Result filters

Metadata provider

Resource type

Availability

Active filters:

  • Tool task: Tokenisation
  • Language: Slovenian
Loading...
13 record(s) found

Search results

  • Universal Dependencies 2.10 models for UDPipe 2 (2022-07-11)

    Tokenizer, POS Tagger, Lemmatizer and Parser models for 123 treebanks of 69 languages of Universal Depenencies 2.10 Treebanks, created solely using UD 2.10 data (https://hdl.handle.net/11234/1-4758). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#universal_dependencies_210_models . To use these models, you need UDPipe version 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
  • Universal Dependencies 2.15 models for UDPipe 2 (2024-11-21)

    Tokenizer, POS Tagger, Lemmatizer and Parser models for 147 treebanks of 78 languages of Universal Depenencies 2.15 Treebanks, created solely using UD 2.15 data (https://hdl.handle.net/11234/1-5787). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#universal_dependencies_215_models . To use these models, you need UDPipe version 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
  • Universal Dependencies 2.12 models for UDPipe 2 (2023-07-17)

    Tokenizer, POS Tagger, Lemmatizer and Parser models for 131 treebanks of 72 languages of Universal Depenencies 2.12 Treebanks, created solely using UD 2.12 data (https://hdl.handle.net/11234/1-5150). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#universal_dependencies_212_models . To use these models, you need UDPipe version 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
  • Universal Dependencies 2.4 Models for UDPipe (2019-05-31)

    Tokenizer, POS Tagger, Lemmatizer and Parser models for 90 treebanks of 60 languages of Universal Depenencies 2.4 Treebanks, created solely using UD 2.4 data (http://hdl.handle.net/11234/1-2988). The model documentation including performance can be found at http://ufal.mff.cuni.cz/udpipe/models#universal_dependencies_24_models . To use these models, you need UDPipe binary version at least 1.2, which you can download from http://ufal.mff.cuni.cz/udpipe . In addition to models itself, all additional data and value of hyperparameters used for training are available in the second archive, allowing reproducible training.
  • Universal Dependencies 2.6 models for UDPipe 2 (2020-08-31)

    Tokenizer, POS Tagger, Lemmatizer and Parser models for 99 treebanks of 63 languages of Universal Depenencies 2.6 Treebanks, created solely using UD 2.6 data (https://hdl.handle.net/11234/1-3226). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#universal_dependencies_26_models . To use these models, you need UDPipe version 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
  • Universal Dependencies 2.5 Models for UDPipe (2019-12-06)

    Tokenizer, POS Tagger, Lemmatizer and Parser models for 94 treebanks of 61 languages of Universal Depenencies 2.5 Treebanks, created solely using UD 2.5 data (http://hdl.handle.net/11234/1-3105). The model documentation including performance can be found at http://ufal.mff.cuni.cz/udpipe/models#universal_dependencies_25_models . To use these models, you need UDPipe binary version at least 1.2, which you can download from http://ufal.mff.cuni.cz/udpipe . In addition to models itself, all additional data and value of hyperparameters used for training are available in the second archive, allowing reproducible training.
  • Trankit model for SST 2.15

    This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the SST treebank of spoken Slovenian (UD v2.15, https://github.com/UniversalDependencies/UD_Slovenian-SST/tree/dev) featuring transcriptions of spontaneous speech in various everyday settings. It is able to predict sentence segmentation, tokenization, lemmatization, language-specific morphological annotation (MULTEXT-East morphosyntactic tags), as well as universal part-of-speech tagging, morphological feature prediction, and dependency parses in accordance with the Universal Dependencies annotation scheme (https://universaldependencies.org/). Please note this model has been published for archiving purposes only. For production use, we recommend using the state-of-the art Trankit model available here: http://hdl.handle.net/11356/1965. The latter was trained on both spoken (SST) and written (SSJ) data, and demonstrates a significantly higher performance to the model featured in this submission.
  • The Trankit model for linguistic processing of standard Slovenian

    This is a retrained Slovenian standard model for Trankit v1.1.1 library (https://pypi.org/project/trankit/). It is able to predict sentence segmentation, tokenization, lemmatization, language-specific morphological annotation (MULTEXT-East morphosyntactic tags), as well as universal part-of-speech tagging, feature prediction, and dependency parsing in accordance with the Universal Dependencies annotation scheme (https://universaldependencies.org/). The model was trained using a dataset published by Universal Dependencies in release 2.12 (https://github.com/UniversalDependencies/UD_Slovenian-SSJ/tree/r2.12). Due to the larger training dataset compared to the original Trankit v1.1.1 model, this version yields superior results and achieves state-of-the art parsing performance for Slovenian (https://slobench.cjvt.si/leaderboard/view/11). To utilize this model, please follow the instructions provided in our github repository (https://github.com/clarinsi/trankit-train) or refer to the Trankit documentation (https://trankit.readthedocs.io/en/latest/training.html#loading). This ZIP file contains models for both xlm-roberta-large (which delivers better performance but requires more hardware resources) and xlm-roberta-base.
  • The Trankit model for linguistic process of standard written Slovenian 1.1

    This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the reference SSJ UD treebank featuring fiction, non-fiction, periodical and Wikipedia texts in standard modern Slovenian. It is able to predict sentence segmentation, tokenization, lemmatization, language-specific morphological annotation (MULTEXT-East morphosyntactic tags), as well as universal part-of-speech tagging, morphological features, and dependency parses in accordance with the Universal Dependencies annotation scheme (https://universaldependencies.org/). The model was trained using a dataset published by Universal Dependencies in release 2.14 (https://github.com/UniversalDependencies/UD_Slovenian-SSJ/tree/r2.14). To utilize this model, please follow the instructions provided in our github repository (https://github.com/clarinsi/trankit-train) or refer to the Trankit documentation (https://trankit.readthedocs.io/en/latest/training.html#loading). This ZIP file contains models for both xlm-roberta-large (which delivers better performance but requires more hardware resources) and xlm-roberta-base. This version was trained on a newer, slightly improved version of the SSJ UD treebank (UD v2.14) than the previous version of the model and produces similar results.
  • Trankit model for SST 2.15 1.1

    This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the SST treebank of spoken Slovenian (UD v2.15, https://github.com/UniversalDependencies/UD_Slovenian-SST/tree/r2.15) featuring transcriptions of spontaneous speech in various everyday settings. It is able to predict sentence segmentation, tokenization, lemmatization, language-specific morphological annotation (MULTEXT-East morphosyntactic tags), as well as universal part-of-speech tagging, morphological feature prediction, and dependency parses in accordance with the Universal Dependencies annotation scheme (https://universaldependencies.org/). Please note this model has been published for archiving purposes only. For production use, we recommend using the state-of-the art Trankit model available here: http://hdl.handle.net/11356/1965 (v1.2 or newest). The latter was trained on both spoken (SST) and written (SSJ) data, and demonstrates a significantly higher performance to the model featured in this submission. In comparison with version 1.0, this model was trained on a new train-dev-test split of the SST treebank introduced in release UD v2.15.