CLARIN Tool Portal

Active filters:

Tool task: Lemmatisation

66 record(s) found

Search results

The CLASSLA-Stanza model for lemmatisation of standard Slovenian 2.0

2 resources

This model for lemmatisation of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The estimated F1 of the lemma annotations is ~99.11. The difference to the previous version of the model is that the model was trained using the SUK training corpus and uses new embeddings and the new version of the Slovene morphological lexicon Sloleks 3.0 (http://hdl.handle.net/11356/1745).

Use "The CLASSLA-Stanza model for lemmatisation of standard Slovenian 2.0"
The CLASSLA-Stanza model for lemmatisation of non-standard Croatian 2.1

2 resources

The model for lemmatisation of non-standard Croatian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the hr500k training corpus (http://hdl.handle.net/11356/1792) and the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1793), using the hrLex inflectional lexicon (http://hdl.handle.net/11356/1232). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~94.23. The difference to the previous version of the model is that this version is trained on a combination of two corpora (hr500k, ReLDI-NormTagNER-hr).

Use "The CLASSLA-Stanza model for lemmatisation of non-standard Croatian 2.1"
Universal Dependencies 2.10 models for UDPipe 2 (2022-07-11)

2 resources

Tokenizer, POS Tagger, Lemmatizer and Parser models for 123 treebanks of 69 languages of Universal Depenencies 2.10 Treebanks, created solely using UD 2.10 data (https://hdl.handle.net/11234/1-4758). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#universal_dependencies_210_models . To use these models, you need UDPipe version 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .

Use "Universal Dependencies 2.10 models for UDPipe 2 (2022-07-11)"
The CLASSLA-Stanza model for lemmatisation of non-standard Serbian 2.1

2 resources

The model for lemmatisation of non-standard Serbian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200) and the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1794), using the srLex inflectional lexicon (http://hdl.handle.net/11356/1233). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~94.92. The difference to the previous version of the model is that this version is trained on a combination of two corpora (SETimes.SR, ReLDI-NormTagNER-sr).

Use "The CLASSLA-Stanza model for lemmatisation of non-standard Serbian 2.1"
Czech PDT-C 1.0 Model for UDPipe 2 (2023-11-16)

2 resources

Tokenizer, POS Tagger, Lemmatizer, and Parser model based on the PDT-C 1.0 treebank (https://hdl.handle.net/11234/1-3185). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#czech_pdtc1.0_model . To use these models, you need UDPipe version 2.1, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .

Use "Czech PDT-C 1.0 Model for UDPipe 2 (2023-11-16)"
The Model latinpipe-evalatin24-240520 for LatinPipe 2024

2 resources

The latinpipe-evalatin24-240520 is a PhilBerta-based model for LatinPipe 2024 <https://github.com/ufal/evalatin2024-latinpipe>, performing tagging, lemmatization, and dependency parsing of Latin, based on the winning entry to the EvaLatin 2024 <https://circse.github.io/LT4HALA/2024/EvaLatin> shared task. It is released under the CC BY-NC-SA 4.0 license.

Use "The Model latinpipe-evalatin24-240520 for LatinPipe 2024"
The CLASSLA-StanfordNLP model for lemmatisation of standard Bulgarian 1.0

2 resources

The model for lemmatisation of standard Bulgarian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the BulTreeBank training corpus (http://hdl.handle.net/11495/D93F-C6E9-65D9-2) and using the Bulgarian inflectional lexicon (Popov, Simov, and Vidinska 1998). The estimated F1 of the lemma annotations is ~98.8.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Bulgarian 1.0"
The CLASSLA-Stanza model for lemmatisation of spoken Slovenian 2.2

2 resources

This model for lemmatisation of spoken Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SST treebank of spoken Slovenian (https://github.com/UniversalDependencies/UD_Slovenian-SST) combined with the SUK training corpus (http://hdl.handle.net/11356/1959) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1791) that were expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The estimated F1 of the lemma annotations is ~99.23.

Use "The CLASSLA-Stanza model for lemmatisation of spoken Slovenian 2.2"
The CLASSLA-StanfordNLP model for lemmatisation of standard Serbian 1.2

2 resources

The model for lemmatisation of standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200) and using the srLex inflectional lexicon (http://hdl.handle.net/11356/1233). The estimated F1 of the lemma annotations is ~97.9. The difference to the previous version is that now it relies solely on XPOS annotations, and not on a combination of UPOS, FEATS (lexicon lookup) and XPOS (lemma prediction) annotations.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Serbian 1.2"
The CLASSLA-StanfordNLP model for lemmatisation of standard Macedonian 1.0

2 resources

The model for lemmatisation of standard Macedonian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the 1984 training corpus (to be published). The estimated F1 of the lemma annotations is ~99.1.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Macedonian 1.0"

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Active filters:

Search results

The CLASSLA-Stanza model for lemmatisation of standard Slovenian 2.0

The CLASSLA-Stanza model for lemmatisation of non-standard Croatian 2.1

Universal Dependencies 2.10 models for UDPipe 2 (2022-07-11)

The CLASSLA-Stanza model for lemmatisation of non-standard Serbian 2.1

Czech PDT-C 1.0 Model for UDPipe 2 (2023-11-16)

The Model latinpipe-evalatin24-240520 for LatinPipe 2024

The CLASSLA-StanfordNLP model for lemmatisation of standard Bulgarian 1.0

The CLASSLA-Stanza model for lemmatisation of spoken Slovenian 2.2

The CLASSLA-StanfordNLP model for lemmatisation of standard Serbian 1.2

The CLASSLA-StanfordNLP model for lemmatisation of standard Macedonian 1.0