CLARIN Tool Portal

Active filters:

Project: Development of Slovene in a Digital Environment
Keywords: lemmatisation

3 record(s) found

Search results

The CLASSLA-Stanza model for lemmatisation of standard Slovenian 2.0

2 resources

This model for lemmatisation of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The estimated F1 of the lemma annotations is ~99.11. The difference to the previous version of the model is that the model was trained using the SUK training corpus and uses new embeddings and the new version of the Slovene morphological lexicon Sloleks 3.0 (http://hdl.handle.net/11356/1745).

Use "The CLASSLA-Stanza model for lemmatisation of standard Slovenian 2.0"
The CLASSLA-Stanza model for lemmatisation of spoken Slovenian 2.2

2 resources

This model for lemmatisation of spoken Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SST treebank of spoken Slovenian (https://github.com/UniversalDependencies/UD_Slovenian-SST) combined with the SUK training corpus (http://hdl.handle.net/11356/1959) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1791) that were expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The estimated F1 of the lemma annotations is ~99.23.

Use "The CLASSLA-Stanza model for lemmatisation of spoken Slovenian 2.2"
The CLASSLA-Stanza model for lemmatisation of non-standard Slovenian 2.1

2 resources

This model for lemmatisation of non-standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and the Janes-Tag corpus (http://hdl.handle.net/11356/1732), using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) that were expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~91.45. The difference to the previous version of the model is that the model was trained on the SUK training corpus and the 3.0 version of Janes-tag, uses new embeddings and the new version of the Slovene morphological lexicon Sloleks 3.0 (http://hdl.handle.net/11356/1745).

Use "The CLASSLA-Stanza model for lemmatisation of non-standard Slovenian 2.1"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

The CLASSLA-Stanza model for lemmatisation of standard Slovenian 2.0

The CLASSLA-Stanza model for lemmatisation of spoken Slovenian 2.2

The CLASSLA-Stanza model for lemmatisation of non-standard Slovenian 2.1