This model for named entity recognition of non-standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1183), the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241) and the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240), using the CLARIN.SI-embed.hr word embeddings (http://hdl.handle.net/11356/1205). The training corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed.
The "Samrómur-Adolescents Kaldi Recipe 22.06" is a code recipe intended to
show how to integrate the adolescent portion of the corpus "Samrómur
Children's Icelandic Speech Data 21.09" [1] and the "Icelandic Language
Models with Pronunciations 22.01" [2] to create automatic speech recognition
systems using the Kaldi toolkit [3].
A VIADAT module; the purpose of VIADAT-STAT is statistical analysis of recordings stored by the platform.
Developed in cooperation with ÚSD AV ČR and NFA.
ZRCalo is an open font meant to gradually phase out the ZRCola font as one of the components of the ZRCola 2 input system (http://hdl.handle.net/11356/1090). The current version is a baseline variant covering the basic Latin Unicode blocks. Future versions will aim to build on Unicode's combining characters mechanic to replace ZRCola's extensive use of the Private Use Area.
This model for UD dependency parsing of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The estimated LAS of the parser is ~90.42.
The difference to the previous version of the model is that the model was trained using the improved SUK 1.1 version of the training corpus.
The "Samrómur DeepSpeech Recipe 22.06" is a code recipe intended to show how to integrate the corpus "Samromur 21.05" [1] and the "DeepSpeech Scorer for Icelandic 22.06" [2] to create automatic speech recognition systems using the Mozilla's DeepSpeech recognizer [3].
This model for morphosyntactic annotation of non-standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and the Janes-Tag corpus (http://hdl.handle.net/11356/1732), using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) that were expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~92.17.
The difference to the previous version of the model is that the model was trained on the SUK training corpus and the 3.0 version of Janes-tag, uses new embeddings and the new version of the Slovene morphological lexicon Sloleks 3.0 (http://hdl.handle.net/11356/1745).
This model for morphosyntactic annotation of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) that were expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~98.27.
The difference to the previous version of the model is that the model was trained using the SUK training corpus and uses new embeddings and the new version of the Slovene morphological lexicon Sloleks 3.0 (http://hdl.handle.net/11356/1745).