Result filters

Metadata provider

  • DSpace

Language

Resource type

Availability

Active filters:

  • Metadata provider: DSpace
Loading...
419 record(s) found

Search results

  • Piper TTS (VITS) models for Talrómur1

    Trained models for four voices from the Talrómur [1] corpus trained with VITS [2] and exported to the onnxruntime [3] for Piper TTS [4]. The four voices are Búi, Salka, Steinn and Ugla. Módel fyrir fjórar raddir úr Talrómi [1]. Raddirnar eru þjálfaðar með VITS [2] og varpað í onnxruntime [3] skrá fyrir Piper TTS [4] verkefnið. Raddirnar fjórar eru Búi, Salka, Steinn og Ugla. [1] http://hdl.handle.net/20.500.12537/104 [2] https://github.com/jaywalnut310/vits/ [3] https://onnxruntime.ai/ [4] https://github.com/rhasspy/piper
  • Word embeddings CLARIN.SI-embed.mk 2.0

    CLARIN.SI-embed.mk contains word embeddings induced from a large collection of Macedonian texts crawled from the .mk top-level domain. The embeddings are based on the skip-gram model of fastText trained on 933,231,582 tokens of running text for 986,670 lowercased surface forms. The difference to the previous version of the embeddings is that this version was trained on the original dataset expanded with the MaCoCu-mk web crawl corpus (http://hdl.handle.net/11356/1512).
  • Slavic Forest, Norwegian Wood (models)

    Trained models for UDPipe used to produce our final submission to the Vardial 2017 CLP shared task (https://bitbucket.org/hy-crossNLP/vardial2017). The SK model was trained on CS data, the HR model on SL data, and the SV model on a concatenation of DA and NO data. The scripts and commands used to create the models are part of separate submission (http://hdl.handle.net/11234/1-1970). The models were trained with UDPipe version 3e65d69 from 3rd Jan 2017, obtained from https://github.com/ufal/udpipe -- their functionality with newer or older versions of UDPipe is not guaranteed. We list here the Bash command sequences that can be used to reproduce our results submitted to VarDial 2017. The input files must be in CoNLLU format. The models only use the form, UPOS, and Universal Features fields (SK only uses the form). You must have UDPipe installed. The feats2FEAT.py script, which prunes the universal features, is bundled with this submission. SK -- tag and parse with the model: udpipe --tag --parse sk-translex.v2.norm.feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu A slightly better after-deadline model (sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe), which we mention in the accompanying paper, is also included. It is applied in the same way (udpipe --tag --parse sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu). HR -- prune the Features to keep only Case and parse with the model: python3 feats2FEAT.py Case < hr-ud-predPoS-test.conllu | udpipe --parse hr-translex.v2.norm.Case.w2v.trainonpred.udpipe NO -- put the UPOS annotation aside, tag Features with the model, merge with the left-aside UPOS annotation, and parse with the model (this hassle is because UDPipe cannot be told to keep UPOS and only change Features): cut -f1-4 no-ud-predPoS-test.conllu > tmp udpipe --tag no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe no-ud-predPoS-test.conllu | cut -f5- | paste tmp - | sed 's/^\t$//' | udpipe --parse no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe
  • The CLASSLA-StanfordNLP model for lemmatisation of standard Croatian 1.1

    The model for lemmatisation of standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1183) and using the hrLex inflectional lexicon (http://hdl.handle.net/11356/1232). The estimated F1 of the lemma annotations is ~97.6. The difference to the previous version of the model is that it is trained with the lemmatiser padding bug removed, cf. https://github.com/stanfordnlp/stanfordnlp/issues/143.
  • PyTorch model for Slovenian Named Entity Recognition SloNER 1.0

    The SloNER is a model for Slovenian Named Entity Recognition. It is is a PyTorch neural network model, intended for usage with the HuggingFace transformers library (https://github.com/huggingface/transformers). The model is based on the Slovenian RoBERTa contextual embeddings model SloBERTa 2.0 (http://hdl.handle.net/11356/1397). The model was trained on the SUK 1.0 training corpus (http://hdl.handle.net/11356/1747).The source code of the model is available on GitHub repository https://github.com/clarinsi/SloNER.
  • The CLASSLA-Stanza model for UD dependency parsing of standard Slovenian 2.0

    This model for UD dependency parsing of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The estimated LAS of the parser is ~91.11. The difference to the previous version of the model is that the model was trained using the SUK training corpus and uses the updated embeddings.