CLARIN Tool Portal

698 record(s) found

Search results

Translation Models (en-de) (v1.0)

2 resources

En-De translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2020 (BLEU): en->de: 25.9 de->en: 33.4 (Evaluated using multeval: https://github.com/jhclark/multeval)

Use "Translation Models (en-de) (v1.0)"
The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Croatian 1.1

3 resources

The model for morphosyntactic annotation of standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1183) and using the CLARIN.SI-embed.hr word embeddings (http://hdl.handle.net/11356/1205). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~94.1. The difference to the previous version of the model is that now the whole XPOS tag is predicted and not specific characters, as was the case in stanfordnlp, which resulted in illegal XPOS tags (and slightly decreased performance).

Use "The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Croatian 1.1"
HaskPL

2 resources

HaskPL is a Polish phraseological database designed for language professionals including linguists, language teachers, lexicographers, language materials developers and translators. Query results can be visualised and exported as spreadsheets. A complementary tool is HaskProof (http://pelcra.clarin-pl.eu:9894/#/lang/pl) identifying potential collocations in any text inserted by the user.

Use "HaskPL"
DigiLing e-Learning Hub: e-Courses for Digital Linguistics

8 resources

The files represent exported e-learning resources created within the DigiLing project, www.digiling.eu. We have identified seven core subjects in Digital Linguistics and built seven corresponding courses: - Introduction to Text Processing and Analysis - Introduction to Python for Linguists - Computational Lexicology and Lexicography - Localization Tools and Workflows - Post-Editing Machine Translation - Mining and Managing Multilingual Terminology - Variability of Languages in Time and Space The data format is .mbz, a compressed archive compatible with any e-learning environment running Moodle.

Use "DigiLing e-Learning Hub: e-Courses for Digital Linguistics"
The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.2

2 resources

The model for lemmatisation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the Sloleks inflectional lexicon (http://hdl.handle.net/11356/1230). The estimated F1 of the lemma annotations is ~99.0. The difference to the previous version is that now it relies solely on XPOS annotations, and not on a combination of UPOS, FEATS (lexicon lookup) and XPOS (lemma prediction) annotations.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.2"
NLP Web services and NLP workflow engine

2 resources

Web based system for natural language processing of texts in Polish. It allows running complex workflows of language and machine learning tools. Making it avaliable via REST Web Services.

Use "NLP Web services and NLP workflow engine"
XLM-RoBERTa-LARGE events relation recognition

1 resources

A set of basic language tools for the Polish language. Z4.2a Improving the quality of recognition of relations between events using Transformer-type deep networks.

Use "XLM-RoBERTa-LARGE events relation recognition"
Multi-speaker GlowTTS model for Talrómur 2 (prerelease) (22.10)

3 resources

This release includes a partially trained multi-speaker model using the GlowTTS architecture in the Coqui TTS library [1]. The model is trained on all of the speakers in the Talrómur 2 [2] corpus. The release includes the model, training log, model configuration file and the recipe used to train the model. The model included here is the best model available during the training at the time of publishing. At run time it is possible to choose any of the voices to produce a similar sounding synthesized voice. Þessi útgáfa inniheldur módel þjálfað á mörgum röddum með notkun GlowTTS nálgunarinnar í Coqui TTS verkfærakistunni [1]. Módelið er þjálfað á öllum röddum í Talrómur 2 [2] gagnasafninu. Innifalið í pakkanum er módelið, þjálfunarsaga, skjal með stillingum fyrir módelið og forskriftin sem var notuð til að þjálfa módelið. Módelið sem er hér inni er besta módelið í þjálfunarferlinu á þeim tíma sem þetta er gefið út. Þegar módelið er keyrt er hægt að velja hvaða rödd sem er úr Talrómur 2 gagnasafninu til að búa til upptöku með sambærilegri rödd. [1] https://github.com/cadia-lvl/coqui-ai-TTS/releases/tag/M9 [2] http://hdl.handle.net/20.500.12537/167

Use "Multi-speaker GlowTTS model for Talrómur 2 (prerelease) (22.10)"
RÚV-DI Speaker Diarization v5 models (21.05)

2 resources

English This archive contains files generated from the recipe in kaldi-speaker-diarization/v5/. Its contents should be placed in a similar directory type, with symbolic links to diarization/, sid/, steps/, etc. It was created when Kaldi's master branch was at git commit 321d3959dabf667ea73cc98881400614308ccbbb. v5 These models are trained on the Althingi Parliamentary Speech corpus available on malfong.is. It uses MFCCS, x-vectors, PLDA and AHC. The recipe uses the Icelandic Rúv-di corpus as two hold out sets for tuning parameters. The Icelandic Rúv-di corpus is currently not publicly available. Íslenska Þetta skjalasafn inniheldur skrár frá kaldi-speaker-diarization v5. Innihaldi skjalasafnsins ætti að setja í eins möppu, með hlekki (symlinks) á diarization, sid, steps, o.s.frv. Notast var við Kaldi af master grein og Git commit 321d3959dabf667ea73cc98881400614308ccbbb. v5 Þessi líkön eru þjálfuð á gagnasafninu Alþingisræður til talgreiningar sem er aðgengilegt á malfong.is. Þau nota MFCC, x-vigra, PLDA, og AHC. Uppskriftin notar RÚV-di gagnasafnið sem hold-out gagnasöfn til að stilla forsendur. Eins og er þá er RÚV-di gagnasafnið ekki aðgengilegt almenningi.

Use "RÚV-DI Speaker Diarization v5 models (21.05)"
Universal Dependencies 2.4 Models for UDPipe (2019-05-31)

93 resources

Tokenizer, POS Tagger, Lemmatizer and Parser models for 90 treebanks of 60 languages of Universal Depenencies 2.4 Treebanks, created solely using UD 2.4 data (http://hdl.handle.net/11234/1-2988). The model documentation including performance can be found at http://ufal.mff.cuni.cz/udpipe/models#universal_dependencies_24_models . To use these models, you need UDPipe binary version at least 1.2, which you can download from http://ufal.mff.cuni.cz/udpipe . In addition to models itself, all additional data and value of hyperparameters used for training are available in the second archive, allowing reproducible training.

Use "Universal Dependencies 2.4 Models for UDPipe (2019-05-31)"

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

Translation Models (en-de) (v1.0)

The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Croatian 1.1

HaskPL

DigiLing e-Learning Hub: e-Courses for Digital Linguistics

The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.2

NLP Web services and NLP workflow engine

XLM-RoBERTa-LARGE events relation recognition

Multi-speaker GlowTTS model for Talrómur 2 (prerelease) (22.10)

RÚV-DI Speaker Diarization v5 models (21.05)

Universal Dependencies 2.4 Models for UDPipe (2019-05-31)