CLARIN Tool Portal

703 record(s) found

Search results

The CLASSLA-StanfordNLP model for morphosyntactic annotation of non-standard Serbian 1.0

3 resources

This model for morphosyntactic annotation of non-standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200), the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240), the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241), the hr500k training corpus (http://hdl.handle.net/11356/1210) and the RAPUT corpus (https://www.aclweb.org/anthology/L16-1513/), using the CLARIN.SI-embed.sr word embeddings (http://hdl.handle.net/11356/1206). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~94.91.

Use "The CLASSLA-StanfordNLP model for morphosyntactic annotation of non-standard Serbian 1.0"
VIADAT-ANNOTATE

2 resources

A VIADAT module; VIADAT-ANNOTATE is an interactive annotation environment. Developed in cooperation with ÚSD AV ČR and NFA.

Use "VIADAT-ANNOTATE"
EVALD 4.0 for Beginners – Evaluator of Discourse

3 resources

EVALD 4.0 for Beginners is a software that serves for automatic evaluation of Czech texts written by non-native speakers of Czech – language beginners.

Use "EVALD 4.0 for Beginners – Evaluator of Discourse"
Samrómur-Children Demonstration Scripts 22.01

2 resources

The "Samrómur-Children Demonstration Scripts 22.01" is a set of three code recipes intended to show how to integrate the corpus "Samrómur Children's Icelandic Speech Data 21.09" and the "Icelandic Language Models with Pronunciations 22.01" to create automatic speech recognition systems using the Kaldi toolkit. „Samrómur-Sýnisforskriftir fyrir börn 22.01“ er safn af þremur talgreiningarforskriftum sem sýna hvernig má beita talmálheildinni „Samrómur-Íslensk talgögn frá börnum 21.09“ ásamt „Íslenskum mállíkönum með framburðarorðabók 22.01“ til þess að byggja talgreiningarkerfi með verkfærakistunni Kaldi.

Use "Samrómur-Children Demonstration Scripts 22.01"
Poliqarp2

6 resources

Poliqarp2 is a linguistic search engine, capable of searching through large corpora annotated on multiple levels. It is not an upgraded version of Poliqarp, it is a completely new software developed from scratch.

Use "Poliqarp2"
CorpoGrabber

1 resources

CorpoGrabber: The Toolchain to Automatic Acquiring and Extraction of the Website Content Jan Kocoń, Wroclaw University of Technology CorpoGrabber is a pipeline of tools to get the most relevant content of the website, including all subsites (up to the user-defined depth). The proposed toolchain can be used to build a big Web corpora of text documents. It requires only the list of the root websites as the input. Tools composing CorpoGrabber are adapted to Polish, but most subtasks are language independent. The whole process can be run in parallel on a single machine and includes the following tasks: downloading of the HTML subpages of each input page URL [1], extracting of plain text from each subpage by removing boilerplate content (such as navigation links, headers, footers, advertisements from HTML pages) [2], deduplication of plain text [2], removing of bad quality documents utilizing Morphological Analysis Converter and Aggregator (MACA) [3], tagging of documents using Wrocław CRF Tagger (WCRFT) [4]. Last two steps are available only for Polish. The result is a corpora as a set of tagged documents for each website. References [1] https://www.httrack.com/html/faq.html [2] J. Pomikalek. 2011. Removing Boilerplate and Duplicate Content from Web Corpora. Ph.D. Thesis. Masaryk University, Faculcy of Informatics. Brno. [3] A. Radziszewski, T. Sniatowski. 2011. Maca – a configurable tool to integrate Polish morphological data. Proceedings of the Second International Workshop on Free/Open-Source Rule-Based Machine Translation. Barcelona, Spain. [4] A. Radziszewski. 2013. A tiered CRF tagger for Polish. Intelligent Tools for Building a Scientific Information Platform: Advanced Architectures and Solutions. Springer Verlag.

Use "CorpoGrabber"
Samrómur NeMo Recipe 22.06

2 resources

The "Samrómur NeMo Recipe 22.06" is a code recipe intended to show how to integrate the corpus "Samromur 21.05" [1] and the "6-GRAM Language Model in Icelandic for NeMo (Binary Format) 22.06" [2] to create automatic speech recognition systems using the NVIDIA-NeMo framework [3].

Use "Samrómur NeMo Recipe 22.06"
GreynirTranslate - mBART25 NMT (with layer drop) models for Translations between Icelandic and English (1.0)

3 resources

These are the models in http://hdl.handle.net/20.500.12537/125 trained with 40% layer drop. They are suitable for inference using every other layer for optimized inference speed with lower translation performance. We refer to the prior submission for usage and the documentation on layerdrop at https://github.com/pytorch/fairseq/blob/fcca32258c8e8bcc9f9890bf4714fa2f96b6b3e1/examples/layerdrop/README.md. Þessi líkön eru þjálfuð með 40% laga missi (e. layer drop) á líkönunum í http://hdl.handle.net/20.500.12537/125. Þau henta vel til þýðinga þar sem er búið að henda öðru hverju lagi í netinu og þannig er hægt að hraða á þýðingum á kostnað gæða. Leiðbeiningar um notkun netanna er að finna með upphaflegu líkönunum og í notkunarleiðbeiningum Fairseq í https://github.com/pytorch/fairseq/blob/fcca32258c8e8bcc9f9890bf4714fa2f96b6b3e1/examples/layerdrop/README.md.

Use "GreynirTranslate - mBART25 NMT (with layer drop) models for Translations between Icelandic and English (1.0)"
The CLASSLA-Stanza model for UD dependency parsing of standard Slovenian 2.2

3 resources

This model for UD dependency parsing of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The estimated LAS of the parser is ~90.42. The difference to the previous version of the model is that the model was trained using the improved SUK 1.1 version of the training corpus.

Use "The CLASSLA-Stanza model for UD dependency parsing of standard Slovenian 2.2"
Multi-speaker GlowTTS model for Talrómur 2 (prerelease) (22.10)

3 resources

This release includes a partially trained multi-speaker model using the GlowTTS architecture in the Coqui TTS library [1]. The model is trained on all of the speakers in the Talrómur 2 [2] corpus. The release includes the model, training log, model configuration file and the recipe used to train the model. The model included here is the best model available during the training at the time of publishing. At run time it is possible to choose any of the voices to produce a similar sounding synthesized voice. Þessi útgáfa inniheldur módel þjálfað á mörgum röddum með notkun GlowTTS nálgunarinnar í Coqui TTS verkfærakistunni [1]. Módelið er þjálfað á öllum röddum í Talrómur 2 [2] gagnasafninu. Innifalið í pakkanum er módelið, þjálfunarsaga, skjal með stillingum fyrir módelið og forskriftin sem var notuð til að þjálfa módelið. Módelið sem er hér inni er besta módelið í þjálfunarferlinu á þeim tíma sem þetta er gefið út. Þegar módelið er keyrt er hægt að velja hvaða rödd sem er úr Talrómur 2 gagnasafninu til að búa til upptöku með sambærilegri rödd. [1] https://github.com/cadia-lvl/coqui-ai-TTS/releases/tag/M9 [2] http://hdl.handle.net/20.500.12537/167

Use "Multi-speaker GlowTTS model for Talrómur 2 (prerelease) (22.10)"

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

The CLASSLA-StanfordNLP model for morphosyntactic annotation of non-standard Serbian 1.0

VIADAT-ANNOTATE

EVALD 4.0 for Beginners – Evaluator of Discourse

Samrómur-Children Demonstration Scripts 22.01

Poliqarp2

CorpoGrabber

Samrómur NeMo Recipe 22.06

GreynirTranslate - mBART25 NMT (with layer drop) models for Translations between Icelandic and English (1.0)

The CLASSLA-Stanza model for UD dependency parsing of standard Slovenian 2.2

Multi-speaker GlowTTS model for Talrómur 2 (prerelease) (22.10)