CLARIN Tool Portal

Grafon

1 resources

Representation of sentence semantic with deepened semantic graphs. Graphs are composed based on the output of saper tool https://clarin-pl.eu/dspace/handle/11321/278

Use "Grafon"

ENIAMtoolkit

2 resources

ENIAMtoolkit is a collection of libraries that: - perform tokenization, lemmatization, part of speech tagging; - detect MWE and abbreviations; - split text into sentences.

Use "ENIAMtoolkit"

SuperMatrix is a system to support automatic extraction of semantic relations, based on the analysis of large text corpora. System was developed as a tool for expansion of Polish wordnet (Słowosieć).Expansion consist of two steps: system suggests a potential links between lexical units. Linguist verify these suggestions and decide which form will go to wordnet. This speeded up the work and preserve the integrity of data entry.

Use "SuperMatrix"

Malach Center User Interface 1.0

2 resources

Source code of the first full and running version for the Malach Center User Interface, does not contain data or metadata fo the digital objects and resources.

Use "Malach Center User Interface 1.0"

The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian

2 resources

The model for lemmatisation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the Sloleks inflectional lexicon (http://hdl.handle.net/11356/1230). The estimated F1 of the lemma annotations is ~99.0.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian"

The CLASSLA-StanfordNLP model for UD dependency parsing of standard Croatian

3 resources

The model for UD dependency parsing of standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the UD-parsed portion of the hr500k training corpus (http://hdl.handle.net/11356/1183) and using the CLARIN.SI-embed.hr word embeddings (http://hdl.handle.net/11356/1205). The estimated LAS of the parser is ~85.9.

Use "The CLASSLA-StanfordNLP model for UD dependency parsing of standard Croatian"

EdUKate translation software 1

2 resources

This software package includes three tools: web frontend for machine translation featuring phonetic transcription of Ukrainian suitable for Czech speakers, API server and a tool for translation of documents with markup (html, docx, odt, pptx, odp,...). These tools are used in the Charles Translator service (https://translator.cuni.cz). This software was developed within the EdUKate project, which aims to help mitigate language barriers between non-Czech-speaking children in the Czech Republic and the education in the Czech school system. The project focuses on the development and dissemination of multilingual digital learning materials for students in primary and secondary schools.

Use "EdUKate translation software 1"

MorphoDiTa-based tagger for Polish language

4 resources

MorphoDiTa-based tagger for Polish language. It is a tool for morphosyntactic unification for the Polish language, according to the NKJP tagset.

Use "MorphoDiTa-based tagger for Polish language"

GlowTTS models for Talrómur1 (22.10)

6 resources

This release contains GlowTTS models for four different voices from the Talrómur 1 [1] corpus. The models were trained using the Coqui TTS library after it was adapted for Icelandic. Included is the model, model configuration, log file for the training and the recipe used for each model. Þessi útgáfa inniheldur þjálfuð GlowTTS módel fyrir fjórar mismunandi raddir úr Talrómur 1 [1] gagnasafninu. Módelin voru þjálfuð með Coqui TTS verkfærakistunni sem búið var að aðlaga fyrir íslensku. Innifalið fyrir hverja rödd er módelið, skjal með stillingum á módelinu, þjálfunarsaga og forskriftin sem var notuð. [1] http://hdl.handle.net/20.500.12537/104

Use "GlowTTS models for Talrómur1 (22.10)"

Semi-supervised Icelandic-Polish Translation System (22.09)

8 resources

This Icelandic-Polish translation model (bi-directional) was trained using fairseq (https://github.com/facebookresearch/fairseq) by means of semi-supervised translation by starting with the mBART50 model. The model was then trained using a multi-task curriculum to first learn to denoise sentences. Then the model was trained to translate using aligned parallel texts. Finally the model was provided with monolingual texts in both Icelandic and Polish with which it iteratively creates back-translations. For the PL-IS direction the model achieves a BLEU score of 27.60 on held out true parallel training data and 15.30 on the out-of-domain Flores devset. For the IS-PL direction the model achieves a score of 27.70 on the true data and 13.30 on the Flores devset. -- Þetta íslensk-pólska þýðingarlíkan (tvíátta) var þjálfað með fairseq (https://github.com/facebookresearch/fairseq) með hálf-sjálfvirkum aðferðum frá mBART50 líkaninu. Líkanið var þjálfað á þremur verkefnum, afruglun, samhliða þýðingum og bakþýðingum sem voru myndaðar á þjálfunartíma. Fyrir PL-IS áttina fæst BLEU skor 27.60 á raun gögnum sem voru tekin til hliðar og 15.30 á Flores þróunargögnunum. Fyrir IS-PL áttina fæst skor 27.70 á raun gögnunum og 13.30 á Flores þróunargögnunum.

Use "Semi-supervised Icelandic-Polish Translation System (22.09)"

Result filters

Metadata provider

Language

Resource type

Tool task

Field of study

Availability

Organisation

Project

Keywords

Active filters:

Search results

Grafon

ENIAMtoolkit

SuperMatrix

Malach Center User Interface 1.0

The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian

The CLASSLA-StanfordNLP model for UD dependency parsing of standard Croatian

EdUKate translation software 1

MorphoDiTa-based tagger for Polish language

GlowTTS models for Talrómur1 (22.10)

Semi-supervised Icelandic-Polish Translation System (22.09)