CLARIN Tool Portal

Tests for Word Embeddings

4 resources

Evaluation tools (WBST, HWBST, EWBST) for word embedding models used to assess and compare the usefulness of different word embeddings

Use "Tests for Word Embeddings"

Czech PDT-C 1.0 Model for UDPipe 2 (2023-11-16)

2 resources

Tokenizer, POS Tagger, Lemmatizer, and Parser model based on the PDT-C 1.0 treebank (https://hdl.handle.net/11234/1-3185). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#czech_pdtc1.0_model . To use these models, you need UDPipe version 2.1, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .

Use "Czech PDT-C 1.0 Model for UDPipe 2 (2023-11-16)"

The CLASSLA-StanfordNLP model for morphosyntactic annotation of non-standard Croatian 1.0

3 resources

This model for morphosyntactic annotation of non-standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1210), the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241), the RAPUT corpus (https://www.aclweb.org/anthology/L16-1513/) and the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240), using the CLARIN.SI-embed.hr word embeddings (http://hdl.handle.net/11356/1205). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~95.11.

Use "The CLASSLA-StanfordNLP model for morphosyntactic annotation of non-standard Croatian 1.0"

Translation Models (en-ru) (v1.0)

2 resources

En-Ru translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2020 (BLEU): en->ru: 18.0 ru->en: 30.4 (Evaluated using multeval: https://github.com/jhclark/multeval)

Use "Translation Models (en-ru) (v1.0)"

The Model latinpipe-evalatin24-240520 for LatinPipe 2024

2 resources

The latinpipe-evalatin24-240520 is a PhilBerta-based model for LatinPipe 2024 <https://github.com/ufal/evalatin2024-latinpipe>, performing tagging, lemmatization, and dependency parsing of Latin, based on the winning entry to the EvaLatin 2024 <https://circse.github.io/LT4HALA/2024/EvaLatin> shared task. It is released under the CC BY-NC-SA 4.0 license.

Use "The Model latinpipe-evalatin24-240520 for LatinPipe 2024"

Universal Dependencies 1.2 Models for Parsito

2 resources

Parsing models for all Universal Depenencies 1.2 Treebanks, created solely using UD 1.2 data (http://hdl.handle.net/11234/1-1548). To use these models, you need Parsito binary, which you can download from http://hdl.handle.net/11234/1-1584.

Use "Universal Dependencies 1.2 Models for Parsito"

CUBBITT Translation Models (en-pl) (v1.0)

3 resources

CUBBITT En-Pl translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2020 (BLEU): en->pl: 12.3 pl->en: 20.0 (Evaluated using multeval: https://github.com/jhclark/multeval)

Use "CUBBITT Translation Models (en-pl) (v1.0)"

Morphological Analyzer for Shipibo-Konibo

2 resources

This tool is the first morphological analyzer ever for this language. The analyzer is a FST that produces all possible segmentations and tagging sequences in a word-by-word fashion.

Use "Morphological Analyzer for Shipibo-Konibo"

CorPipe 23 multilingual CorefUD 1.1 model (corpipe23-corefud1.1-231206)

2 resources

The `corpipe23-corefud1.1-231206` is a `mT5-large`-based multilingual model for coreference resolution usable in CorPipe 23 (https://github.com/ufal/crac2023-corpipe). It is released under the CC BY-NC-SA 4.0 license. The model is language agnostic (no _corpus id_ on input), so it can be used to predict coreference in any `mT5` language (for zero-shot evaluation, see the paper). However, note that the empty nodes must be present already on input, they are not predicted (the same settings as in the CRAC23 shared task).

Use "CorPipe 23 multilingual CorefUD 1.1 model (corpipe23-corefud1.1-231206)"

WMT21 Marian translation model (ca-oc multi-task)

1 resources

Marian NMT model for Catalan to Occitan translation. It is a multi-task model, producing also a phonemic transcription of the Catalan source. The model was submitted to WMT'21 Shared Task on Multilingual Low-Resource Translation for Indo-European Languages as a CUNI-Contrastive system for Catalan to Occitan.

Use "WMT21 Marian translation model (ca-oc multi-task)"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Organisation

Project

Keywords

Active filters:

Search results

Tests for Word Embeddings

Czech PDT-C 1.0 Model for UDPipe 2 (2023-11-16)

The CLASSLA-StanfordNLP model for morphosyntactic annotation of non-standard Croatian 1.0

Translation Models (en-ru) (v1.0)

The Model latinpipe-evalatin24-240520 for LatinPipe 2024

Universal Dependencies 1.2 Models for Parsito

CUBBITT Translation Models (en-pl) (v1.0)

Morphological Analyzer for Shipibo-Konibo

CorPipe 23 multilingual CorefUD 1.1 model (corpipe23-corefud1.1-231206)

WMT21 Marian translation model (ca-oc multi-task)