CLARIN Tool Portal

WoSeDon

1 resources

WoSeDon is a tool for Word Sense Disambiguation. It works for polish texts and as a source of possible senses using plWordNet.

Use "WoSeDon"

KER is a keyword extractor that was designed for scanned texts in Czech and English. It is based on the standard tf-idf algorithm with the idf tables trained on texts from Wikipedia. To deal with the data sparsity, texts are preprocessed by Morphodita: morphological dictionary and tagger.

Use "KER - Keyword Extractor"

SlopeQ for BNC Search Engine

2 resources

The SlopeQ for BNC Search Engine provides access to the British National Corpus dataset. In addition to linguistically motivated corpus queries, it supports a number of data exploration and visualisation features. Most of the functionality of the search engine is available through a REST web service.

Use "SlopeQ for BNC Search Engine"

The CLASSLA-Stanza model for UD dependency parsing of standard Serbian 2.1

3 resources

The model for UD dependency parsing of standard Serbian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200) and using the CLARIN.SI-embed.sr word embeddings (http://hdl.handle.net/11356/1789). The estimated LAS of the parser is ~89.83. The difference to the previous version of the model is that this version uses the new version of Serbian word embeddings.

Use "The CLASSLA-Stanza model for UD dependency parsing of standard Serbian 2.1"

EvaLatin 2020 models for UDPipe 2 (2020-08-31)

2 resources

POS Tagger and Lemmatizer models for EvaLatin2020 data (https://github.com/CIRCSE/LT4HALA). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#evalatin20_models . To use these models, you need UDPipe version at least 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .

Use "EvaLatin 2020 models for UDPipe 2 (2020-08-31)"

The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian

3 resources

The model for morphosyntactic annotation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~96.7.

Use "The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian"

The CLASSLA-StanfordNLP model for lemmatisation of standard Serbian

2 resources

The model for lemmatisation of standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200) and using the srLex inflectional lexicon (http://hdl.handle.net/11356/1233). The estimated F1 of the lemma annotations is ~97.9.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Serbian"

The CLASSLA-StanfordNLP model for lemmatisation of non-standard Croatian 1.0

2 resources

The model for lemmatisation of non-standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1210), the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241), the RAPUT corpus (https://www.aclweb.org/anthology/L16-1513/) and the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240), using the hrLex inflectional lexicon (http://hdl.handle.net/11356/1232). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~97.54.

Use "The CLASSLA-StanfordNLP model for lemmatisation of non-standard Croatian 1.0"

Korektor 2

3 resources

Korektor is a statistical spell-checker and (occasionally) grammar-checker. It is released under 2-Clause BSD license http://opensource.org/licenses/BSD-2-Clause. Korektor started with Michal Richter's diploma thesis Advanced Czech Spellchecker https://redmine.ms.mff.cuni.cz/documents/1, but it is being developed further. There are two versions: a command line utility (tested on Linux, Windows and OS X) and a REST service with publicly available API http://lindat.mff.cuni.cz/services/korektor/api-reference.php and HTML front end https://lindat.mff.cuni.cz/services/korektor/.

Use "Korektor 2"

CUBBITT Translation Models (en-fr) (v1.0)

3 resources

CUBBITT En-Fr translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2014 (BLEU): en->fr: 38.2 fr->en: 36.7 (Evaluated using multeval: https://github.com/jhclark/multeval)

Use "CUBBITT Translation Models (en-fr) (v1.0)"

Result filters

Metadata provider

Language

Resource type

Tool task

Field of study

Availability

Organisation

Project

Keywords

Active filters:

Search results

WoSeDon

KER - Keyword Extractor

SlopeQ for BNC Search Engine

The CLASSLA-Stanza model for UD dependency parsing of standard Serbian 2.1

EvaLatin 2020 models for UDPipe 2 (2020-08-31)

The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian

The CLASSLA-StanfordNLP model for lemmatisation of standard Serbian

The CLASSLA-StanfordNLP model for lemmatisation of non-standard Croatian 1.0

Korektor 2

CUBBITT Translation Models (en-fr) (v1.0)