CLARIN Tool Portal

WiKNN Text Classifier

2 resources

WiKNN is an online text classifier service for Polish and English texts. It supports hierarchical labelled classification of user-submitted texts with Wikipedia categories. WiKNN is available through a web-based interface (http://pelcra.clarin-pl.eu/tools/classifier/) and as a REST service with interactive documentation available at http://clarin.pelcra.pl/apidocs/wiknn.

Use "WiKNN Text Classifier"

Integrated Parser

2 resources

Integrated parser is an application that combines and normalizes outputs of several parsers for Polish. It is based on ENIAM processing stream extended with Polish Dependency Parser, Świgra and POLFIE. Particular parsers may turned on and off according to the user requirements.

Use "Integrated Parser"

CorPipe 24 Multilingual CorefUD 1.2 Model (corpipe24-corefud1.2-240906)

2 resources

The `corpipe24-corefud1.2-240906` is a `mT5-large`-based multilingual model for coreference resolution usable in CorPipe 24 (https://github.com/ufal/crac2024-corpipe). It is released under the CC BY-NC-SA 4.0 license. The model is language agnostic (no corpus id on input), so it can be in theory used to predict coreference in any `mT5` language. This model jointly predicts also the empty nodes needed for zero coreference. The paper introducing this model also presents an alternative two-stage approach first predicting empty nodes (via https://www.kaggle.com/models/ufal-mff/crac2024_zero_nodes_baseline/) and then performing coreference resolution (via http://hdl.handle.net/11234/1-5673), which is circa twice as slow but slightly better.

Use "CorPipe 24 Multilingual CorefUD 1.2 Model (corpipe24-corefud1.2-240906)"

The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.3

2 resources

The model for lemmatisation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the Sloleks inflectional lexicon (http://hdl.handle.net/11356/1230). The estimated F1 of the lemma annotations is ~99.7. The difference to the previous version is that the internal lexicon is built on the lexicon training data only, and not on the (automatically XPOS-annoteted) corpus data.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.3"

The CLASSLA-Stanza model for lemmatisation of standard Croatian 2.1

2 resources

The model for lemmatisation of standard Croatian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the hr500k training corpus (http://hdl.handle.net/11356/1792) and using the hrLex inflectional lexicon (http://hdl.handle.net/11356/1232). The estimated F1 of the lemma annotations is ~98.02. The difference to the previous version is that this version was trained on the new version of the hr500k corpus.

Use "The CLASSLA-Stanza model for lemmatisation of standard Croatian 2.1"

LiStr: Linguistic Structure Induction Tookit

2 resources

This toolkit comprises the tools and supporting scripts for unsupervised induction of dependency trees from raw texts or texts with already assigned part-of-speech tags. There are also scripts for simple machine translation based on unsupervised parsing and scripts for minimally supervised parsing into Universal-Dependencies style.

Use "LiStr: Linguistic Structure Induction Tookit"

The CLASSLA-StanfordNLP model for lemmatisation of non-standard Slovenian 1.1

2 resources

The model for lemmatisation of non-standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and the Janes-Tag corpus (http://hdl.handle.net/11356/1238), using the Sloleks inflectional lexicon (http://hdl.handle.net/11356/1230). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~98.86. The difference to the previous version of the lemmatizer is that now it relies solely on XPOS annotations, and not on a combination of UPOS, FEATS (lexicon lookup) and XPOS (lemma prediction) annotations.

Use "The CLASSLA-StanfordNLP model for lemmatisation of non-standard Slovenian 1.1"

UDify Pretrained Model

3 resources

Pretrained model weights for the UDify model, and extracted BERT weights in pytorch-transformers format. Note that these weights slightly differ from those used in the paper.

Use "UDify Pretrained Model"

The CLASSLA-Stanza model for morphosyntactic annotation of non-standard Serbian 2.1

3 resources

This model for morphosyntactic annotation of non-standard Serbian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200), the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1794) and the hr500k training corpus (http://hdl.handle.net/11356/1792), using the CLARIN.SI-embed.sr word embeddings (http://hdl.handle.net/11356/1789). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~92.64. The difference to the previous version of the model is that this version uses the new version of Serbian word embeddings and is trained on a combination of three training corpora (SETimes.SR, ReLDI-NormTagNER-sr, hr500k).

Use "The CLASSLA-Stanza model for morphosyntactic annotation of non-standard Serbian 2.1"

DG-POLFIE: POLFIE and Malt-based syntactic parser

1 resources

DG-POLFIE is a prototypical parser that tries to merge parse fragments generated by POLFIE using Polish Dependency Parser DG-POLFIE aims to improve the coverage of the POLFIE parser (i.e. the percentage of sentences with at least one analysis). In order to increase the number of Polish sentences and constructions that could be parsed with the POLFIE-based parser, DG-POLFIE defines some rules that use depenency structure to build full parse from the FRAGMENTS provided by POLFIE.

Use "DG-POLFIE: POLFIE and Malt-based syntactic parser"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

WiKNN Text Classifier

Integrated Parser

CorPipe 24 Multilingual CorefUD 1.2 Model (corpipe24-corefud1.2-240906)

The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.3

The CLASSLA-Stanza model for lemmatisation of standard Croatian 2.1

LiStr: Linguistic Structure Induction Tookit

The CLASSLA-StanfordNLP model for lemmatisation of non-standard Slovenian 1.1

UDify Pretrained Model

The CLASSLA-Stanza model for morphosyntactic annotation of non-standard Serbian 2.1

DG-POLFIE: POLFIE and Malt-based syntactic parser