CLARIN Tool Portal

698 record(s) found

Search results

The CLASSLA-StanfordNLP model for lemmatisation of standard Croatian

2 resources

The model for lemmatisation of standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1183) and using the hrLex inflectional lexicon (http://hdl.handle.net/11356/1232). The estimated F1 of the lemma annotations is ~97.6.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Croatian"
The CLASSLA-StanfordNLP model for named entity recognition of standard Serbian 1.0

3 resources

This model for named entity recognition of standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200) and using the CLARIN.SI-embed.sr word embeddings (http://hdl.handle.net/11356/1206).

Use "The CLASSLA-StanfordNLP model for named entity recognition of standard Serbian 1.0"
The CLASSLA-Stanza model for morphosyntactic annotation of standard Bulgarian 2.1

3 resources

This model for morphosyntactic annotation of standard Bulgarian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the BulTreeBank training corpus (https://clarino.uib.no/korpuskel/corpora) and using the CLARIN.SI-embed.bg word embeddings (http://hdl.handle.net/11356/1796). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~96.83. The difference to the previous version of the model is that this version was trained using the new version of the Bulgarian word embeddings.

Use "The CLASSLA-Stanza model for morphosyntactic annotation of standard Bulgarian 2.1"
The CLASSLA-StanfordNLP model for lemmatisation of standard Bulgarian 1.1

2 resources

The model for lemmatisation of standard Bulgarian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the BulTreeBank training corpus (http://hdl.handle.net/11495/D93F-C6E9-65D9-2) and using the Bulgarian inflectional lexicon (Popov, Simov, and Vidinska 1998). The estimated F1 of the lemma annotations is ~98.8. The difference to the previous version of the lemmatizer is that now it relies solely on XPOS annotations, and not on a combination of UPOS, FEATS (lexicon lookup) and XPOS (lemma prediction) annotations.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Bulgarian 1.1"
The CLASSLA-Stanza model for named entity recognition of standard Slovenian 2.2

3 resources

This model for named entity recognition of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1959) and using the CLARIN.SI-embed.sl 2.0 word embeddings (http://hdl.handle.net/11356/1791). The difference to the previous version of the model is that the model was trained using the SUK training corpus and uses new embeddings.

Use "The CLASSLA-Stanza model for named entity recognition of standard Slovenian 2.2"
The CLASSLA-Stanza model for morphosyntactic annotation of standard Serbian 2.1

3 resources

The model for morphosyntactic annotation of standard Serbian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200) combined with the Croatian hr500k training dataset (http://hdl.handle.net/11356/1792) to ensure sufficient representation of certain labels. The CLARIN.SI-embed.sr word embeddings (http://hdl.handle.net/11356/1789) were used during training. The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~96.19. The difference to the previous version of the model is that this version was trained on the SETimes.SR corpus expanded with the Croatian hr500k training dataset to ensure sufficient representation of certain labels. it was also trained using the new version of Serbian word embeddings.

Use "The CLASSLA-Stanza model for morphosyntactic annotation of standard Serbian 2.1"
Liner2.5-events and event relations

1 resources

Liner2.5 configured for the recognition of event attributes and event relations

Use "Liner2.5-events and event relations"
WiKNN Text Classifier

2 resources

WiKNN is an online text classifier service for Polish and English texts. It supports hierarchical labelled classification of user-submitted texts with Wikipedia categories. WiKNN is available through a web-based interface (http://pelcra.clarin-pl.eu/tools/classifier/) and as a REST service with interactive documentation available at http://clarin.pelcra.pl/apidocs/wiknn.

Use "WiKNN Text Classifier"
Integrated Parser

2 resources

Integrated parser is an application that combines and normalizes outputs of several parsers for Polish. It is based on ENIAM processing stream extended with Polish Dependency Parser, Świgra and POLFIE. Particular parsers may turned on and off according to the user requirements.

Use "Integrated Parser"
CorPipe 24 Multilingual CorefUD 1.2 Model (corpipe24-corefud1.2-240906)

2 resources

The `corpipe24-corefud1.2-240906` is a `mT5-large`-based multilingual model for coreference resolution usable in CorPipe 24 (https://github.com/ufal/crac2024-corpipe). It is released under the CC BY-NC-SA 4.0 license. The model is language agnostic (no corpus id on input), so it can be in theory used to predict coreference in any `mT5` language. This model jointly predicts also the empty nodes needed for zero coreference. The paper introducing this model also presents an alternative two-stage approach first predicting empty nodes (via https://www.kaggle.com/models/ufal-mff/crac2024_zero_nodes_baseline/) and then performing coreference resolution (via http://hdl.handle.net/11234/1-5673), which is circa twice as slow but slightly better.

Use "CorPipe 24 Multilingual CorefUD 1.2 Model (corpipe24-corefud1.2-240906)"

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

The CLASSLA-StanfordNLP model for lemmatisation of standard Croatian

The CLASSLA-StanfordNLP model for named entity recognition of standard Serbian 1.0

The CLASSLA-Stanza model for morphosyntactic annotation of standard Bulgarian 2.1

The CLASSLA-StanfordNLP model for lemmatisation of standard Bulgarian 1.1

The CLASSLA-Stanza model for named entity recognition of standard Slovenian 2.2

The CLASSLA-Stanza model for morphosyntactic annotation of standard Serbian 2.1

Liner2.5-events and event relations

WiKNN Text Classifier

Integrated Parser

CorPipe 24 Multilingual CorefUD 1.2 Model (corpipe24-corefud1.2-240906)