CLARIN Tool Portal

703 record(s) found

Search results

EpiLexO

2 resources

EpiLexO is a user friendly web application for the creation and editing of an integrated system of language resources for ancient fragmentary languages centered on the lexicon, in compliance with current digital humanities and Linked Open Data principles. EpiLexo allows for the editing of lexica with all relevant cross-references: for their linking to their testimonies, as well as to bibliographic information and other (external) resources and common vocabularies. This front-end application rests on a Service-Oriented Architecture with two main back-end components, the LexO-server (\handle) and the CASH-server (1github), which manage lexica and textual documents respectively via Rest-ful APIs web-services, plus additional services for the management of other aspects such as access and authentication, XML rendering, etc. All code is available on https://github.com/DigItAnt/ The application has been developed in the context of a project on the languages of fragmentary attestation of ancient Italy, but can be applied to other similar contexts.

Use "EpiLexO"
Toposław 2

3 resources

Toposław 2 is an editor of multi-word unit inflection lexicons.

Use "Toposław 2"
CombiTagger

3 resources

The main purpose of CombiTagger is to read datafiles generated by individual taggers and use them to develop a combined tagger according to a specified algorithm. The system provides algorithms for simple and weighted voting, but it is extensible so that other combination algorithms can be added easily CombiTagger is implemented in Java.

Use "CombiTagger"
KPWr n82 NER model (on Polish RoBERTa base)

2 resources

The named entity recognition model for fine-grained categories of entities (82 types) was trained on the KPWr corpus using Polish RoBERTa base language model. Details can be found on the following page: https://github.com/mczuk/xlm-roberta-ner

Use "KPWr n82 NER model (on Polish RoBERTa base)"
The CLASSLA-Stanza model for morphosyntactic annotation of standard Bulgarian 2.1

3 resources

This model for morphosyntactic annotation of standard Bulgarian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the BulTreeBank training corpus (https://clarino.uib.no/korpuskel/corpora) and using the CLARIN.SI-embed.bg word embeddings (http://hdl.handle.net/11356/1796). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~96.83. The difference to the previous version of the model is that this version was trained using the new version of the Bulgarian word embeddings.

Use "The CLASSLA-Stanza model for morphosyntactic annotation of standard Bulgarian 2.1"
Grafon

1 resources

Representation of sentence semantic with deepened semantic graphs. Graphs are composed based on the output of saper tool https://clarin-pl.eu/dspace/handle/11321/278

Use "Grafon"
Liner2.6 model NER NKJP

3 resources

Liner2.6 NER NKJP model The package contains a pre-trained Liner2 (https://github.com/CLARIN-PL/Liner2) model for recognition named entities according to NKJP guidelines. The model was trained on the NKJP corpus (http://nkjp.pl/) and evaluated in the PolEval 2018 Task 2 (http://poleval.pl/tasks/). The model won third place with the following results: Exact — 0.778, Overlap — 0.818, Final — 0.810. References: * NKJP corpus in TEI format — http://clip.ipipan.waw.pl/NationalCorpusOfPolish?action=AttachFile&do=view&target=NKJP-PodkorpusMilionowy-1.2.tar.gz * PolEval 2018 Task 2 evaluation corpus — http://mozart.ipipan.waw.pl/~axw/poleval2018/

Use "Liner2.6 model NER NKJP"
Universal Dependencies 2.3 Models for UDPipe (2018-11-15)

87 resources

Tokenizer, POS Tagger, Lemmatizer and Parser models for 84 treebanks of 56 languages of Universal Depenencies 2.3 Treebanks, created solely using UD 2.3 data (http://hdl.handle.net/11234/1-2895). The model documentation including performance can be found at http://ufal.mff.cuni.cz/udpipe/models#universal_dependencies_23_models . To use these models, you need UDPipe binary version at least 1.2, which you can download from http://ufal.mff.cuni.cz/udpipe . In addition to models itself, all additional data and value of hyperparameters used for training are available in the second archive, allowing reproducible training.

Use "Universal Dependencies 2.3 Models for UDPipe (2018-11-15)"
The CLASSLA-StanfordNLP model for named entity recognition of standard Serbian 1.0

3 resources

This model for named entity recognition of standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200) and using the CLARIN.SI-embed.sr word embeddings (http://hdl.handle.net/11356/1206).

Use "The CLASSLA-StanfordNLP model for named entity recognition of standard Serbian 1.0"
EVALD 3.0 – Evaluator of Discourse

3 resources

EVALD 3.0 serves for automatic evaluation of surface coherence (cohesion) in Czech texts written by native speakers of Czech.

Use "EVALD 3.0 – Evaluator of Discourse"

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

EpiLexO

Toposław 2

CombiTagger

KPWr n82 NER model (on Polish RoBERTa base)

The CLASSLA-Stanza model for morphosyntactic annotation of standard Bulgarian 2.1

Grafon

Liner2.6 model NER NKJP

Universal Dependencies 2.3 Models for UDPipe (2018-11-15)

The CLASSLA-StanfordNLP model for named entity recognition of standard Serbian 1.0

EVALD 3.0 – Evaluator of Discourse