CLARIN Tool Portal

Active filters:

Organisation: Jožef Stefan Institute
Project: Treebank-Driven Approach to the Study of Spoken Slovenian

6 record(s) found

Search results

The CLASSLA-Stanza model for lemmatisation of spoken Slovenian 2.2

2 resources

This model for lemmatisation of spoken Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SST treebank of spoken Slovenian (https://github.com/UniversalDependencies/UD_Slovenian-SST) combined with the SUK training corpus (http://hdl.handle.net/11356/1959) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1791) that were expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The estimated F1 of the lemma annotations is ~99.23.

Use "The CLASSLA-Stanza model for lemmatisation of spoken Slovenian 2.2"
The CLASSLA-Stanza model for UD dependency parsing of spoken Slovenian 2.2

3 resources

This model for UD dependency parsing of spoken Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SST treebank of spoken Slovenian (https://github.com/UniversalDependencies/UD_Slovenian-SST) combined with the SUK training corpus (http://hdl.handle.net/11356/1959) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1791) that were expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The estimated LAS of the parser is ~81.91.

Use "The CLASSLA-Stanza model for UD dependency parsing of spoken Slovenian 2.2"
The CLASSLA-Stanza model for UD dependency parsing of standard Slovenian 2.2

3 resources

This model for UD dependency parsing of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The estimated LAS of the parser is ~90.42. The difference to the previous version of the model is that the model was trained using the improved SUK 1.1 version of the training corpus.

Use "The CLASSLA-Stanza model for UD dependency parsing of standard Slovenian 2.2"
The CLASSLA-Stanza model for morphosyntactic annotation of spoken Slovenian 2.2

3 resources

This model for morphosyntactic annotation of spoken Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SST treebank of spoken Slovenian (https://github.com/UniversalDependencies/UD_Slovenian-SST) combined with the SUK training corpus (http://hdl.handle.net/11356/1959) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1791) that were expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~96.76.

Use "The CLASSLA-Stanza model for morphosyntactic annotation of spoken Slovenian 2.2"
The CLASSLA-Stanza model for named entity recognition of standard Slovenian 2.2

3 resources

This model for named entity recognition of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1959) and using the CLARIN.SI-embed.sl 2.0 word embeddings (http://hdl.handle.net/11356/1791). The difference to the previous version of the model is that the model was trained using the SUK training corpus and uses new embeddings.

Use "The CLASSLA-Stanza model for named entity recognition of standard Slovenian 2.2"
Q-CAT Corpus Annotation Tool 1.5

2 resources

The Q-CAT (Querying-Supported Corpus Annotation Tool) is a tool for manual linguistic annotation of corpora, which also enables advanced queries on top of these annotations. The tool has been used in various annotation campaigns related to the ssj500k reference training corpus of Slovenian (http://hdl.handle.net/11356/1210), such as named entities, dependency syntax, semantic roles and multi-word expressions, but it can also be used for adding new annotation layers of various types to this or other language corpora. Q-CAT is a .NET application, which runs on Windows operating system. Version 1.1 enables the automatic attribution of token IDs and personalized font adjustments. Version 1.2 supports the CONLL-U format and working with UD POS tags. Version 1.3 supports adding new layers of annotation on top of CONLL-U (and then saving the corpus as XML TEI). Version 1.4 introduces new features in command line mode (filtering by sentence ID, multiple link type visualizations) Version 1.5 supports listening to audio recordings (provided in the # sound_url comment line in CONLL-U)

Use "Q-CAT Corpus Annotation Tool 1.5"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Organisation

Project

Keywords

Active filters:

Search results

The CLASSLA-Stanza model for lemmatisation of spoken Slovenian 2.2

The CLASSLA-Stanza model for UD dependency parsing of spoken Slovenian 2.2

The CLASSLA-Stanza model for UD dependency parsing of standard Slovenian 2.2

The CLASSLA-Stanza model for morphosyntactic annotation of spoken Slovenian 2.2

The CLASSLA-Stanza model for named entity recognition of standard Slovenian 2.2

Q-CAT Corpus Annotation Tool 1.5