Result filters

Metadata provider

Language

Resource type

Availability

Loading...
703 record(s) found

Search results

  • The CLASSLA-Stanza model for morphosyntactic annotation of non-standard Croatian 2.1

    This model for morphosyntactic annotation of non-standard Croatian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the hr500k training corpus (http://hdl.handle.net/11356/1792) and the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1793), using the CLARIN.SI-embed.hr word embeddings (http://hdl.handle.net/11356/1790). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~92.49. The difference to the previous version of the model is that this version uses the new version of Croatian word embeddings and is trained on a combination of two datasets (hr500k, ReLDI-NormTagNER-hr).
  • The CLASSLA-Stanza model for lemmatisation of non-standard Slovenian 2.1

    This model for lemmatisation of non-standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and the Janes-Tag corpus (http://hdl.handle.net/11356/1732), using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) that were expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~91.45. The difference to the previous version of the model is that the model was trained on the SUK training corpus and the 3.0 version of Janes-tag, uses new embeddings and the new version of the Slovene morphological lexicon Sloleks 3.0 (http://hdl.handle.net/11356/1745).
  • CorPipe 23 multilingual CorefUD 1.1 model (corpipe23-corefud1.1-231206)

    The `corpipe23-corefud1.1-231206` is a `mT5-large`-based multilingual model for coreference resolution usable in CorPipe 23 (https://github.com/ufal/crac2023-corpipe). It is released under the CC BY-NC-SA 4.0 license. The model is language agnostic (no _corpus id_ on input), so it can be used to predict coreference in any `mT5` language (for zero-shot evaluation, see the paper). However, note that the empty nodes must be present already on input, they are not predicted (the same settings as in the CRAC23 shared task).
  • VIADAT (2019-12-31)

    This component integrates other VIADAT modules; together with VIADAT-REPO this composes the Virtual Assistant for accessing historical audiovisual data. The zip archive contains sources for the following modules: VIADAT, VIADAT-DEPOSIT, VIADAT-TEXT, VIADAT-ANNOTATE, VIADAT-ANALYZE, VIADAT-STAT, VIADAT-GIS and VIADAT-SEARCH. Developed in cooperation with ÚSD AV ČR and NFA.
  • The CLASSLA-StanfordNLP model for named entity recognition of standard Slovenian 1.0

    This model for named entity recognition of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204).
  • Universal Dependencies 2.0 Models for UDPipe (2017-08-01)

    Tokenizer, POS Tagger, Lemmatizer and Parser models for all 50 languages of Universal Depenencies 2.0 Treebanks, created solely using UD 2.0 data (http://hdl.handle.net/11234/1-1983). The model documentation including performance can be found at http://ufal.mff.cuni.cz/udpipe/users-manual#universal_dependencies_20_models . To use these models, you need UDPipe binary version at least 1.2, which you can download from http://ufal.mff.cuni.cz/udpipe . In addition to models itself, all additional data and value of hyperparameters used for training are available in the second archive, allowing reproducible training.
  • Universal Dependencies 2.5 Models for UDPipe (2019-12-06)

    Tokenizer, POS Tagger, Lemmatizer and Parser models for 94 treebanks of 61 languages of Universal Depenencies 2.5 Treebanks, created solely using UD 2.5 data (http://hdl.handle.net/11234/1-3105). The model documentation including performance can be found at http://ufal.mff.cuni.cz/udpipe/models#universal_dependencies_25_models . To use these models, you need UDPipe binary version at least 1.2, which you can download from http://ufal.mff.cuni.cz/udpipe . In addition to models itself, all additional data and value of hyperparameters used for training are available in the second archive, allowing reproducible training.
  • EXMARaLDA

    **EXMARaLDA** is a system for working with oral corpora on a computer. It consists of a transcription and annotation tool ([Partitur-Editor](https://exmaralda.org/en/partitur-editor-2/ "Partitur Editor")), a tool for managing corpora ([Corpus-Manager](https://exmaralda.org/en/corpus-manager-coma-2/ "Corpus-Manager (Coma)")) and a query and analysis tool ([EXAKT](https://exmaralda.org/en/exakt-3/ "EXAKT")). **EXMARaLDA's** features include, for instance: - time-aligned transcription of digital audio or video - flexible annotation for freely choosable categories, - systematic documentation of a corpus through metadata - flexible output of transcription data in various layouts and formats (notation, document) - computer-assisted querying of transcription, annotation and metadata - interoperable as it works XML based data formats that allow for data exchange with other tools (like Praat, ELAN, Transcriber etc.) and enable a flexible processing and sustainable usage of the data. **EXMARaLDA** is used by [researchers world wide](https://exmaralda.org/en/projects/ "Projekte") in different contexts in which spoken language is analysed, these include: - conversation and discourse analysis, - study of language acquisition and multilingualism, - phonetics and phonology, - dialectology and sociolinguistics. **EXMARaLDA** was developed in the project "Computer assisted methods for the creation and analysis of multilingual data" at the Collaborative Research Center "Multilingualism" (Sonderforschungsbereich "Mehrsprachigkeit" – SFB 538) at the University of Hamburg. Since July 2011, the development of EXMARaLDA is continued at the [Hamburg Centre for Language Corpora](https://corpora.uni-hamburg.de/drupal/en), since November 2011 in cooperation with the [Archive for Spoken German](http://agd.ids-mannheim.de/index.shtml) at the Institute for the German Language in Mannheim.