Result filters

Metadata provider

Language

Resource type

Availability

  • Share-alike

Organisation

Keywords

  • computer-mediated communication

Active filters:

  • Availability: Share-alike
  • Keywords: computer-mediated communication
Loading...
18 record(s) found

Search results

  • The CLASSLA-StanfordNLP model for named entity recognition of non-standard Croatian 1.0

    This model for named entity recognition of non-standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1183), the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241) and the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240), using the CLARIN.SI-embed.hr word embeddings (http://hdl.handle.net/11356/1205). The training corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed.
  • The CLASSLA-StanfordNLP model for morphosyntactic annotation of non-standard Croatian 1.0

    This model for morphosyntactic annotation of non-standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1210), the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241), the RAPUT corpus (https://www.aclweb.org/anthology/L16-1513/) and the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240), using the CLARIN.SI-embed.hr word embeddings (http://hdl.handle.net/11356/1205). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~95.11.
  • The CLASSLA-StanfordNLP model for lemmatisation of non-standard Croatian 1.0

    The model for lemmatisation of non-standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1210), the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241), the RAPUT corpus (https://www.aclweb.org/anthology/L16-1513/) and the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240), using the hrLex inflectional lexicon (http://hdl.handle.net/11356/1232). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~97.54.
  • The CLASSLA-StanfordNLP model for named entity recognition of non-standard Slovenian 1.0

    This model for named entity recognition of non-standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and the Janes-Tag training corpus (http://hdl.handle.net/11356/1238), using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204). The training corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed.
  • The CLASSLA-Stanza model for lemmatisation of non-standard Serbian 2.1

    The model for lemmatisation of non-standard Serbian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200) and the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1794), using the srLex inflectional lexicon (http://hdl.handle.net/11356/1233). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~94.92. The difference to the previous version of the model is that this version is trained on a combination of two corpora (SETimes.SR, ReLDI-NormTagNER-sr).
  • The CLASSLA-StanfordNLP model for lemmatisation of non-standard Slovenian 1.1

    The model for lemmatisation of non-standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and the Janes-Tag corpus (http://hdl.handle.net/11356/1238), using the Sloleks inflectional lexicon (http://hdl.handle.net/11356/1230). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~98.86. The difference to the previous version of the lemmatizer is that now it relies solely on XPOS annotations, and not on a combination of UPOS, FEATS (lexicon lookup) and XPOS (lemma prediction) annotations.
  • The CLASSLA-StanfordNLP model for lemmatisation of non-standard Croatian 1.1

    The model for lemmatisation of non-standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1183), the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241), the RAPUT corpus (https://www.aclweb.org/anthology/L16-1513/) and the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240), using the hrLex inflectional lexicon (http://hdl.handle.net/11356/1232). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~97.54. The difference to the previous version of the lemmatizer is that now it relies solely on XPOS annotations, and not on a combination of UPOS, FEATS (lexicon lookup) and XPOS (lemma prediction) annotations.
  • The CLASSLA-Stanza model for morphosyntactic annotation of non-standard Slovenian 2.1

    This model for morphosyntactic annotation of non-standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and the Janes-Tag corpus (http://hdl.handle.net/11356/1732), using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) that were expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~92.17. The difference to the previous version of the model is that the model was trained on the SUK training corpus and the 3.0 version of Janes-tag, uses new embeddings and the new version of the Slovene morphological lexicon Sloleks 3.0 (http://hdl.handle.net/11356/1745).
  • The CLASSLA-StanfordNLP model for lemmatisation of non-standard Serbian 1.1

    The model for lemmatisation of non-standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200), the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240), the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241), the hr500k training corpus (http://hdl.handle.net/11356/1183) and the RAPUT corpus (https://www.aclweb.org/anthology/L16-1513/), using the srLex inflectional lexicon (http://hdl.handle.net/11356/1233). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~97.62. The difference to the previous version of the lemmatizer is that now it relies solely on XPOS annotations, and not on a combination of UPOS, FEATS (lexicon lookup) and XPOS (lemma prediction) annotations.
  • The CLASSLA-StanfordNLP model for morphosyntactic annotation of non-standard Serbian 1.0

    This model for morphosyntactic annotation of non-standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200), the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240), the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241), the hr500k training corpus (http://hdl.handle.net/11356/1210) and the RAPUT corpus (https://www.aclweb.org/anthology/L16-1513/), using the CLARIN.SI-embed.sr word embeddings (http://hdl.handle.net/11356/1206). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~94.91.