Result filters

Metadata provider

Language

Resource type

Availability

Loading...
703 record(s) found

Search results

  • IceParser 1.5.0

    IceParser is a shallow parser for Icelandic. The parser comprises a sequence of finite-state transducers, which add syntactic information, in an incremental manner, into the input text. The input to IceParser is part-of-speech (PoS) tagged text and it produces output which includes annotation of both constituent structure and syntactic functions. The distributed file contains the entirety of IceNLP, a toolkit of various NLP tools for processing and analysing Icelandic. The current version of IceParser in IceNLP has been specifically changed and updated to be able to annotate input tagged with the revised Icelandic POS tagset. --- IceParser er hlutaþáttari fyrir íslensku. Þáttarinn samanstendur af röð af stöðuferjöldum sem bæta setningafræðilegum upplýsingum inn í inntakstextann á stigvaxandi hátt. Inntakið í IceParser er markaður texti og þáttarinn skilar af sér úttaki sem inniheldur bæði merkingar á setningaliðum og setningafræðilegum hlutverkum. Skráin sem fylgir inniheldur allt IceNLP, þ.e. safn tóla til að vinna með og greina íslensku. Núverandi útgáfa af IceParser í IceNLP hefur verið breytt og uppfærð til að greina og merkja inntak sem er markað með hinu endurskoðað íslenska markamengi.
  • Tokenizer for Icelandic text (2.0.3)

    Tokenizer is a compact pure-Python (2 and 3) executable program and module for tokenizing Icelandic text. It converts input text to streams of tokens, where each token is a separate word, punctuation sign, number/amount, date, e-mail, URL/URI, etc. It also segments the token stream into sentences, considering corner cases such as abbreviations and dates in the middle of sentences.
  • Icelandic NER API - Ensamble model (21.09)

    A dockerized Named Entity Recognition (NER) API for Icelandic. It uses a the IceBERT language model from Miðeind as its primary model, but it also offers the possibility to use 3 other transformer language models with it ( ELECTRA-base, convbert-small, and multilingual-BERT) and combines them with CombiTagger. They were all fine tuned for NER using MIM-GOLD-NER. IceBERT was the best individual model as it achieves F1-score of ~92.73 on the test set for MIM-GOLD-NER, while the combination of the four, in the form of CombiTagger, achieved F1-score of 93.21. The code for the API is available at https://github.com/icelandic-lt/Icelandic-NER-API and the files for the fine tuned models are available in this submission. Dockerútfærð forritaskil fyrir nafnakennsl (NER) á íslensku. Þau notast við IceBERT mállíkan frá Miðeind sem sitt megin líkan, en þau bjóða líka upp á möguleikann að láta IceBERT vinna með 3 öðrum líkönum (ELECTRA-base, convbert-small og multilingual-BERT). Þau hafa öll verið fínstillt fyrir NER með nafnakennslamálheildinni MIM-GOLD-NER. Ef við skoðum hvert líkan fyrir sig, þá er IceBERT líkanið best, en það nær 92.73 í F1, á meðn CombiTagger nær 93.21 í F1. Forritunarkóðinn fyrir forritaskilinu eru aðgengileg hérna: https://github.com/icelandic-lt/Icelandic-NER-API og skrárnar fyrir fínstilltu líkönin má finna í þessari færslu.
  • Trankit model for SST 2.15

    This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the SST treebank of spoken Slovenian (UD v2.15, https://github.com/UniversalDependencies/UD_Slovenian-SST/tree/dev) featuring transcriptions of spontaneous speech in various everyday settings. It is able to predict sentence segmentation, tokenization, lemmatization, language-specific morphological annotation (MULTEXT-East morphosyntactic tags), as well as universal part-of-speech tagging, morphological feature prediction, and dependency parses in accordance with the Universal Dependencies annotation scheme (https://universaldependencies.org/). Please note this model has been published for archiving purposes only. For production use, we recommend using the state-of-the art Trankit model available here: http://hdl.handle.net/11356/1965. The latter was trained on both spoken (SST) and written (SSJ) data, and demonstrates a significantly higher performance to the model featured in this submission.
  • UDPipe

    UDPipe is an trainable pipeline for tokenization, tagging, lemmatization and dependency parsing of CoNLL-U files. UDPipe is language-agnostic and can be trained given only annotated data in CoNLL-U format. Trained models are provided for nearly all UD treebanks. UDPipe is available as a binary, as a library for C++, Python, Perl, Java, C#, and as a web service. UDPipe is a free software under Mozilla Public License 2.0 (http://www.mozilla.org/MPL/2.0/) and the linguistic models are free for non-commercial use and distributed under CC BY-NC-SA (http://creativecommons.org/licenses/by-nc-sa/4.0/) license, although for some models the original data used to create the model may impose additional licensing conditions. UDPipe is versioned using Semantic Versioning (http://semver.org/). UDPipe website http://ufal.mff.cuni.cz/udpipe contains download links of both the released packages and trained models, hosts documentation and offers online demo. UDPipe development repository http://github.com/ufal/udpipe is hosted on GitHub.
  • COMBO-based UD Parser 22.10

    ENGLISH: This Universal Dependencies parser for Icelandic was trained with COMBO on IcePaHC and UD_Icelandic-Modern, the latter one having been revised before training, as some duplicate sentences had to be removed. It utilizes information from an ELECTRA language model (https://huggingface.co/jonfd/electra-base-igc-is). Its UAS (unlabeled attachment score) is 89.13 and its LAS (labeled attachment score) is 85.97.
  • Yfirlestur 1.0.1 (22.10)

    Yfirlestur.is is a public website where you can enter or submit your Icelandic text and have it checked for spelling and grammar errors. The tool also gives hints on words and structures that might not be appropriate, depending on the intended audience for the text. The core spelling and grammar checking functionality of Yfirlestur.is is provided by the GreynirCorrect engine, by the same authors. This software is licensed under the MIT License. More information at https://github.com/icelandic-lt/Yfirlestur. Yfirlestur.is er opin vefsíða þar sem hægt er að senda inn íslenskan texta og finna stafsetningar- og málfræðivillur. Kerfið veitir einnig upplýsingar um orð og setningastrúktúra sem eru mögulega óviðeigandi fyrir ætlaðan lesendahóp textans. Málrýnivirknin Yfirlestur.is er fengin með GreynirCorrect eftir sömu höfunda. Frekari upplýsingar má finna á https://github.com/icelandic-lt/Yfirlestur.