Result filters

Metadata provider

  • DSpace

Language

Resource type

Tool task

Keywords

  • embeddings

Active filters:

  • Metadata provider: DSpace
  • Keywords: embeddings
Loading...
2 record(s) found

Search results

  • Icelandic Homograph Classifier (24.04.)

    IceHoC is a binary classifier for Icelandic homographs following the pattern V-ll-(V|$) where the 'll' can be pronounced either /tl/ or /l/. The classifier was trained on the Labeled Corpus of Icelandic Homographs (http://hdl.handle.net/20.500.12537/327). Please refer to the projects README for further discussions and guidelines for usage. IceHoC er tól sem flokkar íslensk samstafa orð sem fylgja mynstrinu V-ll-(V|$), eða sérhljóð-ll-sérhljóð_eða_lok_orðs. Í þessum orðum er 'll' borið fram ýmist /tl/ eða /l/, eftir merkingu orðsins. IceHoC var þjálfað á málheild íslenskra samstafa orða (http://hdl.handle.net/20.500.12537/327). Fyrir nánari umfjöllun og leiðbeiningar um notkun, sjá README.
  • LitLat BERT

    Trilingual BERT-like (Bidirectional Encoder Representations from Transformers) model, trained on Lithuanian, Latvian, and English data. State of the art tool representing words/tokens as contextually dependent word embeddings, used for various NLP classification tasks by fine-tuning the model end-to-end. LitLat BERT are neural network weights and configuration files in pytorch format (i.e. to be used with pytorch library). The corpora used for training the model have 4.07 billion tokens in total, of which 2.32 billion are English, 1.21 billion are Lithuanian and 0.53 billion are Latvian. LitLat BERT is based on XLM-RoBERTa model and comes in two versions, one for usage with transformers library (https://github.com/huggingface/transformers), and one for usage with fairseq library (https://github.com/pytorch/fairseq). More information is in the readme.txt.