Result filters

Metadata provider

Language

Resource type

Availability

Loading...
703 record(s) found

Search results

  • Slowal (2018-06-29)

    Slowal is a web tool designed for creating, editing and browsing valence dictionaries. So far, it has mainly been used for creating The Polish Valence Dictionary (Walenty). Slowal supports the process of creating the dictionary; it also facilitates access by making it possible to browse the dictionary using an advanced built-in filtering system, covering both syntactic and semantic phenomena. Slowal also gives control over the work of lexicographers involved in creating dictionary, for instance by using predefined lists of values, which prevents spelling errors and enforces consistency, as well as by imposing strict validation rules. Last but not least, the created dictionary can be exported from Slowal in various formats: plain text, TeX, PDF, and TEI XML. This version was adapted for creating semantics of nouns and adjectives.
  • Pedagogic Corpus of Lithuanian

    The Pedagogic Corpus of Lithuanian is a monolingual specialized corpus, prepared for learning and teaching Lithuanian in a foreign language classroom. The pedagogic corpus includes authentic Lithuanian texts, selected using such criteria as a learner-relevant communicative function and genre. Spoken language as well as written language are represented in the corpus. The size of the corpus is 669,000 tokens: 111,000 tokens from texts and spoken language for A1-A2 levels, 558,000 tokens from texts and spoken language for B1-B2 levels (according to the Common European Framework of Reference for Languages). The spoken component constitutes appr. 7.5 % of the Corpus. The written subpart of the corpus (containing 620,000 tokens) includes levelled texts from coursebooks and unlevelled texts from other sources. The texts from coursebooks and other sources could be classified into 29 text types (dialogs, narratives, information, etc.) and 4 groups according to the communicative aims: informational texts, educational texts, advertising and fiction. There are two types of searches in the corpus: simple and advanced (see „Search Tips“). Simple Search allows you to find instances of a search item (word form, lemma, two words) in the whole corpus, or particular part of the corpus (spoken or written texts). After selecting the written subcorpus, you can further select the text type (coursebooks or non-coursebook texts) and/or the genre of the written texts. Advanced Search allows you to use all the features of simple search and find some additional options. Since the Pedagogic Corpus is morphologically annotated, the advanced search allows you to search by grammatical features (e.g. part of speech, case, number, verb form, etc.). At https://kalbu.vdu.lt/mokymosi-priemones/mokomasis-tekstynas/ you can find truncated wordlists: list of lemmas, word forms (for the whole corpus, spoken and written components, and for each level), lists of particular part of speech in the whole corpus. The lists can be downloaded as .xlsx files. REFERENCE Kovalevskaitė, Jolanta and Rimkutė, Erika. "Pedagogic Corpus of Lithuanian: A New Resource for Learning and Teaching Lithuanian as a Foreign Language" Sustainable Multilingualism, vol.17, no.1, 2020, pp.197-230. https://doi.org/10.2478/sm-2020-0019
  • Q-CAT Corpus Annotation Tool 1.3

    The Q-CAT (Querying-Supported Corpus Annotation Tool) is a computational tool for manual annotation of language corpora, which also enables advanced queries on top of these annotations. The tool has been used in various annotation campaigns related to the ssj500k reference training corpus of Slovenian (http://hdl.handle.net/11356/1210), such as named entities, dependency syntax, semantic roles and multi-word expressions, but it can also be used for adding new annotation layers of various types to this or other language corpora. Q-CAT is a .NET application, which runs on Windows operating system. Version 1.1 enables the automatic attribution of token IDs and personalized font adjustments. Version 1.2 supports the CONLL-U format and working with UD POS tags. Version 1.3 supports adding new layers of annotation on top of CONLL-U (and then saving the corpus as XML TEI).
  • Q-CAT Corpus Annotation Tool 1.2

    The Q-CAT (Querying-Supported Corpus Annotation Tool) is a computational tool for manual annotation of language corpora, which also enables advanced queries on top of these annotations. The tool has been used in various annotation campaigns related to the ssj500k reference training corpus of Slovenian (http://hdl.handle.net/11356/1210), such as named entities, dependency syntax, semantic roles and multi-word expressions, but it can also be used for adding new annotation layers of various types to this or other language corpora. Q-CAT is a .NET application, which runs on Windows operating system. Version 1.1 enables the automatic attribution of token IDs and personalized font adjustments. Version 1.2 supports the CONLL-U format and working with UD POS tags.
  • SELEXINI corpus

    We present here a large automatically annotated corpus for French. This corpus is divided into two parts: the first from BigScience, and the second from HPLT. The annotated documents from HPLT were selected in order to optimise the lexical diversity of the final corpus SELEXINI.
  • Heyra (1.0)

    Heyra is an Android application that provides three loosely coupled components, an implementation of Android's speech recognition interface, an intent handler activity for speech recognition actions from other applications and an input method service (i.e. virtual keyboard) that can either be used on its own or launched by supported applications. Heyra can be downloaded from the Google Play Store at https://play.google.com/store/apps/details?id=is.tiro.heyra Heyra er Android forrit sem inniheldur þrjá laustengda hluta; útfærslu á kerfisþjónustu í Android fyrir talgreiningu, meðhöndlara fyrir talgreiningaraðgerðir frá öðrum forritum og inntaksþjónustu (eða sýndarlyklaborð) sem hægt er að nota eitt og sér eða kalla á úr öðrum studdum forritum. Hægt er að sækja Heyra á Google Play Store á https://play.google.com/store/apps/details?id=is.tiro.heyra
  • ALEXIA: Lexicon Acquisition Tool for Icelandic (Orðtökutólið Alexía) 3.0 (21.08)

    ALEXIA is a command-line based corpus tool used for comparing a certain vocabulary to that of a larger corpus or corpora. In order to maintain lexicons, dictionaries and terminologies, it is necessary to be able to systematically go through large amounts of text considered to be representative of the language or category in question in order to find potential gaps in the data. ALEXIA provides an easy way to generate such candidate lists. In order to successfully run ALEXIA, the user must run main.py This script offers two language options, Icelandic and English. It guides the user through a series of options, including the necessary set-up of SQL-databases. After the setup is completed, the user is offered the option of continuing to the actual program. The user is greeted with a welcome message and asked whether to create the default databases for the demo version of the program or if they want to provide their own lexicon files. If the default set-up is chosen, the user must indicate whether to use the Database of Icelandic Morphology (DIM) or A Dictionary of Contemporary Icelandic (DCI) whose vocabulary is then compared to that of the Icelandic Gigaword Corpus (IGC). A number of filters is used to limit distortion from the results. __ ALEXIA er málheildartól sem er keyrt í gegnum skipanalínuna og tilgangur þess er að bera saman orðaforða gagnasafns við orðaforða stórrar málheildar. Það er nauðsynlegt til þess að viðhalda orðasöfnum, orðabókum og íðorðabönkum að geta farið kerfisbundið í gegnum mikið magn texta sem er álitinn táknrænn fyrir tungumálið eða efnisflokkinn sem er verið að skoða hverju sinni. ALEXIA býður upp á auðvelda leið til þess að smíða slíka orðalista. Til þess að nota orðtökutólið með góðum árangri þarf notandinn að keyra main.py í gegnum skipanalínuna2 Skriftan býður upp á tvo tungumálavalmöguleika, ensku og íslensku. Hún leiðir notandann í gegnum ýmsa valmöguleika, þar á meðal uppsetningu SQL-gagnagrunna. Að uppsetningunni lokinni er notandanum boðið að halda áfram í keyrsluhluta forritsins. Notandinn er spurður hvort eigi að búa til gagnagrunna í gegnum sjálfvirka uppsetningu eða hvort hann vilji leggja til eigin orðasafnsskjöl. Ef sjálfgefin uppsetning er valin þarf notandinn að gefa til kynna hvort nota eigi Beygingarlýsingu íslensks nútímamáls (BÍN) eða Nútímamálsorðabókina (NMO) sem inntak. Orðaforði þeirra er þá borinn saman við orðaforða Risamálheildarinnar (RMH). Ýmiskonar síum er beitt til þess að úttakið verði sem best. The linked video includes detailed description of the tool's use // Myndbandið sem fylgir hér í hlekk inniheldur nákvæmar upplýsingar um notkun tólsins.
  • GreynirCorrect 3.4.5 (22.10)

    GreynirCorrect is a Python 3 package and a command line tool for checking and correcting various types of spelling and grammar errors in Icelandic text. GreynirCorrect relies on the Tokenizer package, by the same authors, to tokenize text (see http://hdl.handle.net/20.500.12537/262). More information can be found at https://github.com/icelandic-lt/GreynirCorrect, and detailed documentation at https://yfirlestur.is/doc/. GreynirCorrect er Python 3 pakki og skipanalínutól sem bendir á og leiðréttir ýmsar tegundir stafsetningar- og málvillna í íslenskum texta. GreynirCorrect reiðir sig á Tokenizer-pakkann, eftir sömu höfunda, til að tilreiða textann (see http://hdl.handle.net/20.500.12537/262). Frekari upplýsingar má finna á https://github.com/icelandic-lt/GreynirCorrect, og ítarlega skjölun (á ensku) á https://yfirlestur.is/doc/.