CLARIN Tool Portal

703 record(s) found

Search results

Slowal (2018-06-29)

2 resources

Slowal is a web tool designed for creating, editing and browsing valence dictionaries. So far, it has mainly been used for creating The Polish Valence Dictionary (Walenty). Slowal supports the process of creating the dictionary; it also facilitates access by making it possible to browse the dictionary using an advanced built-in filtering system, covering both syntactic and semantic phenomena. Slowal also gives control over the work of lexicographers involved in creating dictionary, for instance by using predefined lists of values, which prevents spelling errors and enforces consistency, as well as by imposing strict validation rules. Last but not least, the created dictionary can be exported from Slowal in various formats: plain text, TeX, PDF, and TEI XML. This version was adapted for creating semantics of nouns and adjectives.

Use "Slowal (2018-06-29)"
Lithuanian Spelling Checker V.1.0.45 for LibreOffice and OpenOffice

1 resources

Lithuanian spelling checker for LIBREOFFICE / OPENOFFICE 2020-04-09 version 1.0.45

Use "Lithuanian Spelling Checker V.1.0.45 for LibreOffice and OpenOffice"
Pedagogic Corpus of Lithuanian

1 resources

The Pedagogic Corpus of Lithuanian is a monolingual specialized corpus, prepared for learning and teaching Lithuanian in a foreign language classroom. The pedagogic corpus includes authentic Lithuanian texts, selected using such criteria as a learner-relevant communicative function and genre. Spoken language as well as written language are represented in the corpus. The size of the corpus is 669,000 tokens: 111,000 tokens from texts and spoken language for A1-A2 levels, 558,000 tokens from texts and spoken language for B1-B2 levels (according to the Common European Framework of Reference for Languages). The spoken component constitutes appr. 7.5 % of the Corpus. The written subpart of the corpus (containing 620,000 tokens) includes levelled texts from coursebooks and unlevelled texts from other sources. The texts from coursebooks and other sources could be classified into 29 text types (dialogs, narratives, information, etc.) and 4 groups according to the communicative aims: informational texts, educational texts, advertising and fiction. There are two types of searches in the corpus: simple and advanced (see „Search Tips“). Simple Search allows you to find instances of a search item (word form, lemma, two words) in the whole corpus, or particular part of the corpus (spoken or written texts). After selecting the written subcorpus, you can further select the text type (coursebooks or non-coursebook texts) and/or the genre of the written texts. Advanced Search allows you to use all the features of simple search and find some additional options. Since the Pedagogic Corpus is morphologically annotated, the advanced search allows you to search by grammatical features (e.g. part of speech, case, number, verb form, etc.). At https://kalbu.vdu.lt/mokymosi-priemones/mokomasis-tekstynas/ you can find truncated wordlists: list of lemmas, word forms (for the whole corpus, spoken and written components, and for each level), lists of particular part of speech in the whole corpus. The lists can be downloaded as .xlsx files. REFERENCE Kovalevskaitė, Jolanta and Rimkutė, Erika. "Pedagogic Corpus of Lithuanian: A New Resource for Learning and Teaching Lithuanian as a Foreign Language" Sustainable Multilingualism, vol.17, no.1, 2020, pp.197-230. https://doi.org/10.2478/sm-2020-0019

Use "Pedagogic Corpus of Lithuanian"
Q-CAT Corpus Annotation Tool 1.3

2 resources

The Q-CAT (Querying-Supported Corpus Annotation Tool) is a computational tool for manual annotation of language corpora, which also enables advanced queries on top of these annotations. The tool has been used in various annotation campaigns related to the ssj500k reference training corpus of Slovenian (http://hdl.handle.net/11356/1210), such as named entities, dependency syntax, semantic roles and multi-word expressions, but it can also be used for adding new annotation layers of various types to this or other language corpora. Q-CAT is a .NET application, which runs on Windows operating system. Version 1.1 enables the automatic attribution of token IDs and personalized font adjustments. Version 1.2 supports the CONLL-U format and working with UD POS tags. Version 1.3 supports adding new layers of annotation on top of CONLL-U (and then saving the corpus as XML TEI).

Use "Q-CAT Corpus Annotation Tool 1.3"
Q-CAT Corpus Annotation Tool 1.2

2 resources

The Q-CAT (Querying-Supported Corpus Annotation Tool) is a computational tool for manual annotation of language corpora, which also enables advanced queries on top of these annotations. The tool has been used in various annotation campaigns related to the ssj500k reference training corpus of Slovenian (http://hdl.handle.net/11356/1210), such as named entities, dependency syntax, semantic roles and multi-word expressions, but it can also be used for adding new annotation layers of various types to this or other language corpora. Q-CAT is a .NET application, which runs on Windows operating system. Version 1.1 enables the automatic attribution of token IDs and personalized font adjustments. Version 1.2 supports the CONLL-U format and working with UD POS tags.

Use "Q-CAT Corpus Annotation Tool 1.2"
SELEXINI corpus

5 resources

We present here a large automatically annotated corpus for French. This corpus is divided into two parts: the first from BigScience, and the second from HPLT. The annotated documents from HPLT were selected in order to optimise the lexical diversity of the final corpus SELEXINI.

Use "SELEXINI corpus"
Heyra (1.0)

2 resources

Heyra is an Android application that provides three loosely coupled components, an implementation of Android's speech recognition interface, an intent handler activity for speech recognition actions from other applications and an input method service (i.e. virtual keyboard) that can either be used on its own or launched by supported applications. Heyra can be downloaded from the Google Play Store at https://play.google.com/store/apps/details?id=is.tiro.heyra Heyra er Android forrit sem inniheldur þrjá laustengda hluta; útfærslu á kerfisþjónustu í Android fyrir talgreiningu, meðhöndlara fyrir talgreiningaraðgerðir frá öðrum forritum og inntaksþjónustu (eða sýndarlyklaborð) sem hægt er að nota eitt og sér eða kalla á úr öðrum studdum forritum. Hægt er að sækja Heyra á Google Play Store á https://play.google.com/store/apps/details?id=is.tiro.heyra

Use "Heyra (1.0)"
ALEXIA: Lexicon Acquisition Tool for Icelandic (Orðtökutólið Alexía) 3.0 (21.08)

2 resources

ALEXIA is a command-line based corpus tool used for comparing a certain vocabulary to that of a larger corpus or corpora. In order to maintain lexicons, dictionaries and terminologies, it is necessary to be able to systematically go through large amounts of text considered to be representative of the language or category in question in order to find potential gaps in the data. ALEXIA provides an easy way to generate such candidate lists. In order to successfully run ALEXIA, the user must run main.py This script offers two language options, Icelandic and English. It guides the user through a series of options, including the necessary set-up of SQL-databases. After the setup is completed, the user is offered the option of continuing to the actual program. The user is greeted with a welcome message and asked whether to create the default databases for the demo version of the program or if they want to provide their own lexicon files. If the default set-up is chosen, the user must indicate whether to use the Database of Icelandic Morphology (DIM) or A Dictionary of Contemporary Icelandic (DCI) whose vocabulary is then compared to that of the Icelandic Gigaword Corpus (IGC). A number of filters is used to limit distortion from the results. __ ALEXIA er málheildartól sem er keyrt í gegnum skipanalínuna og tilgangur þess er að bera saman orðaforða gagnasafns við orðaforða stórrar málheildar. Það er nauðsynlegt til þess að viðhalda orðasöfnum, orðabókum og íðorðabönkum að geta farið kerfisbundið í gegnum mikið magn texta sem er álitinn táknrænn fyrir tungumálið eða efnisflokkinn sem er verið að skoða hverju sinni. ALEXIA býður upp á auðvelda leið til þess að smíða slíka orðalista. Til þess að nota orðtökutólið með góðum árangri þarf notandinn að keyra main.py í gegnum skipanalínuna2 Skriftan býður upp á tvo tungumálavalmöguleika, ensku og íslensku. Hún leiðir notandann í gegnum ýmsa valmöguleika, þar á meðal uppsetningu SQL-gagnagrunna. Að uppsetningunni lokinni er notandanum boðið að halda áfram í keyrsluhluta forritsins. Notandinn er spurður hvort eigi að búa til gagnagrunna í gegnum sjálfvirka uppsetningu eða hvort hann vilji leggja til eigin orðasafnsskjöl. Ef sjálfgefin uppsetning er valin þarf notandinn að gefa til kynna hvort nota eigi Beygingarlýsingu íslensks nútímamáls (BÍN) eða Nútímamálsorðabókina (NMO) sem inntak. Orðaforði þeirra er þá borinn saman við orðaforða Risamálheildarinnar (RMH). Ýmiskonar síum er beitt til þess að úttakið verði sem best. The linked video includes detailed description of the tool's use // Myndbandið sem fylgir hér í hlekk inniheldur nákvæmar upplýsingar um notkun tólsins.

Use "ALEXIA: Lexicon Acquisition Tool for Icelandic (Orðtökutólið Alexía) 3.0 (21.08)"
GreynirCorrect 3.4.5 (22.10)

2 resources

GreynirCorrect is a Python 3 package and a command line tool for checking and correcting various types of spelling and grammar errors in Icelandic text. GreynirCorrect relies on the Tokenizer package, by the same authors, to tokenize text (see http://hdl.handle.net/20.500.12537/262). More information can be found at https://github.com/icelandic-lt/GreynirCorrect, and detailed documentation at https://yfirlestur.is/doc/. GreynirCorrect er Python 3 pakki og skipanalínutól sem bendir á og leiðréttir ýmsar tegundir stafsetningar- og málvillna í íslenskum texta. GreynirCorrect reiðir sig á Tokenizer-pakkann, eftir sömu höfunda, til að tilreiða textann (see http://hdl.handle.net/20.500.12537/262). Frekari upplýsingar má finna á https://github.com/icelandic-lt/GreynirCorrect, og ítarlega skjölun (á ensku) á https://yfirlestur.is/doc/.

Use "GreynirCorrect 3.4.5 (22.10)"
g2p-service (20.11)

2 resources

[ENGLISH] g2p-service is a simple REST API made using a naive Flask wrapper for Sequitur(1) and fairseq(2) g2p models, a.k.a. pronunciation dictionaries.

Use "g2p-service (20.11)"

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

Slowal (2018-06-29)

Lithuanian Spelling Checker V.1.0.45 for LibreOffice and OpenOffice

Pedagogic Corpus of Lithuanian

Q-CAT Corpus Annotation Tool 1.3

Q-CAT Corpus Annotation Tool 1.2

SELEXINI corpus

Heyra (1.0)

ALEXIA: Lexicon Acquisition Tool for Icelandic (Orðtökutólið Alexía) 3.0 (21.08)

GreynirCorrect 3.4.5 (22.10)

g2p-service (20.11)