CLARIN Tool Portal

698 record(s) found

Search results

Slovene Punctuation and Capitalisation model RSDO-DS2-P&C 3.6

2 resources

This Punctuation and Capitalisation model was trained following the NVIDIA NeMo Punctuation and Capitalisation recipe (for details see the official NVIDIA NeMo P&C documentation, https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/punctuation_and_capitalization.html, and NVIDIA NeMo GitHub repository https://github.com/NVIDIA/NeMo). It provides functionality for restoring punctuation (,.!?) and capital letters in lowercased non-punctuated Slovene text. The training corpus was built from publicly available datasets, as well as a small portion of proprietary data. In total the training corpus consisted of 38.829.529 sentences and the validation corpus consisted of 2.092.497 sentences.

Use "Slovene Punctuation and Capitalisation model RSDO-DS2-P&C 3.6"
Spellchecking app for Android (22.10)

2 resources

ENGLISH: This is an Android application which provides spell and grammar checking for Icelandic. The app is available on Google Play Store under the name "Réttritun". The source code is written in Kotlin and could be used as a base for Android app projects that need an Icelandic spell checking service. The app uses the spell checker service as impelmented by Miðeind ehf. in the Language Technology Program. See also: http://hdl.handle.net/20.500.12537/266 and http://hdl.handle.net/20.500.12537/270 ÍSLENSKA: Réttritun er Android app sem býður upp á málrýni fyrir íslensku. Appið er hægt að nálgast á Google Play Store. Kóðinn er skrifaður í Kotlin og gæti verið notaður sem grunnur fyrir önnur Android app verkefni sem vilja nýta málrýni fyrir íslensku. Appið notar málrýniþjónustu eins og þá sem Miðeind ehf. þróaði innan Máltækniáætlunarinnar. Sjá: http://hdl.handle.net/20.500.12537/266 and http://hdl.handle.net/20.500.12537/270

Use "Spellchecking app for Android (22.10)"
SELEXINI corpus

5 resources

We present here a large automatically annotated corpus for French. This corpus is divided into two parts: the first from BigScience, and the second from HPLT. The annotated documents from HPLT were selected in order to optimise the lexical diversity of the final corpus SELEXINI.

Use "SELEXINI corpus"
GreynirPackage v3.5.1

3 resources

GreynirPackage is a Python 3 package for working with Icelandic natural language text. Greynir can parse text into sentence trees, find lemmas, inflect noun phrases, assign part-of-speech tags and much more. Greynir's sentence trees can inter alia be used to extract information from text, for instance about people, titles, entities, facts, actions and opinions. Greynir uses the Tokenizer package, by the same authors, to tokenize text. More information at https://github.com/mideind/GreynirPackage and detailed documentation at https://greynir.is/doc/. GreynirPackage er Python 3 pakki sem vinnur með íslenskan texta. Greynir þáttar texta í setningar, lemmar og markar texta, beygir nafnliði og margt fleira. Hægt er að nýta þáttunartrén sem tólið býr til í þeim tilgangi að draga upplýsingar út úr texta, til dæmis um manneskjur, starfstitla, sérnafnaeiningar, staðreyndir, atburði og skoðanir. Greynir notar Tokenizer-pakkann, eftir sömu höfunda, til að tilreiða texta. Frekari upplýsingar má finna á https://github.com/mideind/GreynirPackage og ítarlega skjölun (á ensku) á https://greynir.is/doc/.

Use "GreynirPackage v3.5.1"
Tiro TTS web service (22.06)

2 resources

Tiro TTS is a text-to-speech (TTS) API web service that works with various TTS backends. By default, it expects a FastSpeech2+Melgan+Sequitur backend. See the https://github.com/cadia-lvl/fastspeech2 repository for more information on the backend. The service can accept either unnormalized text or an SSML document and respond with audio (MP3, Ogg Vorbis or raw 16 bit PCM) or speech marks, indicating the byte and time offset of each synthesized word in the request. The full API documentation in OpenAPI 2 format is available online at tts.tiro.is. The code for the service along with further information is on https://github.com/tiro-is/tiro-tts/releases/tag/M8. You should also check if a newer version is out (see README.md)

Use "Tiro TTS web service (22.06)"
Corpus extraction tool LIST 1.0

2 resources

The LIST corpus extraction tool is a Java program for extracting lists from text corpora on the levels of characters, word parts, words, and word sets. It supports VERT and TEI P5 XML formats and outputs .CSV files that can be imported into Microsoft Excel or similar statistical processing software.

Use "Corpus extraction tool LIST 1.0"
NameTag 2

2 resources

NameTag 2 is a named entity recognition tool. It recognizes named entities (e.g., names, locations, etc.) and can recognize both flat and embedded (nested) entities. NameTag 2 can be used either as a commandline tool or by requesting the NameTag webservice. NameTag webservice can be found at: https://lindat.mff.cuni.cz/services/nametag/ NameTag commandline tool can be downloaded from NameTag GitHub repository, branch nametag2: git clone https://github.com/ufal/nametag -b nametag2 Latest models and documentation can be found at: https://ufal.mff.cuni.cz/nametag/2 This software subject to the terms of the Mozilla Public License, v. 2.0 (http://mozilla.org/MPL/2.0/). The associated models are distributed under CC BY-NC-SA license. Please cite as: Jana Straková, Milan Straka, Jan Hajič (2019): Neural Architectures for Nested NER through Linearization. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5326-5331, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-48-2 (https://aclweb.org/anthology/papers/P/P19/P19-1527/)

Use "NameTag 2"
GreynirT2T - En--Is NMT with Tensor2Tensor (1.0)

2 resources

A program library for training English-Icelandic neural machine translation systems, built on top of Tensor2Tensor and Tensorflow. Supports training with or without back-translated data. Forritasafn til að þjálfa þýðingarlíkön sem þýða milli íslensku og ensku. Uppsetningin er byggð á Tensor2Tensor og Tensorflow. Safnið styður þjálfun með og án bakþýðingargagna.

Use "GreynirT2T - En--Is NMT with Tensor2Tensor (1.0)"
GreynirCorrect 3.4.5 (22.10)

2 resources

GreynirCorrect is a Python 3 package and a command line tool for checking and correcting various types of spelling and grammar errors in Icelandic text. GreynirCorrect relies on the Tokenizer package, by the same authors, to tokenize text (see http://hdl.handle.net/20.500.12537/262). More information can be found at https://github.com/icelandic-lt/GreynirCorrect, and detailed documentation at https://yfirlestur.is/doc/. GreynirCorrect er Python 3 pakki og skipanalínutól sem bendir á og leiðréttir ýmsar tegundir stafsetningar- og málvillna í íslenskum texta. GreynirCorrect reiðir sig á Tokenizer-pakkann, eftir sömu höfunda, til að tilreiða textann (see http://hdl.handle.net/20.500.12537/262). Frekari upplýsingar má finna á https://github.com/icelandic-lt/GreynirCorrect, og ítarlega skjölun (á ensku) á https://yfirlestur.is/doc/.

Use "GreynirCorrect 3.4.5 (22.10)"
Slovene Text Normalizator RSDO-DS2-NORM 1.0

2 resources

This Text Normalisator converts Slovene text from written-form into its spoken-form. Traditionally it is an essential preprocessing step before text-to-speech (TTS). As input it accepts text as a string, and returns a dictionary with fields "input_text", "normalised_text", "status" and "logs". Example: normalize_text("Sodobna definicija Celzijeve temperaturne lestvice, ki velja od leta 1954, je, da je temperatura trojne točke vode enaka 0,01 °C.") {'input_text': 'Sodobna definicija Celzijeve temperaturne lestvice, ki velja od leta 1954, je, da je temperatura trojne točke vode enaka 0,01 °C.', 'normalized_text': 'Sodobna definicija Celzijeve temperaturne lestvice, ki velja od leta tisoč devetsto štiriinpetdeset, je, da je temperatura trojne točke vode enaka nič celih nič ena stopinje Celzija.', 'status': 1, 'logs': [('1954', 'tisoč devetsto štiriinpetdeset'), ('0,01', 'nič celih nič ena'), ('°C', 'stopinje Celzija')]} For further details see README.md.

Use "Slovene Text Normalizator RSDO-DS2-NORM 1.0"

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

Slovene Punctuation and Capitalisation model RSDO-DS2-P&C 3.6

Spellchecking app for Android (22.10)

SELEXINI corpus

GreynirPackage v3.5.1

Tiro TTS web service (22.06)

Corpus extraction tool LIST 1.0

NameTag 2

GreynirT2T - En--Is NMT with Tensor2Tensor (1.0)

GreynirCorrect 3.4.5 (22.10)

Slovene Text Normalizator RSDO-DS2-NORM 1.0