CLARIN Tool Portal

WebStylo

2 resources

Web based, open stylometry system based on Multilevel Text Analysis. Runs cluto and stylo (R system) clusterisation methods. Based on Natural Language Processing Workflow engine (included in the distribution).

Use "WebStylo"

Liner2.5 rc3

2 resources

A framework for multitask sequence labeling dedicated for natural language processing tasks.

Use "Liner2.5 rc3"

Cinderella - tool for Clustering and Classifications of Texts in Polish

2 resources

System for clustering and classifications of Texts in Polish. Source code.

Use "Cinderella - tool for Clustering and Classifications of Texts in Polish"

NLP Web services and NLP workflow engine

2 resources

Web based system for natural language processing of texts in Polish. It allows running complex workflows of language and machine learning tools. Making it avaliable via REST Web Services.

Use "NLP Web services and NLP workflow engine"

Liner2.5 model NER

2 resources

Przygotował: Michał Marcińczuk <marcinczuk@gmail.com> Data: 25.05.2016 Projekt: Clarin-PL (http://clarin-pl.eu) Autorzy: Michał Marcińczuk, Jan Kocoń, Michał Krautforst Modele do narzędzia Liner2.5 do rozpoznawania jednostek identyfikacyjnych. Narzędzie Liner2.5 dostępne jest pod linkiem http://hdl.handle.net/11321/231. Paczka zawiera trzy modele: 1. config-nam.ini -- granice jednostek identyfikacyjnych, 2. config-top9.ini -- granice i ogólna kategoryzacja jednostek (9 kategorii), 3. config-n82.ini -- granice i szczegółowa kategoryzacja jednostek (82 kategorie).

Use "Liner2.5 model NER"

Polish Grapheme-to-phoneme tool and service

2 resources

This archive contains the source code of the Polish grapheme-to-phoneme conversion tool and the webservice located at http://mowa.clarin-pl.eu/transcriber/

Use "Polish Grapheme-to-phoneme tool and service"

Polish Speech Services

2 resources

This archive contains the source code and configuration of the speech tools web service available at http://mowa.clarin-pl.eu/mowa. The services provided include: + speech to text alignemnt + speaker diarization + speech transcription + speech activity detection and noise classification + keyword spotting

Use "Polish Speech Services"

Long term archive operating system source code

2 resources

This submission contains the operating system of the long-term archive, built in the Polish-Japanese Academy of Information Technology for the Clarin-PL project. Basic elements of the archive are data nodes, equipped with mass memories. The nodes are controlled by embedded low-power computers which are independently powered up only when their storage is about to be accessed. This allows not only for limiting the overall energy consumption but also lowers environmental demands (no air-condition needed). The nodes are grouped in trays. Basic and recommended configuration allows for 30 nodes in trays, but it is possible to extend this limit up to 253. Each tray contains several networks designed for data transport, devices’ state control and power supply. Communication with clients is conducted through buffers that are the only parts visible from externally connected networks. Therefore, stored files are completely isolated and cannot be directly accessed. Multiple trays located at single physical site create a complete archive. It is possible to split storage space into virtual archives that are separated on logical level. The operating system of the data network allows to store from 3 to 7 copies of single digital file in different nodes. Moreover, additional copies of the resource may be stored automatically in remotely located archives. The trays are treated as local parts of wider dispersed data network structure. Software of the archive enables not only secure read and write operations data but it also automatically takes care of the stored data. It periodically regenerates physical state of saved files. In case of device failure clients are transparently redirected to local or remote redundant copies. The mechanism of "software bots" was implemented. Archive can be supplied with external programs for processing files stored inside the data network. This allows for data analyzes, indexation, post-data creation, statistical computations or finding associations in unstructured data sets of Big Data type. Only the output of software bot can be externally accessed what makes such operations very secure. Client programs communicate with the archive using set of simple protocols based on key-value pair strings, making it convenient to build web interfaces for archive access and administration. By automating the supervision of the resources, reduction of requirements for storage, precise energy consumption control and proposed solution significantly lowers the cost of long-term data storage.

Use "Long term archive operating system source code"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

WebStylo

Liner2.5 rc3

Cinderella - tool for Clustering and Classifications of Texts in Polish

NLP Web services and NLP workflow engine

Liner2.5 model NER

Polish Grapheme-to-phoneme tool and service

Polish Speech Services

Long term archive operating system source code