CLARIN Tool Portal

698 record(s) found

Search results

Odissei Code Library

1 resources

The ODISSEI code library is a collection of code and scripts used to execute projects using the ODISSEI infrastructure. ODISSEI (Open Data Infrastructure for Social Science and Economic Innovations) is the national research infrastructure for the social sciences in the Netherlands. ODISSEI brings together researchers with the necessary data, expertise and resources to conduct ground-breaking research and embrace the computational turn in social enquiry. Through ODISSEI, researchers have access to large-scale, longitudinal data collections as well as innovative and diverse new forms of data. These can be linked to administrative data at Statistics Netherlands (CBS). Combining data from a wide range of sources enables researchers to answer new, exciting, interdisciplinary research questions and to investigate existing questions in novel, new ways.
GrETEL

2 resources

Search engine for the exploitation of syntactically annotated corpora or treebanks

Use "GrETEL"
stam

2 resources

Stand-off Text Annotation Model (STAM) is a data model for stand-off-text annotation where any information on a text is represented as an annotation. This repository contains the model's full specification, extensions, schemas, examples and documentation.

Use "stam"
Metadata Editor, Browser and Organiser for IMDI and CMDI

1 resources

Arbil (Archive Builder) is a metadata editor, browser and organiser for metadata in IMDI and CMDI format. It is a Java desktop application that runs on most operating systems. Arbil can be used to create new metadata from scratch for resources on your local machine, or it can be used to download and modify metadata that are already in an archive. Arbil is a generic CMDI editor and therefore supports all CMDI profiles. It has a built-in file type verification tool that is configured to check files against the list of accepted file types for The Language Arhive, this can however be overruled for other archives.
Frog: An advanced Natural Language Processing Suite for Dutch (Web Service and Application)

1 resources

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. It performs automatic linguistic enrichment such as part of speech tagging, lemmatisation, named entity recognition, shallow parsing, dependency parsing and morphological analysis. All NLP modules are based on TiMBL.

Iris Hendrickx, Antal van den Bosch, Maarten van Gompel, Ko van der Sloot and Walter Daelemans. 2016.Frog: A Natural Language Processing Suite for Dutch. CLST Technical Report 16-02, pp 99-114. Nijmegen, the Netherlands. https://github.com/LanguageMachines/frog/blob/master/docs/frogmanual.pdf

Van den Bosch, A., Busser, G.J., Daelemans, W., and Canisius, S. (2007). An efficient memory-based morphosyntactic tagger and parser for Dutch, In F. van Eynde, P. Dirix, I. Schuurman, and V. Vandeghinste (Eds.), Selected Papers of the 17th Computational Linguistics in the Netherlands Meeting, Leuven, Belgium, pp. 99-114. http://ilk.uvt.nl/downloads/pub/papers/tadpole-final.pdf

Frog (plain text input)

Frog (folia+xml input)
Nederlab, online laboratory for humanities research on Dutch text collections

1 resources

The Nederlab project aims to bring together all digitized texts relevant to Dutch national heritage, the history of Dutch language and culture (c. 800 - present) in one user-friendly and tool-enriched open access web interface, allowing scholars to simultaneously search and analyze data from texts spanning the full recorded history of the Netherlands, its language and culture. The project builds on various initiatives: for corpora Nederlab collaborates with the scientific libraries and institutions, for infrastructure with CLARIN (and CLARIAH), for tools with eHumanities programmes such as Catch, IMPACT and CLARIN (TICCL, frog). Nederlab will offer a large number of search options with which researchers can find the occurrence of a particular term in a particular corpus or subcorpus. It'll also offer visualization of search results through line graphs, bar graphs, circle graphs, or scatter graphs. Furthermore, this online lab will offer a large set of tools, like tokenization tools, tools for spelling normalization, PoS-tagging tools, lemmatization tools, a computational historical lexicon and indices. Also, the use of (semi-) automatic syntactic parsing, tools for text mining, data mining and sentiment mining, Named Entity Recognition tools, coreference resolution tools, plagiarism detection tools, paraphrase detection tools and cartographical tools is offered The first version of Nederlab was launched in early 2015, it’ll be expanded until the end of 2017. Nederlab is financed by NWO, KNAW, CLARIAH and CLARIN-NL.

http://www.nederlab.nl/wp/?page_id=12
Automatic Annotation of Multi-modal Language Resources

1 resources

The AAM-LR project provides a web service that helps field researchers to annotate audio- and video-recordings. At the top level the service marks the time intervals at which specific persons in the recording are speaking. In addition, the service provides a global phonetic annotation, using language independent phone models and phonetic features. Speech is separated from speaker noises such as laughing. Note: this service has been withdrawn and the URLs and PID do not resolve anymore!
LASSY Word Relations Search Web Application

1 resources

The LASSY word relations web application makes it possible to search for sentences that contain pairs of words between which there is a grammatical relation. One can search in the Dutch LASSY-SMALL Treebank (1 million tokens), in which the syntactic parse of each sentence has been manually verified, and in (a part of) the LASSY-LARGE Treebank (700 million tokens ),in which the syntactic parse of each sentence has been added by the automatic parser Alpino. One can restrict the query to search for words of a particular Part-of-Speech, which is very useful in the case of syntactic ambiguities. One can also leave out the string of the word, so that one can obtain e.g. a list of sentences in which any adverb modifies a given verb, or even any word modifies a given verb. On the page that lists the found sentences one can view the exact syntactic structure of each sentence by a simple click. The application also provides detailed frequency information of all found sentences and word pairs. The Lassy treebanks have been made by the KU Leuven and the Rijksuniversiteit Groningen through financing of the Dutch Language Union. One can obtain these treebanks through the HLT Agency (TST-Centrale). Use PaQu (http://dev.clarin.nl/node/4182) for many more options and if you want to search for word pairs in your own text corpus.
GrETEL Search Engine for Querying Syntactic Constructions in Treebanks

1 resources

GrETEL is a query engine in which linguists can use a natural language example as a starting point for searching a treebank with limited knowledge about tree representations and formal query languages. Instead of a formal search instruction, it takes a natural language example as input. This provides a convenient way for novice and non-technical users to use treebanks with a limited knowledge of the underlying syntax and formal query languages. By allowing linguists to search for constructions similar to the example they provide, it aims to bridge the gap between descriptive-theoretical and computational linguistics. The example-based query procedure consists of several steps. In the first step the user enters an example of the construction he/she is interested in. In the second step the example is returned in the form of a matrix, in which the user specifies which aspects of this example are essential for the construction under investigation. The third step provides an overview of the search instruction, i.e. the subpart of the parse tree that contains the elements relevant for the construction under investigation. This query tree is automatically converted in an XPath query which can be used for the actual treebank search. This query can be edited if desired. In the fourth step the query is executed on the selected corpus. The matching constructions are presented to the user as a list of sentences, which can be downloaded. The user can also click on the sentences in order to visualize the results as syntax trees. GrETEL enables search in the LASSY-SMALL and the CGN (Spoken Dutch Corpus) Treebanks (1 million tokens each). GrETEL was created by CLARIN Dutch Language Union in Flanders in the context of the CLARIN-NL / CLARIN Flanders cooperation project.

Liesbeth Augustinus, Vincent Vandeghinste, and Frank Van Eynde (2012). "Example-Based Treebank Querying" In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC-2012). Istanbul, Turkey. pp. 3161-3167

Augustinus, L, Vandeghinste, V, Schuurman, I and Van Eynde, F. 2017. GrETEL: A Tool for Example-Based Treebank Mining. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 269–280. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.22. License: CC-BY 4.0

http://gretel.ccl.kuleuven.be/project/publications.php
PICCL: Philosophical Integrator of Computational and Corpus Libraries

1 resources

PICCL is a set of workflows for corpus building through OCR, post-correction, modernization of historic language and Natural Language Processing. It combines Tesseract Optical Character Recognition, TICCL functionality and Frog functionality in a single pipeline. Tesseract offers Open Source software for optical character recognition. TICCL (Text Induced Corpus Clean-up) is a system that is designed to search a corpus for all existing variants of (potentially) all words occurring in the corpus. This corpus can be one text, or several, in one or more directories, located on one or more machines. TICCL creates word frequency lists, listing for each word type how often the word occurs in the corpus. These frequencies of the normalized word forms are the sum of the frequencies of the actual word forms found in the corpus. TICCL is a system that is intended to detect and correct typographical errors (misprints) and OCR errors (optical character recognition) in texts. When books or other texts are scanned from paper by a machine, that then turns these scans, i.e. images, into digital text files, errors occur. For instance, the letter combination `in' can be read as `m', and so the word `regeering' is incorrectly reproduced as `regeermg'. TICCL can be used to detect these errors and to suggest a correct form. Frog enriches textual documents with various linguistic annotations.

Martin Reynaert, Maarten van Gompel, Ko van der Sloot and Antal van den Bosch. 2015. PICCL: Philosophical Integrator of Computational and Corpus Libraries. Proceedings of CLARIN Annual Conference 2015, pp. 75-79. Wrocław, Poland. http://www.nederlab.nl/cms/wp-content/uploads/2015/10/Reynaert_PICCL-Philosophical-Integrator-of-Computational-and-Corpus-Libraries.pdf

PICCL

Use "PICCL: Philosophical Integrator of Computational and Corpus Libraries"

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

Odissei Code Library

GrETEL

stam

Metadata Editor, Browser and Organiser for IMDI and CMDI

Frog: An advanced Natural Language Processing Suite for Dutch (Web Service and Application)

Nederlab, online laboratory for humanities research on Dutch text collections

Automatic Annotation of Multi-modal Language Resources

LASSY Word Relations Search Web Application

GrETEL Search Engine for Querying Syntactic Constructions in Treebanks

PICCL: Philosophical Integrator of Computational and Corpus Libraries