CLARIN Tool Portal

WFT-GTB: Integrating the Wurdboek fan ˈe Fryske Taal into the Geïntegreerde TaalBank

1 resources

The Dictionary of the Frisian Language (Wurdboek fan de Fryske Taal) is online available via the GTB dictionary web application. The GTB also holds other major Dutch historical dictionaries, such as the Dictionary of Old Dutch (ONW), the Dictionary of early Middle Dutch (VMNW), the Dictionary of Middle Dutch (MNW), and the Dictionary of the Dutch language (WNT). The digital surrounding enables extensive forms of free and structured search queries, including comparative studies with Dutch materials. The Wurdboek fan de Fryske Taal (Dictionary of the Frisian Language)-project includes the vocabulary of Modern West Frisian from the period 1800-1975. The dictionary’s metalanguage is Dutch. A volume of 400 pages comes out every year, the first one in 1984. The editorial phase was finalized in 2009, the final editing and publication phase in 2010.

Modern Dutch Lemma and Frisian lemma

Describes the origin of a word

describes the meaning of a words

describes the structure of a word

describes the possible spellings of a word

Depuydt, K, de Does, J, Duijff, P and Sijens, H. 2017. Making the Dictionary of the Frisian Language Available in the Dutch Historical Dictionary Portal. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 151–165. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.13. License: CC-BY 4.0

Use "WFT-GTB: Integrating the Wurdboek fan ˈe Fryske Taal into the Geïntegreerde TaalBank"

VK: Verrijkt Koninkrijk (Enriched Kingdom)

2 resources

Dr Loe de Jong’s Het Koninkrijk der Nederlanden in de Tweede Wereldoorlog remains the most appealing history of German occupied Dutch society (1940-1945). Published between 1969 and 1991, the 14 volumes, consisting of 30 parts and 18,000 pages combine the qualities of an authoritative work for a general audience, and an inevitable point of reference for scholars. In VK this corpus is enriched with: - Tokenization, sentence splitting, part-of-speech tagging and lemmatization (done with the FROG software from Tilburg University); - Named entity recognition (done using UvA's NE tagger (specially trained for Dutch within the Stevin DuoMan project)); - Polarity tagging (positive/negative connotation of words) (done using UvA's FietsTas software (developed for Dutch within the Stevin DuoMan project)); - Named entity reconciliation by linking to Wikipedia (done using software developed by Edgar Meij (UvA)).

REST web interface, HTTP GET

De Boer, V., J. van Doornik, L. Buitinck, K. Ribbens, and T. Veken. Enriched Access to a Large War Historical Text using the Back of the Book Index. Extended abstract presented at the Workshop on Semantic Web and Information Extraction (SWAIE 2012), Galway, Ireland, 9 october 2012

L. Buitinck and M.Marx, Two-Stage Named-Entity Recognition Using Averaged Perceptrons in proceedings of NDLB, Groningen, Netherlands, 2012. http://link.springer.com/chapter/10.1007%2F978-3-642-31178-9_17

Use "VK: Verrijkt Koninkrijk (Enriched Kingdom)"

Cornetto: Combinatorial and Relational Network as Toolkit for Dutch Language Technology

1 resources

Cornetto is a lexical resource for the Dutch language which combines two resources with different semantic organisations: the Dutch Wordnet with its synset organisation and the Dutch Reference Lexicon which includes definitions, usage constraints, selectional restrictions, syntactic behaviours, illustrative contexts, etc. The Cornetto database contains over 92K lemmas and almost 120K word meanings. The Cornetto lexical resource for Dutch covers the most generic and central part of the language. Cornetto combines the structures of the Princeton Wordnet, some of the features from the FrameNet for English and the information on morphological, syntactic, semantic and combinatorial features of lexemes normally found in dictionaries. The Cornetto resource is compiled by combining and aligning two existing semantic resources for Dutch: the Dutch wordnet (DWN) and the Referentie Bestand Nederlands (RBN). Recently, the resource is revised and extended with sentiment values in the From Text to Political Positions project , and with semantic annotations in SONAR, CGN and texts from the Web in the DutchSemCor project. The Cornetto Lexical Resource consists of two large repositories of lexicon data: the lexical entry repository and the synset repository. A Lexical Entry (LE) is a word-meaning pair (i.e. a single meaning of a certain word form), for which morphological, syntactical, semantical and combinatorial information is given. As such, LEs are word senses in the lexical semantic tradition, containing the linguistic knowledge that is needed to properly use the word in a specific meaning in a language. Since the LEs follow a word-to-meaning view, the semantical and combinatorial information for each meaning clarify the differences across the meanings. LEs focus on the polysemy of words and typically follow an approach to represent condensed and generalised meanings from which more specific ones can be derived. Each LE is aligned with a synset (set of synonyms) in the synset repository. As such, a synset can be seen as a set of LEs with the same meaning and every synset stands for a concept. The synsets in Cornetto are interconnected by different semantic relations such as hyponymy, antonymy and meronymy. The Cornet-to Resource is aligned with the English Wordnet, from which domain information was imported. The domains represent clusters of concepts that are related by a shared area of interest, such as sport, education or politics. The definitions of LEs from the same synset should be semantically equivalent and the LEs of a single word form should belong to different synsets. The LEs of a single word form typically differ in terms of connotation, pragmatics, syntax and semantics but synonymous words in the same synset can be differen-tiated along connotation, pragmatics and syntax but not semantics. This structure of the resource makes it possible to combine the very detailed information on form and usage of a specific LE or a group of LEs with the semantic relations which are specified in the corresponding synset(s). For an Open Source version lexico-semantic database for Dutch see the Open Source Dutch Wordnet (ODWN): http://wordpress.let.vupr.nl/odwn/

Vossen, P., I. maks, R. Segers, H. van der Vliet, M.F. Moens, K. Hofmann, E. Tjong Kim Sang, M. de Rijke (2013), Corntto: a lexical semantic database for Dutch, Chapter in: P. Spyns and J. Odijk (eds): Essential Speech and Language Technology for Dutch, Results by the STEVIN-programme, Publ. Springer series Theory and Applications of Natural Language Processing, ISBN 978-3-642-30909-0.

Vossen, P., I. Maks, R. Seegers and H. van der Vliet (2008). Integrating Lexical Units, Synsets, and Ontology in the Cornetto Database. In Proceedings of LREC-2008, Marrakech, Morocco.

AVResearcherXL: Exploring audiovisual metadata in historical context

1 resources

AVResearcherXL is a tool for exploring radio and television programme descriptions, television subtitles and general newspaper articles. The interface searches across the catalogue "iMMix" of the Netherlands Institute for Sound and Vision and a selection of newspapers of KB Royal Archive of the Netherlands. By the end of 2014, the data used by AVResearcherXL are: iMMix 932,035 broadcasts indexed 18,124 broadcasts with subtitles 1 January 1900 is the date of the first broadcast in the index 26 October 2013 is the date of the last broadcast in the index KB newspapers 25,811,413 articles indexed 16,294,029 articles are of type "artikel" 8,483,542 articles are of type "advertentie" 630,929 articles are of type "illustratie met onderschrift" 402,913 articles are of type "familiebericht" 1 January 1900 is the date of the first article in the index 30 November 1994 is the date of the last article in the index AVResearcherXL is financially supported by CLARIN-NL within the QuaMeRDES-project and by CLARIAH-SEED within the Research Instruments for Media Studies-project. AVResearcherXL is an extended version of MeRDES, the tool developed in 2012 by the NWO-CATCH project BRIDGE. MeRDES was further developed into AVResearcher by the Netherlands Institute for Sound and Vision in 2013. AVResearcherXL is a collaborative project of Centre for Television in Transition (Utrecht University), Intelligent Systems Lab Amsterdam (University of Amsterdam) and the Netherlands Institute for Sound and Vision. The partners worked together with Dispectu for the development of the interface and back-end, and with Frontwise for the styling of the interface.

Bron, M., Gorp, J. van, Nack, F., Rijke, M. de, Vishneuski, Andrei and Leeuw, J.S. de (2012). A Subjunctive Exploratory Search Interface to Support Media Studies Researchers. SIGIR '12: 35th international ACM SIGIR conference on Research and development in information retrieval Portland, Oregon: ACM.

Huurnink, B., Bronner, A., Bron, M., Gorp, J. van, Goede, B. de and Wees, J. van (2013). AVResearcher: Exploring Audiovisual Metadata. DIR 2013: Dutch-Belgian Information Retrieval Conference Delft: DIR.

FESLI: Functional elements in Specific Language Impairment

1 resources

Tool for the quantitative and qualitative comparison of the acquisition of functional elements (morphological inflection, articles, pronouns etcetera) in a corpus with data from monolingual and bilingual children (Dutch - Turkish) with and without Specific Language Impairment (SLI). The FESLI-data come from two NWO-sponsored projects: BiSLI and Variflex. The numbers of children included in the resources are: - 12 bilingual children without language impairment (SLI); - 25 monolingual children with SLI; - 20 bilingual children with SLI. The children´s ages ranged from 6;0 to 8:5. For more precise information about the specific age distribution in each group, the reader is referred the dissertation written by Antje Orgassa (http://dare.uva.nl/document/147433 (link is external)). The non-impaired children were included in the Variflex project (data collected by Elma Blom) and also used in the BiSLI project; the data from the children with SLI were exclusive to the biSLI project. The technology used in the FESLI web application is based on modules of the COAVA web application.

General Dutch Dictionary

1 resources

Corpus based dictionary describing contemporary Dutch in the Netherlands and in Flanders of the period 1970-2019.

Modern Dutch Lemma

Describes the origin of a word

describes the meaning of a words

describes the structure of a word

Use "General Dutch Dictionary"

Manual Oral History Annotation Tool

1 resources

The Oral History Annotation tool, developed by the Centre for Language and Speech Technology (CLST) at the Radboud University Nijmegen, enables one to annotate and search in oral history resources. The tool has been used to enrich a corpus of 250 interviews from the Living Oral History Workbench with commentary . All 250 interviews are searchable through a fragment finder and can be annotated. These annotations can be shared with other researchers, making the interviews available and easier accessible for a much wider range of researchers in the humanities in general and in linguistics in particular. The Annotation Tool is only available for scientific research and only after approval by the Veterans Institute. Interview data can be used in a number of ways, such as comparative research, restudy or follow-up study, re-analysis / secondary analysis, research design and methodological advancement, replication and validation of published work, and for teaching and learning. Recent experiences with the re-use of interview data show that there is an enormous potential for this type of data. Especially in the field of interview data related to the Second World War and other military conflicts multidisciplinary research is carried out. This corpus consists of (about) 30 interviews that are fully transcribed from the Veteran Tapes VP project, and 250 interviews resulting from the Living Oral History Workbench project: - 120 World War II interviews presenting a range of experiences and frames of reference of Dutch soldiers between 1935-1945; - 100 interviews with veterans of the Dutch East Indies. This collection exhibits a large diversity in experiences at the local level in guerilla warfare; - 30 interviews with veterans of New Guinea. This is a relatively unknown conflict with very interesting elements (soldiers left in uncertainty and isolation, and the pressure of the international community to decolonize the area). Each interview lasts between 1 and 1.5 hours.

INPOLDER: Integrated Parser and Lemmatizer Dutch in Retrospect

1 resources

INPOLDER (Integrated Parser and Lemmatizer of Dutch in Retrospect) provides a tool that assigns morphological tagging, lemmatization, and syntactic parsing for historical Dutch texts. It is built on the Adelheid tool (tagging and lemmatization) and Collins-Bikel statistical Parser. As an essential part of the Dutch cultural heritage, it is of vital importance that the Dutch historical record be made accessible for research into a wide range of historical and linguistic research questions. In the transition from the Middle Ages to the modern era, the Netherlands developed from speaking a diverse group of dialects (Hollandic, Brabantic, Flemish, North-eastern, Limburgian) to a country with a standard language, and there is good reason to believe that this process was an extremely dynamic one. Systematic research into these processes affecting syntax, phonology, morphology and spelling cannot be done without access to lemmatized, tagged and parsed corpora of historical Dutch. In recent years, a tagger-lemmatizer has been developed by Hans van Halteren (Adelheid, also available in the CLARIN infrastructure). INPOLDER complements these enrichment tool with a parser for historical Dutch. The INPOLDER parser is trained using a subset of the corpus of fourteenth-century texts (Corpus van Reenen/Mulder CRM, van Reenen and Mulder, 1993; Rem, 2003) and a subset of the Drenthe corpus (DC). CRM consists of 2700 charters from 345 places of origin. The corpus was designed as representative for the local language use of Middle Dutch and to be suitable for all types of linguistic research.

PaQu - Parse and Query

1 resources

PaQu uses the Alpino parser to make treebanks of your own text corpus, and to search in these treebanks using an interface based on the LASSY Word Relations Search interface (http://dev.clarin.nl/node/1966). Several treebanks are already available in the application, such as: Lassy Klein (1M words, manually checked syntactic analysis) and Lassy Groot (700M words, syntactic analysis automatically assigned by Alpino). PaQu offers two ways to search through the syntactically annotated texts. The first option is to use the search bar to look for word pairs, optionally complemented by their syntactic relationship. The second search option is to use the query language XPath.

Odijk, J, van Noord, G, Kleiweg, P and Tjong Kim Sang, E. 2017. The Parse and Query (PaQu) Application. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 281–297. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.23. License: CC-BY 4.0

OpenConvert

1 resources

The OpenConvert tools convert to TEI or FOLiA from a number of input formats (alto, text, word, HTML, ePub). The tools are available as a Java command line tool, a web service and a web application.The OpenConvert Tools were created by IVDNT in the OpenConvert project. The OpenConvert tools convert to TEI or FOLiA from a number of input formats (alto, text, word, HTML, ePub). The tools are available as a Java command line tool, a web service and a web application. Furthermore, as a proof of concept, the website currently provides two annotation tools: a simple Tokenizer for TEI files and a modern Dutch part of speech tagger.

The tool service can be called as a REST webservice which returns responses in XML, allowing it to be part of a webservice tool chain.

Input TEI, plain text, HTML

ALTO XML input

ePub input

directory containing files of a valid input type

zip file (with extension .zip) containing files of a valid input type

Free for academic use. Non-applicable for commercial parties

CLARIN based login required. The Clarin federation accepts login from many europian institutions. please seehttp://www.clarin.eu/content/service-provider-federation for more details

input file name (File upload)

Format of input file

Format of output file

to specify the tagger or tokeniser

input file mimetype is application/tei+xml

input file mimetype is text/html

input file mimetype is text/alto+xml

input file mimetype is application/msword

input file mimetype is application/epub+zip

input file mimetype is text/plain

output file mimetype is application/tei+xml

output file mimetype is text/folia+xml

Basic tagger-lemmatizer for modern Dutch

a TEI tokenizer

Result filters

Metadata provider

Language

Resource type

Availability

Organisation

Project

Active filters:

Search results

WFT-GTB: Integrating the Wurdboek fan ˈe Fryske Taal into the Geïntegreerde TaalBank

VK: Verrijkt Koninkrijk (Enriched Kingdom)

Cornetto: Combinatorial and Relational Network as Toolkit for Dutch Language Technology

AVResearcherXL: Exploring audiovisual metadata in historical context

FESLI: Functional elements in Specific Language Impairment

General Dutch Dictionary

Manual Oral History Annotation Tool

INPOLDER: Integrated Parser and Lemmatizer Dutch in Retrospect

PaQu - Parse and Query

OpenConvert