Result filters

Metadata provider

Language

  • Dutch

Resource type

Availability

Active filters:

  • Language: Dutch
  • Organisation: Meertens Institute
Loading...
9 record(s) found

Search results

  • Nederlab, online laboratory for humanities research on Dutch text collections

    The Nederlab project aims to bring together all digitized texts relevant to Dutch national heritage, the history of Dutch language and culture (c. 800 - present) in one user-friendly and tool-enriched open access web interface, allowing scholars to simultaneously search and analyze data from texts spanning the full recorded history of the Netherlands, its language and culture. The project builds on various initiatives: for corpora Nederlab collaborates with the scientific libraries and institutions, for infrastructure with CLARIN (and CLARIAH), for tools with eHumanities programmes such as Catch, IMPACT and CLARIN (TICCL, frog). Nederlab will offer a large number of search options with which researchers can find the occurrence of a particular term in a particular corpus or subcorpus. It'll also offer visualization of search results through line graphs, bar graphs, circle graphs, or scatter graphs. Furthermore, this online lab will offer a large set of tools, like tokenization tools, tools for spelling normalization, PoS-tagging tools, lemmatization tools, a computational historical lexicon and indices. Also, the use of (semi-) automatic syntactic parsing, tools for text mining, data mining and sentiment mining, Named Entity Recognition tools, coreference resolution tools, plagiarism detection tools, paraphrase detection tools and cartographical tools is offered The first version of Nederlab was launched in early 2015, it’ll be expanded until the end of 2017. Nederlab is financed by NWO, KNAW, CLARIAH and CLARIN-NL.
    http://www.nederlab.nl/wp/?page_id=12
  • MIGMAP: Detailed interactive mapping of migration in The Netherlands in the 20th century.

    MIGMAP is a web application that can show migration flow between Dutch municipalities. The user first chooses generation (forward or backward in time) and gender, while subsequently the migration map of The Netherlands related to an interactively pointed municipality (or other aggregation unit) is shown. The data underlying the migration maps originate from the first name selection from the Civil Registration, acquired by Utrecht University and the Meertens Institute in 2006. These concern 16 million records from persons with Dutch citizenship, alive in 2006, and in addition 6 million persons deceased before 2006, but mentioned in other records – mainly as parents. The records include identifiers by which family relations can be reconstructed. After considerable efforts in data clearing and reconstruction of older generations, the data provide an almost complete overview of the Dutch population, born after 1930, and a fairly good sample from the period 1880-1930 (>25%). The user will be given options to choose generation (places of birth of the current population, their parents, grandparents grand-grandparents, or starting with the persons born between 1880-1900: the current places of residence of their children, grandchildren), and gender. Each map will be made available as a .csv record with municipality number and percentage as fields, and thus can be used by users in correlation studies with other variables. Utrecht University and the Meertens Institute have the signed permission of the "Basisadministratie voor Persoonsgegevens en Reisdocumenten, The Hague" to use the data for scientific purposes. The migration maps present the data in an aggregated way, and do not violate privacy requirements (no individual can possibly be identified from the maps). However, the underlying data containing information about individual persons and their family relations cannot be made available for reasons of privacy.
    Bloothooft, G, Onland, D and Kunst, J.P. 2017. Mapping Migration across Generations. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 351–360. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.29. License: CC-BY 4.0
    Ekamper, P. en Bloothooft, G. (2013), "Weg van je wortels. De afstand tussen overgrootouders en achterkleinkinderen", DEMOS 29, 2, p8.
  • COAVA: Cognition, Acquisition and Variation Tool

    In COAVA two sets of databases are made available in a standardized way: one with historical dialect data (the databases WBD and WLD with lexical data of the Brabantish and Limburgian dialect between 1880-1980) and one with first language acquisition data (four databases form the CHILDES project). The databases contain linguistic information (dialect form, standardised form (“Dutchified”), lexical meaning), geographical information (locality, dialect area, province) and information on the source (inquiry forms or monotopic dictionaries and the date of documentation). The visualisation of the first two sets of information will lead to lexical maps. The most typical way for the user to get to the data will be with the use of the browsable concept taxonomy. The databases are, in other words, approachable via search tools but also via a thematic taxonomy. This taxonomy was developed for the dialect databases and covers the general vocabulary. COAVA (COgnition, Acquisition and VAriation Tool) brings together two strange bedfellows: first language acquisition and historical dialectology. In historical linguistics there is the common assumption that language change in the past is due to the process of non-target like transmission of linguistic features between generations i.e. between parents and children. Despite this assumption, both disciplines remain isolated from each other due to, among others, different methods of data-collection and different types of resources with empirical data. The aim of the COAVA project was to demonstrate that the common assumption in historical linguistics, mentioned above, can be examined in detail with the help of Digital Humanities. This interdisciplinary research targets at the development of a tool for easily exploring the linguistic characteristics of concepts. In COAVA two sets of databases are made available in a standardized way: one with historical dialect data (the databases WBD and WLD with lexical data of the Brabantish and Limburgian dialect between 1880-1980) and one with first language acquisition data (four databases form the CHILDES project).
    Leonie Cornips, Jos Swanenberg, Wilbert Heeringa, Folkert de Vriend (2016). The relationship between first language acquisition and dialect variation: Linking resources from distinct disciplines in a CLARIN-NL project. Lingua, Vol. 178, 07.2016, p. 32-45. doi:10.1016/j.lingua.2015.11.007
    Cornips, L., Swanenberg, J., Vriend, F. de, Heeringa, W. (2012), Is what we have acquired early, less vulnerable to variation? A comparison between data from dialectlexicography and data from first language acquisition. http://www.meertens.knaw.nl/coavasite/wp-content/uploads/2012/10/Abstract-SIDG-2-JS.pdf
    Cornips, L., Kemps-Snijders, M., Snijders, M., Swanenberg, J. and Vriend, F. de (2011). Bridging the Gap between First Language Acquisition and Historical Dialectology with the Help of Digital Humanities. SDH Copenhagen. http://www.meertens.knaw.nl/coavasite/wp-content/uploads/2011/11/Paper-SDH.pdf
  • INPOLDER: Integrated Parser and Lemmatizer Dutch in Retrospect

    INPOLDER (Integrated Parser and Lemmatizer of Dutch in Retrospect) provides a tool that assigns morphological tagging, lemmatization, and syntactic parsing for historical Dutch texts. It is built on the Adelheid tool (tagging and lemmatization) and Collins-Bikel statistical Parser. As an essential part of the Dutch cultural heritage, it is of vital importance that the Dutch historical record be made accessible for research into a wide range of historical and linguistic research questions. In the transition from the Middle Ages to the modern era, the Netherlands developed from speaking a diverse group of dialects (Hollandic, Brabantic, Flemish, North-eastern, Limburgian) to a country with a standard language, and there is good reason to believe that this process was an extremely dynamic one. Systematic research into these processes affecting syntax, phonology, morphology and spelling cannot be done without access to lemmatized, tagged and parsed corpora of historical Dutch. In recent years, a tagger-lemmatizer has been developed by Hans van Halteren (Adelheid, also available in the CLARIN infrastructure). INPOLDER complements these enrichment tool with a parser for historical Dutch. The INPOLDER parser is trained using a subset of the corpus of fourteenth-century texts (Corpus van Reenen/Mulder CRM, van Reenen and Mulder, 1993; Rem, 2003) and a subset of the Drenthe corpus (DC). CRM consists of 2700 charters from 345 places of origin. The corpus was designed as representative for the local language use of Middle Dutch and to be suitable for all types of linguistic research.
  • TTNWW - TST Tools for the Dutch Language as Web services in a Workflow

    TTNWW integrates and makes available existing Language Technology (LT) software components for the Dutch language that have been developed in the STEVIN and CGN projects. The LT components (for text and speech) are made available as web-services in a simplified workflow system that enables researchers without much technical background to use standard LT workflow recipes. The web services are available in two separate domains: "Text" and "Speech" processing. For "Text", workflows for the following functionality is offered by TTNWW: - Orthographic Normalisation using TICCLops (version CLARIN-NL 1.0); - Part of Speech Tagging, Lemmatisation, Chunking, limited Multiword Unit Recognition, and Grammatical Relation Assignment by Frog (Version 012.012); - Syntactic Parsing (including grammatical relation assignment, limited named entity recognition, and limited multiword unit recognition) by the Alpino Parser (version 1.3); - Semantic Annotation; - Named Entity Recognition; - Co-reference Assignment. For "Speech", the following workflows are offered: - Automatic Transcription of speech files using a Netherlands Dutch acoustic model; - Automatic Transcription of speech files using a Flemish Dutch acoustic model; - Conversion of the input speech file to the required sampling rate, followed by automatic transcription. The TTNWW services have been created in a Dutch and Flemish collaboration project building on the results of past Dutch and Flemish projects. The web services are partly deployed in the SURF-SARA BiG-Grid cloud or at CLARIN centres in the Netherlands and at CLARIN VL University partners. The architecture of the TTNWW portal consists out of several components and follows the principles of Service Oriented Architecture (SOA). The TTNWW GUI front-end is a Flex module that communicates with the TTNWW web-application which keeps track of the different sessions and knows which LT recipes are available. TTNWW communicates assigments (workflow specifications) to the WorkflowService that evaluates the requested workflow and requests the DeploymentSevice to start the required LT web-services. After initialization of the LT web-services, the workflow specification is sent to the Taverna Server, that takes further care of the workflow. To facilitate the process of wrapping applications that were originally designed as standalone applications into web services, the CLAM (Computational Linguistics Application Mediator) wrapper software allows for easy and transparent transformation of applications into RESTful web services. The CLAM software has extensively been used in the TTNWW project for both text and speech processing tools. With the exception of Alpino and MBSRL all web services work operate on CLAM wrappers. Given the number of web services involved in the TTNWW project and possibilities offered by the cloud environment the preferred method of delivering the web service installations was delivery of complete virtual machine images by the LT providers. These could be directly uploaded into the cloud environment and thus relieving the CLARIN centres nd LT providers from the original foreseen task of running the webservices themselves. A potential advantage of this method, that has not been exploited in the project yet, is that these images may be also be delivered directly to the end user so these can be run in a local configuration using virtualization software such as VMWare of VirtualBox. The workflow engine used in the project was Taverna. But build on top of this was a a number of selectable task recipes, following a task oriented approach in line with the premises that users with no or little technical expertise should be able to use the system. In this context, tasks are understood in terms of end results of processes such as semantic role labelling, pos tagging or syntactic analysis and ready-made workflows are constructed that can be readily used by the end user.
    Kemps-Snijders, M, Schuurman, I, Daelemans, W, Demuynck, K, Desplanques, B, Hoste, V, Huijbregts, M, Martens, J-P, Paulussen, H, Pelemans, J, Reynaert, M, Vandeghinste, V, van den Bosch, A, van denHeuvel, H, van Gompel, M, van Noord, G and Wambacq, P. 2017. TTNWW to the Rescue: No Need to Know How to Handle Tools and Resources. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 83–93. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.7. License: CC-BY 4.0
  • FESLI: Functional elements in Specific Language Impairment

    Tool for the quantitative and qualitative comparison of the acquisition of functional elements (morphological inflection, articles, pronouns etcetera) in a corpus with data from monolingual and bilingual children (Dutch - Turkish) with and without Specific Language Impairment (SLI). The FESLI-data come from two NWO-sponsored projects: BiSLI and Variflex. The numbers of children included in the resources are: - 12 bilingual children without language impairment (SLI); - 25 monolingual children with SLI; - 20 bilingual children with SLI. The children´s ages ranged from 6;0 to 8:5. For more precise information about the specific age distribution in each group, the reader is referred the dissertation written by Antje Orgassa (http://dare.uva.nl/document/147433 (link is external)). The non-impaired children were included in the Variflex project (data collected by Elma Blom) and also used in the BiSLI project; the data from the children with SLI were exclusive to the biSLI project. The technology used in the FESLI web application is based on modules of the COAVA web application.
  • VK: Verrijkt Koninkrijk (Enriched Kingdom)

    Dr Loe de Jong’s Het Koninkrijk der Nederlanden in de Tweede Wereldoorlog remains the most appealing history of German occupied Dutch society (1940-1945). Published between 1969 and 1991, the 14 volumes, consisting of 30 parts and 18,000 pages combine the qualities of an authoritative work for a general audience, and an inevitable point of reference for scholars. In VK this corpus is enriched with: - Tokenization, sentence splitting, part-of-speech tagging and lemmatization (done with the FROG software from Tilburg University); - Named entity recognition (done using UvA's NE tagger (specially trained for Dutch within the Stevin DuoMan project)); - Polarity tagging (positive/negative connotation of words) (done using UvA's FietsTas software (developed for Dutch within the Stevin DuoMan project)); - Named entity reconciliation by linking to Wikipedia (done using software developed by Edgar Meij (UvA)).
    REST web interface, HTTP GET
    De Boer, V., J. van Doornik, L. Buitinck, K. Ribbens, and T. Veken. Enriched Access to a Large War Historical Text using the Back of the Book Index. Extended abstract presented at the Workshop on Semantic Web and Information Extraction (SWAIE 2012), Galway, Ireland, 9 october 2012
    L. Buitinck and M.Marx, Two-Stage Named-Entity Recognition Using Averaged Perceptrons in proceedings of NDLB, Groningen, Netherlands, 2012. http://link.springer.com/chapter/10.1007%2F978-3-642-31178-9_17
  • MIMORE: Microcomparative Morphosyntax Research Tool

    With the MIMORE search engine one can search three databases together, with text strings, part of speech tags and syntactic variables. The researcher can combine categories and features into complex tags or use predefined tags. All categories and features are defined in ISOcat. Since all sentences have a location code, the morphosyntactic phenomena found in a set of sentences resulting from a search can be automatically plotted on a geographic map. It is possible to include more than one morphosyntactic phenomenon in one map, thus visualizing potential correlations between these phenomena. There is also a user-friendly function to export the data to a statistical program. The data in DynaSAND, the dynamic syntactic atlas of the Dutch dialects (http://www.meertens.knaw.nl/sand/ (link is external)), were collected between 2000 and 2005 by oral interviews (fieldwork and telephone) in about 300 locations across The Netherlands, Belgium and a small part of north-west France. Dialect speakers were asked to judge and/or translate some 150 test sentences. DynaSAND makes available the full recordings and transcriptions of these interviews. Together, the DynSAND data cover the syntactic variation in the Dutch language area in the left periphery of the clause (the complementizer system and complementizer agreement), variation in subject pronoun form depending on syntactic position, subject pronoun doubling, cliticization on YES/NO, the reflexive system, fronting constructions (Wh-clauses, relative clauses, topicalization), word order and morphological variation in verb clusters, negation and quantification. The data in DiDDD (Diversity in Dutch DP Design; http://www.meertens.knaw.nl/diddd/ (link is external)) were collected between 2005 and 2009 with oral and written interviews in about 200 locations in the Dutch language area, with a methodology highly parallel to DynaSAND. The data involve translations of and judgements on test sentences. For 29 interviews there are sound recordings which have been lined up with their transcriptions. The DIDDD data cover the morphosyntactic variation within nominal groups, in particular possessives, partitives, noun ellipsis, the demonstrative system, the numeral modification system, what-for constructions, quantitative er, adjectival inflection, negation and exclamatives. The data in GTRP (Goeman, Taeldeman, van Reenen Project; http://www.meertens.knaw.nl/mand/database/ (link is external)) were collected between 1979 and 2000 with oral interviews in about 600 locations in the Dutch language area. Informants were asked to translate words or short sentences. Part of the transcriptions have been lined up with the sound recordings. The morphological data in GTRP include plural forms of nouns, diminutives, gender on nouns and adjectives, comparatives, superlatives, verbal inflection including participles, subject, object and possessive pronouns.
    S. Barbiers, M. van Koppen, H. Bennis, N. Corver, MIcrocomparative MOrphosyntactic REsearch (MIMORE): Mapping partial grammars of Flemish, Brabantish and Dutch. Lingua Vol. 178, 5-31. doi:10.1016/j.lingua.2015.10.018
  • Taalportaal, the linguistics of Dutch, Frisian and Afrikaans online.

    Taalportaal (or Language Portal) is an interactive knowledge base about Dutch, Frisian and Afrikaans. It provides access to a comprehensive and authoritative scientific grammar for these three languages.
    van der Wouden, T, Bouma, G, van deCamp, M, van Koppen, M, Landsbergen, F and Odijk, J. 2017. Enriching a Scientific Grammar with Links to Linguistic Resources: The Taalportaal. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 299–310. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.24. License: CC-BY 4.0