Result filters

Metadata provider

Language

Resource type

Availability

Loading...
703 record(s) found

Search results

  • Namescape Search

    Searching and visualizing Named Entities in modern Dutch novels. The named entity (NE) tagging and resolution in NameScape enables quantitative and repeatable research where previously only guesswork and anecdotal evidence was feasible. The visualisation module enables researchers with a less technical background to draw conclusions about functions of names in literary work and help them to explore the material in search of more interesting questions (and answers). Users from other communities (sociolinguistics, sentiment analysis, …) also benefit from the NE tagged data, especially since the NE recognizer is available as a web service, enabling researchers to annotate their own research data. Datasets in NameScape (total of 1.129 books): Corpus Sanders: A corpus of 582 Dutch novels written and published between 1970 and 2009 will. Corpus Huygens: Consists of 22 novels manually tagged with detailed named entity information. IPR for this corpus do not allow distribution. Corpus eBooks: Consists of 7000+ Dutch eBooks tagged automatically with basic NER features and person name Part information. IPR for this corpus do not allow distribution. Corpus SoNaR Books: 105 Dutch books; NE tagged. Corpus Gutenberg Dutch: Consists of 530 NE tagged TEI files converted from the Epub versions of the corresponding Gutenberg documents. Recent research has conclusively proven names in literary works can only be put fully into perspective when studied in a wider context (landscape) of names either in the same text or in related material (the onymic landscape or “namescape”). Research on large corpora is needed to gain a better understanding of e.g. what is characteristic for a certain period, genre, author or cultural region. The data necessary for research on this scale simply does not exist yet. NameScape aims to fill the need by providing a substantial amount of literary works annotated with a rich tag set, thereby enabling researchers to perform their research in more depth than previously possible. Several exploratory visualization tools help the scholar to answer old questions and uncover many more new ones, which can be addressed using the demonstrator.
    de Does, J, Depuydt, K, van Dalen-Oskam, K and Marx, M. 2017. Namescape: Named Entity Recognition from a Literary Perspective. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 361–370. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.30. License: CC-BY 4.0
    Karina van Dalen-Oskam (2013), Nordic Noir: a background check on Inspector Van Veeteren, 31 May 2012, http://blog.namescape.nl/?p=47
  • Frog: An advanced Natural Language Processing suite for Dutch

    Frog's current version will tokenize, tag, lemmatize, and morphologically segment word tokens in Dutch text files, will assign a dependency graph to each sentence, will identify the base phrase chunks in the sentence, and will attempt to find and label all named entities.
    Van den Bosch, A., Busser, G.J., Daelemans, W., and Canisius, S. (2007). An efficient memory-based morphosyntactic tagger and parser for Dutch, In F. van Eynde, P. Dirix, I. Schuurman, and V. Vandeghinste (Eds.), Selected Papers of the 17th Computational Linguistics in the Netherlands Meeting, Leuven, Belgium, pp. 99-114
  • Alpino: a dependency parser for Dutch

    Alpino is a dependency parser for Dutch, developed in the context of the PIONIER Project Algorithms for Linguistic Processing.
    Bouma, G., van Noord, G. J. M. and Malouf, R. 2001.Alpino: Wide-coverage computational analysis of Dutch. in Daelemans, W., Simaan, K., Veenstra, J. and Zavrel, J. (eds.). Computational Linguistics in the Netherlands 2000. Amsterdam: Rodopi, p. 45-59 15 p. (LANGUAGE AND COMPUTERS : STUDIES IN PRACTICAL LINGUISTICS)
    Robert Malouf and Gertjan van Noord. Wide Coverage Parsing with Stochastic Attribute Value Grammars. In: IJCNLP-04 Workshop Beyond Shallow Analyses - Formalisms and statistical modeling for deep analyses.
    Leonoor van der Beek, Gosse Bouma, and Gertjan van Noord. Een brede computationele grammatica voor het Nederlands. 2002. Nederlandse Taalkunde. https://www.let.rug.nl/~vannoord/papers/taalkunde.pdf .
  • Arthurian Fiction

    This research tool provides information on medieval Arthurian narratives and the manuscripts in which they are transmitted throughout Europe. The tool discloses a database consists of linked records on over two hundred texts, more than thousand manuscripts and two hundred persons. The database is work in progress: a considerable number of records have yet to be completed, while fresh discoveries of narratives and manuscripts invite new entries. The compilers of the database hope that this tool will contribute to further research into Arthurian fiction as a pan-European phenomenon. The Arthurian Fiction web application enables searching for manuscripts, narratives and persons from the Arthurian Fiction narratives and manuscripts metadata database Arthurian Fiction Data. Each of these object types can be searched for using facets specific to the object type. These include: - for manuscripts: institute, date, origin, physical form, extant leave, leaf sizes, illustration type, scripts, scribe, patron and several more; - for narratives: date, origin, languages, cycle, manuscript, author, patron, verse type, meter, length, intertextuality properties and many more; - for persons: name, gender, subtype, background, manuscript, and narratives. The user can, if desired, select a subset of the facets to work with. In addition, keyword search is possible for all fields, query results can be sorted by a variety of keys and queries can be saved. There is also a web service with an API for the Arthurian Fiction narratives and manuscripts database. This web service makes use of SOLR queries via HTTP POST requests.
    This movie is in Dutch with English subtitles.
    Besamusca, A.A.M. and Quinlan, J. (2012). The Fringes of Arthurian Fiction. Arthurian literature, 29, 191-241.
    Boot, P. (2012), Manuscripten koning Arthur op tafel, E-Data & Research 7(1), 2012.
    Dalen-Oskam, K. van and Besamusca, B. (2011), Arthurian Fiction in Medieval Europe: Narratives and Manuscripts, presentation held at the CLARIN-NL Kick-off meeting Call 2, Utrecht, February 9, 2011.
    Dalen-Oskam, K. van (2011), ArthurianFiction, presentation held at the Call 3 information session, Utrecht, August 25, 2011.
  • Cornetto: Combinatorial and Relational Network as Toolkit for Dutch Language Technology

    Cornetto is a lexical resource for the Dutch language which combines two resources with different semantic organisations: the Dutch Wordnet with its synset organisation and the Dutch Reference Lexicon which includes definitions, usage constraints, selectional restrictions, syntactic behaviours, illustrative contexts, etc. The Cornetto database contains over 92K lemmas and almost 120K word meanings. The Cornetto lexical resource for Dutch covers the most generic and central part of the language. Cornetto combines the structures of the Princeton Wordnet, some of the features from the FrameNet for English and the information on morphological, syntactic, semantic and combinatorial features of lexemes normally found in dictionaries. The Cornetto resource is compiled by combining and aligning two existing semantic resources for Dutch: the Dutch wordnet (DWN) and the Referentie Bestand Nederlands (RBN). Recently, the resource is revised and extended with sentiment values in the From Text to Political Positions project , and with semantic annotations in SONAR, CGN and texts from the Web in the DutchSemCor project. The Cornetto Lexical Resource consists of two large repositories of lexicon data: the lexical entry repository and the synset repository. A Lexical Entry (LE) is a word-meaning pair (i.e. a single meaning of a certain word form), for which morphological, syntactical, semantical and combinatorial information is given. As such, LEs are word senses in the lexical semantic tradition, containing the linguistic knowledge that is needed to properly use the word in a specific meaning in a language. Since the LEs follow a word-to-meaning view, the semantical and combinatorial information for each meaning clarify the differences across the meanings. LEs focus on the polysemy of words and typically follow an approach to represent condensed and generalised meanings from which more specific ones can be derived. Each LE is aligned with a synset (set of synonyms) in the synset repository. As such, a synset can be seen as a set of LEs with the same meaning and every synset stands for a concept. The synsets in Cornetto are interconnected by different semantic relations such as hyponymy, antonymy and meronymy. The Cornet-to Resource is aligned with the English Wordnet, from which domain information was imported. The domains represent clusters of concepts that are related by a shared area of interest, such as sport, education or politics. The definitions of LEs from the same synset should be semantically equivalent and the LEs of a single word form should belong to different synsets. The LEs of a single word form typically differ in terms of connotation, pragmatics, syntax and semantics but synonymous words in the same synset can be differen-tiated along connotation, pragmatics and syntax but not semantics. This structure of the resource makes it possible to combine the very detailed information on form and usage of a specific LE or a group of LEs with the semantic relations which are specified in the corresponding synset(s). For an Open Source version lexico-semantic database for Dutch see the Open Source Dutch Wordnet (ODWN): http://wordpress.let.vupr.nl/odwn/
    Vossen, P., I. maks, R. Segers, H. van der Vliet, M.F. Moens, K. Hofmann, E. Tjong Kim Sang, M. de Rijke (2013), Corntto: a lexical semantic database for Dutch, Chapter in: P. Spyns and J. Odijk (eds): Essential Speech and Language Technology for Dutch, Results by the STEVIN-programme, Publ. Springer series Theory and Applications of Natural Language Processing, ISBN 978-3-642-30909-0.
    Vossen, P., I. Maks, R. Seegers and H. van der Vliet (2008). Integrating Lexical Units, Synsets, and Ontology in the Cornetto Database. In Proceedings of LREC-2008, Marrakech, Morocco.
  • BNM-I: Linked Data on Middle Dutch Sources Kept Worldwide

    Web application for consultation, using facetted search, and collaborative editing of the curated e-BNM collection of textual, codicological and historical information about thousands of Middle Dutch manuscripts kept world wide.The Bibliotheca Neerlandica Manuscripta and Impressa collects and makes available information on medieval manuscripts produced in the Netherlands regardless where they are kept. Documentation activities concentrate on the Middle-Dutch texts and their authors that have been transmitted in these manuscripts, on the individuals and institutions that have been involved in the manuscript production (scribes, illuminators, monasteries) and on the former and present manuscript owners. Since 1991 two-thirds of this ‘paper’ information, checked and supplemented with information from recent publications, has been converted into electronic data and incorporated in a database ( BNM-I ), which can be searched online. In 2013 this database was converted in the e-BNM+ project into a flexible datastructure that turned BNM-I into a key open access resource to which many other resources can easily be linked. The new BNM-I: - will be freely accessible for every user, anywhere in the world; - can easily implement new contributions or corrections by scientists; - can easily be linked to related databases - in the near future cross searching several databases in one interface will be possible; - will be prepared for the inclusion of new data, like: research data on Middle Dutch texts that were printed before 1541 and the books in which they are preserved; - articles on Middle Dutch texts and their authors (associated with the current thesaurised information).
  • TTNWW - TST Tools for the Dutch Language as Web services in a Workflow

    TTNWW integrates and makes available existing Language Technology (LT) software components for the Dutch language that have been developed in the STEVIN and CGN projects. The LT components (for text and speech) are made available as web-services in a simplified workflow system that enables researchers without much technical background to use standard LT workflow recipes. The web services are available in two separate domains: "Text" and "Speech" processing. For "Text", workflows for the following functionality is offered by TTNWW: - Orthographic Normalisation using TICCLops (version CLARIN-NL 1.0); - Part of Speech Tagging, Lemmatisation, Chunking, limited Multiword Unit Recognition, and Grammatical Relation Assignment by Frog (Version 012.012); - Syntactic Parsing (including grammatical relation assignment, limited named entity recognition, and limited multiword unit recognition) by the Alpino Parser (version 1.3); - Semantic Annotation; - Named Entity Recognition; - Co-reference Assignment. For "Speech", the following workflows are offered: - Automatic Transcription of speech files using a Netherlands Dutch acoustic model; - Automatic Transcription of speech files using a Flemish Dutch acoustic model; - Conversion of the input speech file to the required sampling rate, followed by automatic transcription. The TTNWW services have been created in a Dutch and Flemish collaboration project building on the results of past Dutch and Flemish projects. The web services are partly deployed in the SURF-SARA BiG-Grid cloud or at CLARIN centres in the Netherlands and at CLARIN VL University partners. The architecture of the TTNWW portal consists out of several components and follows the principles of Service Oriented Architecture (SOA). The TTNWW GUI front-end is a Flex module that communicates with the TTNWW web-application which keeps track of the different sessions and knows which LT recipes are available. TTNWW communicates assigments (workflow specifications) to the WorkflowService that evaluates the requested workflow and requests the DeploymentSevice to start the required LT web-services. After initialization of the LT web-services, the workflow specification is sent to the Taverna Server, that takes further care of the workflow. To facilitate the process of wrapping applications that were originally designed as standalone applications into web services, the CLAM (Computational Linguistics Application Mediator) wrapper software allows for easy and transparent transformation of applications into RESTful web services. The CLAM software has extensively been used in the TTNWW project for both text and speech processing tools. With the exception of Alpino and MBSRL all web services work operate on CLAM wrappers. Given the number of web services involved in the TTNWW project and possibilities offered by the cloud environment the preferred method of delivering the web service installations was delivery of complete virtual machine images by the LT providers. These could be directly uploaded into the cloud environment and thus relieving the CLARIN centres nd LT providers from the original foreseen task of running the webservices themselves. A potential advantage of this method, that has not been exploited in the project yet, is that these images may be also be delivered directly to the end user so these can be run in a local configuration using virtualization software such as VMWare of VirtualBox. The workflow engine used in the project was Taverna. But build on top of this was a a number of selectable task recipes, following a task oriented approach in line with the premises that users with no or little technical expertise should be able to use the system. In this context, tasks are understood in terms of end results of processes such as semantic role labelling, pos tagging or syntactic analysis and ready-made workflows are constructed that can be readily used by the end user.
    Kemps-Snijders, M, Schuurman, I, Daelemans, W, Demuynck, K, Desplanques, B, Hoste, V, Huijbregts, M, Martens, J-P, Paulussen, H, Pelemans, J, Reynaert, M, Vandeghinste, V, van den Bosch, A, van denHeuvel, H, van Gompel, M, van Noord, G and Wambacq, P. 2017. TTNWW to the Rescue: No Need to Know How to Handle Tools and Resources. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 83–93. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.7. License: CC-BY 4.0
  • ePistolarium: A Web-based Humanities’ Collaboratory on Correspondences

    Circulation of Knowledge and Learned Practices in the 17th-century Dutch Republic (CKCC) investigates the circulation of knowledge in the 17th-century Dutch Republic. A multi-disciplinary project team consisting of historians, literature researchers, linguists and computer scientists works together in this project and created a web-based Humanities’ Collaboratory on Correspondences. This project, is carried out thanks to a NWO Medium investment subsidy and with CLARIN subsidies to make the resources available withing the CLARIN domain. A consortium of Dutch universities and cultural heritage institutions is building a web-based collaboratory (an online space for asynchronous collaboration) around a corpus of 20.000 letters of scholars who lived in the 17th-century Dutch Republic to answer the research question: how did knowledge circulate in the 17th century? Hereto, it will be necessary to analyze this large amount of correspondence systematically. Based on this (extendable) corpus, we will implement a content processing workflow that consists of iterative cycles of conceptual analysis, enrichment with several layers of annotation and visualization. With advice from CLARIN-EU in the first stage of the project a demonstrator was developed which implements techniques of keyword extraction. The second stage consists of evaluating existing more complex tools en techniques that can tackle one or more aspects of the targeted grammatical, content-related, and network complexity analysis, annotation, and visualization. The phase shall identify a set of tools that can be readily utilized in CKCC, as well as tools that need to be adapted or extended to the needs of CKCC; in short, by the end of this phase resources, requirements and risks shall become clear (deadline: December 2010). In the third stage the collaboratory is further developed according to the description in the CKCC project goals, centering around the technique of concept extraction. These three stages constitute the Work Package Analysis Tools, the core of the CKCC project, which was supported by CLARIN-NL. Other Work Packages provide data and software tools needed to create a complete system: the digital corpus of letters (WP6), the editing collaboratory that will contain the letters (WP1), and the archiving environment for data and software (WP2).
    Ravenek, W, van den Heuvel, C and Gerritsen, G. 2017. The ePistolarium: Origins and Techniques. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 317–323. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.26. License: CC-BY 4.0