Result filters

Metadata provider

Language

Resource type

Availability

Loading...
703 record(s) found

Search results

  • MUSCLE

    MUSCLE (MUSic-related Citizen Science Listening Experiments) is an infrastructure for reusable online music experiments. It can be used by researchers from musicology, music cognition, and other audio related fields to implement experiments through a graphical user interface and Python. MUSCLE presents questions, along with text, audio or visual stimuli, to participants. The participants' response is saved to a database and can be exported to json for analysis in R or other software.
  • GaLAHaD

    GaLAHaD (Generating Linguistic Annotations for Historical Dutch) allows linguists to compare taggers, tag their own corpora, evaluate the results and export their tagged documents.
  • GrNe: Greek-Dutch dictionary

    Online dictionary (ancient) Greek - Dutch for the letter Pi. Search functions include searches for Greek lemmata, search of Greek declined or conjugated word-forms that lead to the correct lemma ('lemmatizer'), searches for Dutch words leading to different Greek lemmata, and etymological searches. The dictionary is linked to Logeion, the international website of Greek dictionaries at the University of Chicago. The developers estimate that a complete version of the dictionary will be finished by the end of 2015 and that it will be published by the end of 2016. A new dictionary ancient Greek – Dutch is currently under construction at Leiden University. The dictionary is being financed through the 2010 Spinoza award of project director Ineke Sluiter. CLARIN funding enabled the digital production of the letter Pi. Currently, the letters beta, gamma, zeta, pi and sigma are available online. The developers estimate that a complete first version of the dictionary will be finished by the end of 2015 and that it will be published by the end of 2016. The corpus that is being covered by this dictionary covers Greek literature from its beginnings (Homer) and consists of ca. 3.680.000 words (tokens); it includes all classical authors from the 5th and 4th centuries BCE, and a selection of later Greek (selection based on the likelihood that the text will be used by our target groups), but all of the New Testament, Lucian and Plutarch. The dictionary will eventually contain ca. 52.500 headwords. It is based on a thorough comparison of state of the art dictionaries, supplemented with the help of the material from the Thesaurus Linguae Graecae. Greek morphology is complicated. In order to use a dictionary effectively, a rather high level of initial language competence is necessary for the user to be able to relate the word-form s/he finds in a text to the correct basic lemma form, where the definition of the word can be found. This digital dictionary however has an added ‘lemmatizer’ function, which enables the user to type in the word as found in the text and to be redirected to the correct lemma. The digital resource enables both Greek-Dutch searches and searches for the possible Greek equivalents of Dutch terms. This also makes it possible to explore the relation of semantic fields in Dutch and Greek. E.g., it is possible to locate all Greek words that have ‘courage’ as part of their definition. Furthermore, the digital resource makes it possible to locate different Greek words with the same etymological roots. And finally, the dictionary is linked to the website of the University of Chicago, where a comparison of all Greek-x dictionaries is supported. Here, one can enter a Greek word and be provided with the equivalents and definitions in all the dictionaries that are linked on this website.
  • MIGMAP: Detailed interactive mapping of migration in The Netherlands in the 20th century.

    MIGMAP is a web application that can show migration flow between Dutch municipalities. The user first chooses generation (forward or backward in time) and gender, while subsequently the migration map of The Netherlands related to an interactively pointed municipality (or other aggregation unit) is shown. The data underlying the migration maps originate from the first name selection from the Civil Registration, acquired by Utrecht University and the Meertens Institute in 2006. These concern 16 million records from persons with Dutch citizenship, alive in 2006, and in addition 6 million persons deceased before 2006, but mentioned in other records – mainly as parents. The records include identifiers by which family relations can be reconstructed. After considerable efforts in data clearing and reconstruction of older generations, the data provide an almost complete overview of the Dutch population, born after 1930, and a fairly good sample from the period 1880-1930 (>25%). The user will be given options to choose generation (places of birth of the current population, their parents, grandparents grand-grandparents, or starting with the persons born between 1880-1900: the current places of residence of their children, grandchildren), and gender. Each map will be made available as a .csv record with municipality number and percentage as fields, and thus can be used by users in correlation studies with other variables. Utrecht University and the Meertens Institute have the signed permission of the "Basisadministratie voor Persoonsgegevens en Reisdocumenten, The Hague" to use the data for scientific purposes. The migration maps present the data in an aggregated way, and do not violate privacy requirements (no individual can possibly be identified from the maps). However, the underlying data containing information about individual persons and their family relations cannot be made available for reasons of privacy.
    Bloothooft, G, Onland, D and Kunst, J.P. 2017. Mapping Migration across Generations. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 351–360. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.29. License: CC-BY 4.0
    Ekamper, P. en Bloothooft, G. (2013), "Weg van je wortels. De afstand tussen overgrootouders en achterkleinkinderen", DEMOS 29, 2, p8.
  • TiCClops: Text-Induced Corpus Clean-up online processing system

    TICCL (Text Induced Corpus Clean-up) is a system that is designed to search a corpus for all existing variants of (potentially) all words occurring in the corpus. This corpus can be one text, or several, in one or more directories, located on one or more machines. TICCL creates word frequency lists, listing for each word type how often the word occurs in the corpus. These frequencies of the normalized word forms are the sum of the frequencies of the actual word forms found in the corpus. TICCL is a system that is intended to detect and correct typographical errors (misprints) and OCR errors (optical character recognition) in texts. When books or other texts are scanned from paper by a machine, that then turns these scans, i.e. images, into digital text files, errors occur. For instance, the letter combination `in' can be read as `m', and so the word `regeering' is incorrectly reproduced as `regeermg'. TICCL can be used to detect these errors and to suggest a correct form. Text-Induced Corpus Clean-up (TICCL) was developed first as a prototype at the request of the Koninklijke Bibliotheek - The Hague (KB) and reworked into a production tool according to KB specifications (currently at production version 2.0) mainly during the second half of 2008. It is a fully functional environment for processing possibly very large corpora in order to largely remove the undesirable lexical variation in them. It has provisions for various input and output formats, is flexible and robust and has very high recall and acceptable precision. As a spelling variation detection system it is to the developer’s knowledge unique in making principled use of the input text as possible source for target output canonical forms. As such it is far less domain-sensitive than other approaches: the domain is largely covered by the input text collection. TICCL comes in two variants: one with a classic CLAM web application interface, and one with the PhilosTEI interface.
    Reynaert, M. (2008). All, and only, the errors: More complete and consistent spelling and OCR-error correction evaluation. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech, Morocco.
    Reynaert, M. (2010). Character confusion versus focus word-based correction of spelling and ocr variants in corpora. International Journal on Document Analysis and Recognition, pp 1-15, URL http://dx.doi.org/10.1007/s10032-010-0133-5
  • Usage

    The system here allows you to convert your book pages' images into editable text, presented in a particular text format called XML (eXtended Markup Language) of a particular type called Text-Encoding Initiative or TEI XML. This particular format was developed specifically for being able to mark-up or annotate the text you want to work on, i.e. to add all manner of further information to the actual text, e.g. to build a critical edition of it, which is most likely exactly what you want to do with your author's work.
    Betti, A, Reynaert, M and van den Berg, H. 2017. @PhilosTEI: Building Corpora for Philosophers. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 379–392. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.32. License: CC-BY 4.0
  • Namescape Visualizer

    Searching and visualizing Named Entities in modern Dutch novels. The named entity (NE) tagging and resolution in NameScape enables quantitative and repeatable research where previously only guesswork and anecdotal evidence was feasible. The visualisation module enables researchers with a less technical background to draw conclusions about functions of names in literary work and help them to explore the material in search of more interesting questions (and answers). Users from other communities (sociolinguistics, sentiment analysis, …) also benefit from the NE tagged data, especially since the NE recognizer is available as a web service, enabling researchers to annotate their own research data. Datasets in NameScape (total of 1.129 books): Corpus Sanders: A corpus of 582 Dutch novels written and published between 1970 and 2009 will. Corpus Huygens: Consists of 22 novels manually tagged with detailed named entity information. IPR for this corpus do not allow distribution. Corpus eBooks: Consists of 7000+ Dutch eBooks tagged automatically with basic NER features and person name Part information. IPR for this corpus do not allow distribution. Corpus SoNaR Books: 105 Dutch books; NE tagged. Corpus Gutenberg Dutch: Consists of 530 NE tagged TEI files converted from the Epub versions of the corresponding Gutenberg documents. Recent research has conclusively proven names in literary works can only be put fully into perspective when studied in a wider context (landscape) of names either in the same text or in related material (the onymic landscape or “namescape”). Research on large corpora is needed to gain a better understanding of e.g. what is characteristic for a certain period, genre, author or cultural region. The data necessary for research on this scale simply does not exist yet. NameScape aims to fill the need by providing a substantial amount of literary works annotated with a rich tag set, thereby enabling researchers to perform their research in more depth than previously possible. Several exploratory visualization tools help the scholar to answer old questions and uncover many more new ones, which can be addressed using the demonstrator.
    de Does, J, Depuydt, K, van Dalen-Oskam, K and Marx, M. 2017. Namescape: Named Entity Recognition from a Literary Perspective. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 361–370. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.30. License: CC-BY 4.0
    Karina van Dalen-Oskam (2013), Nordic Noir: a background check on Inspector Van Veeteren, 31 May 2012, http://blog.namescape.nl/?p=47