Result filters

Metadata provider

Language

Resource type

Availability

Loading...
703 record(s) found

Search results

  • WAHSP/BILAND: web application for (bilingual) historical sentiment mining in public media

    WAHSP/BLAND has been succeeded by TexCavator: http://texcavator.surfsaralabs.nl/
    WAHSP/BILAND is a research tool for historians that uses textual data of news media from the period 1863-1940 of the Koninklijke Bibliotheek and Staatsbibliothek zu Berlin as input material. One can search with single query terms or with combinations thereof. Apart from showing the articles that match the query, the results can be visualized by word clouds of single articles together with sentiment words highlighted, or by a word cloud of the whole result set together with newspaper statistics derived from their metadata. WAHSP/BILAND enables historians to collect and process large bi-lingual (Dutch and German) sets of opinionated text-data from news media and extract discourse identity and intensity patterns in two different countries with different scripts (e.g. Latin and Gothic). This tool offers a unique opportunity for non-technical humanities researchers to perform a new kind of historical e-research for studying changing opinions, notions and perceptions regarding public health and policy issues. The text mining tools for opinion/sentiment extraction that form the technological base for WAHSP/BILAND have been developed within the NTU/STEVIN DuOMAn project. The technology includes algorithms and tools for identification of polarity (positive/support or negative/criticism), sources (opinion-holders), frequency of items and specific targets of discourses. The tools and subjectivity lexicons are implemented as modules of ‘Fietstas’ 2, an web service for text analysis. Fietstas also provides other essential text processing modules (morphological normalization, format and encoding reconciliation, named entity recognition and normalization, etc.) and visualization modules (interactive word clouds and timelines). Fietstas has been developed and is being used for processing of large-scale datasets in the context of several projects, such as DuOMAn. A text translation service based on Machine Learning can be used to translate existing lexicons and documents between Dutch and German (both directions). The web application uses this functionality of Fietstas to leverage interactive creation, expansion and refinement of lexicons specific to the user’s research questions and needs. For BILAND new bilingual and biscriptural lexicons have been developed. The application uses the visualization features of Fietstas to allow users to examine the research domain along the dimensions of time, context, and the identity and frequency of the discourse. WAHSP/BILAND is meant to be generic and testable in all domains, where analysis of topics, contexts and attitudes in large volumes of text is needed.
    Snelders, S, Huijnen, P, Verheul, J, de Rijke, M and Pieters. T. 2017. A Digital Humanities Approach to the History of Culture and Science: Drugs and Eugenics Revisited in Early 20th-Century Dutch Newspapers, Using Semantic TextMining. In:Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 325–336. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.27. License: CC-BY 4.0
  • DSS: Dutch Ships and Sailors

    A tool chain and methodology for converting legacy datasets in the area of maritime history. Set up to facilitate over 25 data sets, the initial population consists of 4 selected maritime-historical datasets. The maritime industry has been central to regional and global economic, social and cultural exchange. It is also one of the best historically documented sectors of human activity. Many aspects of it have been recorded by shipping companies, governments, newspapers and other institutions. In the past few decades, much of the data in the preserved historical source material has been digitized. Among the most interesting data are those on shipping movement and crew members. The Dutch Republic in the 17th and 18th centuries had to rely to a large extent on immigration to man its fleet. Especially in Asian waters, it also relied on Asian crews. Often information that deals with the same shipping movements and crew composition is spread over several historical sources and hence over several databases. The data often refers to the same ‘places’, ‘ships’, ‘persons’ and ‘events’. By linking the different available databases, the data complements and amplifies each other, and new research possibilities open up. Ideally, we would want to follow a ship from port to port, and crew members pursuing their careers from ship to ship. The Dutch Ships and Sailors project provides a tool chain and methodology for converting legacy datasets. The infrastructure includes common vocabularies to normalize and enrich existing data. Links are established between the datasets and to other relevant datasets. In doing so, Dutch Ships and Sailors builds a (semantic) web-based structure that aims to function as a future platform and infrastructure for maritime historical datasets. Initially, this portal contains the following datasets: - Historische Kranten of the Koninklijke Bibliotheek; - The Monsterrollen databases contains elaborate data on the crew composition of ships from the Northern Netherlands (c. 1800-1930) and provides information on the sailors involved, such as the places of origin, wage and age; - The databases VOC Opvarenden, providing extensive data on crews of VOC ships leaving the Republic; - The database Dutch-Asiatic Shipping, providing data on all inter-continental voyages of VOC ships; - The database Generale Zeemonsterrollen, providing data on the crew composition and sometimes location of VOC ships stationed in Asia and not engaged in inter-continental shipping.
    Victor de Boer, Jur Leinenga, Matthias van Rossum and Rik Hoekstra. Dutch Ships and Sailors Linked Data Cloud. AcIn Proceedings of the International Semantic Web Conference (ISWC 2014), 19-23 October, Riva del Garda, Italy, 2014.
    A. Bravo Balado. Information extraction on newspaper archives for historical research. a dutch maritime history case study. M.Sc. thesis VU University Amsterdam (forthcoming), 2014.
    Andrea Bravo Balado, Victor de Boer, and Guus Schreiber. Linking historical ship records to a newspaper archive. Proceedings of the 6th International Conference on Social Informatics (workshops). LNCS. ed. Luca Maria Aiello, Daniel McFarland, 2014.
    R.Ponstein. Reconciling dutch ships and sailors. M.Sc. thesis VU University Amsterdam, 2014.
  • MIMORE: Microcomparative Morphosyntax Research Tool

    With the MIMORE search engine one can search three databases together, with text strings, part of speech tags and syntactic variables. The researcher can combine categories and features into complex tags or use predefined tags. All categories and features are defined in ISOcat. Since all sentences have a location code, the morphosyntactic phenomena found in a set of sentences resulting from a search can be automatically plotted on a geographic map. It is possible to include more than one morphosyntactic phenomenon in one map, thus visualizing potential correlations between these phenomena. There is also a user-friendly function to export the data to a statistical program. The data in DynaSAND, the dynamic syntactic atlas of the Dutch dialects (http://www.meertens.knaw.nl/sand/ (link is external)), were collected between 2000 and 2005 by oral interviews (fieldwork and telephone) in about 300 locations across The Netherlands, Belgium and a small part of north-west France. Dialect speakers were asked to judge and/or translate some 150 test sentences. DynaSAND makes available the full recordings and transcriptions of these interviews. Together, the DynSAND data cover the syntactic variation in the Dutch language area in the left periphery of the clause (the complementizer system and complementizer agreement), variation in subject pronoun form depending on syntactic position, subject pronoun doubling, cliticization on YES/NO, the reflexive system, fronting constructions (Wh-clauses, relative clauses, topicalization), word order and morphological variation in verb clusters, negation and quantification. The data in DiDDD (Diversity in Dutch DP Design; http://www.meertens.knaw.nl/diddd/ (link is external)) were collected between 2005 and 2009 with oral and written interviews in about 200 locations in the Dutch language area, with a methodology highly parallel to DynaSAND. The data involve translations of and judgements on test sentences. For 29 interviews there are sound recordings which have been lined up with their transcriptions. The DIDDD data cover the morphosyntactic variation within nominal groups, in particular possessives, partitives, noun ellipsis, the demonstrative system, the numeral modification system, what-for constructions, quantitative er, adjectival inflection, negation and exclamatives. The data in GTRP (Goeman, Taeldeman, van Reenen Project; http://www.meertens.knaw.nl/mand/database/ (link is external)) were collected between 1979 and 2000 with oral interviews in about 600 locations in the Dutch language area. Informants were asked to translate words or short sentences. Part of the transcriptions have been lined up with the sound recordings. The morphological data in GTRP include plural forms of nouns, diminutives, gender on nouns and adjectives, comparatives, superlatives, verbal inflection including participles, subject, object and possessive pronouns.
    S. Barbiers, M. van Koppen, H. Bennis, N. Corver, MIcrocomparative MOrphosyntactic REsearch (MIMORE): Mapping partial grammars of Flemish, Brabantish and Dutch. Lingua Vol. 178, 5-31. doi:10.1016/j.lingua.2015.10.018
  • VLO: The Virtual Language Observatory

    The VLO is a faceted browser that shows the metadata records harvested from within the CLARIN joint metadata domain. Next to that it also shows part of the Language Resource metadata that can be harvested from the OLAC domain. The Virtual Language Observatory (VLO) is meant to be the open market place where users can find metadata descriptions of all language resources and tools/services which we can harvest from any useful and trusted source. Currently VLO contains more than 230.000 resources and more than 400 tools already. Different user interfaces are maintained to allow users to find and select resources such as a GoogleEarth overlay for geographic browsing, a facetted browser for easy search and browsing along major criteria and a normal cata- logue. The VLO machinery is ready to harvest various types of metadata that is offered via the OAI-PMH pro- tocol. It currently is harvesting data from OLAC, DFKI Tool registry, DOBES, DELAMAN partners, MPI registry, ELRA catalogue and the CLARIN Language Resource and Technology inventory which was meant as a simple registry for resources and tools from CLARIN members. VLO is based on the principle that metadata needs to be open.
    Van Uytvanck, D., Stehouwer, H., and Lampen, L. (2012). Semantic metadata mapping in practice: The Virtual Language Observatory. In N. Calzolari (Ed.), Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, May 23rd-25th, 2012 (pp. 1029-1034). European Language Resources Association (ELRA).
  • WIP: War in Parliament

    An advanced search engine for the OCR-ed scanned image collection of proceedings of the Dutch Hansard (Handelingen der Staten-Generaal 1930-1995). These proceedings are available as a fully annotated semi-structures dataset for historical and social science research. The output of the search engine can be restricted by speaker name, party, date range, and other criteria. References to the Second World War (WW II) have shaped political debate in the Netherlands for many decades. However, we have no systematic knowledge of why, how often, when, by whom or from which political party, and in which context, these references were made. Nor do we know the meanings politicians ascribed to the war years, the lessons the war was supposed to teach, and how all of this influenced political decision-making. WIP helps answering these questions and will help us better understand the complex legacies of WW II. The WIP project bridges the gap between historical and social science practices and the possibilities offered by using large corpora and language resources, in particular Clarin tools for Dutch. The dataset - de Handelingen der Staten-Generaal (Dutch Hansard) - are made compliant with Clarin, ISOCAT and ISO/TC 37/SC 4 standards. The search engine for this dataset uses an intuitive and powerful query language based on XPath, and its output can be fed directly into further analysis programs like SPSS. Integrating this technology with important historical research questions will directly contribute to new and innovative ways of writing about history. The search engine results can be exported in a CSV-format (comma seperated values). This makes it easy to calculate statistics offline from a result set and apply further filters.
    Marx, M. (2011), Oorlog in de Kamer, NRC, March 3, 2011
    ‘Waarom politici graag over de oorlog praten’, NRC-Handelsblad, 25 februari 2011
    ‘Zoekmachine vindt relevante WO2-verwijzingen in Handelingen der Staten Generaal. Dat doet denken aan de oorlog’, in: E-data & research, Jaargang 6, nummer 2, oktober 2011 http://www.edata.nl/0602_011011/pdf/0602_011011_1.pdf
    ‘NIOD ontwikkelt zoekmachine die verwijzingen naar de oorlog opspoort’, in: Informatie Professional. Vakblad voor informatiewerkers, nr. 11 (2011)
    L. Buitinck en M. Marx (2012), ‘Two-stage named entity recognition using averaged perceptrons’, in: Proc. NLDB 2012, pp. 171-176
  • Texcavator End-user Manual

    WAHSP/BLAND has been succeeded by TexCavator: http://texcavator.surfsaralabs.nl/
    Texcavator enables a researcher to use full-text search on the newspaper archive of the Dutch Royal Library. On top of that, it allows for visualizations like word clouds, time lines and heat maps. It also provides services to enhance your search experience like filtering, stopword removal, normalization and stemming. Texcavator also gives access to ShiCo (Shifting Concepts), developed by Carlos Martinez Ortiz (NL eScience Center).ShiCo is a tool for visualizing time shifting concepts. We refer to a concept as the set of words which are related to a given seed word. ShiCo uses a set of semantic models (word2vec) spanning a number of years to explore how concepts change over time -- words related to a given concept at time t=0 may differ from the words related to the same concept at time t=n . Texcavator originated from the earlier text mining applications WAHSP and BiLand. During the Translantis project, the application was renamed to Texcavator and further developed by the UvA (Fons Laan). In May 2014, development was taken over by the Netherlands eScience Center (Janneke van der Zwaan). From April 2015 onwards, Texcavator was developed at the Digital Humanities lab of Utrecht University (Julian Gonggrijp and Martijn van der Klis). ShiCo was created in cooperation with the NL eScience Center (Carlos Martinez Ortiz).
    Snelders, S, Huijnen, P, Verheul, J, de Rijke, M and Pieters. T. 2017. A Digital Humanities Approach to the History of Culture and Science: Drugs and Eugenics Revisited in Early 20th-Century Dutch Newspapers, Using Semantic TextMining. In:Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 325–336. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.27. License: CC-BY 4.0
  • Taalportaal, the linguistics of Dutch, Frisian and Afrikaans online.

    Taalportaal (or Language Portal) is an interactive knowledge base about Dutch, Frisian and Afrikaans. It provides access to a comprehensive and authoritative scientific grammar for these three languages.
    van der Wouden, T, Bouma, G, van deCamp, M, van Koppen, M, Landsbergen, F and Odijk, J. 2017. Enriching a Scientific Grammar with Links to Linguistic Resources: The Taalportaal. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 299–310. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.24. License: CC-BY 4.0
  • Dupira; the Dutch Parser for IR Applications

    Dupira is a rule-based parser, generated by means of the AGFL parser generator from the Dupira grammar, lexicon and fact tables. By means of transductions which are specified in the grammar (and can be modified), the parser transduces sentences to dependency graphs. Dupira was developed for practical applications in Information Retrieval and for Information Systems needing a Natural Language interface. Its intended users are computer scientists and computer professionals rather than linguists.