Result filters

Metadata provider

  • CSD Tools

Language

Resource type

Active filters:

  • Metadata provider: CSD Tools
  • Organisation: University of Amsterdam
Loading...
7 record(s) found

Search results

  • Namescape Visualizer

    Searching and visualizing Named Entities in modern Dutch novels. The named entity (NE) tagging and resolution in NameScape enables quantitative and repeatable research where previously only guesswork and anecdotal evidence was feasible. The visualisation module enables researchers with a less technical background to draw conclusions about functions of names in literary work and help them to explore the material in search of more interesting questions (and answers). Users from other communities (sociolinguistics, sentiment analysis, …) also benefit from the NE tagged data, especially since the NE recognizer is available as a web service, enabling researchers to annotate their own research data. Datasets in NameScape (total of 1.129 books): Corpus Sanders: A corpus of 582 Dutch novels written and published between 1970 and 2009 will. Corpus Huygens: Consists of 22 novels manually tagged with detailed named entity information. IPR for this corpus do not allow distribution. Corpus eBooks: Consists of 7000+ Dutch eBooks tagged automatically with basic NER features and person name Part information. IPR for this corpus do not allow distribution. Corpus SoNaR Books: 105 Dutch books; NE tagged. Corpus Gutenberg Dutch: Consists of 530 NE tagged TEI files converted from the Epub versions of the corresponding Gutenberg documents. Recent research has conclusively proven names in literary works can only be put fully into perspective when studied in a wider context (landscape) of names either in the same text or in related material (the onymic landscape or “namescape”). Research on large corpora is needed to gain a better understanding of e.g. what is characteristic for a certain period, genre, author or cultural region. The data necessary for research on this scale simply does not exist yet. NameScape aims to fill the need by providing a substantial amount of literary works annotated with a rich tag set, thereby enabling researchers to perform their research in more depth than previously possible. Several exploratory visualization tools help the scholar to answer old questions and uncover many more new ones, which can be addressed using the demonstrator.
    de Does, J, Depuydt, K, van Dalen-Oskam, K and Marx, M. 2017. Namescape: Named Entity Recognition from a Literary Perspective. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 361–370. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.30. License: CC-BY 4.0
    Karina van Dalen-Oskam (2013), Nordic Noir: a background check on Inspector Van Veeteren, 31 May 2012, http://blog.namescape.nl/?p=47
  • Namescape Named Entity Recognition

    Searching and visualizing Named Entities in modern Dutch novels. The named entity (NE) tagging and resolution in NameScape enables quantitative and repeatable research where previously only guesswork and anecdotal evidence was feasible. The visualisation module enables researchers with a less technical background to draw conclusions about functions of names in literary work and help them to explore the material in search of more interesting questions (and answers). Users from other communities (sociolinguistics, sentiment analysis, …) also benefit from the NE tagged data, especially since the NE recognizer is available as a web service, enabling researchers to annotate their own research data. Datasets in NameScape (total of 1.129 books): Corpus Sanders: A corpus of 582 Dutch novels written and published between 1970 and 2009 will. Corpus Huygens: Consists of 22 novels manually tagged with detailed named entity information. IPR for this corpus do not allow distribution. Corpus eBooks: Consists of 7000+ Dutch eBooks tagged automatically with basic NER features and person name Part information. IPR for this corpus do not allow distribution. Corpus SoNaR Books: 105 Dutch books; NE tagged. Corpus Gutenberg Dutch: Consists of 530 NE tagged TEI files converted from the Epub versions of the corresponding Gutenberg documents. Recent research has conclusively proven names in literary works can only be put fully into perspective when studied in a wider context (landscape) of names either in the same text or in related material (the onymic landscape or “namescape”). Research on large corpora is needed to gain a better understanding of e.g. what is characteristic for a certain period, genre, author or cultural region. The data necessary for research on this scale simply does not exist yet. NameScape aims to fill the need by providing a substantial amount of literary works annotated with a rich tag set, thereby enabling researchers to perform their research in more depth than previously possible. Several exploratory visualization tools help the scholar to answer old questions and uncover many more new ones, which can be addressed using the demonstrator.
    de Does, J, Depuydt, K, van Dalen-Oskam, K and Marx, M. 2017. Namescape: Named Entity Recognition from a Literary Perspective. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 361–370. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.30. License: CC-BY 4.0
    Karina van Dalen-Oskam (2013), Nordic Noir: a background check on Inspector Van Veeteren, 31 May 2012, http://blog.namescape.nl/?p=47
  • Namescape Barcode Browser

    Searching and visualizing Named Entities in modern Dutch novels. The named entity (NE) tagging and resolution in NameScape enables quantitative and repeatable research where previously only guesswork and anecdotal evidence was feasible. The visualisation module enables researchers with a less technical background to draw conclusions about functions of names in literary work and help them to explore the material in search of more interesting questions (and answers). Users from other communities (sociolinguistics, sentiment analysis, …) also benefit from the NE tagged data, especially since the NE recognizer is available as a web service, enabling researchers to annotate their own research data. Datasets in NameScape (total of 1.129 books): Corpus Sanders: A corpus of 582 Dutch novels written and published between 1970 and 2009 will. Corpus Huygens: Consists of 22 novels manually tagged with detailed named entity information. IPR for this corpus do not allow distribution. Corpus eBooks: Consists of 7000+ Dutch eBooks tagged automatically with basic NER features and person name Part information. IPR for this corpus do not allow distribution. Corpus SoNaR Books: 105 Dutch books; NE tagged. Corpus Gutenberg Dutch: Consists of 530 NE tagged TEI files converted from the Epub versions of the corresponding Gutenberg documents. Recent research has conclusively proven names in literary works can only be put fully into perspective when studied in a wider context (landscape) of names either in the same text or in related material (the onymic landscape or “namescape”). Research on large corpora is needed to gain a better understanding of e.g. what is characteristic for a certain period, genre, author or cultural region. The data necessary for research on this scale simply does not exist yet. NameScape aims to fill the need by providing a substantial amount of literary works annotated with a rich tag set, thereby enabling researchers to perform their research in more depth than previously possible. Several exploratory visualization tools help the scholar to answer old questions and uncover many more new ones, which can be addressed using the demonstrator.
    de Does, J, Depuydt, K, van Dalen-Oskam, K and Marx, M. 2017. Namescape: Named Entity Recognition from a Literary Perspective. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 361–370. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.30. License: CC-BY 4.0
    Karina van Dalen-Oskam (2013), Nordic Noir: a background check on Inspector Van Veeteren, 31 May 2012, http://blog.namescape.nl/?p=47
  • Namescape Search

    Searching and visualizing Named Entities in modern Dutch novels. The named entity (NE) tagging and resolution in NameScape enables quantitative and repeatable research where previously only guesswork and anecdotal evidence was feasible. The visualisation module enables researchers with a less technical background to draw conclusions about functions of names in literary work and help them to explore the material in search of more interesting questions (and answers). Users from other communities (sociolinguistics, sentiment analysis, …) also benefit from the NE tagged data, especially since the NE recognizer is available as a web service, enabling researchers to annotate their own research data. Datasets in NameScape (total of 1.129 books): Corpus Sanders: A corpus of 582 Dutch novels written and published between 1970 and 2009 will. Corpus Huygens: Consists of 22 novels manually tagged with detailed named entity information. IPR for this corpus do not allow distribution. Corpus eBooks: Consists of 7000+ Dutch eBooks tagged automatically with basic NER features and person name Part information. IPR for this corpus do not allow distribution. Corpus SoNaR Books: 105 Dutch books; NE tagged. Corpus Gutenberg Dutch: Consists of 530 NE tagged TEI files converted from the Epub versions of the corresponding Gutenberg documents. Recent research has conclusively proven names in literary works can only be put fully into perspective when studied in a wider context (landscape) of names either in the same text or in related material (the onymic landscape or “namescape”). Research on large corpora is needed to gain a better understanding of e.g. what is characteristic for a certain period, genre, author or cultural region. The data necessary for research on this scale simply does not exist yet. NameScape aims to fill the need by providing a substantial amount of literary works annotated with a rich tag set, thereby enabling researchers to perform their research in more depth than previously possible. Several exploratory visualization tools help the scholar to answer old questions and uncover many more new ones, which can be addressed using the demonstrator.
    de Does, J, Depuydt, K, van Dalen-Oskam, K and Marx, M. 2017. Namescape: Named Entity Recognition from a Literary Perspective. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 361–370. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.30. License: CC-BY 4.0
    Karina van Dalen-Oskam (2013), Nordic Noir: a background check on Inspector Van Veeteren, 31 May 2012, http://blog.namescape.nl/?p=47
  • WAHSP/BILAND: web application for (bilingual) historical sentiment mining in public media

    WAHSP/BLAND has been succeeded by TexCavator: http://texcavator.surfsaralabs.nl/
    WAHSP/BILAND is a research tool for historians that uses textual data of news media from the period 1863-1940 of the Koninklijke Bibliotheek and Staatsbibliothek zu Berlin as input material. One can search with single query terms or with combinations thereof. Apart from showing the articles that match the query, the results can be visualized by word clouds of single articles together with sentiment words highlighted, or by a word cloud of the whole result set together with newspaper statistics derived from their metadata. WAHSP/BILAND enables historians to collect and process large bi-lingual (Dutch and German) sets of opinionated text-data from news media and extract discourse identity and intensity patterns in two different countries with different scripts (e.g. Latin and Gothic). This tool offers a unique opportunity for non-technical humanities researchers to perform a new kind of historical e-research for studying changing opinions, notions and perceptions regarding public health and policy issues. The text mining tools for opinion/sentiment extraction that form the technological base for WAHSP/BILAND have been developed within the NTU/STEVIN DuOMAn project. The technology includes algorithms and tools for identification of polarity (positive/support or negative/criticism), sources (opinion-holders), frequency of items and specific targets of discourses. The tools and subjectivity lexicons are implemented as modules of ‘Fietstas’ 2, an web service for text analysis. Fietstas also provides other essential text processing modules (morphological normalization, format and encoding reconciliation, named entity recognition and normalization, etc.) and visualization modules (interactive word clouds and timelines). Fietstas has been developed and is being used for processing of large-scale datasets in the context of several projects, such as DuOMAn. A text translation service based on Machine Learning can be used to translate existing lexicons and documents between Dutch and German (both directions). The web application uses this functionality of Fietstas to leverage interactive creation, expansion and refinement of lexicons specific to the user’s research questions and needs. For BILAND new bilingual and biscriptural lexicons have been developed. The application uses the visualization features of Fietstas to allow users to examine the research domain along the dimensions of time, context, and the identity and frequency of the discourse. WAHSP/BILAND is meant to be generic and testable in all domains, where analysis of topics, contexts and attitudes in large volumes of text is needed.
    Snelders, S, Huijnen, P, Verheul, J, de Rijke, M and Pieters. T. 2017. A Digital Humanities Approach to the History of Culture and Science: Drugs and Eugenics Revisited in Early 20th-Century Dutch Newspapers, Using Semantic TextMining. In:Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 325–336. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.27. License: CC-BY 4.0
  • WIP: War in Parliament

    An advanced search engine for the OCR-ed scanned image collection of proceedings of the Dutch Hansard (Handelingen der Staten-Generaal 1930-1995). These proceedings are available as a fully annotated semi-structures dataset for historical and social science research. The output of the search engine can be restricted by speaker name, party, date range, and other criteria. References to the Second World War (WW II) have shaped political debate in the Netherlands for many decades. However, we have no systematic knowledge of why, how often, when, by whom or from which political party, and in which context, these references were made. Nor do we know the meanings politicians ascribed to the war years, the lessons the war was supposed to teach, and how all of this influenced political decision-making. WIP helps answering these questions and will help us better understand the complex legacies of WW II. The WIP project bridges the gap between historical and social science practices and the possibilities offered by using large corpora and language resources, in particular Clarin tools for Dutch. The dataset - de Handelingen der Staten-Generaal (Dutch Hansard) - are made compliant with Clarin, ISOCAT and ISO/TC 37/SC 4 standards. The search engine for this dataset uses an intuitive and powerful query language based on XPath, and its output can be fed directly into further analysis programs like SPSS. Integrating this technology with important historical research questions will directly contribute to new and innovative ways of writing about history. The search engine results can be exported in a CSV-format (comma seperated values). This makes it easy to calculate statistics offline from a result set and apply further filters.
    Marx, M. (2011), Oorlog in de Kamer, NRC, March 3, 2011
    ‘Waarom politici graag over de oorlog praten’, NRC-Handelsblad, 25 februari 2011
    ‘Zoekmachine vindt relevante WO2-verwijzingen in Handelingen der Staten Generaal. Dat doet denken aan de oorlog’, in: E-data & research, Jaargang 6, nummer 2, oktober 2011 http://www.edata.nl/0602_011011/pdf/0602_011011_1.pdf
    ‘NIOD ontwikkelt zoekmachine die verwijzingen naar de oorlog opspoort’, in: Informatie Professional. Vakblad voor informatiewerkers, nr. 11 (2011)
    L. Buitinck en M. Marx (2012), ‘Two-stage named entity recognition using averaged perceptrons’, in: Proc. NLDB 2012, pp. 171-176
  • Texcavator End-user Manual

    WAHSP/BLAND has been succeeded by TexCavator: http://texcavator.surfsaralabs.nl/
    Texcavator enables a researcher to use full-text search on the newspaper archive of the Dutch Royal Library. On top of that, it allows for visualizations like word clouds, time lines and heat maps. It also provides services to enhance your search experience like filtering, stopword removal, normalization and stemming. Texcavator also gives access to ShiCo (Shifting Concepts), developed by Carlos Martinez Ortiz (NL eScience Center).ShiCo is a tool for visualizing time shifting concepts. We refer to a concept as the set of words which are related to a given seed word. ShiCo uses a set of semantic models (word2vec) spanning a number of years to explore how concepts change over time -- words related to a given concept at time t=0 may differ from the words related to the same concept at time t=n . Texcavator originated from the earlier text mining applications WAHSP and BiLand. During the Translantis project, the application was renamed to Texcavator and further developed by the UvA (Fons Laan). In May 2014, development was taken over by the Netherlands eScience Center (Janneke van der Zwaan). From April 2015 onwards, Texcavator was developed at the Digital Humanities lab of Utrecht University (Julian Gonggrijp and Martijn van der Klis). ShiCo was created in cooperation with the NL eScience Center (Carlos Martinez Ortiz).
    Snelders, S, Huijnen, P, Verheul, J, de Rijke, M and Pieters. T. 2017. A Digital Humanities Approach to the History of Culture and Science: Drugs and Eugenics Revisited in Early 20th-Century Dutch Newspapers, Using Semantic TextMining. In:Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 325–336. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.27. License: CC-BY 4.0