Result filters

Metadata provider

Language

Resource type

Availability

Loading...
703 record(s) found

Search results

  • FESLI: Functional elements in Specific Language Impairment

    Tool for the quantitative and qualitative comparison of the acquisition of functional elements (morphological inflection, articles, pronouns etcetera) in a corpus with data from monolingual and bilingual children (Dutch - Turkish) with and without Specific Language Impairment (SLI). The FESLI-data come from two NWO-sponsored projects: BiSLI and Variflex. The numbers of children included in the resources are: - 12 bilingual children without language impairment (SLI); - 25 monolingual children with SLI; - 20 bilingual children with SLI. The children´s ages ranged from 6;0 to 8:5. For more precise information about the specific age distribution in each group, the reader is referred the dissertation written by Antje Orgassa (http://dare.uva.nl/document/147433 (link is external)). The non-impaired children were included in the Variflex project (data collected by Elma Blom) and also used in the BiSLI project; the data from the children with SLI were exclusive to the biSLI project. The technology used in the FESLI web application is based on modules of the COAVA web application.
  • PaQu - Parse and Query

    PaQu uses the Alpino parser to make treebanks of your own text corpus, and to search in these treebanks using an interface based on the LASSY Word Relations Search interface (http://dev.clarin.nl/node/1966). Several treebanks are already available in the application, such as: Lassy Klein (1M words, manually checked syntactic analysis) and Lassy Groot (700M words, syntactic analysis automatically assigned by Alpino). PaQu offers two ways to search through the syntactically annotated texts. The first option is to use the search bar to look for word pairs, optionally complemented by their syntactic relationship. The second search option is to use the query language XPath.
    Odijk, J, van Noord, G, Kleiweg, P and Tjong Kim Sang, E. 2017. The Parse and Query (PaQu) Application. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 281–297. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.23. License: CC-BY 4.0
  • Gabmap is a free web-based application for dialectometry. It measures the differences in sets of phonetic (or phonemic) transcriptions via edit distance. Gabmap has a graphical user interface that makes string comparison facility available as a web application.

    Gabmap is a free web-based application for dialectometry. It measures the differences in sets of phonetic (or phonemic) transcriptions via edit distance. Gabmap has a graphical user interface that makes string comparison facility available as a web application. This enables wider experimentation with the techniques. Gabmap (a.k.a. ADEPT) measures pronunciation distances based on transcriptions and aligns pronunciation transcription data. Because the measurements are numeric, they can be aggregated in order to obtain an estimation of overall pronunciation differences among varieties. The software uses a range of edit distance (or Levenshtein) algorithms. It is useful for dialectologists, and has been used extensively in dialectology. It has occasionally been used for other purposes, e.g. trying to identify loan words automatically (Paris, Musée de l’Homme, central Asian project involving Turkic and also Indo-Iranian languages). The software has also been used as the basis of a program to multi-align pronunciation data for the purpose of phylogenetic analysis. The Gabmap developers claim that the program could also be used to measure deviant pronunciation e.g. of second-language learners, or of speakers with speech defects. A variety of related algorithms are implemented in the package of C programs (and R programs) the developers turned into a web application, including a basic version regarding segments only as same or different, and other versions variously respecting consonant/vowel distinctions; using phonetic segment distances as provided via an assignment of phonetic or phonological features to segments; using segment distances as learned from refining alignment correspondences; and applying weightings derived from (inverse) frequency (derived from Goebl’s work) or depending on the position within a word. There are useful auxiliary programs aimed at assisting users in converting phonetic data to X-SAMPA and at spotting errors. (In working with users in the past, the developers have noted that data conversion is a major hurdle.) There are additional meta-analytical calculations aimed at gauging how reliable the signal is from a given set of data, and aimed at comparing various options with respect to the degree to which they capture the geographic cohesion one assumes in dialectology. Gabmap was developed in the CLARIN-NL project ADEPT: Assaying Differences via Edit-Distance of Pronunciation Transcriptions.
    Nerbonne, J., Colen, R., Gooskens, C., Kleiweg, P., and Leinonen, T. (2011). Gabmap — A Web Application for Dialectology. Dialectologica, Special issue II, 65-89.
    T. Leinonen, Ç. Çöltekin, J. Nerbonne, Using Gabmap. Lingua Vol. 178, 71-83, doi:10.1016/j.lingua.2015.02.004
  • SHEBANQ: System for HEBrew Text: ANnotations for Queries and Markup

    The WIVU (Werkgroep Informatica Vrije Universiteit) Hebrew Text Database contains the Biblia Hebraica Stuttgartensia (BHS) version of the text of the Hebrew Bible. Portions of other Semitic languages are included as well: the Aramaic sections of the Old Testament, two Syriac versions, and annotated portions of the Syriac and Aramaic translations. All these texts have been enriched with features that primarily result from linguistic analysis. The database can be queried by means of a language that is optimized to deal with data that is modeled as objects + features. SHEBANQ builds a bridge between the linguistically annotated Hebrew Text corpus and biblical scholars by (1) making this text, including its annotations, available to scholars; (2) demonstrating how queries can function to address research questions; the query saver and the metadata added to them will be a growing repository of valuable best practices of what queries are used in addressing research questions and how they contribute to answering these questions; (3) giving textual scholarship a more empirical basis, by creating the opportunity that claims made in scholarly articles (e.g.: “this syntactic pattern is not attested elsewhere in the Hebrew Bible”) can be accompanied by the unique identifiers that refer to the saved queries that have led to the claim. The WIVU database is a resource under long-term development. New features are being added, new corrections are being made over time.
    Roorda, D. 2017. The Hebrew Bible as Data: Laboratory - Sharing - Experiences. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 217–229. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.18. License: CC-BY 4.0
    Roorda, D. (2015). The Hebrew Bible as Data: Laboratory - Sharing - Experiences http://arxiv.org/abs/1501.01866
    Roorda, D. (2014). LAF-Fabric: a data analysis tool for Linguistic Annotation Framework with an application to the Hebrew Bible, Computational Linguistics in the Netherlands Journal, Volume 4, December 2014, pp. 105-109 http://www.clinjournal.org/sites/clinjournal.org/files/08-Roorda-etal-CLIN2014.pdf and http://arxiv.org/abs/1410.0286
  • WFT-GTB: Integrating the Wurdboek fan ˈe Fryske Taal into the Geïntegreerde TaalBank

    The Dictionary of the Frisian Language (Wurdboek fan de Fryske Taal) is online available via the GTB dictionary web application. The GTB also holds other major Dutch historical dictionaries, such as the Dictionary of Old Dutch (ONW), the Dictionary of early Middle Dutch (VMNW), the Dictionary of Middle Dutch (MNW), and the Dictionary of the Dutch language (WNT). The digital surrounding enables extensive forms of free and structured search queries, including comparative studies with Dutch materials. The Wurdboek fan de Fryske Taal (Dictionary of the Frisian Language)-project includes the vocabulary of Modern West Frisian from the period 1800-1975. The dictionary’s metalanguage is Dutch. A volume of 400 pages comes out every year, the first one in 1984. The editorial phase was finalized in 2009, the final editing and publication phase in 2010.
    Modern Dutch Lemma and Frisian lemma
    Describes the origin of a word
    describes the meaning of a words
    describes the structure of a word
    describes the possible spellings of a word
    Depuydt, K, de Does, J, Duijff, P and Sijens, H. 2017. Making the Dictionary of the Frisian Language Available in the Dutch Historical Dictionary Portal. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 151–165. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.13. License: CC-BY 4.0
  • Polimedia: Interlinking multimedia for the analysis of media coverage of political debates

    PoliMedia links the minutes of the debates in the Dutch Parliament (Dutch Hansard) to the databases of historical newspapers and ANP radio bulletins to allow cross-media analysis of coverage in a uniform search interface. For each fragment from a single speaker in a debate, the developers extracted relevant information: the speaker, the date, important terms from its content and important terms from the description of the complete debate. This information was then combined to create a query with which they searched the archives of newspapers, radio bulletins and television programmes. Media items that corresponded to this query were retrieved and a link was created between the speech and the media item, creating a Semantic Web of Dutch Hansard and media coverage. This Semantic Web contains links from the Dutch Hansard to newspaper articles and radio bulletins. From evaluations it was found that there was a 62% recall and 75% precision. To navigate this Semantic Web, a search user interface was developed based on a requirements study with five scholars in history and political communication. The developers created a faceted search interface in which the Dutch parliamentary minutes can be searched in full-text and in which refinements can be performed based on the speaker, the role of the speaker (parliament of government), political party and year. These debates are presented with links to the original locations of the media items. Polimedia is a collaboration of the TU Delft and the Free University (development of Semantic Web of Dutch Hansard and media), the Netherlands Institute of Sound and Vision (development of the search user interface) and Erasmus University Rotterdam (projectleader and user research of historians and political communication researchers).
    Juric, D., Hollink, L., and Houben, G. (2013). Discovering links between political debates and media. The 13th International Conference on Web Engineering (ICWE'13). Aalborg, Denmark.
    Juric, D., Hollink, L., and Houben, G. (2012). Bringing parliamentary debates to the Semantic Web. DeRiVE workshop on Detection, Representation, and Exploitation of Events in the Semantic Web.
    Kemman, M. J., and Kleppe, M. (2013). PoliMedia - Improving Analyses of Radio, TV and Newspaper Coverage of Political Debates. In T. Aalberg and E. Al. (Eds.), TPDL2013, LCNS 8092 (pp. 409-412). Springer-Verlag Berlin Heidelberg.
    Kemman, M. J., Kleppe, M., and Maarseveen, J. (2013). Eye Tracking the Use of a Collapsible Facets Panel in a Search Interface. In T. Aalberg and E. Al. (Eds.), TPDL2013, LCNS 8092 (pp. 405-408). Springer-Verlag Berlin Heidelberg.
    Martijn Kleppe, Laura Hollink, Max Kemman, Damir Juric, Henri Beunders, Jaap Blom, Johan Oomen and Geert-Jan Houben. PoliMedia: Analysing Media Coverage of political debates by automatically generated links to Radio & Newspaper Items. http://ceur-ws.org/Vol-1124/linkedup_veni2013_04.pdf
  • VK: Verrijkt Koninkrijk (Enriched Kingdom)

    Dr Loe de Jong’s Het Koninkrijk der Nederlanden in de Tweede Wereldoorlog remains the most appealing history of German occupied Dutch society (1940-1945). Published between 1969 and 1991, the 14 volumes, consisting of 30 parts and 18,000 pages combine the qualities of an authoritative work for a general audience, and an inevitable point of reference for scholars. In VK this corpus is enriched with: - Tokenization, sentence splitting, part-of-speech tagging and lemmatization (done with the FROG software from Tilburg University); - Named entity recognition (done using UvA's NE tagger (specially trained for Dutch within the Stevin DuoMan project)); - Polarity tagging (positive/negative connotation of words) (done using UvA's FietsTas software (developed for Dutch within the Stevin DuoMan project)); - Named entity reconciliation by linking to Wikipedia (done using software developed by Edgar Meij (UvA)).
    REST web interface, HTTP GET
    De Boer, V., J. van Doornik, L. Buitinck, K. Ribbens, and T. Veken. Enriched Access to a Large War Historical Text using the Back of the Book Index. Extended abstract presented at the Workshop on Semantic Web and Information Extraction (SWAIE 2012), Galway, Ireland, 9 october 2012
    L. Buitinck and M.Marx, Two-Stage Named-Entity Recognition Using Averaged Perceptrons in proceedings of NDLB, Groningen, Netherlands, 2012. http://link.springer.com/chapter/10.1007%2F978-3-642-31178-9_17
  • Syntactic Profiler of Dutch

    SPOD is syntactic profiler that covers a broad spectrum of properties. It is part of the PaQu application but has its own interface with a menu of predefined queries. It can be used to provide general information about corpus properties, such as the number of main and subordinate clauses, types of main and subordinate clauses, and their frequencies, average length of clauses (per clause type: e.g. relative clauses, indirect questions, finite complement clauses, infinitival clauses, finite adverbial clauses, etc.). It yields output in HTML and tab-separated text format.