Result filters

Metadata provider

Language

Resource type

Availability

Loading...
698 record(s) found

Search results

  • ISOcat

    This service is no longer operational! The ISO TC37 Data Category Registry (DCR) was created in 2008 as one of the first ISO standards delivered in the form of a database (ISOcat). The Max Planck Institute for Psycholinguistics (MPI) has provided development, hosting, and support services and acted as the Registration Authority (RA) until December 2014. For users from the European CLARIN research infrastructure, the Meertens Institute develops and hosts a new registry for CLARIN relevant concepts based on the corresponding ISOcat data categories, such as those used for the Component MetaData Infrastructure (CMDI). This can be found here: http://portal.clarin.nl/node/4216. ISO 12620 provides a framework for defining data categories compliant with the ISO/IEC 11179 family of standards. According to this model, each data category is assigned a unique administrative identifier, together with information on the status or decision-making process associated with the data category. In addition, data category specifications in the DCR contain linguistic descriptions, such as data category definitions, statements of associated value domains, and examples. Data category specifications can be associated with a variety of data element names and with language-specific versions of definitions, names, value domains and other attributes. For now the entries of the Data Category Registry are still available in a static manner, i.e., can't be changed anymore. All Data Category Peristent IDentifiers, e.g., http://www.isocat.org/datcat/DC-4146 (link is external), remain resolvable. The public part of the registry can be browsed via the Guest workspace: http://www.isocat.org/rest/user/guest/workspace . new location for this data category registry is http://www.datcatinfo.net/ .
  • Namescape Named Entity Recognition

    Searching and visualizing Named Entities in modern Dutch novels. The named entity (NE) tagging and resolution in NameScape enables quantitative and repeatable research where previously only guesswork and anecdotal evidence was feasible. The visualisation module enables researchers with a less technical background to draw conclusions about functions of names in literary work and help them to explore the material in search of more interesting questions (and answers). Users from other communities (sociolinguistics, sentiment analysis, …) also benefit from the NE tagged data, especially since the NE recognizer is available as a web service, enabling researchers to annotate their own research data. Datasets in NameScape (total of 1.129 books): Corpus Sanders: A corpus of 582 Dutch novels written and published between 1970 and 2009 will. Corpus Huygens: Consists of 22 novels manually tagged with detailed named entity information. IPR for this corpus do not allow distribution. Corpus eBooks: Consists of 7000+ Dutch eBooks tagged automatically with basic NER features and person name Part information. IPR for this corpus do not allow distribution. Corpus SoNaR Books: 105 Dutch books; NE tagged. Corpus Gutenberg Dutch: Consists of 530 NE tagged TEI files converted from the Epub versions of the corresponding Gutenberg documents. Recent research has conclusively proven names in literary works can only be put fully into perspective when studied in a wider context (landscape) of names either in the same text or in related material (the onymic landscape or “namescape”). Research on large corpora is needed to gain a better understanding of e.g. what is characteristic for a certain period, genre, author or cultural region. The data necessary for research on this scale simply does not exist yet. NameScape aims to fill the need by providing a substantial amount of literary works annotated with a rich tag set, thereby enabling researchers to perform their research in more depth than previously possible. Several exploratory visualization tools help the scholar to answer old questions and uncover many more new ones, which can be addressed using the demonstrator.
    de Does, J, Depuydt, K, van Dalen-Oskam, K and Marx, M. 2017. Namescape: Named Entity Recognition from a Literary Perspective. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 361–370. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.30. License: CC-BY 4.0
    Karina van Dalen-Oskam (2013), Nordic Noir: a background check on Inspector Van Veeteren, 31 May 2012, http://blog.namescape.nl/?p=47
  • Gabmap is a free web-based application for dialectometry. It measures the differences in sets of phonetic (or phonemic) transcriptions via edit distance. Gabmap has a graphical user interface that makes string comparison facility available as a web application.

    Gabmap is a free web-based application for dialectometry. It measures the differences in sets of phonetic (or phonemic) transcriptions via edit distance. Gabmap has a graphical user interface that makes string comparison facility available as a web application. This enables wider experimentation with the techniques. Gabmap (a.k.a. ADEPT) measures pronunciation distances based on transcriptions and aligns pronunciation transcription data. Because the measurements are numeric, they can be aggregated in order to obtain an estimation of overall pronunciation differences among varieties. The software uses a range of edit distance (or Levenshtein) algorithms. It is useful for dialectologists, and has been used extensively in dialectology. It has occasionally been used for other purposes, e.g. trying to identify loan words automatically (Paris, Musée de l’Homme, central Asian project involving Turkic and also Indo-Iranian languages). The software has also been used as the basis of a program to multi-align pronunciation data for the purpose of phylogenetic analysis. The Gabmap developers claim that the program could also be used to measure deviant pronunciation e.g. of second-language learners, or of speakers with speech defects. A variety of related algorithms are implemented in the package of C programs (and R programs) the developers turned into a web application, including a basic version regarding segments only as same or different, and other versions variously respecting consonant/vowel distinctions; using phonetic segment distances as provided via an assignment of phonetic or phonological features to segments; using segment distances as learned from refining alignment correspondences; and applying weightings derived from (inverse) frequency (derived from Goebl’s work) or depending on the position within a word. There are useful auxiliary programs aimed at assisting users in converting phonetic data to X-SAMPA and at spotting errors. (In working with users in the past, the developers have noted that data conversion is a major hurdle.) There are additional meta-analytical calculations aimed at gauging how reliable the signal is from a given set of data, and aimed at comparing various options with respect to the degree to which they capture the geographic cohesion one assumes in dialectology. Gabmap was developed in the CLARIN-NL project ADEPT: Assaying Differences via Edit-Distance of Pronunciation Transcriptions.
    Nerbonne, J., Colen, R., Gooskens, C., Kleiweg, P., and Leinonen, T. (2011). Gabmap — A Web Application for Dialectology. Dialectologica, Special issue II, 65-89.
    T. Leinonen, Ç. Çöltekin, J. Nerbonne, Using Gabmap. Lingua Vol. 178, 71-83, doi:10.1016/j.lingua.2015.02.004
  • Usage

    The system here allows you to convert your book pages' images into editable text, presented in a particular text format called XML (eXtended Markup Language) of a particular type called Text-Encoding Initiative or TEI XML. This particular format was developed specifically for being able to mark-up or annotate the text you want to work on, i.e. to add all manner of further information to the actual text, e.g. to build a critical edition of it, which is most likely exactly what you want to do with your author's work.
    Betti, A, Reynaert, M and van den Berg, H. 2017. @PhilosTEI: Building Corpora for Philosophers. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 379–392. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.32. License: CC-BY 4.0
  • Arthurian Fiction

    This research tool provides information on medieval Arthurian narratives and the manuscripts in which they are transmitted throughout Europe. The tool discloses a database consists of linked records on over two hundred texts, more than thousand manuscripts and two hundred persons. The database is work in progress: a considerable number of records have yet to be completed, while fresh discoveries of narratives and manuscripts invite new entries. The compilers of the database hope that this tool will contribute to further research into Arthurian fiction as a pan-European phenomenon. The Arthurian Fiction web application enables searching for manuscripts, narratives and persons from the Arthurian Fiction narratives and manuscripts metadata database Arthurian Fiction Data. Each of these object types can be searched for using facets specific to the object type. These include: - for manuscripts: institute, date, origin, physical form, extant leave, leaf sizes, illustration type, scripts, scribe, patron and several more; - for narratives: date, origin, languages, cycle, manuscript, author, patron, verse type, meter, length, intertextuality properties and many more; - for persons: name, gender, subtype, background, manuscript, and narratives. The user can, if desired, select a subset of the facets to work with. In addition, keyword search is possible for all fields, query results can be sorted by a variety of keys and queries can be saved. There is also a web service with an API for the Arthurian Fiction narratives and manuscripts database. This web service makes use of SOLR queries via HTTP POST requests.
    This movie is in Dutch with English subtitles.
    Besamusca, A.A.M. and Quinlan, J. (2012). The Fringes of Arthurian Fiction. Arthurian literature, 29, 191-241.
    Boot, P. (2012), Manuscripten koning Arthur op tafel, E-Data & Research 7(1), 2012.
    Dalen-Oskam, K. van and Besamusca, B. (2011), Arthurian Fiction in Medieval Europe: Narratives and Manuscripts, presentation held at the CLARIN-NL Kick-off meeting Call 2, Utrecht, February 9, 2011.
    Dalen-Oskam, K. van (2011), ArthurianFiction, presentation held at the Call 3 information session, Utrecht, August 25, 2011.
  • BinPackage 0.4.4 (22.10)

    BinPackage is a Python Package that embeds the vocabulary of the DMII (https://bin.arnastofnun.is) and offers various lookups and queries of the data. The database, maintained by The Árni Magnússon Institute for Icelandic Studies, contains over 6.5 million entries, over 3.1 million unique word forms, and about 300,000 distinct lemmas. The database has been encapsulated in an easy-to-install Python package, and compressed from 400+ megabyte CSV file to an ~80 megabyte indexed binary structure. More information at: https://github.com/mideind/BinPackage BinPackage er Python-pakki utan um BÍN, Beygingarlýsingu íslensks nútímamáls (https://bin.arnastofnun.is), sem inniheldur yfir 6,5 milljónir færslna, 3,1 milljón einstakra orðmynda og um 300.000 stakar lemmur. Stofnun Árna Magnússonar í íslenskum fræðum heldur utan um gagnagrunninn. Gagnagrunninum, um 400 megabæta CSV-skrá, hefur verið pakkað í um 80 megabæta tvíundarbyggingu með vísum. Frekari upplýsingar á: https://github.com/mideind/BinPackage
  • Service for querying dependency treebanks Drevesnik 1.0

    Drevesnik (https://orodja.cjvt.si/drevesnik/) is an online service for querying syntactically parsed corpora in Slovenian using the Universal Dependencies annotation scheme with easy-to-use query language on the one hand and user-friendly graph visualizations on the other. It is based on the open-source dep_search tool (https://github.com/TurkuNLP/dep_search), which was localized and modified so as to also support querying by JOS morphosyntactic tags, random distribution of results, and filtering by sentence length. The source code and the documentation for the search backend and the web user interface are publicly available on the CLARIN.SI GitHub repository https://github.com/clarinsi/drevesnik. This submission corresponds to release 1.0: https://github.com/clarinsi/drevesnik/releases/tag/1.0.
  • Biaffine-based UD Parser for Icelandic 22.12

    ENGLISH: This Universal Dependencies parser for Icelandic was trained with Diaparser [1]. This version of it was trained on v2.11 of UD_Icelandic-IcePaHC [2] and UD_Icelandic-Modern [3]. (Note that texts in UD_Icelandic-Modern [3] labeled RUV_TGS_2017 and RUV_ESP_2017 were not included here as these were originally parsed with COMBO-based UD Parser 22.10 [4] and the output subsequently corrected.) The parser utilizes information from an ELECTRA language model [5]. Its UAS (unlabeled attachment score) is 89.58 and its LAS (labeled attachment score) is 86.46.   ICELANDIC: Þessi UD-þáttari var þjálfaður með Diaparser [1]. Þessi útgáfa hans var þjálfuð á útgáfu 2.11 af UD_Icelandic-IcePaHC [2] og UD_Icelandic-Modern [3]. (Ath. að textar í UD_Icelandic-Modern [3] merktir RUV_TGS_2017 og RUV_ESP_2017 voru ekki notaðir við þjálfunina þar sem þeir voru upphaflega þáttaðir með COMBO-based UD Parser 22.10 [4] og úttakið leiðrétt að því loknu.) Þáttarinn nýtir sér upplýsingar úr ELECTRA-mállíkani [5]. Hann skorar 89.58 á UAS (unlabeled attachment score) og 86.46 á LAS (labeled attachment score). [1] Diaparser: https://github.com/Unipisa/diaparser  [2] UD_Icelandic-IcePaHC: https://github.com/UniversalDependencies/UD_Icelandic-IcePaHC/  [3] UD_Icelandic-Modern: https://github.com/UniversalDependencies/UD_Icelandic-Modern/  [4] COMBO-based UD Parser 22.10: http://hdl.handle.net/20.500.12537/272 [5] electra-base-igc-is: https://huggingface.co/jonfd/electra-base-igc-is
  • Yfirlestur 1.0.0 (22.06)

    Yfirlestur.is is a public website where you can enter or submit your Icelandic text and have it checked for spelling and grammar errors. The tool also gives hints on words and structures that might not be appropriate, depending on the intended audience for the text. The core spelling and grammar checking functionality of Yfirlestur.is is provided by the GreynirCorrect engine, by the same authors. This software is licensed under the MIT License. More information at https://github.com/mideind/Yfirlestur.
  • GreynirCorrect4LT (1.0)

    This is a slightly adapted version of Miðeind's spell and grammar checker GreynirCorrect <CLARIN link: http://hdl.handle.net/20.500.12537/174> . This version is implemented for use in a text-to-speech text pre-processing pipeline, but includes guidelines for a quick adaptation to other use cases in language technology applications as well. [ICELANDIC] Þetta er lítillega aðlöguð útgáfa af málrýnitólinu GreynirCorrect <CLARIN link: http://hdl.handle.net/20.500.12537/174> til notkunar í textavinnslu fyrir talgervla. Einnig inniheldur útgáfan leiðbeiningar um það hvernig aðlaga má GreyniCorrect að öðrum notkunartilvikum í máltækni, þar sem þarfirnar kunna að vera aðrar en í málrýni fyrir almenna notendur.