CLARIN Tool Portal

UPSKILLS Teaching and Learning Content

14 resources

This is a collection of modular teaching and learning content created in the UPSKILLS project ( UPgrading the SKIlls of Linguistics and Language Students) and downloaded from the Moodle platform in .mbz format. The learning content can be reused and adapted by curriculum designers, lecturers, and instructors of courses in linguistics and language-related subjects. Different blocks or individual units within a block can be combined to create new learning paths at the BA and MA levels. Some of the learning content is also suitable for the PhD level. Students can also use the content for self-study, considering this is not a MOOC (Massive Open Online Course). Before downloading the files, it is recommended to: - use the project URL to read the descriptions of each learning block on the UPSKILLS project website - use the demo link to preview the learning content on the Moodle platform and decide which learning blocks you would like to download. Each learning block in Moodle contains several units on different topics, including presentations, learning activities, assignments, and a final student project. Furthermore, we have included a short guide explaining how the materials are organised, and how they can be used and cited. Please note that the .mbz files can be used exclusively on Moodle systems, version 3.8+. The material can be directly imported in MBZ format without changes. If help is required, please consult the Moodle User Guide > Course Restore: https://docs.moodle.org/402/en/Course_restore. The "Processing Texts and Corpora" and "Introduction to Language Data: Standards and Repositories" contain interactive presentations and quizzes created in H5p, which means that the H5p plugin should be available in your Moodle instance to be able to view and reuse the content (both in code and as a plugin), tiles formats, stashes and badges. The badges are given as a separate downloadable file. Nevertheless, the H5P content can be downloaded directly from the UPSKILLS Moodle platform and reused outside Moodle. H5P is richer HTML5, which has become famous for creating interactive learning objects (e.g. presentations, videos, gamified learning activities). It is a free and open format, which can be used as a plugin in Learning Management Systems, such as Moodle, Blackboard, Brightspace, OpenEdX, etc., and Content Management Systems, such as WordPress, Drupal, and Canvas. See the H5P administrators' guides for more information:https://help.h5p.com/hc/en-us/sections/7556764070429-Guides. All UPSKILLS learning content is made available under the CC-BY 4.0 International license. This means you can copy and share it with others in any medium or format, even for commercial purposes. However, it is required that you give appropriate credit to the source, include the license link, and indicate whether any changes were made to the original content. To learn more about the UPSKILLS project, please visit the project website and the following guides: 1. Research-Based Teaching: Guidelines and Best Practices 2. Integrating Research Infrastructures into Teaching (this guide is especially relevant if you are interested in reusing the learning content created by CLARIN, namely Introduction to Language Data: Standards and Repositories) 3. Integrating Industry-Based Research into Teaching Finally, all project deliverables are accessible in the UPSKILLS Community on Zenodo: https://zenodo.org/communities/upskills/?page=1&size=20.

Use "UPSKILLS Teaching and Learning Content"

DigiLing e-Learning Hub: e-Courses for Digital Linguistics

8 resources

The files represent exported e-learning resources created within the DigiLing project, www.digiling.eu. We have identified seven core subjects in Digital Linguistics and built seven corresponding courses: - Introduction to Text Processing and Analysis - Introduction to Python for Linguists - Computational Lexicology and Lexicography - Localization Tools and Workflows - Post-Editing Machine Translation - Mining and Managing Multilingual Terminology - Variability of Languages in Time and Space The data format is .mbz, a compressed archive compatible with any e-learning environment running Moodle.

Use "DigiLing e-Learning Hub: e-Courses for Digital Linguistics"

Service for querying dependency treebanks Drevesnik 1.0

2 resources

Drevesnik (https://orodja.cjvt.si/drevesnik/) is an online service for querying syntactically parsed corpora in Slovenian using the Universal Dependencies annotation scheme with easy-to-use query language on the one hand and user-friendly graph visualizations on the other. It is based on the open-source dep_search tool (https://github.com/TurkuNLP/dep_search), which was localized and modified so as to also support querying by JOS morphosyntactic tags, random distribution of results, and filtering by sentence length. The source code and the documentation for the search backend and the web user interface are publicly available on the CLARIN.SI GitHub repository https://github.com/clarinsi/drevesnik. This submission corresponds to release 1.0: https://github.com/clarinsi/drevesnik/releases/tag/1.0.

Use "Service for querying dependency treebanks Drevesnik 1.0"

Corpus extraction tool LIST 1.2

2 resources

The LIST corpus extraction tool is a Java program for extracting lists from text corpora on the levels of characters, word parts, words, and word sets. It supports VERT and TEI P5 XML formats and outputs .CSV files that can be imported into Microsoft Excel or similar statistical processing software. Version 1.2 adds support for Gigafida 2.0 in XML format and fixes a bug which disabled the extraction of character-level n-grams from normalized forms in the GOS 1.0 corpus.

Use "Corpus extraction tool LIST 1.2"

Q-CAT Corpus Annotation Tool 1.5

2 resources

The Q-CAT (Querying-Supported Corpus Annotation Tool) is a tool for manual linguistic annotation of corpora, which also enables advanced queries on top of these annotations. The tool has been used in various annotation campaigns related to the ssj500k reference training corpus of Slovenian (http://hdl.handle.net/11356/1210), such as named entities, dependency syntax, semantic roles and multi-word expressions, but it can also be used for adding new annotation layers of various types to this or other language corpora. Q-CAT is a .NET application, which runs on Windows operating system. Version 1.1 enables the automatic attribution of token IDs and personalized font adjustments. Version 1.2 supports the CONLL-U format and working with UD POS tags. Version 1.3 supports adding new layers of annotation on top of CONLL-U (and then saving the corpus as XML TEI). Version 1.4 introduces new features in command line mode (filtering by sentence ID, multiple link type visualizations) Version 1.5 supports listening to audio recordings (provided in the # sound_url comment line in CONLL-U)

Use "Q-CAT Corpus Annotation Tool 1.5"

Q-CAT Corpus Annotation Tool 1.4

2 resources

The Q-CAT (Querying-Supported Corpus Annotation Tool) is a tool for manual linguistic annotation of corpora, which also enables advanced queries on top of these annotations. The tool has been used in various annotation campaigns related to the ssj500k reference training corpus of Slovenian (http://hdl.handle.net/11356/1210), such as named entities, dependency syntax, semantic roles and multi-word expressions, but it can also be used for adding new annotation layers of various types to this or other language corpora. Q-CAT is a .NET application, which runs on Windows operating system. Version 1.1 enables the automatic attribution of token IDs and personalized font adjustments. Version 1.2 supports the CONLL-U format and working with UD POS tags. Version 1.3 supports adding new layers of annotation on top of CONLL-U (and then saving the corpus as XML TEI). Version 1.4 introduces new features in command line mode (filtering by sentence ID, multiple link type visualizations)

Use "Q-CAT Corpus Annotation Tool 1.4"

Service for querying dependency treebanks Drevesnik 1.1

2 resources

Drevesnik (https://orodja.cjvt.si/drevesnik/) is an online service for querying Slovenian corpora parsed with the Universal Dependencies annotation scheme. It features an easy-to-use query language on the one hand and user-friendly graph visualizations on the other. It is based on the open-source dep_search tool (https://github.com/TurkuNLP/dep_search), which was localized and modified so as to also support querying by JOS morphosyntactic tags, random distribution of results, and filtering by sentence length. The source code and the documentation for the search backend and the web user interface are publicly available on the CLARIN.SI GitHub repository https://github.com/clarinsi/drevesnik. This submission corresponds to release 1.1: https://github.com/clarinsi/drevesnik/releases/tag/1.1, which brings improved architecture, documentation and branding in comparison to release 1.0.

Use "Service for querying dependency treebanks Drevesnik 1.1"

Dependency tree extraction tool STARK 2.0

2 resources

STARK is a python-based command-line tool for extraction of dependency trees from parsed corpora, aimed at corpus-driven linguistic investigations of syntactic and lexical phenomena of various kinds. It takes a treebank in the CONLL-U format as input and returns a list of all relevant dependency trees with frequency information and other useful statistics, such as the strength of association between the nodes of a tree, or its significance in comparison to another treebank. For installation, execution and the description of various user-defined parameter settings, see the official project page at: https://github.com/clarinsi/STARK In comparison with v1, this version introduces several new features and improvements, such as the option to set parameters in the command line, compare treebanks or visualise results online.

Use "Dependency tree extraction tool STARK 2.0"

Corpus extraction tool LIST 1.3

2 resources

The LIST corpus extraction tool is a Java program for extracting lists from text corpora on the levels of characters, word parts, words, and word sets. It supports VERT and TEI P5 XML formats and outputs .CSV files that can be imported into Microsoft Excel or similar statistical processing software. Version 1.3 adds support for the KOST 2.0 Slovene Learner Corpus (http://hdl.handle.net/11356/1887) in XML format. It also allows program execution using the command line (see 00README.txt for details), and uses a later version of Java (tested using JDK 21). In addition, Windows users no longer need to have Java installed on their computers to run the program.

Use "Corpus extraction tool LIST 1.3"

Q-CAT Corpus Annotation Tool 1.1

2 resources

The Q-CAT (Querying-Supported Corpus Annotation Tool) is a computational tool for manual annotation of language corpora, which also enables advanced queries on top of these annotations. The tool has been used in various annotation campaigns related to the ssj500k reference training corpus of Slovenian (http://hdl.handle.net/11356/1210), such as named entities, dependency syntax, semantic roles and multi-word expressions, but it can also be used for adding new annotation layers of various types to this or other language corpora. Q-CAT is a .NET application, which runs on Windows operating system. Version 1.1 enables the automatic attribution of token IDs and personalized font adjustments.

Use "Q-CAT Corpus Annotation Tool 1.1"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

UPSKILLS Teaching and Learning Content

DigiLing e-Learning Hub: e-Courses for Digital Linguistics

Service for querying dependency treebanks Drevesnik 1.0

Corpus extraction tool LIST 1.2

Q-CAT Corpus Annotation Tool 1.5

Q-CAT Corpus Annotation Tool 1.4

Service for querying dependency treebanks Drevesnik 1.1

Dependency tree extraction tool STARK 2.0

Corpus extraction tool LIST 1.3

Q-CAT Corpus Annotation Tool 1.1