Result filters

Metadata provider

Language

Resource type

Availability

Loading...
703 record(s) found

Search results

  • UPSKILLS Teaching and Learning Content

    This is a collection of modular teaching and learning content created in the UPSKILLS project ( UPgrading the SKIlls of Linguistics and Language Students) and downloaded from the Moodle platform in .mbz format. The learning content can be reused and adapted by curriculum designers, lecturers, and instructors of courses in linguistics and language-related subjects. Different blocks or individual units within a block can be combined to create new learning paths at the BA and MA levels. Some of the learning content is also suitable for the PhD level. Students can also use the content for self-study, considering this is not a MOOC (Massive Open Online Course). Before downloading the files, it is recommended to: - use the project URL to read the descriptions of each learning block on the UPSKILLS project website - use the demo link to preview the learning content on the Moodle platform and decide which learning blocks you would like to download. Each learning block in Moodle contains several units on different topics, including presentations, learning activities, assignments, and a final student project. Furthermore, we have included a short guide explaining how the materials are organised, and how they can be used and cited. Please note that the .mbz files can be used exclusively on Moodle systems, version 3.8+. The material can be directly imported in MBZ format without changes. If help is required, please consult the Moodle User Guide > Course Restore: https://docs.moodle.org/402/en/Course_restore. The "Processing Texts and Corpora" and "Introduction to Language Data: Standards and Repositories" contain interactive presentations and quizzes created in H5p, which means that the H5p plugin should be available in your Moodle instance to be able to view and reuse the content (both in code and as a plugin), tiles formats, stashes and badges. The badges are given as a separate downloadable file. Nevertheless, the H5P content can be downloaded directly from the UPSKILLS Moodle platform and reused outside Moodle. H5P is richer HTML5, which has become famous for creating interactive learning objects (e.g. presentations, videos, gamified learning activities). It is a free and open format, which can be used as a plugin in Learning Management Systems, such as Moodle, Blackboard, Brightspace, OpenEdX, etc., and Content Management Systems, such as WordPress, Drupal, and Canvas. See the H5P administrators' guides for more information:https://help.h5p.com/hc/en-us/sections/7556764070429-Guides. All UPSKILLS learning content is made available under the CC-BY 4.0 International license. This means you can copy and share it with others in any medium or format, even for commercial purposes. However, it is required that you give appropriate credit to the source, include the license link, and indicate whether any changes were made to the original content. To learn more about the UPSKILLS project, please visit the project website and the following guides: 1. Research-Based Teaching: Guidelines and Best Practices 2. Integrating Research Infrastructures into Teaching (this guide is especially relevant if you are interested in reusing the learning content created by CLARIN, namely Introduction to Language Data: Standards and Repositories) 3. Integrating Industry-Based Research into Teaching Finally, all project deliverables are accessible in the UPSKILLS Community on Zenodo: https://zenodo.org/communities/upskills/?page=1&size=20.
  • The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.2

    The model for lemmatisation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the Sloleks inflectional lexicon (http://hdl.handle.net/11356/1230). The estimated F1 of the lemma annotations is ~99.0. The difference to the previous version is that now it relies solely on XPOS annotations, and not on a combination of UPOS, FEATS (lexicon lookup) and XPOS (lemma prediction) annotations.
  • The CLASSLA-Stanza model for morphosyntactic annotation of standard Slovenian 2.0

    This model for morphosyntactic annotation of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) that were expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~98.27. The difference to the previous version of the model is that the model was trained using the SUK training corpus and uses new embeddings and the new version of the Slovene morphological lexicon Sloleks 3.0 (http://hdl.handle.net/11356/1745).
  • Terminal-based CoNLL-file viewer, v2

    A simple way of browsing CoNLL format files in your terminal. Fast and text-based. To open a CoNLL file, simply run: ./view_conll sample.conll The output is piped through less, so you can use less commands to navigate the file; by default the less searches for sentence beginnings, so you can use "n" to go to next sentence and "N" to go to previous sentence. Close by "q". Trees with a high number of non-projective edges may be difficult to read, as I have not found a good way of displaying them intelligibly. If you are on Windows and don't have less (but have Python), run like this: python view_conll.py sample.conll For complete instructions, see the README file. You need Python 2 to run the viewer.
  • VIADAT-SEARCH

    VIADAT-SEARCH in connection with VIADAT-REPO enables searching transcripts of oral history recordings. Language analysis has been used to preprocess the recordings, which makes it possible to search the fulltext using multiple criteria, including names, different forms of the same word etc. Developed in cooperation with ÚSD AV ČR and NFA.
  • Slovenian text summarization models

    A text summarisation task aims to convert a longer text into a shorter text while preserving the essential information of the source text. In general, there are two approaches to text summarization. The extractive approach simply rewrites the most important sentences or parts of the text, whereas the abstractive approach is more similar to human-made summaries. We release 5 models that cover extractive, abstractive, and hybrid types: Metamodel: a neural model based on the Doc2Vec document representation that suggests the best summariser. Graph-based model: unsupervised graph-based extractive approach that returns the N most relevant sentences. Headline model: a supervised abstractive approach (T5 architecture) that returns returns headline-like abstracts. Article model: a supervised abstract approach (T5 architecture) that returns short summaries. Hybrid-long model: unsupervised hybrid (graph-based and transformer model-based) approach that returns short summaries of long texts. Details and instructions to run and train the models are available at https://github.com/clarinsi/SloSummarizer. The web service with a demo is available at https://slovenscina.eu/povzemanje.