Result filters

Metadata provider

Language

Resource type

  • Unspecified

Organisation

Active filters:

  • Resource type: Unspecified
  • Keywords: Lithuanian
Loading...
6 record(s) found

Search results

  • Lithuanian speech-to-text Transcriber

    Speech to text automatic transcriber for Lithuanian is a containerized application implemented into 17 containers. It covers four areas: administrative, legal, medical and general spoken language. For the installation of Transcriber, we recommend the installation of Docker and Docker Compose. Demo service is provided at https://semantika.lt/Analysis/Transcriber, while IT solutions can be found at https://semantika.lt/Help/Info/Solutions. The transcription result is a set of files containing the same, but differently formatted information: plain text, a WebVTT format file (for subtitling purposes) and a file with data about transcription synchronization with the audio record. This latter file is intended for convenient editing of an audio-synchronized transcription. A transcription editor for this purpose can be found (http://semantikadocker.vdu.lt/files/transcription-editor-multi.zip).
  • Lithuanian keyboard for macOS users

    This keyboard driver allows easy access of the Lithuanian letters via conventional keyboard layout a.k.a. „Lithuanian letters instead of numbers“. Essential new feature of this layout is the extensive use of "dead key" technique to type the following single letters: • Lithuanian accented (ą̃, ū́, m̃, ė́ etc.); • Latvian; • Estonian; • Polish; • French; • German; • Scandinavian; • old Greek; • Russian.
  • Pedagogic Corpus of Lithuanian

    The Pedagogic Corpus of Lithuanian is a monolingual specialized corpus, prepared for learning and teaching Lithuanian in a foreign language classroom. The pedagogic corpus includes authentic Lithuanian texts, selected using such criteria as a learner-relevant communicative function and genre. Spoken language as well as written language are represented in the corpus. The size of the corpus is 669,000 tokens: 111,000 tokens from texts and spoken language for A1-A2 levels, 558,000 tokens from texts and spoken language for B1-B2 levels (according to the Common European Framework of Reference for Languages). The spoken component constitutes appr. 7.5 % of the Corpus. The written subpart of the corpus (containing 620,000 tokens) includes levelled texts from coursebooks and unlevelled texts from other sources. The texts from coursebooks and other sources could be classified into 29 text types (dialogs, narratives, information, etc.) and 4 groups according to the communicative aims: informational texts, educational texts, advertising and fiction. There are two types of searches in the corpus: simple and advanced (see „Search Tips“). Simple Search allows you to find instances of a search item (word form, lemma, two words) in the whole corpus, or particular part of the corpus (spoken or written texts). After selecting the written subcorpus, you can further select the text type (coursebooks or non-coursebook texts) and/or the genre of the written texts. Advanced Search allows you to use all the features of simple search and find some additional options. Since the Pedagogic Corpus is morphologically annotated, the advanced search allows you to search by grammatical features (e.g. part of speech, case, number, verb form, etc.). At https://kalbu.vdu.lt/mokymosi-priemones/mokomasis-tekstynas/ you can find truncated wordlists: list of lemmas, word forms (for the whole corpus, spoken and written components, and for each level), lists of particular part of speech in the whole corpus. The lists can be downloaded as .xlsx files. REFERENCE Kovalevskaitė, Jolanta and Rimkutė, Erika. "Pedagogic Corpus of Lithuanian: A New Resource for Learning and Teaching Lithuanian as a Foreign Language" Sustainable Multilingualism, vol.17, no.1, 2020, pp.197-230. https://doi.org/10.2478/sm-2020-0019