The Text Tonsorium designs and enacts workflows that fulfil your goal. Here, the goal is set to "lemmatization of the input". Once in Text Tonsorium, you can refine or change the goal.
Tool used to automatically extract domain terminology from texts. In addition to terminology, the tool can be used to extract multi-word units. Terminology extraction is a helpful mechanism, among others: in creating domain dictionaries, resources for translating texts and document summaries, in developing an ontology of a given field, in document annotation and supporting the search for answers to questions.
Unlike simple tools built into word processors, this tool applies context-sensitive spelling rules rather than placing characters mechanically. It adds not only punctuation marks, but also dots after ordinal numbers and parentheses around parentheses.
Service integrates several keyword determination methods, including generative models and multi-label classification. The combination of several advanced techniques makes the results more reliable and accurate.
A service for automatically extracting information about what topics are covered in the texts. It uses topic modeling (LDA), which detects topics based on the co-occurrence of words in one document. The service assigns each document to several topics. The detected topic represents what a list of pairs: the word and the probability of its occurrence in the topic. It enables qualitative (detection of non-obvious topics) and quantitative analysis of processed texts.
Service that is used to process literary texts to extract statistical information from them. The service allows, among others, lemmatization, determining parts of speech, characterization of verbs used in the text, creating a list of proper names and extracting statistics from the corpus of texts.
A portal that implements a transcription chain for batch processing of speech files using automatic speech recognition, the OCTRA editor, the Munich Automatic Segmentation MAUS and the EMU-webApp viewer.