CLARIN Tool Portal

GrETEL

2 resources

Search engine for the exploitation of syntactically annotated corpora or treebanks

Pretrained models for recognising sex education concepts SemSEX 1.0

6 resources

Pretrained language models for detecting and classifying the presence of sex education concepts in Slovene curriculum documents. The models are PyTorch neural network models, intended for usage with the HuggingFace transformers library (https://github.com/huggingface/transformers). The models are based on the Slovenian RoBERTa contextual embeddings model SloBERTa 2.0 (http://hdl.handle.net/11356/1397) and on the CroSloEngual BERT model (http://hdl.handle.net/11356/1330). The source code of the model and example usage is available in GitHub repository https://github.com/TimotejK/SemSex. The models and tokenizers can be loaded using the AutoModelForSequenceClassification.from_pretrained() and the AutoTokenizer.from_pretrained() functions from the transformers library. An example of such usage is available at https://github.com/TimotejK/SemSex/blob/main/Concept%20detection/Classifiers/full_pipeline.py. The corpus on which these models have been trained is available at http://hdl.handle.net/11356/1895.

Use "Pretrained models for recognising sex education concepts SemSEX 1.0"

libfolia

1 resources

This is a C++ Library for working with the Format for Linguistic Annotation (FoLiA).

Use "libfolia"

Frog

1 resources

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. It performs automatic linguistic enrichment such as part of speech tagging, lemmatisation, named entity recognition, shallow parsing, dependency parsing and morphological analysis. All NLP modules are based on TiMBL.

Use "Frog"

mbt

1 resources

MBT is a memory-based tagger-generator and tagger in one. The tagger-generator part can generate a sequence tagger on the basis of a training set of tagged sequences; the tagger part can tag new sequences. MBT can, for instance, be used to generate part-of-speech taggers or chunkers for natural language processing. It has also been used for named-entity recognition, information extraction in domain-specific texts, and disfluency chunking in transcribed speech.

Use "mbt"

Glem

1 resources

GLEM is a lemmatizer for Ancient Greek.

Use "Glem"

T-Scan

1 resources

T-Scan is an analysis tool for Dutch texts to assess the complexity of the text, and is based on original work by Rogier Kraf

Use "T-Scan"

CLAM

1 resources

Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, and CLAM wraps around your application to form a fully fledged RESTful webservice.

Use "CLAM"

ucto

1 resources

Ucto tokenizes text files: it separates words from punctuation, and splits sentences. This is one of the first tasks for almost any Natural Language Processing application. Ucto offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation.

Use "ucto"

I-Analyzer

2 resources

I-analyzer is a tool for exploring corpora (large collections of texts). You can use I-analyzer to find relevant documents, or to make visualisations to understand broader trends in the corpus. The interface is designed to be accessible for users of all skill levels. I-analyzer is primarily intended for academic research and higher education. We focus on data that is relevant for the humanities, but we are open to datasets that are relevant for other fields.

Use "I-Analyzer"

Result filters

Metadata provider

Language

Resource type

Tool task

Field of study

Availability

Organisation

Project

Keywords

Active filters:

Search results

GrETEL

Pretrained models for recognising sex education concepts SemSEX 1.0

libfolia

Frog

mbt

Glem

T-Scan

CLAM

ucto

I-Analyzer