This is the second version of the morpho-syntactic tagger for the Polish language, adapted to UGC-processing. It has been enriched with some heuristics to improve its accuracy and a tokenizer.
ENIAMtoolkit is a collection of libraries that:
- perform tokenization, lemmatization, part of speech tagging;
- detect MWE and abbreviations;
- split text into sentences;
- LCG parsing.
ENIAMtoolkit is a collection of libraries that:
- perform tokenization, lemmatization, part of speech tagging;
- detect MWE and abbreviations;
- split text into sentences.