The CLASSLA-Stanza model for lemmatisation of standard Bulgarian 2.1
The model for lemmatisation of standard Bulgarian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the BulTreeBank training corpus (https://clarino.uib.no/korpuskel/corpora) and using the Bulgarian inflectional lexicon (Popov, Simov, and Vidinska 1998). The estimated F1 of the lemma annotations is ~98.93.
The difference to the previous version of the lemmatizer is that this version was trained using the new version of the Bulgarian word embeddings.