Result filters

Metadata provider

Language

  • Icelandic

Resource type

Keywords

  • ocr

Active filters:

  • Language: Icelandic
  • Keywords: ocr
Loading...
2 record(s) found

Search results

  • OCR Post-Processing Transformer Model 23.04

    ENGLISH During the project L11 - Error models for OCR of The Language Technology Programme 2019-2023, various OCR post-processing models were trained. This is the best performing one. On texts from the 19th century to the early 20th century, it reduces word error rate from 6.49% to 3.08%, and character error rate from 1.39% to 0.73%. On modern texts, it reduces word error rate from 5.52% to 3.60% and character error rate from 1.17% to 1.0%. More info, such as how to use the model for inference, in README. ICELANDIC Í verkefninu L11 - Error models for OCR í Máltækniáætlun 2019-2023 voru nokkur ljóslestrarvilluleiðréttingarlíkön þjálfuð. Þetta er best þeirra. Líkanið lækkar hlutfall orðavillna (e. word error rate) úr 6,49% í 3,08% í textum frá 19. öld og fyrri hluta 20. aldar og hlutfall stafvillna úr 1,39% í 0,73%. Í nútímamálstextum lækkar það hlutfall orðavillna úr 5,52% í 3,60% og hlutfall stafvillna úr 1,17% í 1,0%. Nánari upplýsingar, svo sem hvernig má nota líkanið, er að finna í meðfylgjandi README-skjali.
  • OCR Post-Processing Tool for Icelandic 22.10

    ENGLISH: This entry consists of two trained transformer models to correct OCR errors, along with ca 50,000 line pairs of OCRed/corrected text. The models were trained on ca 900,000 lines (~7,000,000 tokens) of which only 50,000 (~400,000 tokens) were from real OCRed texts. It can be assumed that increasing the amount of such data can significantly improve the tool. More info in README.md. ICELANDIC: Þessi gagnahirsla inniheldur tvö þjálfuð transformer-líkön til leiðréttingar á ljóslestrarvillum, auk u.þ.b. 50.000 línupara úr ljóslesnum/leiðréttum textum. Líkönin voru þjálfuð á u.þ.b. 900.000 línum (~7.000.000 orð) en af þeim voru ekki nema um 50.000 (~400.000 orð) úr raunverulegum ljóslesnum gögnum. Ætla má að aukið magn slíkra gagna geti bætt tólið umtalsvert. Nánari upplýsingar í README.md.