Fulltext available Open Access
DC FieldValueLanguage
dc.contributor.advisorTaefi, Tessa-
dc.contributor.authorZach, Sophie-
dc.date.accessioned2026-06-02T08:23:12Z-
dc.date.available2026-06-02T08:23:12Z-
dc.date.issued2025-07-10-
dc.identifier.urihttps://hdl.handle.net/20.500.12738/19399-
dc.description.abstractThis thesis investigates the influence of training language on the performance of handwriting text recognition (HTR) models. Two separate Vision Transformer-based models were trained using datasets in different languages, one with English data (IAM dataset), and another with German data (fhswf/german_handwriting). Both models were evaluated on their native test sets as well as on a cross-lingual test set to assess generalization and linguistic robustness. Quantitative evaluation using Character Error Rate (CER) and Word Error Rate (WER) shows a clear degradation in recognition performance when models are tested on a language different from their training set. This highlights the sensitivity of HTR models to language-specific features, even when based on language-agnostic decoding mechanisms like Connectionist Temporal Classification (CTC). A qualitative error analysis was conducted to illustrate how specific types of language-dependent character sequences contribute to recognition failures. Furthermore, a pipeline for n-gram-based error attribution on character level was implemented to explore whether misrecognitions correlate with language-dominant character patterns. Although the n-gram analysis could not be fully utilized due to insufficient cross-lingual performance, the results were discussed in the Appendix and the implemented tools remain available for future experimentation. The findings underscore the need for either multilingual training strategies or language-specific adaptation in practical HTR systems. The code is publicly available at: https://github.com/Mir0da/HTR-VT_Bachelor The german trained model is available at: https://huggingface.co/Mir0da/HTR-VT-german The english trained model is available at: https://huggingface.co/Mir0da/HTR-VT-englishen
dc.language.isoenen_US
dc.subject.ddc004: Informatiken_US
dc.titleComparison of language-specific HTR models : “Does the language of the training corpus affect the performance of a handwritten text recognition (HTR) model on crosslingual settings?”en
dc.typeThesisen_US
openaire.rightsinfo:eu-repo/semantics/openAccessen_US
thesis.grantor.departmentFakultät Design, Medien und Information (ehemalig, aufgelöst 10.2025)en_US
thesis.grantor.departmentDepartment Medientechnik (ehemalig, aufgelöst 10.2025)en_US
thesis.grantor.universityOrInstitutionHochschule für Angewandte Wissenschaften Hamburgen_US
tuhh.contributor.refereeSchumann, Sabine-
tuhh.identifier.urnurn:nbn:de:gbv:18302-reposit-240720-
tuhh.oai.showtrueen_US
tuhh.publication.instituteFakultät Design, Medien und Information (ehemalig, aufgelöst 10.2025)en_US
tuhh.publication.instituteDepartment Medientechnik (ehemalig, aufgelöst 10.2025)en_US
tuhh.type.opusBachelor Thesis-
dc.type.casraiSupervised Student Publication-
dc.type.dinibachelorThesis-
dc.type.driverbachelorThesis-
dc.type.statusinfo:eu-repo/semantics/publishedVersionen_US
dc.type.thesisbachelorThesisen_US
dcterms.DCMITypeText-
tuhh.dnb.statusdomainen_US
item.openairecristypehttp://purl.org/coar/resource_type/c_46ec-
item.cerifentitytypePublications-
item.openairetypeThesis-
item.fulltextWith Fulltext-
item.creatorGNDZach, Sophie-
item.grantfulltextopen-
item.languageiso639-1en-
item.creatorOrcidZach, Sophie-
item.advisorGNDTaefi, Tessa-
Appears in Collections:Theses
Files in This Item:
File Description SizeFormat
BA_A_comparison_of_language-specific_HTR_models.pdf1.05 MBAdobe PDFView/Open
Show simple item record

Google ScholarTM

Check

HAW Katalog

Check

Note about this record


Items in REPOSIT are protected by copyright, with all rights reserved, unless otherwise indicated.