Untersuchung von Active Learning Methoden für BERT-Modelle

Wischhusen, Jonathan

DC Field	Value	Language
dc.contributor.advisor	Zukunft, Olaf	-
dc.contributor.author	Wischhusen, Jonathan	-
dc.date.accessioned	2024-01-12T08:25:44Z	-
dc.date.available	2024-01-12T08:25:44Z	-
dc.date.created	2021-12-23	-
dc.date.issued	2024-01-12	-
dc.identifier.uri	http://hdl.handle.net/20.500.12738/14558	-
dc.description.abstract	Automatisierte Textklassifikation ist für viele praktische Anwendungen ein aussichtsreiches Analyse- und Moderationsinstrument. In der Praxis steht Textklassifikation jedoch oft teuren Annotationskosten und einem Ungleichgewicht der Klassen in den Trainingsdaten gegenüber. Active Learning beschreibt ein Paradigma, dass die Kosten für die Annotation signifikant senken kann, indem über mehrere Iterationen geeignete Daten gezielt annotiert werden. In Verbindung mit vortrainierten Sprachmodellen, wie BERT und seinen Variationen, ist Active Learning bisher wenig untersucht. Diese Arbeit untersucht verschiedene Active Learning Methoden für BERT-Modelle unter dem Problem der Mehr-Klassen-Textklassifikation. Der Fokus liegt auf Szenarien mit praxisnahen Startmengen und ihre Auswirkung für vielfältige Datensätze. Die Ergebnisse zeigen, dass Discriminate Active Learning im Umfeld der Untersuchung als einzige Methode über die Modelle und Daten hinweg signifikant besser ist als der Zufall. Andere Methoden sind in der Regel nicht besser als der Zufall, außer in einigen praxisnahen Situationen. Die Arbeit gewährt ebenfalls einen Einblick in den Einsatz des Menschen als Orakel. Durch eine Benutzerstudie wird beobachtet, dass für ein kleines Annotationsbudget Menschen eine konstante Leistung zeigen und Domänenwissen auch bei einfachen Kategorien hilfreich ist.	de
dc.description.abstract	Automated text classification is a promising analysis and moderation tool for many practical applications. In practice, text classification often faces expensive annotation costs and class imbalance in the training data. Active Learning describes a paradigm that can significantly reduce annotation costs by selectively annotating the most suitable data over multiple iterations. In conjunction with pre-trained language models, such as BERT and its variations, Active Learning has been little studied. This work investigates Active Learning for BERT models under the problem of multi-class text classification. The focus is on scenarios with practical warmstart sets and their impact for diverse datasets. The results show that Discriminate Active Learning is the only method that significantly outperforms the random baseline across models and datasets in the setting of the study. Other methods are outperfoming the baseline only for some real-world scenarios. The work also provides insight into the use of humans as an oracle. A user study concludes that humans show consistent performance over a small annotation budget and that domain knowledge is helpful even for simple categories.	en
dc.language.iso	de	en_US
dc.subject	Active Learning	en_US
dc.subject	BERT	en_US
dc.subject	Textklassifikation	en_US
dc.subject.ddc	004: Informatik	en_US
dc.title	Untersuchung von Active Learning Methoden für BERT-Modelle	de
dc.type	Thesis	en_US
openaire.rights	info:eu-repo/semantics/openAccess	en_US
thesis.grantor.department	Fakultät Technik und Informatik	en_US
thesis.grantor.department	Department Informatik	en_US
thesis.grantor.universityOrInstitution	Hochschule für Angewandte Wissenschaften Hamburg	en_US
tuhh.contributor.referee	Sarstedt, Stefan	-
tuhh.identifier.urn	urn:nbn:de:gbv:18302-reposit-167956	-
tuhh.oai.show	true	en_US
tuhh.publication.institute	Fakultät Technik und Informatik	en_US
tuhh.publication.institute	Department Informatik	en_US
tuhh.type.opus	Masterarbeit	-
dc.type.casrai	Supervised Student Publication	-
dc.type.dini	masterThesis	-
dc.type.driver	masterThesis	-
dc.type.status	info:eu-repo/semantics/publishedVersion	en_US
dc.type.thesis	masterThesis	en_US
dcterms.DCMIType	Text	-
tuhh.dnb.status	domain	en_US
item.creatorGND	Wischhusen, Jonathan	-
item.grantfulltext	open	-
item.openairetype	Thesis	-
item.advisorGND	Zukunft, Olaf	-
item.fulltext	With Fulltext	-
item.languageiso639-1	de	-
item.cerifentitytype	Publications	-
item.creatorOrcid	Wischhusen, Jonathan	-
item.openairecristype	http://purl.org/coar/resource_type/c_46ec	-
Appears in Collections:	Theses