Eine vergleichende Untersuchung zum Clustering von Textdokumenten

Nour Alhuda, Hasan

DC Element	Wert	Sprache
dc.contributor.advisor	Zukunft, Olaf	-
dc.contributor.author	Nour Alhuda, Hasan	-
dc.date.accessioned	2024-04-26T09:14:10Z	-
dc.date.available	2024-04-26T09:14:10Z	-
dc.date.issued	2022-03-20	-
dc.identifier.uri	http://hdl.handle.net/20.500.12738/15632	-
dc.description.abstract	Clustering-Analyse ist eines der Hauptforschungsgebiete der Künstlichen Intelligenz und Data-Minings. Ihre Anwendung auf Textdokumente nennt sich Dokument-Clustering, womit sich diese Arbeit insbesondere beschäftigt. Diese Art von Clustering bezeichnet die automatische Einteilung von Dokumenten in Clustern, sodass Dokumente innerhalb eines Clusters eine hohe Ähnlichkeit im Vergleich zu Dokumenten in anderen Clustern aufweisen. Das Fachgebiet hat eine wichtige Rolle in verschiedenen Bereichen wie Web-Mining, Suchmaschinen und Information-Retrieval gespielt. Im Rahmen dieser Arbeit werden zwei Clustering-Algorithmen, K-Means und DBScan, in Kombination mit drei verschiedenen Feature-Extraktionstechniken, TF-IDF, Word2Vec und BERT, eingesetzt bzw. untersucht. Die Leistung dieser Methoden wird anhand drei ausgewählter Datensätze unter Verwendung von Clustering-Bewertungsmetriken gemessen und entsprechend bewertet.	de
dc.description.abstract	Clustering analysis is one of the main research areas of artificial intelligence and data mining. Its application on text documents is called document clustering, which is the main focus of this thesis. This type of clustering refers to the automatic classification of documents into clusters, so that documents within one cluster would have high similarity compared to documents in other clusters. This topic has played an important role in various fields such as web mining, search engines and information retrieval. In this work, two clustering algorithms, K-Means and DBScan, are used in combination with three different feature extraction techniques, TF-IDF, Word2Vec and BERT. The performance of these methods is measured and examined based on three preselected data sets using clustering evaluation metrics.	en
dc.language.iso	de	en_US
dc.subject	Künstliche Intelligenz	en_US
dc.subject	Data-Mining	en_US
dc.subject	Dokument-Clustering	en_US
dc.subject	Web-Mining	en_US
dc.subject	Suchmaschinen	en_US
dc.subject	Information-Retrieval	en_US
dc.subject	K-Means	en_US
dc.subject	DBScan	en_US
dc.subject	TF-IDF	en_US
dc.subject	Word2Vec	en_US
dc.subject	BERT	en_US
dc.subject	Artificial Intelligence	en_US
dc.subject	Document Clustering	en_US
dc.subject	Web Mining	en_US
dc.subject	Search Engines	en_US
dc.subject	Information Retrieval	en_US
dc.subject.ddc	004: Informatik	en_US
dc.title	Eine vergleichende Untersuchung zum Clustering von Textdokumenten	de
dc.type	Thesis	en_US
openaire.rights	info:eu-repo/semantics/openAccess	en_US
thesis.grantor.department	Fakultät Technik und Informatik	en_US
thesis.grantor.department	Department Informatik	en_US
thesis.grantor.universityOrInstitution	Hochschule für Angewandte Wissenschaften Hamburg	en_US
tuhh.contributor.referee	Tropmann-Frick, Marina	-
tuhh.identifier.urn	urn:nbn:de:gbv:18302-reposit-185193	-
tuhh.oai.show	true	en_US
tuhh.publication.institute	Fakultät Technik und Informatik	en_US
tuhh.publication.institute	Department Informatik	en_US
tuhh.type.opus	Bachelor Thesis	-
dc.type.casrai	Supervised Student Publication	-
dc.type.dini	bachelorThesis	-
dc.type.driver	bachelorThesis	-
dc.type.status	info:eu-repo/semantics/publishedVersion	en_US
dc.type.thesis	bachelorThesis	en_US
dcterms.DCMIType	Text	-
tuhh.dnb.status	domain	en_US
item.fulltext	With Fulltext	-
item.creatorGND	Nour Alhuda, Hasan	-
item.advisorGND	Zukunft, Olaf	-
item.languageiso639-1	de	-
item.cerifentitytype	Publications	-
item.openairecristype	http://purl.org/coar/resource_type/c_46ec	-
item.creatorOrcid	Nour Alhuda, Hasan	-
item.grantfulltext	open	-
item.openairetype	Thesis	-
Enthalten in den Sammlungen:	Theses