PhD DATABASE

Title:  
Statistical Phrase Clustering and Hybrid Classifier for Text Classification
Abstract:  
The dissertation analyses automated text classification task, which is the assignment of text documents to one or more pre-defined thematic classes based on their contents. The main aim of this dissertation is to improve automated text classification accuracy. Experimental text classification system is divided into three separate phases: feature extraction, classifier learning and classifier evaluation. The dissertation thoroughly analyses existing text classification feature extraction approaches and classifiers, their influence to classification accuracy. A new feature extraction approach based on statistical phrase clustering, using sequential information theoretic clustering algorithm, is presented. A hybrid classifier, which was not used for text classification, is analyzed. An experimental investigation for the suitability of hybrid classifier using new feature extraction approach for text classification is presented in the dissertation. A modified hybrid classifier, which is original as it uses decision tree with error-based pruning approach, is proposed. The experimental investigations of English and Lithuanian text classification are presented. It was showed that a modified hybrid classifier was more accurate than the decision tree, multilayer neural network and Bajernee hybrid classifier. Statistical phrase clustering using a modified hybrid classifier with a small training set (2% of all 20 Newsgroups text corpus) allowed an increase in classification accuracy respectively by 10.5% and 11% comparing it to statistical phrase features and word features. When the training set was large (66.7% of all 20 Newsgroups text corpus), statistical phrase clustering allowed a minor increase in classification accuracy, however, the number of used features was decreased from 3000 to 300.
URL:  
Area of Science:  
Informatics
PhD Student:  
Nerijus Remeikis
E-mail:  
Scientific Adviser:  
prof. habil. dr. Ignas Skucas
E-mail:  
University:  
Vytautas Magnus University
City:  
Kaunas
Country:  
Lithuania