Academic
Publications
A Comparative Study on Vietnamese Text Classification Methods

A Comparative Study on Vietnamese Text Classification Methods,10.1109/RIVF.2007.369167,Vu Cong Duy Hoang,Dien Dinh,Nguyen Le Nguyen,Hung Quoc Ngo

A Comparative Study on Vietnamese Text Classification Methods  
BibTex | RIS | RefWorks Download
Text classification concerns the problem of automatically assigning given text passages (or documents) into predefined categories (or topics). Whereas a wide range of methods have been applied to English text classification, relatively few studies have been done on Vietnamese text classification. Based on a Vietnamese news corpus, we present two different approaches for the Vietnamese text classification problem. By using the Bag Of Words - BOW and Statistical N-Gram Language Modeling - N-Gram approaches we were able to evaluate these two widely used classification approaches for our task and showed that these approaches could achieve an average of >95% accuracy with an average 79 minutes classifying time for about 14,000 documents (3 docs/sec). Additionally, we also analyze the advantages and disadvantages of each approach to find out the best method in specific circumstances.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.