Author
|
Conference
|
Journal
|
Organization
|
Year
|
DOI
Look for results that meet for the following criteria:
since
equal to
before
between
and
Search in all domains
Limit my searches in the following domains
Agriculture Science
Arts & Humanities
Biology
Chemistry
Computer Science
Economics & Business
Engineering
Environmental Sciences
Geosciences
Material Science
Mathematics
Medicine
Physics
Social Science
Multidisciplinary
Keywords
(6)
Clustering Method
Feature Selection
Generalization Capability
Latent Semantic Analysis
Text Classification
Rule Based
Subscribe
Academic
Publications
Efficient Text Classification Using Term Projection
Edit
Efficient Text Classification Using Term Projection
BibTex
|
RIS
|
RefWorks
Download
Yabin Zheng
,
Zhiyuan Liu
,
Shaohua Teng
,
Maosong Sun
In this paper, we propose an efficient
text classification
method using term projection. Firstly, we use a modified χ2 statistic to project terms into predefined categories, which is more efficient compared to other clustering methods. Afterwards, we utilize the generated clusters as features to represent the documents. The classification is then performed in a rule-based manner or via SVM. Experiment results show that our modified χ2 statistic
feature selection
method outperforms traditional χ2 statistic especially at lower dimensionalities. And our method is also more efficient than
Latent Semantic Analysis
(LSA) on homogeneous dataset. Meanwhile, we can reduce the feature dimensionality by three orders of magnitude to save training and testing cost, and maintain comparable accuracy. Moreover, we could use a small training set to gain an approximately 4.3% improvement on heterogeneous dataset as compared to traditional method, which indicates that our method has better generalization capability.
Conference:
Asia Information Retrieval Symposium - AIRS
, pp. 230-241, 2009
DOI:
10.1007/978-3-642-04769-5_20
Cumulative
Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
(
www.springerlink.com
)
(
dx.doi.org
)
(
www.informatik.uni-trier.de
)
References
(16)
Machine learning in automated text categorization
(
Citations: 1858
)
Fabrizio Sebastiani
,
M S Sridhar
Journal:
ACM Computing Surveys - CSUR
, vol. 34, no. 1, pp. 1-47, 2002
Text categorization with support vector machines: learning with many relevant features
(
Citations: 2321
)
Thorsten Joachims
Conference:
The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - ECML
, 1998
Pattern Classiflcation (2nd ed.)
(
Citations: 2195
)
Richard O. Duda
,
Peter E. Hart
,
David G. Stork
Published in 2000.
A comparative study on feature selection in text categorization
(
Citations: 1737
)
Yiming Yang
,
Jan O. Pedersen
Conference:
International Conference on Machine Learning - ICML
, pp. 412-420, 1997
A Comparison and Semi-Quantitative Analysis of Words and Character-Bigrams as Features in Chinese Text Categorization
(
Citations: 9
)
Jingyang Li
,
Maosong Sun
,
Xian Zhang
Conference:
Meeting of the Association for Computational Linguistics - ACL
, 2006