Academic
Publications
Towards a Universal Text Classifier: Transfer Learning Using Encyclopedic Knowledge
Towards a Universal Text Classifier: Transfer Learning Using Encyclopedic Knowledge   (Citations: 1)
BibTex | RIS | RefWorks Download
Document classification is a key task for many text min- ing applications. However, traditional text classification requires labeled data to construct reliable and accurate classifiers. Unfortunately, labeled data are seldom avail- able, and often too expensive to obtain. In this work, we propose a universal text classifier, which does not require any labeled training document. Our approach simulates the capability of people to classify documents based on background knowledge. As such, we build a classifier that can effectively group documents based on their content, underthe guidanceof few words, whichwe call discriminant words, describing the classes of inter- est. Backgroundknowledgeis modeledusing encyclope- dic knowledge, namely Wikipedia. Wikipedia's articles related to the specific problem domain at hand are se- lected, and used during the learning process for predict- ing labels of test documents. The universal text classifier can also be used to perform document retrieval, in which the pool of test documents may or may not be relevant to the topics of interest for the user. In ourexperimentswith real data we test the feasibility of our approach for both the classification and retrieval tasks. The results demon- strate the advantage of incorporating backgroundknowl- edge through Wikipedia, and the effectiveness of mod- eling such knowledge via probabilistic topic modeling. The accuracy achieved by the universal text classifier is comparable to that of a supervised learning technique for transfer learning.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
Order by: