Author
|
Conference
|
Journal
|
Organization
|
Year
|
DOI
Look for results that meet for the following criteria:
since
equal to
before
between
and
Search in all domains
Limit my searches in the following domains
Agriculture Science
Arts & Humanities
Biology
Chemistry
Computer Science
Economics & Business
Engineering
Environmental Sciences
Geosciences
Material Science
Mathematics
Medicine
Physics
Social Science
Multidisciplinary
Keywords
(6)
Machine Learning
Semantic Information
Web Pages
Information Bottleneck
Open Directory Project
World Wide Web
Related Publications
(2)
An EM Based Training Algorithm for Cross-Language Text Categorization
Transferring Naive Bayes Classifiers for Text Classification
Subscribe
Academic
Publications
Can chinese web pages be classified with english data source?
Edit
Can chinese web pages be classified with english data source?
(
Citations: 11
)
BibTex
|
RIS
|
RefWorks
Download
Xiao Ling
,
Gui-rong Xue
,
Wenyuan Dai
,
Yun Jiang
,
Qiang Yang
,
Yong Yu
As the
World Wide Web
in China grows rapidly, mining knowledge in Chinese
Web pages
becomes more and more important. Mining Web information usually relies on the
machine learning
techniques which require a large amount of labeled data to train credible models. Although the number of Chinese
Web pages
increases quite fast, it still lacks Chi-nese labeled data. However, there are relatively su cient English labeled Web pages. These labeled data, though in di erent linguistic representations, share a substantial amount of
semantic information
with Chinese ones, and can be utilized to help classify Chinese Web pages. In this pa-per, we propose an
information bottleneck
based approach to address this cross-language classification problem. Our algorithm first translates all the Chinese
Web pages
to En-glish. Then, all the Web pages, including Chinese and En-glish ones, are encoded through an
information bottleneck
which can allow only limited information to pass. Therefore, in order to retain as much useful information as possible, the common part between Chinese and English
Web pages
is in-clined to be encoded to the same code (i. e. class label), which makes the cross-language classification accurate. We evaluated our approach using the
Web pages
collected from
Open Directory Project
(ODP). The experimental results show that our method significantly improves several exist-ing supervised and semi-supervised classifiers.
Conference:
World Wide Web Conference Series - WWW
, pp. 969-978, 2008
DOI:
10.1145/1367497.1367628
Cumulative
Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
(
portal.acm.org
)
(
www.informatik.uni-trier.de
)
(
doi.acm.org
)
Citation Context
(8)
...In [
82
], Ling et al. proposed an information-theoretic approach for transfer learning to address the cross-language classification problem for translating webpages from English to Chinese...
Sinno Jialin Pan
,
et al.
A Survey on Transfer Learning
...Some scholars have proposed a method called translated learning [5] [
6
] to solve the problem of using labeled data from one feature space to enhance the classification of other entirely different learning spaces...
Geli Fei
,
et al.
Research on Domain-Adaptive Transfer Learning Method and Its Applicati...
...[
10
] proposes a information theory framework to address crosslanguage classification problem...
Sihong Xie
,
et al.
Latent space domain transfer between high dimensional overlapping dist...
...Classication results have been reported for various language pairs: e.g., English-Italian [18], English-Czech [16], English-Spanish [13], English-Japanese [10], and English-Chinese [
14
]...
...Even though the translated Web pages might not be easily readable by human readers, a machine-learned classier can still reliably classify MT output [
14
], which is also demonstrated in Table 1. Finally, the voting mechanism further increases the robustness of our method as it alleviates the impacts of irrelevant search results or partially incorrect translations...
Xuerui Wang
,
et al.
Cross-language query classification using web search for exogenous kno...
...Recently, transfer learning [
12
, 13] is designed to solve this problem...
Yabin Zheng
,
et al.
Efficient Text Classification Using Term Projection
References
(25)
An EM Based Training Algorithm for Cross-Language Text Categorization
(
Citations: 18
)
Leonardo Rigutini
,
Marco Maggini
,
Bing Liu
Conference:
Web Intelligence - WI
, pp. 529-535, 2005
\sc NewsWeeder}: learning to filter netnews
(
Citations: 610
)
Ken Lang
Conference:
International Conference on Machine Learning - ICML
, pp. 331-339, 1995
Automatic search engine performance evaluation with click-through data analysis
(
Citations: 11
)
Yiqun Liu
,
Yupeng Fu
,
Min Zhang
,
Shaoping Ma
,
Liyun Ru
Conference:
World Wide Web Conference Series - WWW
, pp. 1133-1134, 2007
An algorithm for suffix stripping
(
Citations: 3068
)
Martin Porter
Journal:
Program-electronic Library and Information Systems - PROGRAM-ELECTRON LIBR INFORM
, vol. 14, no. 3, pp. 130-137, 1980
Enhanced word clustering for hierarchical text classification
(
Citations: 61
)
Inderjit S. Dhillon
,
Subramanyam Mallela
,
Rahul Kumar
Conference:
Knowledge Discovery and Data Mining - KDD
, pp. 191-200, 2002
Order by:
Citations
(11)
A Survey on Transfer Learning
(
Citations: 72
)
Sinno Jialin Pan
,
Qiang Yang
Journal:
IEEE Transactions on Knowledge and Data Engineering - TKDE
, vol. 22, no. 10, pp. 1345-1359, 2010
Cross-Language Text Classification Using Structural Correspondence Learning
(
Citations: 1
)
Peter Prettenhofer
,
Benno Stein
Published in 2010.
Research on Domain-Adaptive Transfer Learning Method and Its Applications
Geli Fei
,
Dequan Zheng
Conference:
International Conference on Asian Language Processing - IALP
, pp. 162-165, 2010
Co-Training for Cross-Lingual Sentiment Classification
(
Citations: 15
)
Xiaojun Wan
Conference:
Meeting of the Association for Computational Linguistics - ACL
, pp. 235-243, 2009
Latent space domain transfer between high dimensional overlapping distributions
(
Citations: 4
)
Sihong Xie
,
Wei Fan
,
Jing Peng
,
Olivier Verscheure
,
Jiangtao Ren
Conference:
World Wide Web Conference Series - WWW
, pp. 91-100, 2009