Author
|
Conference
|
Journal
|
Organization
|
Year
|
DOI
Look for results that meet for the following criteria:
since
equal to
before
between
and
Search in all domains
Limit my searches in the following domains
Agriculture Science
Arts & Humanities
Biology
Chemistry
Computer Science
Economics & Business
Engineering
Environmental Sciences
Geosciences
Material Science
Mathematics
Medicine
Physics
Social Science
Multidisciplinary
Keywords
(5)
Digital Library
Machine Learning
Metadata Extraction
Research Paper
Support Vector Machine
Related Publications
(28)
Ontology research and development. Part 1 - a review of ontology generation
Knowledge-based metadata extraction from PostScript files
Learning-based linguistic indexing of pictures with 2--d MHMMs
CiteSeer: an autonous Web agent for automatic retrieval and identification of interesting publications
Learning Hidden Markov Model Structure for Information Extraction
Subscribe
Academic
Publications
Automatic document metadata extraction using support vector machines
Edit
Automatic document metadata extraction using support vector machines
(
Citations: 118
)
BibTex
|
RIS
|
RefWorks
Download
Hui Han
,
C. Lee Giles
,
Eren Manavoglu
,
Hongyuan Zha
,
Zhenyue Zhang
,
Edward A. Fox
Automatic metadata generation provides scalability and usability for digital libraries and their collections.
Machine learning
methods offer robust and adaptable automatic metadata extraction. We describe a
Support Vector Machine
classification-based method for
metadata extraction
from header part of research papers and show that it outperforms other
machine learning
methods on the same task. The method first classifies each line of the header into one or more of 15 classes. An iterative convergence procedure is then used to improve the line classification by using the predicted class labels of its neighbor lines in the previous round. Further
metadata extraction
is done by seeking the best chunk boundaries of each line. We found that discovery and use of the structural patterns of the data and domain based word clustering can improve the
metadata extraction
performance. An appropriate feature normalization also greatly improves the classification performance. Our
metadata extraction
method was originally designed to improve the
metadata extraction
quality of the digital libraries Citeseer [17] and EbizSearch[24]. We believe it can be generalized to other digital libraries.
Conference:
ACM/IEEE Joint Conference on Digital Libraries - JCDL
, pp. 37-48, 2003
DOI:
10.1109/JCDL.2003.1204842
Cumulative
Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
(
portal.acm.org
)
(
portal.acm.org
)
(
clgiles.ist.psu.edu
)
(
portal.acm.org
)
(
widit.slis.indiana.edu
)
(
csdl.computer.org
)
(
ieeexplore.ieee.org
)
(
ieeexplore.ieee.org
)
(
www.informatik.uni-trier.de
)
(
clgiles.ist.psu.edu
)
More »
Citation Context
(89)
...The header parser [
17
] extracts document title, author, abstract and affiliation information...
Pradeep Teregowda
,
et al.
Cloud Computing: A Digital Libraries Perspective
...According to studies, the existing approaches achieve excellent accuracy, significantly above 90%, sometimes close to 100% [
1
, 2, 3]. However, all existing approaches for extracting titles from PDF files have two shortcomings...
...Then, titles from the same PDFs were extracted with a Support Vector Machine from Cite-Seer [
1
] to compare results...
Jöran Beel
,
et al.
SciPlore Xtract: Extracting Titles from Scientific PDF Documents by An...
...Manual [2], semiautomatic [9] [10] [15], and automatic techniques [1] [5] [6] [7] [
8
] [14] were proposed to accommodate changes of web pages and to reduce the cost for developing and maintaining the wrappers...
Yaw-Huei Chen
,
et al.
Extracting Topics Information from Conference Web Pages Using Page Seg...
...Automatic metadata extraction methodologies can be classified into two main categories: machine learning methods [4][
5
][7][11] and other methods which based on rules combined with dictionaries and ontology [8][10][12]...
...According to [
5
], machine learning for information extraction include symbolic learning, inductive logic programming, grammar induction, Support Vector Machine, Hidden Markov models (HMMs), and statistical methods...
...In paper [
5
], authors suggested using SVM for automatic metadata extraction...
...In [7], authors also suggested automatic metadata extraction by using CRF (Conditional Random Fields) and their approach gave a comparable result with SVM in [
5
]...
...In [
5
][7] the Precision is from 86 % to 99%, the Recall is from 45% to 100%, the Accuracy is from 96% to 100% (depends on various metadata)...
...Similar to [
5
], we define these measures as following...
Tin Huynh
,
et al.
GATE framework based metadata extraction from scientific papers
...In addition to harvesting existing information, Machine learning technologies have been used for automatic metadata extraction, authors in [
6
] proposed a method to conduct metadata extraction from header part of scientific research papers...
Sahar Changuel
,
et al.
A General Learning Method for Automatic Title Extraction from HTML Pag...
References
(38)
The open archives initiative protocol for metadata harvesting
(
Citations: 74
)
H. Van De Sompel
,
C. Lagoze
Published in 2001.
Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering
(
Citations: 67
)
Hongyuan Zha
Conference:
Research and Development in Information Retrieval - SIGIR
, pp. 113-120, 2002
Digital Libraries and Autonomous Citation Indexing
(
Citations: 389
)
Steve Lawrence
,
C. Lee Giles
,
Kurt D. Bollacker
Journal:
IEEE Computer - COMPUTER
, vol. 32, no. 6, pp. 67-71, 1999
eBizSearch: an OAI-compliant digital library for eBusiness
(
Citations: 13
)
Yves Petinot
,
Pradeep B. Teregowda
,
Hui Han
,
C. Lee Giles
,
Steve Lawrence
,
Arvind Rangaswamy
,
Nirmal Pal
Conference:
ACM/IEEE Joint Conference on Digital Libraries - JCDL
, pp. 199-209, 2003
Distributional clustering of words for text classification
(
Citations: 317
)
Douglas Baker
,
Andrew Kachites McCallum
Conference:
Research and Development in Information Retrieval - SIGIR
, pp. 96-103, 1998
Order by:
Citations
(118)
Cloud Computing: A Digital Libraries Perspective
(
Citations: 2
)
Pradeep Teregowda
,
Bhuvan Urgaonkar
,
C. Lee Giles
Conference:
IEEE International Conference on Cloud Computing - CLOUD
, 2010
Automatic Spatial Metadata Update: a New Approach
(
Citations: 1
)
Hamed OLFAT
,
Abbas RAJABIFARD
,
Mohsen KALANTARI
Published in 2010.
oreChem ChemXSeer: a semantic digital library for chemistry
(
Citations: 1
)
Na Li
,
Leilei Zhu
,
Prasenjit Mitra
,
Karl Mueller
,
Eric Poweleit
,
C. Lee Giles
Conference:
ACM/IEEE Joint Conference on Digital Libraries - JCDL
, pp. 245-254, 2010
SciPlore Xtract: Extracting Titles from Scientific PDF Documents by Analyzing Style Information (Font Size)
Jöran Beel
,
Bela Gipp
,
Ammar Shaker
,
Nick Friedrich
Conference:
European Conference on Digital Libraries - ECDL
, pp. 413-416, 2010
Extracting Topics Information from Conference Web Pages Using Page Segmentation and SVM
Yaw-Huei Chen
,
Sin-Sian Li
,
Yu-Ta Chen
Conference:
International Conference on Technologies and Applications of Artificial Intelligence - TAAI
, 2010