Sign in
Author
|
Conference
|
Journal
|
Organization
|
Year
|
DOI
Look for results that meet for the following criteria:
since
equal to
before
between
and
Search in all fields of study
Limit my searches in the following fields of study
Agriculture Science
Arts & Humanities
Biology
Chemistry
Computer Science
Economics & Business
Engineering
Environmental Sciences
Geosciences
Material Science
Mathematics
Medicine
Physics
Social Science
Multidisciplinary
Keywords
(9)
Document Representation
Edit Distance
Latent Semantic Indexing
Performance Evaluation
Semantic Information
Semantic Similarity
Similarity Relation
Xml Document
Vector Space Model
Subscribe
Academic
Publications
Integrating Element and Term Semantics for Similarity-Based XML Document Clustering
Integrating Element and Term Semantics for Similarity-Based XML Document Clustering,10.1109/WI.2005.80,Jianwu Yang,William K. Cheung,Xiaoou Chen
Edit
Integrating Element and Term Semantics for Similarity-Based XML Document Clustering
(
Citations: 4
)
BibTex
|
RIS
|
RefWorks
Download
Jianwu Yang
,
William K. Cheung
,
Xiaoou Chen
Structured link vector model (SLVM) is a recently proposed
document representation
that takes into account both structural and
semantic information
for measuring
XML document
similarity. Its formulation includes an element similarity matrix for capturing the
semantic similarity
between XML elements - the structural components of XML documents. In this paper, instead of applying heuristics to define the similarity matrix, we proposed to learn the matrix using pair-wise similar training data in an iterative manner. In addition, we extended SLVM to SLVM-LSI by incorporating term semantics into SLVM using latent semantic indexing, with the element similarity related properties of the original SLVM preserved. For performance evaluation, we applied SLVM-LSI to similarity-based clustering of two XML datasets and the proposed SLVM-LSI was found to significantly outperform the conventional
vector space model
and the edit-distance based methods. The similarity matrix, obtained as a by-product via the learning, can provide higher-level knowledge about the semantic relationship between the XML elements.
Conference:
Web Intelligence - WI
, pp. 222-228, 2005
DOI:
10.1109/WI.2005.80
Cumulative
Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
(
www.informatik.uni-trier.de
)
(
ieeexplore.ieee.org
)
(
ieeexplore.ieee.org
)
Citation Context
(3)
...Yang et al. [
101
], for example, calculate the similarity between documents by their representations in the Structured Linked Vector Model Latent Semantic Indexing (SLVM-LSI), which is their extension to the Structured Linked Vector Model (SLVM)...
...Milano et al. [71] String similarity No Ma and Chbeir [67] String similarity No Al techniques Yang et al. [
101
]S LVM-LSI No...
...Recent work, such as [90,
101
,102], have been considering both structure and data in the clustering process...
...The proposal in [
101
]a nalyzes how data contained inside XML documents are structured and how they relate to other close data...
...In [
101
], a similarity model is proposed in order to measure similarities scores between XML documents, which combines structure and contents of the XML documents...
...Yang et al. [
101
] XML document Clustering Large XML Luis et al.[66] Duplicate detection Data cleaning Small XML Park et al. [77] XML document Similarity queries Medium XML Carvalho et al. [18] Object identification Data cleaning Medium Semi-structured Weis and Naumann [95] Duplicate detection Data cleaning Large XML...
...Ma and Chbeir [67] Common and uncommon No NA Similarity Yang et al. [
101
] Common and uncommon No Learning Similarity Luis et al.[66] Common No Learning Similarity Park et al. [77] NA No NA Similarity Carvalho et al. [18] Common and uncommon No Comparison Similarity...
Carina Friedrich Dorneles
,
et al.
Approximate data instance matching: a survey
...To obtain an optimal Me for a specific type of XML data, we proposed in [
21
] to learn the matrix using pair-wise similar training data in an iterative manner...
Jianwu Yang
,
et al.
XML Document Classification Using Extended VSM
...As for the main differences, we can observe that: (i) the approach of [16] returns very accurate results but needs a deep analysis of the structural properties of an XML document; (ii) it does not consider semantic similarities among the concepts of involved sources; on the contrary, in our approach, this information plays an important role; (iii) the approach of [16] is extensional whereas ours is intensional. Approach of [
44
]...
...In [
44
] a framework exploiting matrix algebra for clustering XML documents is presented...
...We can recognize some similarities between our approach and that of [
44
]...
...As for the main differences between them, we observe that: (i) the approach of [
44
] considers only synonymies whereas our approach handles a wide range of interschema properties; (ii) the approach of [44] is quite sophisticated and precise, since it computes various statistics on the terms occurring in an XML source (e.g., the frequency of a term in a document); this allows accurate results to be obtained but requires a significant ...
...As for the main differences between them, we observe that: (i) the approach of [44] considers only synonymies whereas our approach handles a wide range of interschema properties; (ii) the approach of [
44
] is quite sophisticated and precise, since it computes various statistics on the terms occurring in an XML source (e.g., the frequency of a term in a document); this allows accurate results to be obtained but requires a significant ...
Pasquale De Meo
,
et al.
Semantics-Guided Clustering of Heterogeneous XML Schemas
References
(14)
Similarity Metric for XML Documents
(
Citations: 37
)
Zhongping Zhang
,
Rong Li
,
Shunliang Cao
,
Yangyong Zhu
Published in 2003.
Evaluating Structural Similarity in XML Documents
(
Citations: 207
)
Andrew Nierman
,
H. V. Jagadish
Conference:
International Workshop on the Web and Databases - WebDB
, pp. 61-66, 2002
On the Editing Distance Between Unordered Labeled Trees
(
Citations: 168
)
Kaizhong Zhang
,
Richard Statman
,
Dennis Shasha
Journal:
Information Processing Letters - IPL
, vol. 42, no. 3, pp. 133-139, 1992
Detecting Structural Similarities between XML Documents
(
Citations: 63
)
Sergio Flesca
,
Giuseppe Manco
,
Elio Masciari
,
Luigi Pontieri
,
Andrea Pugliese
Conference:
International Workshop on the Web and Databases - WebDB
, pp. 55-60, 2002
A Semi-Structured Document Model for Text Mining
(
Citations: 4
)
Jianwu Yang
,
Xiaoou Chen
Journal:
Journal of Computer Science and Technology - JCST
, vol. 17, no. 5, pp. 603-610, 2002
Sort by:
Citations
(4)
Approximate data instance matching: a survey
(
Citations: 2
)
Carina Friedrich Dorneles
,
Rodrigo Gonçalves
,
Ronaldo dos Santos Mello
Journal:
Knowledge and Information Systems - KAIS
, vol. 27, no. 1, pp. 1-21, 2011
Semantic clustering of XML documents
(
Citations: 4
)
Andrea Tagarelli
,
Sergio Greco
Journal:
ACM Transactions on Information Systems - TOIS
, vol. 28, no. 1, pp. 1-56, 2010
XML Document Classification Using Extended VSM
(
Citations: 4
)
Jianwu Yang
,
Fudong Zhang
Conference:
INitiative for the Evaluation of XML Retrieval - INEX
, pp. 234-244, 2007
Semantics-Guided Clustering of Heterogeneous XML Schemas
(
Citations: 1
)
Pasquale De Meo
,
Giovanni Quattrone
,
Giorgio Terracina
,
Domenico Ursino
Journal:
Journal on Data Semantics - JODS
, vol. 9, pp. 39-81, 2007