Chem X Seer: A Web Search Engine and Repository for e-Chemistry

Chem X Seer: A Web Search Engine and Repository for e-Chemistry,C. Lee Giles,Prasenjit Mitra,Karl Mueller,James Z. Wang,Bingjun Sun,Levent Bolelli,Yin

Chem X Seer: A Web Search Engine and Repository for e-Chemistry  
BibTex | RIS | RefWorks Download
C. Lee Giles, Prasenjit Mitra, Karl Mueller, James Z. Wang, Bingjun Sun, Levent Bolelli, Ying Liu, Isaac Councill, William Brower, Qingzhao Tan, Anuj Jaiswal, James Kubicki
Cyberinfrastructure or e-science has become crucial for scientific progress and open source systems have greatly facilitated design and implementation. In chemistry, the growth of data has been explosive and timely and effective information and data access is critical. We discuss our Chem X Seer (funded by NSF Chemistry) architecture, a portal and search engine for academic researchers in environmental chemistry, which integrates the scientific literature with experimental, analytical and simulation datasets. Chem X Seer consists of information crawled from the web, manual submission of scientific documents and user submitted datasets, as well as scientific documents and metadata provided by major publishers. Information gathered from the web is publicly accessible whereas access to restricted publisher resources will be provided by linking to their respective sites and users can control access to their data. Thus, instead of being a fully open search engine and repository, Chem X Seer will be a hybrid one, limiting access to some resources. Chem X Seer offers some unique aspects of search not yet present in other scientific search services or search engines. We have developed or are developing algorithms for the extraction of tables, figures, and chemical names and formulae from scientific documents enabling users to search on those fields. In particular Chem X Seer will provide the following search features: ∞ Full text search ∞ Author, affiliation, title and venue search ∞ Table search ∞ Figure search ∞ Chemical formulae and name search ∞ Citation and acknowledgement search ∞ Citation linking and statistics Chem X Seer takes advantage of many open source search and indexing tools such as Lucene and CiteSeer. For dataset search, we are developing tools that automatically annotate published data representations such as figures that permit researchers to annotate their datasets by providing both document-level and attribute- level metadata in OAI-PMH format. This level of data annotation permits more effective data search both at the attribute and semantic levels, and allows browsing of datasets and linking to existing scientific literature and other datasets in our and other repositories. Because Chem X Seer requires unique information extraction, several different machine learning methods, such as conditional random fields, support vector machines, mutual information based feature selection, sequence mining, are critical for performance. We give a progress report on Chem X Seer and draw lessons for other e-science and cyberinfrastructure systems in terms of design, implementation and research.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.