Author
|
Conference
|
Journal
|
Organization
|
Year
|
DOI
Look for results that meet for the following criteria:
since
equal to
before
between
and
Search in all domains
Limit my searches in the following domains
Agriculture Science
Arts & Humanities
Biology
Chemistry
Computer Science
Economics & Business
Engineering
Environmental Sciences
Geosciences
Material Science
Mathematics
Medicine
Physics
Social Science
Multidisciplinary
Keywords
(7)
Human Behavior
Large Data
Machine Learning
part-of-speech tagging
Semantic Web
User Interaction
Web Pages
Related Publications
(3)
The Semantic Web" in Scientific American
Towards Ontology Generation from Tables
Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria
Subscribe
Academic
Publications
The Unreasonable Effectiveness of Data
Edit
The Unreasonable Effectiveness of Data
(
Citations: 35
)
BibTex
|
RIS
|
RefWorks
Download
Alon Y. Halevy
,
Peter Norvig
,
Fernando Pereira
At Brown University, there is excitement of having access to the Brown Corpus, containing one million English words. Since then, we have seen several notable corpora that are about 100 times larger, and in 2006, Google released a trillion-word corpus with frequency counts for all sequences up to five words long. In some ways this corpus is a step backwards from the Brown Corpus: it's taken from unfiltered
Web pages
and thus contains incomplete sentences, spelling errors, grammatical errors, and all sorts of other errors. It's not annotated with carefully hand-corrected part-of-speech tags. But the fact that it's a million times larger than the Brown Corpus outweighs these drawbacks. A trillion-word corpus - along with other Web-derived corpora of millions, billions, or trillions of links, videos, images, tables, and user interactions - captures even very rare aspects of human behavior. So, this corpus could serve as the basis of a complete model for certain tasks - if only we knew how to extract the model from the data.
Journal:
IEEE Expert / IEEE Intelligent Systems - EXPERT
, vol. 24, no. 2, pp. 8-12, 2009
DOI:
10.1109/MIS.2009.36
Cumulative
Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
(
dx.doi.org
)
(
ieeexplore.ieee.org
)
(
www.informatik.uni-trier.de
)
(
ieeexplore.ieee.org
)
(
ieeexplore.ieee.org
)
More »
Citation Context
(18)
...Researchers in data mining and machine translation have able to take advantage of Google’s index of billions of crowdsourced documents and trillions of words to show that simple learning algorithms that focus upon recognizing specific features outperform more conceptually sophisticated ones [
13
]...
Christopher Crick
,
et al.
Human and robot perception in large-scale learning from demonstration
...We believe that large-scale analysis of more complex medical data can be used to improve patient outcomes [22] [1] [17], in much the way big data has transformed other domains [
23
]...
Daniel J. Crichton
,
et al.
An informatics architecture for the Virtual Pediatric Intensive Care U...
...[
14
] for a thoughtful perspective on the role of data in computing...
David F. Gleich
,
et al.
Some computational tools for digital archive and metadata maintenance
...Enabled by the power of such web scale data, simple models can yield remarkable results in various real-world applications such as statistical machine translation [
17
]...
Jian Huang
,
et al.
Exploring web scale language models for search query processing
...[15, 22]) use labeled images to learn a discriminative visual dictionary through supervised learning, but we want here to be able to exploit a distributed environment with unlabeled data, which is more plentiful [
12
]...
Raphaël Marée
,
et al.
Incremental indexing and distributed image search using shared randomi...
References
(12)
The unreasonable effectiveness of mathematics in the natural sciences
(
Citations: 211
)
E. P. Wigner
Published in 1960.
A Comprehensive Grammar of the English Language
(
Citations: 897
)
Randolph Quirk
,
Sidney Greenbaum
,
Geoffrey Leech
,
Jan Svartvik
Published in 1985.
computational analysis of present-day american english
(
Citations: 2300
)
H. Kucera
,
W. N. Francis
Published in 1967.
Translating Queries into Snippets for Improved Query Expansion
(
Citations: 7
)
Stefan Riezler
,
Yi Liu
,
Alexander Vasserman
Conference:
International Conference on Computational Linguistics - COLING
, pp. 737-744, 2008
Learning to create data-integrating queries
(
Citations: 13
)
Partha Pratim Talukdar
,
Marie Jacob
,
Muhammad Salman Mehmood
,
Koby Crammer
,
Zachary G. Ives
,
Fernando Pereira
,
Sudipto Guha
Journal:
Proceedings of The Vldb Endowment - PVLDB
, vol. 1, no. 1, pp. 785-796, 2008
Order by:
Citations
(35)
Human and robot perception in large-scale learning from demonstration
(
Citations: 1
)
Christopher Crick
,
Sarah Osentoski
,
Graylin Jay
,
Odest Chadwicke Jenkins
Conference:
Human-Robot Interaction - HRI
, pp. 339-346, 2011
An informatics architecture for the Virtual Pediatric Intensive Care Unit
Daniel J. Crichton
,
Chris A. Mattmann
,
Andrew F. Hart
,
David Kale
,
Robinder G. Khemani
,
Patrick Ross
,
Sarah Rubin
,
Paul Veeravatanayothin
,
Amy Braverman
,
Cameron Goodale
,
Randall C. Wetzel
Conference:
IEEE Symposium on Computer-Based Medical Systems - CBMS
, pp. 1-6, 2011
Some computational tools for digital archive and metadata maintenance
David F. Gleich
,
Ying Wang
,
Xiangrui Meng
,
Farnaz Ronaghi
,
Margot Gerritsen
,
Amin Saberi
Published in 2011.
Supercharging Enterprise 2.0
Konstantinos Christidis
,
Gregoris Mentzas
,
Dimitris Apostolou
Journal:
It Professional
, vol. 13, no. 4, pp. 29-35, 2011
Modeling Player Experience for Content Creation
(
Citations: 15
)
Christopher Pedersen
,
Julian Togelius
,
Georgios N. Yannakakis
Journal:
IEEE Transactions on Computational Intelligence and AI in Games - TCIAIG
, vol. 2, no. 1, pp. 54-67, 2010