Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Automatic Meaning Discovery Using Google

R. Cilibrasi, und P. Vitanyi. (Dezember 2004)

Zusammenfassung

We propose a new method to extract semantic knowledge from the world-wide-web for both supervised and unsupervised learning using the Google search engine in an unconventional manner. The approach is novel in its unrestricted problem domain, simplicity of implementation, and manifestly ontological underpinnings. We give evidence of elementary learning of the semantics of concepts, in contrast to most prior approaches. The method works as follows: The world-wide-web is the largest database on earth, and it induces a probability mass function, the Google distribution, via page counts for combinations of search queries. This distribution allows us to tap the latent semantic knowledge on the web. Shannon's coding theorem is used to establish a code-length associated with each search query. Viewing this mapping as a data compressor, we connect to earlier work on Normalized Compression Distance. We give applications in (i) unsupervised hierarchical clustering, demonstrating the ability to distinguish between colors and numbers, and to distinguish between 17th century Dutch painters; (ii) supervised concept-learning by example, using Support Vector Machines, demonstrating the ability to understand electrical terms, religious terms, emergency incidents, and by conducting a massive experiment in understanding WordNet categories; and (iii) matching of meaning, in an example of automatic English-Spanish translation.

Beschreibung

phd thesis version 2009-10-23

Links und Ressourcen

BibTeX-Schlüssel: cilibrasi2004
Eintragstyp: misc
Jahr: 2004
Monat: December
id: 4487
eprint: cs.CL/0412098
URL: http://arxiv.org/abs/cs.CL/0412098

@gerhard.wohlgenannts Tags hervorgehoben

Zitieren Sie diese Publikation

@misc{cilibrasi2004, abstract = {We propose a new method to extract semantic knowledge from the world-wide-web for both supervised and unsupervised learning using the Google search engine in an unconventional manner. The approach is novel in its unrestricted problem domain, simplicity of implementation, and manifestly ontological underpinnings. We give evidence of elementary learning of the semantics of concepts, in contrast to most prior approaches. The method works as follows: The world-wide-web is the largest database on earth, and it induces a probability mass function, the Google distribution, via page counts for combinations of search queries. This distribution allows us to tap the latent semantic knowledge on the web. Shannon's coding theorem is used to establish a code-length associated with each search query. Viewing this mapping as a data compressor, we connect to earlier work on Normalized Compression Distance. We give applications in (i) unsupervised hierarchical clustering, demonstrating the ability to distinguish between colors and numbers, and to distinguish between 17th century Dutch painters; (ii) supervised concept-learning by example, using Support Vector Machines, demonstrating the ability to understand electrical terms, religious terms, emergency incidents, and by conducting a massive experiment in understanding WordNet categories; and (iii) matching of meaning, in an example of automatic English-Spanish translation.}, added-at = {2009-10-23T10:49:34.000+0200}, author = {Cilibrasi, Rudi and Vitanyi, Paul M. B.}, biburl = {https://www.bibsonomy.org/bibtex/2981021c72322f99081cca1fecdc5e687/gerhard.wohlgenannt}, description = {phd thesis version 2009-10-23}, eprint = {cs.CL/0412098}, id = {4487}, interhash = {d0a6d81e08a236b41c69d12bad6406de}, intrahash = {981021c72322f99081cca1fecdc5e687}, keywords = {imported}, month = {December}, timestamp = {2009-10-23T10:49:38.000+0200}, title = {Automatic Meaning Discovery Using Google}, url = {http://arxiv.org/abs/cs.CL/0412098}, year = 2004 }

BibSonomy

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Automatic Meaning Discovery Using Google

Zusammenfassung

Beschreibung

Links und Ressourcen

Tags

Community

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen
(0)

BibSonomy

KopierenLöschenDiese Publikation zur Ablage hinzufügenCommunity-EintragVersionsverlauf dieses EintragsURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Automatic Meaning Discovery Using Google

Zusammenfassung

Beschreibung

Links und Ressourcen

Tags

Community

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen (0)

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Automatic Meaning Discovery Using Google

Kommentare und Rezensionen
(0)