<rdf:RDF xmlns:burst="http://xmlns.com/burst/0.1/" xmlns:admin="http://webns.net/mvcb/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:cc="http://web.resource.org/cc/" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:swrc="http://swrc.ontoware.org/ontology#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><channel rdf:about="http://www.bibsonomy.org/burst/user/diego_ma/web_data_extraction"><title>BibSonomy publications for /user/diego_ma/web_data_extraction</title><link>http://www.bibsonomy.org/burst/user/diego_ma/web_data_extraction</link><description>BibSonomy BuRST Feed for /user/diego_ma/web_data_extraction</description><dc:date>2008-08-21T13:22:47+02:00</dc:date><items><rdf:Seq><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2527f1d41b4598dfc788a61c31c21e37e/diego_ma"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/25959de6883d786df9eae38e385183837/diego_ma"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2c5c24c789f4b4c9ab6bcc2678fac37af/diego_ma"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/28c6aec5e528a18d01427e6a601e2dd26/diego_ma"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/279604d45adda39d78379de3eb1af3f40/diego_ma"/><rdf:li rdf:resource="http://www.bibsonomy.org/bibtex/2edb8b670fb4dc8cdcebbbce2110149e0/diego_ma"/></rdf:Seq></items></channel><item rdf:about="http://www.bibsonomy.org/bibtex/2527f1d41b4598dfc788a61c31c21e37e/diego_ma"><title>Effective Web Data Extraction with Standard XML Technologies</title><link>http://www.bibsonomy.org/bibtex/2527f1d41b4598dfc788a61c31c21e37e/diego_ma</link><dc:creator>diego_ma</dc:creator><dc:date>2007-12-14T02:44:25+01:00</dc:date><dc:subject>web_data_extraction </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Jussi &lt;a href=&#034;http://www.bibsonomy.org/author/Myllymaki&#034;&gt;Myllymaki&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Proc. WWW10, &lt;/em&gt;(&lt;em&gt;2001&lt;/em&gt;)</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/web_data_extraction"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2527f1d41b4598dfc788a61c31c21e37e/diego_ma"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2527f1d41b4598dfc788a61c31c21e37e/diego_ma"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#InProceedings"/><owl:sameAs rdf:resource="http://citeseer.nj.nec.com/452335.html"/><swrc:date>Fri Dec 14 02:44:25 CET 2007</swrc:date><swrc:booktitle>Proc. WWW10</swrc:booktitle><swrc:title>Effective Web Data Extraction with Standard {XML} Technologies</swrc:title><swrc:year>2001</swrc:year><swrc:keywords>web_data_extraction </swrc:keywords><swrc:abstract>We discuss the problem of Web data extraction and describe an XML-based methodology whose goal extends far beyond simple ``screen scraping.&#039;&#039; An ideal data extraction process is able to digest target Web databases that are visible only as HTML pages, and create a local, identical replica of those databases as a result. What is needed in this process is much more than a Web crawler and set of Web site wrappers. A comprehensive data extraction process needs to deal with such roadblocks such as session identifiers, HTML forms, and client-side JavaScript, and data integration problems such as incompatible datasets and vocabularies, and missing and conflicting data. Proper data extraction also requires a solid data validation and error recovery service to handle data extraction failures, which are unavoidable...</swrc:abstract><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Jussi Myllymaki"/></rdf:_1></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/25959de6883d786df9eae38e385183837/diego_ma"><title>Web Mining Research: A Survey</title><link>http://www.bibsonomy.org/bibtex/25959de6883d786df9eae38e385183837/diego_ma</link><dc:creator>diego_ma</dc:creator><dc:date>2007-12-14T02:41:43+01:00</dc:date><dc:subject>web_data_extraction </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Raymond &lt;a href=&#034;http://www.bibsonomy.org/author/Kosala&#034;&gt;Kosala&lt;/a&gt;  and Hendrik &lt;a href=&#034;http://www.bibsonomy.org/author/Blockeel&#034;&gt;Blockeel&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;SIGKDD Explorations&lt;/em&gt;&lt;em&gt;2(1):1-15&lt;/em&gt;&lt;em&gt;July2000. &lt;/em&gt;</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/web_data_extraction"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/25959de6883d786df9eae38e385183837/diego_ma"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/25959de6883d786df9eae38e385183837/diego_ma"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><owl:sameAs rdf:resource="http://citeseer.nj.nec.com/459121.html"/><swrc:date>Fri Dec 14 02:41:43 CET 2007</swrc:date><swrc:journal>SIGKDD Explorations</swrc:journal><swrc:month>July</swrc:month><swrc:number>1</swrc:number><swrc:pages>1-15</swrc:pages><swrc:title>Web Mining Research: A Survey</swrc:title><swrc:volume>2</swrc:volume><swrc:year>2000</swrc:year><swrc:keywords>web_data_extraction </swrc:keywords><swrc:abstract>With the huge amount of information available online, the World Wide Web is a fertile area for data mining research. The Web mining research is at the cross road of research from several research communities, such as database, information retrieval, and within AI, especially the sub-areas of machine learning and natural language processing. However, there is a lot of confusions when comparing research efforts from different point of views. In this paper, we survey the research in the area of Web mining, point out some confusions regarded the usage of the term Web mining and suggest three Web mining categories. Then we situate some of the research with respect to these three categories. We also explore the connection between the Web mining categories and the related agent paradigm. For the survey, we focus on representation issues, on the process, on the learning algorithm, and on the application of the recent works as the criteria. We conclude the paper with some research issues.</swrc:abstract><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Raymond Kosala"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Hendrik Blockeel"/></rdf:_2></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/2c5c24c789f4b4c9ab6bcc2678fac37af/diego_ma"><title>Database Techniques for the World Wide Web: A Survey</title><link>http://www.bibsonomy.org/bibtex/2c5c24c789f4b4c9ab6bcc2678fac37af/diego_ma</link><dc:creator>diego_ma</dc:creator><dc:date>2007-12-14T02:38:52+01:00</dc:date><dc:subject>web_data_extraction </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Daniela &lt;a href=&#034;http://www.bibsonomy.org/author/Florescu&#034;&gt;Florescu&lt;/a&gt;  and Alon &lt;a href=&#034;http://www.bibsonomy.org/author/Levy&#034;&gt;Levy&lt;/a&gt;  and Alberto &lt;a href=&#034;http://www.bibsonomy.org/author/Mendelzon&#034;&gt;Mendelzon&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;ACM SIGMOD Record&lt;/em&gt;(&lt;em&gt;1998&lt;/em&gt;)</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/web_data_extraction"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2c5c24c789f4b4c9ab6bcc2678fac37af/diego_ma"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2c5c24c789f4b4c9ab6bcc2678fac37af/diego_ma"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><owl:sameAs rdf:resource="http://citeseer.nj.nec.com/florescu98database.html"/><swrc:date>Fri Dec 14 02:38:52 CET 2007</swrc:date><swrc:journal>ACM SIGMOD Record</swrc:journal><swrc:number>3</swrc:number><swrc:title>Database Techniques for the World Wide Web: A Survey</swrc:title><swrc:volume>27</swrc:volume><swrc:year>1998</swrc:year><swrc:keywords>web_data_extraction </swrc:keywords><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Daniela Florescu"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Alon Levy"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Alberto Mendelzon"/></rdf:_3></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/28c6aec5e528a18d01427e6a601e2dd26/diego_ma"><title>A Scalable Comparison-Shopping Agent for the World-Wide Web</title><link>http://www.bibsonomy.org/bibtex/28c6aec5e528a18d01427e6a601e2dd26/diego_ma</link><dc:creator>diego_ma</dc:creator><dc:date>2007-12-14T02:38:16+01:00</dc:date><dc:subject>web_data_extraction </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Robert B. &lt;a href=&#034;http://www.bibsonomy.org/author/Doorenbos&#034;&gt;Doorenbos&lt;/a&gt;  and Oren &lt;a href=&#034;http://www.bibsonomy.org/author/Etzioni&#034;&gt;Etzioni&lt;/a&gt;  and Daniel S. &lt;a href=&#034;http://www.bibsonomy.org/author/Weld&#034;&gt;Weld&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;UW-CSE-96-01-03. &lt;/em&gt;&lt;em&gt;Department of Computer Science and engineering, University of Washington, &lt;/em&gt;(&lt;em&gt;1996&lt;/em&gt;)</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/web_data_extraction"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/28c6aec5e528a18d01427e6a601e2dd26/diego_ma"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/28c6aec5e528a18d01427e6a601e2dd26/diego_ma"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#TechnicalReport"/><owl:sameAs rdf:resource="http://citeseer.nj.nec.com/doorenbos97scalable.html"/><swrc:date>Fri Dec 14 02:38:16 CET 2007</swrc:date><swrc:institution><swrc:Organization swrc:name="Department of Computer Science and engineering, University of Washington"/></swrc:institution><swrc:number>UW-CSE-96-01-03</swrc:number><swrc:title>A Scalable Comparison-Shopping Agent for the World-Wide Web</swrc:title><swrc:year>1996</swrc:year><swrc:keywords>web_data_extraction </swrc:keywords><swrc:abstract>The Web is less agent-friendly than we might hope. Most information on the Web is presented in loosely structured natural language text with no agent-readable semantics. HTML annotations structure the display of Web pages, but provide virtually no insight into their content. Thus, the designers of intelligent Web agents need to address the following questions: (1) To what extent can an agent understand information published at Web sites? (2) Is the agent&#039;s understanding sufficient to provide genuinely useful assistance to users? (3) Is site-specific hand-coding necessary, or can the agent automatically extract information from unfamiliar Web sites? (4) What aspects of the Web facilitate...</swrc:abstract><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Robert B. Doorenbos"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Oren Etzioni"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Daniel S. Weld"/></rdf:_3></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/279604d45adda39d78379de3eb1af3f40/diego_ma"><title>Learning to Extract Symbolic Knowledge from the World Wide Web</title><link>http://www.bibsonomy.org/bibtex/279604d45adda39d78379de3eb1af3f40/diego_ma</link><dc:creator>diego_ma</dc:creator><dc:date>2007-12-14T02:37:51+01:00</dc:date><dc:subject>web_data_extraction </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Mark &lt;a href=&#034;http://www.bibsonomy.org/author/Craven&#034;&gt;Craven&lt;/a&gt;  and Dan &lt;a href=&#034;http://www.bibsonomy.org/author/DiPasquo&#034;&gt;DiPasquo&lt;/a&gt;  and Dayne &lt;a href=&#034;http://www.bibsonomy.org/author/Freitag&#034;&gt;Freitag&lt;/a&gt;  and Andrew &lt;a href=&#034;http://www.bibsonomy.org/author/McCallum&#034;&gt;McCallum&lt;/a&gt;  and Tom &lt;a href=&#034;http://www.bibsonomy.org/author/Mitchell&#034;&gt;Mitchell&lt;/a&gt;  and Kamal &lt;a href=&#034;http://www.bibsonomy.org/author/Nigam&#034;&gt;Nigam&lt;/a&gt;  and Se&#039;an &lt;a href=&#034;http://www.bibsonomy.org/author/Slattery&#034;&gt;Slattery&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Proc. AAAI-98, &lt;/em&gt;(&lt;em&gt;1998&lt;/em&gt;)</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/web_data_extraction"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/279604d45adda39d78379de3eb1af3f40/diego_ma"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/279604d45adda39d78379de3eb1af3f40/diego_ma"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#InProceedings"/><owl:sameAs rdf:resource="http://citeseer.nj.nec.com/9546.html"/><swrc:date>Fri Dec 14 02:37:51 CET 2007</swrc:date><swrc:booktitle>Proc. AAAI-98</swrc:booktitle><swrc:title>Learning to Extract Symbolic Knowledge from the World Wide Web</swrc:title><swrc:year>1998</swrc:year><swrc:keywords>web_data_extraction </swrc:keywords><swrc:abstract>The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would enable much more effective retrieval of Web information, and promote new uses of the Web to support knowledge-based inference and problem solving. Our approach is to develop a trainable information extraction system that takes two inputs: and ontology defining the classes and relations of interest, and a set of training data consisting of labeled regions of hypertext representing instances of these classes and relations...</swrc:abstract><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Mark Craven"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Dan DiPasquo"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Dayne Freitag"/></rdf:_3><rdf:_4><swrc:Person swrc:name="Andrew McCallum"/></rdf:_4><rdf:_5><swrc:Person swrc:name="Tom Mitchell"/></rdf:_5><rdf:_6><swrc:Person swrc:name="Kamal Nigam"/></rdf:_6><rdf:_7><swrc:Person swrc:name="Se\&#039;{a}n Slattery"/></rdf:_7></rdf:Seq></swrc:author></rdf:Description></burst:publication></item><item rdf:about="http://www.bibsonomy.org/bibtex/2edb8b670fb4dc8cdcebbbce2110149e0/diego_ma"><title>Learning to Construct Knowledge Bases from the World Wide Web</title><link>http://www.bibsonomy.org/bibtex/2edb8b670fb4dc8cdcebbbce2110149e0/diego_ma</link><dc:creator>diego_ma</dc:creator><dc:date>2007-12-14T02:37:50+01:00</dc:date><dc:subject>web_data_extraction </dc:subject><content:encoded>&lt;span style=&#034;color:#555555;&#034;&gt;Mark &lt;a href=&#034;http://www.bibsonomy.org/author/Craven&#034;&gt;Craven&lt;/a&gt;  and Dan &lt;a href=&#034;http://www.bibsonomy.org/author/DiPasquo&#034;&gt;DiPasquo&lt;/a&gt;  and Dayne &lt;a href=&#034;http://www.bibsonomy.org/author/Freitag&#034;&gt;Freitag&lt;/a&gt;  and Andrew &lt;a href=&#034;http://www.bibsonomy.org/author/McCallum&#034;&gt;McCallum&lt;/a&gt;  and Tom &lt;a href=&#034;http://www.bibsonomy.org/author/Mitchell&#034;&gt;Mitchell&lt;/a&gt;  and Kamal &lt;a href=&#034;http://www.bibsonomy.org/author/Nigam&#034;&gt;Nigam&lt;/a&gt;  and Se&#039;an &lt;a href=&#034;http://www.bibsonomy.org/author/Slattery&#034;&gt;Slattery&lt;/a&gt;  &lt;/span&gt;&lt;em&gt;Artificial Intelligence&lt;/em&gt;(&lt;em&gt;2000&lt;/em&gt;)</content:encoded><taxo:topics><rdf:Bag><rdf:li rdf:resource="http://www.bibsonomy.org/tag/web_data_extraction"/></rdf:Bag></taxo:topics><burst:publication><rdf:Description rdf:about="http://www.bibsonomy.org/bibtex/2edb8b670fb4dc8cdcebbbce2110149e0/diego_ma"><owl:sameAs rdf:resource="http://www.bibsonomy.org/uri/bibtex/2edb8b670fb4dc8cdcebbbce2110149e0/diego_ma"/><rdf:type rdf:resource="http://swrc.ontoware.org/ontology#Article"/><owl:sameAs rdf:resource="http://citeseer.nj.nec.com/198786.html"/><swrc:date>Fri Dec 14 02:37:50 CET 2007</swrc:date><swrc:journal>Artificial Intelligence</swrc:journal><swrc:title>Learning to Construct Knowledge Bases from the World Wide Web</swrc:title><swrc:year>2000</swrc:year><swrc:keywords>web_data_extraction </swrc:keywords><swrc:abstract>The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would enable much more effective retrieval of Web information, and promote new uses of the Web to support knowledge-based inference and problem solving. Our approach is to develop a trainable information extraction system that takes two inputs. The first is an ontology that defines the classes (e.g., company, person, employee, product) and relations (e.g., employed by, produced by) of interest when creating the knowledge base...</swrc:abstract><swrc:author><rdf:Seq><rdf:_1><swrc:Person swrc:name="Mark Craven"/></rdf:_1><rdf:_2><swrc:Person swrc:name="Dan DiPasquo"/></rdf:_2><rdf:_3><swrc:Person swrc:name="Dayne Freitag"/></rdf:_3><rdf:_4><swrc:Person swrc:name="Andrew McCallum"/></rdf:_4><rdf:_5><swrc:Person swrc:name="Tom Mitchell"/></rdf:_5><rdf:_6><swrc:Person swrc:name="Kamal Nigam"/></rdf:_6><rdf:_7><swrc:Person swrc:name="Se\&#039;{a}n Slattery"/></rdf:_7></rdf:Seq></swrc:author></rdf:Description></burst:publication></item></rdf:RDF>