Abstract
The explosive growth of World Wide Web resulted in the largest Knowledge base ever developed and made available to the public. These documents are typically formatted for human viewing (HTML) and vary widely from document to document. So we can’t construct a global schema, discovery of rules from it is complex and tedious process. Most of the existing system uses hand coded wrappers to extract information, which is monotonous and time consuming. Learning grammatical information from given set of Web pages (HTML) has attracted lots of attention in the past decades. In this paper I proposed a method of learning Context-free grammar rules from HTML documents using probabilities association of HTML tags.
Users
Please
log in to take part in the discussion (add own reviews or comments).