Inproceedings,

Recovering Structure from Unstructured Webaccessible Rental Classifieds

, , and .
Proceedings of the 4th Australian Document Computing Symposium (ADCS’00), DSTC Pty Ltd, page 23--28. (2000)

Abstract

This paper describes a research prototype system called RFCA for structuring Web-accessible rental classified advertisements based on semantic content. A hand crafted parser is used to extract various facets of the rental property being advertised including amongst others; number of room, type of garage, dwelling type (unit, house, or high rise apartment), price and contact details. The performance of the parser is measured in terms precision and recall by comparing its output to that of human expert. The structured information once extracted is stored in a relational database and users searching for rental properties are presented with a graphical organisation of rental properties according to predefined themes. The overall result is a suite of tools for extracting, cleaning, structuring, and visually querying/browsing collection of web-accessible rental advertisements. The mathematical and methodological foundation for the graphical organisation of the structured information is provided by formal concept analysis. Using formal concept analysis each property is understood to be an object possessing attributes with attribute values. The data is then conceptually organised via concept lattices dynamically according to pre-defined conceptual scales. The concept lattice organises rental properties into conceptual groupings. The user then has the opportunity to view the attributes of all properties in a grouping as well as navigate back to the source advertisements. The interface is delivered over the web using a CGI interface and dynamic creation of image and image maps. The ideas presented are general enough to be relevant to other web-accessible unstructured text sources.

Tags

Users

  • @jamesh

Comments and Reviews