Inproceedings,

Utilizing Web Search Engines for Program Analysis

, and .
18th Int'l Conf. on Program Comprehension, page 94 -103. IEEE, (June 2010)
DOI: 10.1109/ICPC.2010.26

Abstract

Programming involves representing domain concepts by using programming abstractions. In object-oriented programs, concepts and relations of the business domain are represented as classes, attributes and methods. However, the concepts and relations that logically belong together are scattered across different modules, interleaved with technical concepts, and distorted due to implementation details. In this paper, we present an automatic method to identify logically related concepts and the relations among them. To achieve this, we systematically transform program identifiers into fragments of natural language sentences and check whether these sentence fragments are meaningful for humans. In order to automatically perform such checks, we use the World Wide Web as a knowledge base that contains a huge number of meaningful texts, and use the Google web search engine to validate the meaningfulness of these sentences. If the search engine returns a sufficient number of hits, we discovered a piece of knowledge in the code. By systematically applying this method, we obtain a condensed form of the knowledge embodied in the program which is an enabler for automatic analyses. We present our experience with several use-cases: (1) assessing the meaningfulness of identifiers, (2) extracting complex concepts from compound identifiers, (3) extracting a meaningful taxonomy from the class hierarchy, and (4) extracting complex conceptual relations from the code. We report on our observations during the analysis of real world Java code, discuss the limitations of our approach and sketch extension possibilities.

Tags

Users

  • @sjbutler

Comments and Reviews