This is an abstractive summarization demo program. It was mainly used to summarize opinions, but since it does not rely on any domain information, it can be used to summarize any highly redundant text.
The web can be represented by a graph with special regions: SCC, IN, OUT and TENDRILS.
Regions are defined by the link-path-reach from one website to others.
The linkage to and from a website (in- and out-degree) seems to conform the power law, which is also mentioned in this document.
Graph mining refers to extracting knowledge from massive graphs. The data sets of telephone calls we see at AT&T can be viewed as a single graph, with several hundred million phone numbers as nodes, and calls between phone numbers as edges. It is a giant social network, like an internet connections graph or a rich citation network.
This dissertations presents an algorithm on the webgraph for finding dense bipartite graphs wich represents web-communities.
By performing further steps of the algorithm several levels of communities are recognized which can be related to communites of former levels.
M. Zaki. Proceedings of the 18th International Conference on Conceptual Structures (ICCS 2010), volume 6208 of Lecture Notes in Computer Science, page 13. Springer, (2010)
L. Akoglu, D. Chau, U. Kang, D. Koutra, and C. Faloutsos. Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, page 717--720. New York, NY, USA, ACM, (2012)