The web can be represented by a graph with special regions: SCC, IN, OUT and TENDRILS.
Regions are defined by the link-path-reach from one website to others.
The linkage to and from a website (in- and out-degree) seems to conform the power law, which is also mentioned in this document.
The authors of this document define a community consisting of several web sites with a common subject.
They explain an algorithm for crawling the web for communities using a maximum flow algorithm of the web graph