The web can be represented by a graph with special regions: SCC, IN, OUT and TENDRILS.
Regions are defined by the link-path-reach from one website to others.
The linkage to and from a website (in- and out-degree) seems to conform the power law, which is also mentioned in this document.
The authors of this document define a community consisting of several web sites with a common subject.
They explain an algorithm for crawling the web for communities using a maximum flow algorithm of the web graph
Algorithms and methods on complete dense bipartite graph are presented in this document for emerging small communities - websites which mention a common subject. These methods allow finding communities, which would not be found by HITS or CLEVER algorithm.
This document deals Web-Graph-Mining where the nodes of the graph are hosts instead of web pages, which leads to a hostgraph.
The power-law for in- and outdegree of hosts is examined and a variant of the copy-model for creating the hostgraph is presented.
J. Schlötterer, C. Seifert, and M. Granitzer. Machine Learning and Knowledge Extraction, page 237--251. Cham, Springer International Publishing, (2017)
G. Styliaras, and S. Christodoulou. HT '09: Proceedings of the Twentieth ACM Conference on Hypertext and Hypermedia, New York, NY, USA, ACM, (July 2009)