Categories are pages that are used to group other pages on similar subjects together. This is done to help users find the pages they are looking for, even if they do not know whether it exists or what it is called.
Every page should belong to at least one category. A page may often be in several categories. However, putting a page in too many categories may not be useful.
Wikipedia is a terrific knowledge resource, and many recent studies in artificial intelligence, information retrieval and related fields used Wikipedia to endow computers with (some) human knowledge. Wikipedia dumps are publicly available in XML format, but they have a few shortcomings. First, they contain a lot of information that is often not used when Wikipedia texts are used as knowledge (e.g., ids of users who changed each article, timestamps of article modifications). On the other hand, the XML dumps do not contain a lot of useful information that could be inferred from the dump, such as link tables, category hierarchy, resolution of redirection links etc.
C. Karande, K. Chellapilla, and R. Andersen. WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, page 272--281. New York, NY, USA, ACM, (2009)