
The rapid growth of online investing and virtual investing-related communities (VICs) has a wide-raging impact on research, practice and policy. Given the enormous volume of postings on VICs, automated classification of messages to extract relevance is critical. Classification is complicated by three factors: (a) the amount of irrelevant messages or "noise" messages (e.g., spam, insults), (b) the highly unstructured nature of the text (e.g., abbreviations), and finally, and (c) the wide variation in relevancy for a given firm. We develop and validate an approach based on a variety of classifiers to identify: (1)"noisy" messages that bear no relevance to the topic, (2) messages containing no sentiment about the investment, but are relevant to the topic, and (3) messages containing sentiment and are relevant. Preliminary results show sufficient promise to classify messages.

Links and resources
