In this article, we will explore how we can use Llama2 for Topic Modeling without the need to pass every single document to the model. Instead, we will leverage BERTopic, a modular topic modeling technique that can use any LLM for fine-tuning topic representations.
I'm not sure what these vectors are, since BERT does not generate meaningful sentence vectors. It seems that this is is doing average pooling over the word tokens to get a sentence vector, but we never suggested that this will generate meaningful sentence representations. And even if they are decent representations when fed into a DNN trained for a downstream task, it doesn't mean that they will be meaningful in terms of cosine distance. (Since cosine distance is a linear space where all dimensions are weighted equally).
Working group for social relation and opinion mining
Research Group Social Computing
Department of Informatics
Technical University of Munich (TUM)
http://www.social.in.tum.de/ghagerer