We investigate the statistical filtering
of phishing emails, where a classifier is
trained on characteristic features of existing
emails and subsequently is able to identify
new phishing emails with different contents.
We propose advanced email features generated
by adaptively trained Dynamic Markov
Chains and by novel latent Class-Topic Models.
On a publicly available test corpus classifiers
using these features are able to reduce
the number of misclassified emails by two
thirds compared to previous work. Using a
recently proposed more expressive evaluation
method we show that these results are statistically
significant. In addition we successfully
tested our approach on a non-public email
corpus with a real-life composition.
Parallel or distributed mining,Cluster-based data mining algorithms and systems,Grid-based data mining,lgorithms and systems;Peer-to-Peer based data mining algorithms and systems;Data mining algorithms and systems based on parallel hardware platforms
Privacy is a micro concern, i.e. it refers to individual database records, while data mining tools want to learn macro rules that hold for a large fraction of the database. Techniques that publish data while preserving the right balance between individual