We investigate the statistical filtering
of phishing emails, where a classifier is
trained on characteristic features of existing
emails and subsequently is able to identify
new phishing emails with different contents.
We propose advanced email features generated
by adaptively trained Dynamic Markov
Chains and by novel latent Class-Topic Models.
On a publicly available test corpus classifiers
using these features are able to reduce
the number of misclassified emails by two
thirds compared to previous work. Using a
recently proposed more expressive evaluation
method we show that these results are statistically
significant. In addition we successfully
tested our approach on a non-public email
corpus with a real-life composition.
We have developed a systems that enables
the detection of certain common salting
tricks that are employed by criminals. Salting
is the intentional addition or distortion of
content. In this paper we describe a framework
to identify email messages that might
contain new, previously unseen tricks. To
this end, we compare the simulated perceived
email message text generated by our hidden
salting simulation system to the OCRed
text we obtain from the rendered email message.
We present robust text comparison
techniques and train a classifier based on the
differences of these two texts. In simulations
we show that we can detect suspicious emails
with a high level of accuracy.