|
|
|
![]() |
ScamSlam Project |
Unsolicited communications currently account for more than 65% of all sent e-mail with projections reaching the mid-80%s. While much “spam” is innocuous, a portion is engineered by criminals to prey upon, or “scam”, unsuspecting people. The senders of scam spam attempt to mask their messages as non-spam and con through a range of tactics, including pyramid schemes, securities fraud, and identity theft via phisher mechanisms (e.g. faux PayPal or AOL websites). To lessen the suspicion of fraudulent activities, scam messages sent by the same individual, or collaborating group, change the text of their messages and appear under a variety of pseudonyms with different stories. In this paper, we introduce ScamSlam, a software system designed to learn the number of different authors that exist for a particular type of scam, as well as to identify which scam spam messages were written by which author. The system consists of two main components; 1) a filtering mechanism to separate scam from general spam and non-spam messages, and 2) a message normalization and clustering technique to relate different scam messages to one another. We test ScamSlam on a corpus of approximately 500 scam messages communicating the “Nigerian” advance fee fraud and our results demonstrate that the filtering technique achieves over 99% accuracy. Furthermore, we discover that at least half of all scam messages are accounted for by 20 individuals.
Keywords: Spam, scam, Internet fraud, e-mail filtering, text analysis, text classification, poisson classification models, single linkage clustering, information retrieval, semantic learning
Related Publications
Tell me more about: