ScamSlam Project


An Architecture for Learning the Criminal Relations Behind Scam Spam

Unsolicited communications currently account for more than 65% of all sent e-mail with projections reaching the mid-80%s. While much “spam” is innocuous, a portion is engineered by criminals to prey upon, or “scam”, unsuspecting people. The senders of scam spam attempt to mask their messages as non-spam and con through a range of tactics, including pyramid schemes, securities fraud, and identity theft via phisher mechanisms (e.g. faux PayPal or AOL websites). To lessen the suspicion of fraudulent activities, scam messages sent by the same individual, or collaborating group, change the text of their messages and appear under a variety of pseudonyms with different stories. In this paper, we introduce ScamSlam, a software system designed to learn the number of different authors that exist for a particular type of scam, as well as to identify which scam spam messages were written by which author. The system consists of two main components; 1) a filtering mechanism to separate scam from general spam and non-spam messages, and 2) a message normalization and clustering technique to relate different scam messages to one another. We test ScamSlam on a corpus of approximately 500 scam messages communicating the “Nigerian” advance fee fraud and our results demonstrate that the filtering technique achieves over 99% accuracy. Furthermore, we discover that at least half of all scam messages are accounted for by 20 individuals.

Keywords: Spam, scam, Internet fraud, e-mail filtering, text analysis, text classification, poisson classification models, single linkage clustering, information retrieval, semantic learning

Related Publications

  • E. Airoldi, B. Malin and L. Sweeney. Technologies to Defeat Fraudulent Schemes Related to Email Requests. AAAI Spring Symposium on AI for Homeland Security, 2005. (PDF)

  • E. Airoldi and B. Malin. Data mining challenges for electronic safety: the case of fraudulent intent detection in e-mails.   In Proceedings of the Workshop on Privacy and Security Aspects of Data Mining, in conjunction with the IEEE International Conference on Data Mining.   Brighton, England: November 2004.   (PDF)

  • E. Airoldi and B. Malin. ScamSlam: An Architecture for Learning the Criminal Relations Behind Scam Spam.   Carnegie Mellon University, School of Computer Science, Technical Report CMU-ISRI-04-121.   Pittsburgh: May 2004.   (PDF) (PS)



Tell me more about:


Fall 2005 Data Privacy Lab