One of the main tasks in the mail exchange process is filtering malicious and spam messages that do not require the user’s attention. Spam filtering algorithms are designed to detect such messages. How does this method work, and what are the most popular spam filtering algorithms — read below.

Data Preparation

Most of the traditional and cloud based spam filtering systems like the one offered on https://cleantalk.org/help/cleantalk-spam-firewall, use filtering algorithms to detect spam emails. For this purpose, they need to determine the key characteristics of the emails by which filtration is performed. They usually include:

  • duplicate words and phrases;
  • the total number of characters, punctuation marks, numbers, spaces, words;
  • the frequency of each character;
  • average word and sentence lengths;
  • a diversity;
  • the number of unique words.

These algorithms use a dictionary that contains words, symbols, and other text elements that are most typical for spam to get the necessary filtering characteristics. Based on these characteristics, it is possible to assess a text for belonging to spam, relying on different parameters that complement each other when making a decision.

The Choice of Algorithms

It is necessary to process the previously obtained characteristics with text and data classification algorithms to filter spam messages. There are classical algorithms and those based on artificial neural networks.

Classical algorithms are based on the use of methods of statistical data analysis and mathematical calculations. They include:

  • Naive Bayes classifier;
  • k-nearest neighbors’ method;
  • supporting vector’s method;
  • genetic algorithms.

Filtering algorithms based on artificial neural networks process the tasks within blocks. These algorithms include:

  • pattern recognition;
  • perceptron;
  • Kohonen’s neural network;
  • self-organizing Kohonen’s map.

Modern spam filters used by Google or Yahoo are complex systems, but the basic idea of their work is simple. You must first teach your PC to classify messages as spam and then apply the developed filter in practice.

First, letters are formalized according to some parameters, and for each specific letter, it is known whether it is spam or not. Then the computer is trained to divide letters by these parameters into two categories. The more carefully the parameters are selected, the better the result will be, and the more accurate the classifier will work.

Except for the analysis of letters, a period during which the client was active or inactive is used as a filter that determines if it is worth marking a letter as spam. For example, if over the past 90 days there has been some kind of activity (for example, opening letters), then a message is perceived as organic. As soon as 90 days have passed since the last activity, this process stops, and reactivation actions are performed. This period may take less or more time, but the idea remains the same.

Now you know how spam filtering works. So, do not hesitate to use the advanced anti-spam systems and share your recommendations with us.

Leave a Reply