Detecting spam users automatically with a neural network
Spam Stopper
Build a neural network that uncovers spam websites.
Website builders – online hosting services that provide tools for nontechnical users to build their own websites – are frequently exploited by spammers looking for a convenient launching pad. Checking thousands, or sometimes millions, of web pages manually to look for evidence of a spammer is both tedious and inefficient.
In this article, I show how to build a suitable spamsearching neural network with help from Google's TensorFlow machine learning library [2] [3] and TFLearn [4], a library with a highlevel API for TensorFlow. Even if you don't spend your days searching for spammers, the techniques described in this article will give you some insights on how to harness the power of neural networks for other complex problems.
Training Day
The neural network needs both positive and negative samples in order to learn. This solution starts with a manually compiled list of sample users divided into spammers and legitimate users, taking care to distribute both types in equal numbers. Alongside this classification (spammer or not spammer), the data set contained the user's name or the website that belongs to the user, the IP address with which the site is registered, and the language version associated with the site.
As a result of the solution described here, the neural network now automatically recognizes new spammers as they register. The next step is to combine this automatic check with a manual check. A Python script automatically blocks sites that the network classifies with a very high probability of being spam, and an employee manually checks the sites that are deemed high probability.
Sound Network
Neural networks are mathematical models that can approximate any function. A neural network is guided by networked neurons similar to those in the human brain, such as in the visual cortex. What makes these networks special is that you do not have to model their behavior explicitly; instead, you train the network using sample data.
Neural networks help out when it is difficult to model functions manually, and they are often used in image and speech recognition. You need to provide the neural network with training data that has already been classified, and it will then attempt to classify new data in a similar way.
A single artificial neuron comprises several weighted inputs and an activation function, which is usually nonlinear and helps to determine the output value of the neuron. There is also a threshold value or bias, which complements the weighted inputs, thus influencing the activation function. The mathematical formula behind this concept is as follows:
The formula uses the w
vector to weight the input vector x
and calculate the sum of both. It then adds the bias b
, using the activation function phi. When developers skillfully combine several neurons, they can compute more complex functions (see the box titled "Solving Problems with Neural Networks").
Solving Problems with Neural Networks
A single neuron can already solve linearly separable problems. The binary OR function is an example of such a linearly separable problem. If you enter the possible inputs in a coordinate system, the two output values can be separated with a straight line (the top right neuron in Figure 1).
Few problems, however, are so easy to solve. A single neuron is typically not enough to make a classification. Full networks composed of neurons are used in practice, because more complex challenges can largely be split into separable subproblems, which individual neurons can then solve.
Figure 1 shows the binary exclusive OR, which proves not to be linearly separable. A single line is not sufficient to separate the two ones from the zeros. As the propositional logic is aware, the XOR function consists of a combination of two conjunctions:
Both the two conjunctions and the disjunction can in turn be separated linearly. It is therefore possible to model the binary exclusive OR with three neurons, with one of them receiving the outputs of the two others. This combination of neurons forms a small, twolayered neural network.
As the small example demonstrates, deep learning experts can calculate complex functions with ease by combining multiple neurons. The strength of a neural network increases with the number of layers used. The layers allow experts to compute more functions.
Networked Learning
Several layers of interconnected neurons form a neural network (Figure 2). These layers consist of at least an input layer, which receives the input values, and an output layer, on which the data arrives after passing through several hidden layers. All the neurons on a particular layer generally use the same activation function.
Neural networks learn through an optimization process that determines the parameters of the network, the weightings of the connections, and the bias of all the neurons, then refines these values step by step. The process that determines parameters is one of the optimization problems. This process involves the use of many traditional numerical analysis methods (e.g., the gradient method [5]).
The script first initializes the network using random parameters. Next, the script applies the training data set to the neural network and determines the difference between the network's results and the correct results from the training data. The gap between these results is the loss, which the script attempts to minimize in the course of the optimization process.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.
News

The GNU Project Celebrates Its 40th Birthday
September 27 marks the 40th anniversary of the GNU Project, and it was celebrated with a hacker meeting in Biel/Bienne, Switzerland.

Linux Kernel Reducing LongTerm Support
LTS support for the Linux kernel is about to undergo some serious changes that will have a considerable impact on the future.

Fedora 39 Beta Now Available for Testing
For fans and users of Fedora Linux, the first beta of release 39 is now available, which is a minor upgrade but does include GNOME 45.

Fedora Linux 40 to Drop X11 for KDE Plasma
When Fedora 40 arrives in 2024, there will be a few big changes coming, especially for the KDE Plasma option.

RealTime Ubuntu Available in AWS Marketplace
Anyone looking for a Linux distribution for realtime processing could do a whole lot worse than RealTime Ubuntu.

KSMBD Finally Reaches a Stable State
For those who've been looking forward to the first release of KSMBD, after two years it's no longer considered experimental.

Nitrux 3.0.0 Has Been Released
The latest version of Nitrux brings plenty of innovation and fresh apps to the table.

Linux From Scratch 12.0 Now Available
If you're looking to roll your own Linux distribution, the latest version of Linux From Scratch is now available with plenty of updates.

Linux Kernel 6.5 Has Been Released
The newest Linux kernel, version 6.5, now includes initial support for two very exciting features.

UbuntuDDE 23.04 Now Available
A new version of the UbuntuDDE remix has finally arrived with all the updates from the Deepin desktop and everything that comes with the Ubuntu 23.04 base.