Detecting spam users automatically with a neural network
Spam Stopper
Build a neural network that uncovers spam websites.
Website builders – online hosting services that provide tools for nontechnical users to build their own websites – are frequently exploited by spammers looking for a convenient launching pad. Checking thousands, or sometimes millions, of web pages manually to look for evidence of a spammer is both tedious and inefficient.
In this article, I show how to build a suitable spamsearching neural network with help from Google's TensorFlow machine learning library [2] [3] and TFLearn [4], a library with a highlevel API for TensorFlow. Even if you don't spend your days searching for spammers, the techniques described in this article will give you some insights on how to harness the power of neural networks for other complex problems.
Training Day
The neural network needs both positive and negative samples in order to learn. This solution starts with a manually compiled list of sample users divided into spammers and legitimate users, taking care to distribute both types in equal numbers. Alongside this classification (spammer or not spammer), the data set contained the user's name or the website that belongs to the user, the IP address with which the site is registered, and the language version associated with the site.
As a result of the solution described here, the neural network now automatically recognizes new spammers as they register. The next step is to combine this automatic check with a manual check. A Python script automatically blocks sites that the network classifies with a very high probability of being spam, and an employee manually checks the sites that are deemed high probability.
Sound Network
Neural networks are mathematical models that can approximate any function. A neural network is guided by networked neurons similar to those in the human brain, such as in the visual cortex. What makes these networks special is that you do not have to model their behavior explicitly; instead, you train the network using sample data.
Neural networks help out when it is difficult to model functions manually, and they are often used in image and speech recognition. You need to provide the neural network with training data that has already been classified, and it will then attempt to classify new data in a similar way.
A single artificial neuron comprises several weighted inputs and an activation function, which is usually nonlinear and helps to determine the output value of the neuron. There is also a threshold value or bias, which complements the weighted inputs, thus influencing the activation function. The mathematical formula behind this concept is as follows:
The formula uses the w
vector to weight the input vector x
and calculate the sum of both. It then adds the bias b
, using the activation function phi. When developers skillfully combine several neurons, they can compute more complex functions (see the box titled "Solving Problems with Neural Networks").
Solving Problems with Neural Networks
A single neuron can already solve linearly separable problems. The binary OR function is an example of such a linearly separable problem. If you enter the possible inputs in a coordinate system, the two output values can be separated with a straight line (the top right neuron in Figure 1).
Few problems, however, are so easy to solve. A single neuron is typically not enough to make a classification. Full networks composed of neurons are used in practice, because more complex challenges can largely be split into separable subproblems, which individual neurons can then solve.
Figure 1 shows the binary exclusive OR, which proves not to be linearly separable. A single line is not sufficient to separate the two ones from the zeros. As the propositional logic is aware, the XOR function consists of a combination of two conjunctions:
Both the two conjunctions and the disjunction can in turn be separated linearly. It is therefore possible to model the binary exclusive OR with three neurons, with one of them receiving the outputs of the two others. This combination of neurons forms a small, twolayered neural network.
As the small example demonstrates, deep learning experts can calculate complex functions with ease by combining multiple neurons. The strength of a neural network increases with the number of layers used. The layers allow experts to compute more functions.
Networked Learning
Several layers of interconnected neurons form a neural network (Figure 2). These layers consist of at least an input layer, which receives the input values, and an output layer, on which the data arrives after passing through several hidden layers. All the neurons on a particular layer generally use the same activation function.
Neural networks learn through an optimization process that determines the parameters of the network, the weightings of the connections, and the bias of all the neurons, then refines these values step by step. The process that determines parameters is one of the optimization problems. This process involves the use of many traditional numerical analysis methods (e.g., the gradient method [5]).
The script first initializes the network using random parameters. Next, the script applies the training data set to the neural network and determines the difference between the network's results and the correct results from the training data. The gap between these results is the loss, which the script attempts to minimize in the course of the optimization process.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Direct Download
Read full article as PDF:
Price $2.95
News

SUSE Spins off from Parent Company
While IBM has acquired Red Hat, SUSE goes solo.

Gnome 3.32 Released
New release of the Gnu desktop comes with many improvements.

VMware Rolls Out Essential PKS
Virtualization vendor brings commercial support to upstream Kubernetes.

Linux 5.0 Is Here
Linus says don't get excited, but the new release contains some significant updates.

Kali Linux 2019.1 Released
The favorite Linux distro of Mr. Robot gets the first update of 2019.

Linux Foundation Releases a New Draft of OpenChain Spec
OpenChain provides a standard for open source compliance throughout the software supply chain.

Linux Kernel Continues To Offer Mitigation for Spectre Mitigation
Kernel 4.19 has added another family of Spectre vulnerabilities to its list of mitigating the mitigation.

SpeakUp Trojan Targets Linux Servers
It’s exploiting a known vulnerability.

KDE Plasma 5.15 Beta Arrives
Major improvements to software management.

Canonical Announces Latest Ubuntu Core for IoT
Now offers 10 years of support.