Detecting spam users automatically with a neural network
Future
The method described in this article has some limitations. Although a neural network can come close to any complex function, it may be the case that the optimization processes do not produce the optimum solution. In this case, the network only achieves a low accuracy level.
A further potential problem is caused by unbalanced or contradictory training data, which, for instance, might quite accidentally involve only the spammers having hyphens in their names. There is also the previously mentioned risk in large networks of overfitting, where the network learns the training data by heart but doesn't gain the ability to evaluate new, unknown data.
Despite these limitations, you can check far more pages than before using the method described in this article, because the neural network pre-sorts potential spammers. If additional spammers are found manually, you can feed them into the network later in the form of training data.
Infos
- All listings for the article: http://www.linux-magazin.de/static/listings/magazin/2016/12/machine_learning/
- TensorFlow: https://www.tensorflow.org
- TensorFlow: Large-scale machine learning on heterogeneous systems? (2015): http://download.tensorflow.org/paper/whitepaper2015.pdf
- TFLearn: http://tflearn.org
- Bengio, Yoshua, Practical recommendations for gradient-based training of deep architectures. In G. Montavon, G.B. Orr, and K.-R. Müller (eds.), Neural Networks: Tricks of the Trade, 2nd ed. Springer-Verlag, 2012, pp. 437-478
- Overfitting: https://www.ibm.com/developerworks/community/blogs/jfp/entry/Overfitting_In_Machine_Learning
- Installing TensorFlow: https://www.tensorflow.org/versions/r0.10/get_started/os_setup.html#pip-installation
- Installing TFLearn: http://tflearn.org/installation/
« Previous 1 2 3 4
Buy this article as PDF
(incl. VAT)