Developing a mailbot script
Address Catcher

© Lead Image © Konstantin Inozemtcev, 123RF.com
A Python script that captures email addresses will help you understand how bots analyze and extract data from the web.
Bots crawl around constantly on the Internet, capturing information from public websites for later processing. Although the science of bot design has become quite advanced, the basic steps for capturing data from an HTML page are quite simple. This article describes an example script that extracts email addresses. The script even provides the option to extend the search to the URLs found on the target page. Rolling your own bot will help you build a deeper understanding of privacy defense and cybersecurity.
Setting Up the Environment
I recommend setting up an integrated development environment, like Visual Studio (VS) Code for Python programming, and having a basic understanding of the language. You can download VS Code from the VS Code website [1]. On Ubuntu, an easy way to install the application is by downloading the .deb
package, right-clicking the file, and selecting the Install
option. Alternatively, you can search for "vscode" in the App Center and click the Install
button. If you prefer using the terminal, the VS Code website [2] provides detailed instructions for any Linux distribution. I also suggest adding Python development extensions, including Pylance and the Python Debugger.
The Script
The full text of the mailbot.py
script is available on the Linux Magazine website [3]. Listing 1 shows the beginning of the script where I import the modules I will need to manage communications via the HTTP protocol, search for string patterns using regular expressions, implement asynchronous functions, manage script input arguments, and show a progress bar to track process advancement. The alive_progress
module is not part of the standard library, so I have to install it with the following command:
[...]
Buy this article as PDF
(incl. VAT)