Developing a mailbot script
Address Catcher

© Lead Image © Konstantin Inozemtcev, 123RF.com
A Python script that captures email addresses will help you understand how bots analyze and extract data from the web.
Bots crawl around constantly on the Internet, capturing information from public websites for later processing. Although the science of bot design has become quite advanced, the basic steps for capturing data from an HTML page are quite simple. This article describes an example script that extracts email addresses. The script even provides the option to extend the search to the URLs found on the target page. Rolling your own bot will help you build a deeper understanding of privacy defense and cybersecurity.
Setting Up the Environment
I recommend setting up an integrated development environment, like Visual Studio (VS) Code for Python programming, and having a basic understanding of the language. You can download VS Code from the VS Code website [1]. On Ubuntu, an easy way to install the application is by downloading the .deb
package, right-clicking the file, and selecting the Install
option. Alternatively, you can search for "vscode" in the App Center and click the Install
button. If you prefer using the terminal, the VS Code website [2] provides detailed instructions for any Linux distribution. I also suggest adding Python development extensions, including Pylance and the Python Debugger.
The Script
The full text of the mailbot.py
script is available on the Linux Magazine website [3]. Listing 1 shows the beginning of the script where I import the modules I will need to manage communications via the HTTP protocol, search for string patterns using regular expressions, implement asynchronous functions, manage script input arguments, and show a progress bar to track process advancement. The alive_progress
module is not part of the standard library, so I have to install it with the following command:
[...]
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News
-
Red Hat Releases RHEL 10 Early
Red Hat quietly rolled out the official release of RHEL 10.0 a bit early.
-
openSUSE Joins End of 10
openSUSE has decided to not only join the End of 10 movement but it also will no longer support the Deepin Desktop Environment.
-
New Version of Flatpak Released
Flatpak 1.16.1 is now available as the latest, stable version with various improvements.
-
IBM Announces Powerhouse Linux Server
IBM has unleashed a seriously powerful Linux server with the LinuxONE Emperor 5.
-
Plasma Ends LTS Releases
The KDE Plasma development team is doing away with the LTS releases for a good reason.
-
Arch Linux Available for Windows Subsystem for Linux
If you've ever wanted to use a rolling release distribution with WSL, now's your chance.
-
System76 Releases COSMIC Alpha 7
With scores of bug fixes and a really cool workspaces feature, COSMIC is looking to soon migrate from alpha to beta.
-
OpenMandriva Lx 6.0 Available for Installation
The latest release of OpenMandriva has arrived with a new kernel, an updated Plasma desktop, and a server edition.
-
TrueNAS 25.04 Arrives with Thousands of Changes
One of the most popular Linux-based NAS solutions has rolled out the latest edition, based on Ubuntu 25.04.
-
Fedora 42 Available with Two New Spins
The latest release from the Fedora Project includes the usual updates, a new kernel, an official KDE Plasma spin, and a new System76 spin.