Ssscrape 1.0 Collects Dynamic Web Data
The Ssscrape tool screen-scrapes data from RSS and Atom feeds, blogs and podcasts. The open source software is now available in version 1.0.
Ssscrape tracks feeds and other collections for similar elements on updates, and downloads and cleans content by converting HTML to plain text. The database used is MySQL. The tool can also gather statistics about feed activities and report errors. A scheduler takes care of the periodic checks and a monitor displays the running activities.
Known as a Web crawler, a program that scrapes together information off the Web, Ssscrape is short for Syndicated and Semi-Structured Content Retrieval and Processing Environment. The Web scraper is written in Python with Twisted used for network programming and the not always standards-based Beautiful Soup used for parsing HTML/XML content.
Ssscrape was developed in the Information and Language Processing Systems (ILPS) department of the University of Amsterdam and is under LGPLv3 licensing. Ssscrape 1.0 requires Python 2.4 and is available for download as a tarball from the project page.
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News
-
Go-Based Botnet Attacking IoT Devices
Using an SSH credential brute-force attack, the Go-based PumaBot is exploiting IoT devices everywhere.
-
Plasma 6.5 Promises Better Memory Optimization
With the stable Plasma 6.4 on the horizon, KDE has a few new tricks up its sleeve for Plasma 6.5.
-
KaOS 2025.05 Officially Qt5 Free
If you're a fan of independent Linux distributions, the team behind KaOS is proud to announce the latest iteration that includes kernel 6.14 and KDE's Plasma 6.3.5.
-
Linux Kernel 6.15 Now Available
The latest Linux kernel is now available with several new features/improvements and the usual bug fixes.
-
Microsoft Makes Surprising WSL Announcement
In a move that might surprise some users, Microsoft has made Windows Subsystem for Linux open source.
-
Red Hat Releases RHEL 10 Early
Red Hat quietly rolled out the official release of RHEL 10.0 a bit early.
-
openSUSE Joins End of 10
openSUSE has decided to not only join the End of 10 movement but it also will no longer support the Deepin Desktop Environment.
-
New Version of Flatpak Released
Flatpak 1.16.1 is now available as the latest, stable version with various improvements.
-
IBM Announces Powerhouse Linux Server
IBM has unleashed a seriously powerful Linux server with the LinuxONE Emperor 5.
-
Plasma Ends LTS Releases
The KDE Plasma development team is doing away with the LTS releases for a good reason.