Ssscrape 1.0 Collects Dynamic Web Data
The Ssscrape tool screen-scrapes data from RSS and Atom feeds, blogs and podcasts. The open source software is now available in version 1.0.
Ssscrape tracks feeds and other collections for similar elements on updates, and downloads and cleans content by converting HTML to plain text. The database used is MySQL. The tool can also gather statistics about feed activities and report errors. A scheduler takes care of the periodic checks and a monitor displays the running activities.
Known as a Web crawler, a program that scrapes together information off the Web, Ssscrape is short for Syndicated and Semi-Structured Content Retrieval and Processing Environment. The Web scraper is written in Python with Twisted used for network programming and the not always standards-based Beautiful Soup used for parsing HTML/XML content.
Ssscrape was developed in the Information and Language Processing Systems (ILPS) department of the University of Amsterdam and is under LGPLv3 licensing. Ssscrape 1.0 requires Python 2.4 and is available for download as a tarball from the project page.
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
    Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
 
	
News
- 
		    					    		    Fedora 43 Has Finally LandedThe Fedora Linux developers have announced their latest release, Fedora 43. 
- 
		    					    		    KDE Unleashes Plasma 6.5The Plasma 6.5 desktop environment is now available with new features, improvements, and the usual bug fixes. 
- 
		    					    		    Xubuntu Site Possibly HackedIt appears that the Xubuntu site was hacked and briefly served up a malicious ZIP file from its download page. 
- 
		    					    		    LMDE 7 Now AvailableLinux Mint Debian Edition, version 7, has been officially released and is based on upstream Debian. 
- 
		    					    		    Linux Kernel 6.16 Reaches EOLLinux kernel 6.16 has reached its end of life, which means you'll need to upgrade to the next stable release, Linux kernel 6.17. 
- 
		    					    		    Amazon Ditches Android for a Linux-Based OSAmazon has migrated from Android to the Linux-based Vega OS for its Fire TV. 
- 
		    					    		    Cairo Dock 3.6 Now Available for More CompositorsIf you're a fan of third-party desktop docks, then the latest release of Cairo Dock with Wayland support is for you. 
- 
		    					    		    System76 Unleashes Pop!_OS 24.04 BetaSystem76's first beta of Pop!_OS 24.04 is an impressive feat. 
- 
		    					    		    Linux Kernel 6.17 is AvailableLinus Torvalds has announced that the latest kernel has been released with plenty of core improvements and even more hardware support. 
- 
		    					    		    Kali Linux 2025.3 Released with New Hacking ToolsIf you're a Kali Linux fan, you'll be glad to know that the third release of this famous pen-testing distribution is now available with updates for key components. 



