Aggregating data with Portia
Itsy, Bitsy Spider

Are you interested in retrieving stock quotes in machine-readable form off the Internet? No problem: After a few mouse clicks, Portia weaves a command line and wraps the data in JSON format.
The Internet is a treasure trove of useful information, often residing on colorful HTML pages that are not easily extracted and processed. If you want to automate processing of current stock quotes or aggregate news, for example, you need to dismantle the HTML code of news portals such as CNN or Slashdot. This can be pretty ugly work.
Portia, a tool written in Python [1], promises a remedy; its name also refers to a genus of spiders, which would seem to make sense on the World Wide Web. The tool consists of a web application that, with a simple click, allows a user to select stock quotes, messages, and any other desired content. Portia then extracts this data and outputs it in JSON format.
Supported by a supplied web crawler, Portia can also ransack complete websites. As an example, if you need the headings from all Wikipedia articles, you show Portia exactly once where the headline resides on a Wikipedia page. The crawler then traverses the entire website and returns all matching headings in JSON format (see the "Warning" box for more information).
[...]
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News
-
Linux Kernel 6.17 is Available
Linus Torvalds has announced that the latest kernel has been released with plenty of core improvements and even more hardware support.
-
Kali Linux 2025.3 Released with New Hacking Tools
If you're a Kali Linux fan, you'll be glad to know that the third release of this famous pen-testing distribution is now available with updates for key components.
-
Zorin OS 18 Beta Available for Testing
The latest release from the team behind Zorin OS is ready for public testing, and it includes plenty of improvements to make it more powerful, user-friendly, and productive.
-
Fedora Linux 43 Beta Now Available for Testing
Fedora Linux 43 Beta ships with Gnome 49 and KDE Plasma 6.4 (and other goodies).
-
USB4 Maintainer Leaves Intel
Michael Jamet, one of the primary maintainers of USB4 and Thunderbolt drivers, has left Intel, leaving a gaping hole for the Linux community to deal with.
-
Budgie 10.9.3 Now Available
The latest version of this elegant and configurable Linux desktop aligns with changes in Gnome 49.
-
KDE Linux Alpha Available for Daring Users
It's official, KDE Linux has arrived, but it's not quite ready for prime time.
-
AMD Initiates Graphics Driver Updates for Linux Kernel 6.18
This new AMD update focuses on power management, display handling, and hardware support for Radeon GPUs.
-
AerynOS Alpha Release Available
With a choice of several desktop environments, AerynOS 2025.08 is almost ready to be your next operating system.
-
AUR Repository Still Under DDoS Attack
Arch User Repository continues to be under a DDoS attack that has been going on for more than two weeks.