Aggregating data with Portia

Itsy, Bitsy Spider

Article from Issue 169/2014

Author(s): Tim Schürmann

Are you interested in retrieving stock quotes in machine-readable form off the Internet? No problem: After a few mouse clicks, Portia weaves a command line and wraps the data in JSON format.

The Internet is a treasure trove of useful information, often residing on colorful HTML pages that are not easily extracted and processed. If you want to automate processing of current stock quotes or aggregate news, for example, you need to dismantle the HTML code of news portals such as CNN or Slashdot. This can be pretty ugly work.

Portia, a tool written in Python [1], promises a remedy; its name also refers to a genus of spiders, which would seem to make sense on the World Wide Web. The tool consists of a web application that, with a simple click, allows a user to select stock quotes, messages, and any other desired content. Portia then extracts this data and outputs it in JSON format.

Supported by a supplied web crawler, Portia can also ransack complete websites. As an example, if you need the headings from all Wikipedia articles, you show Portia exactly once where the headline resides on a Wikipedia page. The crawler then traverses the entire website and returns all matching headings in JSON format (see the "Warning" box for more information).

[...]

Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF

Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES

Print Issues

Digital Issues

SUBSCRIPTIONS

Print Subs

Digisubs

TABLET & SMARTPHONE APPS

US / Canada

UK / Australia

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

Linux Kernel 6.16 Reaches EOL

Kernel , Linux

Linux kernel 6.16 has reached its end of life, which means you'll need to upgrade to the next stable release, Linux kernel 6.17.
Amazon Ditches Android for a Linux-Based OS

Linux , Operating Systems , Tools

Amazon has migrated from Android to the Linux-based Vega OS for its Fire TV.
Cairo Dock 3.6 Now Available for More Compositors

Desktop , graphics , Linux

If you're a fan of third-party desktop docks, then the latest release of Cairo Dock with Wayland support is for you.
System76 Unleashes Pop!_OS 24.04 Beta

COSMIC , Operating Systems , Pop!_OS

System76's first beta of Pop!_OS 24.04 is an impressive feat.
Linux Kernel 6.17 is Available

Games , Kernel , Linux

Linus Torvalds has announced that the latest kernel has been released with plenty of core improvements and even more hardware support.
Kali Linux 2025.3 Released with New Hacking Tools

Kali Linux , Linux , Operating Systems

If you're a Kali Linux fan, you'll be glad to know that the third release of this famous pen-testing distribution is now available with updates for key components.
Zorin OS 18 Beta Available for Testing

Linux , Operating Systems , Zorin OS

The latest release from the team behind Zorin OS is ready for public testing, and it includes plenty of improvements to make it more powerful, user-friendly, and productive.
Fedora Linux 43 Beta Now Available for Testing

Fedora , Gnome , Plasma

Fedora Linux 43 Beta ships with Gnome 49 and KDE Plasma 6.4 (and other goodies).
USB4 Maintainer Leaves Intel

Community , Kernel , Linux

Michael Jamet, one of the primary maintainers of USB4 and Thunderbolt drivers, has left Intel, leaving a gaping hole for the Linux community to deal with.
Budgie 10.9.3 Now Available

Budgie , Desktop , Linux

The latest version of this elegant and configurable Linux desktop aligns with changes in Gnome 49.

Aggregating data with Portia

Itsy, Bitsy Spider

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

News

Linux Kernel 6.16 Reaches EOL

Amazon Ditches Android for a Linux-Based OS

Cairo Dock 3.6 Now Available for More Compositors

System76 Unleashes Pop!_OS 24.04 Beta

Linux Kernel 6.17 is Available

Kali Linux 2025.3 Released with New Hacking Tools

Zorin OS 18 Beta Available for Testing

Fedora Linux 43 Beta Now Available for Testing

USB4 Maintainer Leaves Intel

Budgie 10.9.3 Now Available

Aggregating data with Portia

Itsy, Bitsy Spider

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters