Multilingual programming for retrieving web pages
Functional Node.js
If you want to retrieve a URL in a snippet of JavaScript in the browser, or do something similar in Node.js code on the server side, for example, on an Amazon Lamda Server [2], you need to toggle your brain to functional programming mode. After all, event-based systems do not follow the paradigm of "Do this, wait until it is finished, then do that." Instead, they want to receive their instructions in the form of "Do this, then this, then this… and go."
The reason for this is the event loop, which can only perform short callbacks and then wants the control back. It then drops in again when the data slowly flutters in from external interfaces. This structure complicates the readability of your code and requires much experience in the design of software components so that they interact well and in an easily maintainable way.
The dreaded pyramid of doom [3], composed of nested callbacks, can be resolved by several helper constructs. Node 7.6 now even comes with support for the async
and await
keywords, which force asynchronous code into a synchronous straightjacket to make things look tidier [4].
Listing 4 shows a get
call of the HTTP module in Node.js. In addition to the URL for the web document, it expects a function. This is called later with a response object and defines a closure with a variable (content
) and three callbacks for the events data
, error
, and end
.
Listing 4
http-get.js
The data
event gets triggered whenever a bunch of data arrives from the server. It collects the data chunks one by one and reassembles them in the content
variable. The error
callback gets involved in case of an error and writes the reason to the log in Line 11. When the server signals the end of the transmission, the event loop jumps to the end
callback, which in line 15 outputs the content of content
, where all the body data in the HTTP response is now located. The Node.js http
library automatically follows redirects.
Good Old Perl
Good Old Perl traditionally retrieves web documents with the CPAN LWP::UserAgent module. SSL support is not automatic but gets magically added if the admin retroactively installs the CPAN LWP::Protocol::https module, which depends on the availability of an OpenSSL installation and a list of root certificates.
Listing 5 shows also a peculiarity as well as correct error handling: Like some other libraries presented here, it automatically follows redirects and identifies the encoding of google.de as ISO-8859-1
, but it returns a UTF-8 string from decoded_content()
(as opposed to content()
). That is a good thing, because processing the data in the program code often relies on UTF-8 and otherwise causes ugly-looking mangled text problems.
Listing 5
http-get.pl
To output a UTF-8 string as such without modification using print
, the script first needs to tell stdout to select on UTF-8 mode with the help of binmode
. This rather elaborate procedure is owed to compatibility reasons and at least ensures that old scripts from the early days of Perl's UTF-8 support don't freak out when they meet the new versions of Perl.
Yeah, old age is not a piece of cake, when all of your joints are aching and the young folks are turning somersaults!
Infos
- Listings for this article: ftp://ftp.linux-magazine.com/pub/listings/magazine/201
- "Equipping Alexa with Self-Programmed Skills" by Michael Schilli, Linux Magazine, issue 199, June 2017: http://www.linux-magazine.com/Issues/2017/199/Programming-Snapshot-Alexa
- "Pyramid of Doom" by Mike Schilli, Linux Magazine, issue 170, January 2015: http://www.linux-magazine.com/Issues/2015/170/Perl-Asynchronous-Code
- "Node 7.6 Brings Default Async/Await Support" by Sergio De Simone: https://www.infoq.com/news/2017/02/node-76-async-await
« Previous 1 2
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU
This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
-
XZ Gets the All-Clear
The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
-
Canonical Collaborates with Qualcomm on New Venture
This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.
-
Kodi 21.0 Open-Source Entertainment Hub Released
After a year of development, the award-winning Kodi cross-platform, media center software is now available with many new additions and improvements.
-
Linux Usage Increases in Two Key Areas
If market share is your thing, you'll be happy to know that Linux is on the rise in two areas that, if they keep climbing, could have serious meaning for Linux's future.
-
Vulnerability Discovered in xz Libraries
An urgent alert for Fedora 40 has been posted and users should pay attention.
-
Canonical Bumps LTS Support to 12 years
If you're worried that your Ubuntu LTS release won't be supported long enough to last, Canonical has a surprise for you in the form of 12 years of security coverage.
-
Fedora 40 Beta Released Soon
With the official release of Fedora 40 coming in April, it's almost time to download the beta and see what's new.
-
New Pentesting Distribution to Compete with Kali Linux
SnoopGod is now available for your testing needs
-
Juno Computers Launches Another Linux Laptop
If you're looking for a powerhouse laptop that runs Ubuntu, the Juno Computers Neptune 17 v6 should be on your radar.