An introduction to Parrot
Birdsong
Parrot is an all-in-one tool for developing and executing new programming languages. Perl 6 runs on Parrot; chances are your language can run on it, too.
Are you exhausted from working on the same old code day after day? Is your programming language a clunker? Do you need a little more oomph to merge onto the information superhighway? Suffer no more. Coder's Kingdom has something for everyone, so come on down! Here's PHP 5: practical and great for the web with spunky bytecode acceleration. Or Ruby 1.9: this looker, fresh from the factory, is a real cloud-pleaser. Do you want to save a little cash? Try certified, pre-owned programming languages. Perl 5.8 is priced to move. Coder's Kingdom. Where every coder is king.
Coder's Kingdom might not exist, but choosing a programming language is strikingly similar to buying a car. Like a car, each programming language is designed with an intent, be it utility, speed, or size. Like a car, a language can offer exotic, cutting-edge features. Choosing a language is a significant investment, too, with potentially harmful and even disastrous consequences if the selection is ill considered. Additionally, each language includes mandatory parts – literals, variables, subroutines, and flow control instead of four tires, a steering wheel, and a windshield.
Indeed, the bulk of the work developing a new programming language is spent on (pardon the allusion) reinventing the wheel. The source of language must be decomposed by a parser, assembled into tokens, collated into statements, abstracted into syntax trees, and finally, either interpreted or converted into something executable, such as bytecode or binary.
Moreover, even once constructed, a new programming language is limited unless it provides a raft of supplemental capabilities, such as regular expressions and interfaces to system calls. Some overhead can be avoided by constructing facades around C libraries, such as the Perl Compatible Regular Expression (PCRE) library, but that too requires at least some effort. And ironically, language after language pursues such work because there is little other choice.
However, there are exceptions: Microsoft's long-standing .NET initiative permits many languages – C#, Visual Basic, C++, and open source languages ported to .NET – to share code because each is ultimately compiled to a common low-level target, Common Intermediate Language (CIL) [1]. JRuby is also an exception: JRuby runs Ruby code on top the Java Virtual Machine (JVM).
Still, the vast majority of computer programming languages remain isolated, like trying to put a Hemi into a Honda.
A Unified Approach
To speed the development of new languages, facilitate code sharing among all languages, and hasten the execution of dynamic languages such as Perl, Ruby, and Python, the Parrot development team has constructed a virtual machine and a suite of tools to compile and execute (conceivably) any computer language. Parrot [2] was initially conceived as an engine for Perl 6, but its charter was expanded because it's suitable for a vast class of code. As the Parrot website states, "In theory, you will be able to write a class in Perl, subclass it in Python, and then instantiate and use that subclass in a Tcl program."
Because Parrot is a one-stop compiler shop, it provides all the tools required to convert source code into a running program.
- The Parser Grammar Engine (PGE) defines the syntax of a programming language with rules. One rule might define the syntax of an assignment statement, whereas another might define the structure of a subroutine. PGE parses input, yields tokens, and attempts to match the stream of tokens against all the rules. A match indicates a valid clause in the language.
- Each rule, in turn, may call one or more actions. If you are writing a new programming language, the semantics of the language are expressed in actions. Each action transforms a match into a language-independent data structure called a Parrot Abstract Syntax Tree (PAST). PAST represents a statement as a tree, typically with operands as the leaves and operators, including control structures, as branches. Cumulatively, an entire program is ultimately represented by a PAST tree with the special node TOP at the root.
- Parrot Intermediate Representation (PIR) is easily written by hand and is the usual output of a compiler targeting Parrot. It abstracts away some low-level details to reduce the burden on programmers and compiler writers.
- Parrot Assembly (PASM) is more mechanical than PIR. It can be coded manually and can be generated by a compiler, but every detail, such as register allocation, must be attended to.
- The lowest level representation is Parrot Byte Code (PBC). Although you can think of this as machine code, it does not run on hardware. Instead, PBC is executed by Parrot.
Parrot can run PBC, PASM, PIR, and PAST files. PAST isn't suitable for manual coding, but it is ideal for integration between tools.
Rules
A rule is like a form you fill out in the doctor's office. Some of the fields are printed on the form and simply provide context; others fields are blank and must be filled out. In the rule if_statement, the subrules expression, block, and (the second block alias) else are the empty fields. As mentioned in the discussion about the Parrot object for an if-then-else, these subrules provide the values used to instantiate the object.
A traditional compiler like the ubiquitous GCC typically generates an abstract syntax tree as an intermediate form to deduce and apply optimizations. Adapting such a tree to PAST is straightforward, and because Parrot can read PAST, you can connect an existing compiler to Parrot with ease.
In addition to these components, Parrot also supplies a number of PAST nodes (think of a node as a template or object) to represent fundamental features found in every programming language.
For example, conceptually, Parrot includes a node for if-then-else, a construct found in all languages. To add such a node to your program's larger tree, you must provide a condition, a block of statements to execute if the condition is true, and, optionally, a block of statements to execute if the condition is false. No matter what the original syntax looked like, the if node captures the semantics of such a statement.
Other PAST nodes realize other features: a variable, including scope and type; a loop; and operators, including arithmetic and subroutine calls.
And that's the real power of Parrot. Most language features need not be reinvented because Parrot provides suitable abstractions for everyone to reuse, independent of syntax. However, if a language dictates something truly novel, you can focus on implementing a new PAST node to achieve its semantics.
Figures 1 through 5 picture the process of converting a C-like if statement to a PAST node. Figure 1 shows the source code; in this form, it is a sequence of characters that might or might not adhere to proper syntax.
Figure 2 shows the source code decomposed into tokens according to the rules of the parser. At this point, individual characters are no longer important; instead, an aggregate is now the coin of the realm.
Figure 3 pictures rules. Each rule dictates a piece of proper grammar. When a collection of tokens matches a rule, it is collected and formed into a rule, another kind of amalgam.
Each agglomeration step in the compilation process moves code from syntax closer to semantics. Figure 4 shows the action for the if_statement. Each amalgam in Figure 3 has already been reduced by its separate (sub)action and is now amassed for subsequent processing. At this point, each unit has purpose, but the logic of the conditional is not yet realized.
Figure 5 shows the final assembly after the action for the if_statement. The PAST node captures the intent and operation of the statement in a data structure. The final step, not shown, traverses the PAST depth-first (top-to-bottom and left-to-right) and generates the proper code. The code would load the first variable, then the second, and perform a comparison. The comparison either continues execution in sequence (with the next code, which implements a successful comparison, printing "A less than B") or branches to the code for false (printing "B is greater than or equal to A").
Coding for Parrot
Listing 1 shows a concrete example of the data discussed above. This example demonstrates some grammar for Squaak, a sample programming language included with the Parrot source code. In particular, Listing 1 is a rule that defines the syntax of a Squaak if-then-else statement. A valid statement must have a leading literal, the if; an expression, which is defined in its own eponymous rule (not shown); the keyword then; a block of statements, also defined in its own rule; and the keyword end. Those are mandatory phrases.
Listing 1
A Rule Defining Syntax
The phrase ['else' <else=block>]? is optional, as denoted by the trailing question mark, which means "one or zero" occurrences. If the phrase does appear, it must have the keyword else followed by another block. The notation <else=block> simply assigns an alias called else to the second block to differentiate it from the first.
If this rule matches, an action is called at the end, which is the purpose of the quirky looking {*} marker at the end of the rule.
An action is simply a subroutine. By default, Parrot calls the action with the same name as the matched rule. Furthermore, the action is called with arguments, one argument per subrule.
Listing 2 shows the action associated with the if_statement rule. Even without an elaborate description, its operation should be fairly obvious. Given a condition and a block to execute if the condition is true, the code builds a PAST for the statement. If the additional block is provided, it's amended to the PAST. The final statement make does the heavy lifting for you.
Listing 2
An Action Associated with a Rule
With Listing 2 in mind, the following code is the action for an assignment statement, such as x = 10:
method assignment($/) { my $rhs := $<expression>.ast; my $lhs := $<primary>.ast; $lhs.lvalue(1); make PAST::Op.new( $lhs, $rhs, :pasttype('bind'), :node($/) ); }
Listing 3 shows an example PIR program taken from the Parrot website. The application sums the square of numbers 1 through 10. In the code, indentation is used for readability, but not semantics. (In other words, the white space is not active.)
Listing 3
A PIR Program
Listing 3 looks like assembly language, with some exceptions that are conveniences provided in PIR. Specifically, temp = i * i is shorthand for mul temp, i, i. The code .local int i, total, temp uses friendly names instead of hard-wired registers for the three variables and uses .local to specify a scope. The assignment operator (=) is an alias for set, and goto is also a convenience. Its equivalent is le i, maxnum, loop.
To run the code shown in Listing 3, you can either interpret the PIR code or compile the PIR into PBC and execute that code instead. (See the "Building Parrot" box for instructions on how to build Parrot.)
$ parrot sum.pir The sum of the first 10 squares is 385. $ parrot -o sum.pbc sum.pir $ parrot sum.pbc The sum of the first 10 squares is 385.
Parrot can emit PASM source, if you're curious how PIR translates. To generate PASM, substitute -o somefilename.pbc with -o somefilename.pasm instead:
$ parrot -o sum.pasm sum.pir
Listing 4 shows the PASM equivalent of Listing 3.
Listing 4
PASM Source
Parrot can run PASM directly, too:
$ parrot sum.pasm The sum of the first 10 squares is 385.
Building Parrot
Building Parrot is surprisingly easy, assuming you have common software development tools on your Linux system. If you have Perl 5, the GNU Compiler Collection (GCC), and Subversion (or if you have Xcode installed on Mac OS X), the entire process requires just a few minutes.
Your first step is to check out the latest code with Subversion:
$ svn co https://svn.parrot.org/parrot/trunk parrot $ cd parrot
Next, configure the build with Parrot's special Configure.pl script. Typically, the defaults derived by the script work fine. However, if you want to customize your build, type perl Configure.pl --help to review the list of options. Most options are general, like the install option --prefix, but a handful affect Parrot features, such as which garbage collection strategy to apply.
$ perl Configure.pl Parrot Version 1.2.0 Configure 2.0 Copyright (C) 2001-2009, Parrot Foundation. Hello, I'm Configure. My job is to poke and prod your system to figure out how to build Parrot. The process is completely automated, unless you passed in the <C>--ask<C> flag on the command line, in which case I'll prompt you for a few pieces of info. Since you're running this program, you obviously have Perl 5--I'll be pulling some defaults from its configuration. ... Now you can use <C>make<C> to build your Parrot. After that, you can use <C>make test<C> to run the test suite.
To continue, compile the software and run its verification suite to ensure the stability of your build:
$ make $ make test
Assuming the tests pass, you can proceed and install the Parrot tools to your system:
$ sudo make install
By default, the Parrot software is installed in /usr/local. If you want to install it to your home directory, use perl Configure.pl --prefix=$HOME/parrot in the configuration step. The installation adds eight utilities to your system. The most important one is parrot, the Parrot virtual machine. To try it, use Listings 3 and 4 and the commands discussed previously.
Once you have the Parrot source code, you can keep up to date with svn update, with one caveat: You must do make realclean before any refresh.
$ make realclean $ svn update
If you skip the cleanup, Parrot can break or exhibit strange and unreproducible bugs.
One Talkative Bird
Apropos of its name, Parrot speaks lots of languages [3].
Rakudo Perl 6 [4] is an implementation of Perl 6 on Parrot. Cardinal [5] is a version of Ruby 1.0, and Pipp [6] is a version of PHP.
Other projects are moving Java, Lua, and Smalltalk to the Parrot engine, and 40 other projects are listed in the index.
As this issue of Linux Magazine/Linux Pro Magazine goes to press, the latest version of Parrot is 1.2.0. The software is stable and suitable for use as both a programming language and for virtual machine research and development. If you need a domain-specific language or want to extend an existing language, try Perl 6 now.
Take Parrot down from its perch.
Infos
- The Common Intermediate Language: http://en.wikipedia.org/wiki/Common_Intermediate_Language
- The Parrot homepage: http://www.parrot.org/
- Language implementations based on Parrot: http://www.parrot.org/languages/
- Rakudo Perl 6 on Parrot: http://www.rakudo.org/
- Cardinal, Ruby for Parrot: http://github.com/cardinal/cardinal/tree/master
- Pipp, PHP for Parrot: http://wiki.github.com/bschmalhofer/pipp/
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
AlmaLinux 10.0 Beta Released
The AlmaLinux OS Foundation has announced the availability of AlmaLinux 10.0 Beta ("Purple Lion") for all supported devices with significant changes.
-
Gnome 47.2 Now Available
Gnome 47.2 is now available for general use but don't expect much in the way of newness, as this is all about improvements and bug fixes.
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
-
Armbian 24.11 Released with Expanded Hardware Support
If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
-
SUSE Renames Several Products for Better Name Recognition
SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
-
Red Hat Enterprise Linux 9.5 Released
Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.
-
Linux Sees Massive Performance Increase from a Single Line of Code
With one line of code, Intel was able to increase the performance of the Linux kernel by 4,000 percent.
-
Fedora KDE Approved as an Official Spin
If you prefer the Plasma desktop environment and the Fedora distribution, you're in luck because there's now an official spin that is listed on the same level as the Fedora Workstation edition.