From one to many: AsciiDoc converts a text file to various output formats

Single Source

© Lead Image © Galina Peshkova, 123RF.com

© Lead Image © Galina Peshkova, 123RF.com

Article from Issue 171/2015
Author(s):

AsciiDoc syntax along with its eponymous command lets users create a text document with unobtrusive markup and convert it to a variety of output formats.

Write once, publish many – the idea behind AsciiDoc [1] is not new. The AsciiDoc syntax was created as a simple method of editing DocBook documents and has established itself as a more or less ubiquitous document format that acts as a source for a variety of other output formats.

AsciiDoc is both in wide use and actively developed. Even publishing companies accept manuscripts in this format or use it internally. The system comprises a source text and a converter that converts the source into the desired output.

The asciidoc command accepts three document types with the -d switch – book, article, and manpage – with default front and back matter (Table 1). The software uses back ends to generate various formats with the -b switch (Table 2). The default document type is article.

Table 2

AsciiDoc Back Ends

Back End

Formats

docbook5

DocBook (current version), PDF

docbook45

DocBook (widespread version), PDF

xhtml11

XHTML 1.1 (default)

html4

HTML 4

html5

HTML5

slidy

(X)HTML presentations

wordpress

Websites, blogs, CMS

latex

Standard LaTeX (for PDF Fineprint)

epub

eBooks

Table 1

AsciiDoc Document Types

Type

Default Sections

book

dedication

 

preface

 

appendix

 

bibliography

 

glossary

 

index

 

colophon

article

abstract

 

appendix

 

glossary

 

bibliography

 

index

manpage

NAME*

 

SYNOPSIS*

* Mandatory sections

The design goal for the source text markup was a format that is easily understandable for humans but that still offers relatively advanced options, including support for all of the usual tags and structures, as well as hierarchical layers for structuring text, references within the document, and URLs for external content. Beyond this, the software supports the ability to embed keywords, indexes and footnotes, literary references, images, and tables.

Text

AsciiDoc is oriented on many existing popular conventions for embellishing text files. You will be familiar with many of these and probably have used some features already – for example, underlining a title with equals signs or a head with dashes or marking a bullet point with an asterisk at the start of the line.

Paragraphs, probably the most important element for structuring texts, are created by an empty line between blocks of text. If the body text of a paragraph does not start in the first column (i.e., if you have at least one space or a tab), AsciiDoc treats the whole paragraph differently, applying the formatting literally without interpretation and using monospaced font in the output.

AsciiDoc also supports a special type of paragraph formatting: You can add an instruction starting with [type] in what is otherwise an empty line (Listing 1); AsciiDoc differentiates between four types (Table 3).

Listing 1

Listing Paragraph Style

 

Table 3

Paragraph Types

Instruction

Result

verse

Normal typeface taking hard line breaks into account (left-justified).

quote

Normal typeface, balanced typography.

listing

Monospace font, often with a background or border; takes hard line breaks into account; left justified.

literal

Monospace font, takes hard line breaks into account, left justified.

AsciiDoc groups multiple successive spaces in the body text during conversion. To add spaces at a specific position, you need special non-breaking spaces, which you generate using {nbsp} singly or in succession. Classic markup in the text uses the popular Markdown syntax (Table 4).

Table 4

A Brief Introduction to Markdown

Instruction

Result

*Text*

Bold

_Text_

Italics

+Text+

Literal text in a typeface with fixed spacing

`Text`

Literal text in a typeface with fixed spacing

Two ways to create the layers required to add structure are by adding single-line instructions or by using a variant with two lines.

The single-line headers (Table 5) are correctly identified by the converters only if the text really does just take up one line, but you can leave out the closing tags. Alternatively, you can mark the structural layer by underscoring headings with different characters: An equals sign for the first layer, a dash for the second, then a tilde, a circumflex, and finally a plus sign at the lowest level (Listing 2).

Listing 2

Document Headings

 

Table 5

Layers

Instruction

Result

= Text =

Document title

== Text ==

Chapter

=== Text ===

Section

==== Text ====

Subsection

===== Text =====

Sub-subsection

Additionally, you can insert anchors (i.e., labels) in front of headings:

[Label]
=== Section heading

The heading formats are pretty flexible because they support versatile formatting in the source code.

The ability to omit the closing tags for individual markups is convenient but can make it difficult to troubleshoot bugs that cause a conversion to fail. This problem also occurs with the use of multiline tags: They tend to cause far more problems than their single-line counterparts in practical applications.

AsciiDoc reserves the first lines in a document (the Header) for special tasks. In addition to the document title, you can enter the author's name, an email address, a version number, and a date. The software uses these details both as the document's metadata and as the basic document information in the output.

References and Links

If you need references within the document, [[Label]] creates the anchor that you can point to at any position in the text using <<Label>> or <<Label,Text>>. Although AsciiDoc relies fully on Unicode, non-standard characters in the label can cause problems during conversion.

The reference Text mitigates the situation somewhat: The program uses it at the point where the reference exists in the text, and arbitrary characters are permissible.

If the reference text is missing, AsciiDoc inserts the previously defined label or a different element during conversion. The rules for what the software uses here vary: In sections, for example, the default is the headline text.

You can reference other documents in one of these forms:

link:filenameI#ID
link:filename#ID[Text]

If you need to do so, you can state paths that include filenames. A simple syntax exists for URLs:

http://address
http://address[Text]

The program detects both automatically because of the http: keyword. This also applies to mail addresses:

mailto:address
mailto:address[Text]

AsciiDoc supports two approaches to embedding images of different kinds: in the body text as inline graphics or as a separate block (i.e., an image paragraph). The image: keyword is used for images in the body text:

image:Filename[Text]

Even if the text is missing, you need to use the square brackets. The text can include additional formatting for the image – for example, defining the size with height=14pt.

If you put images in separate paragraphs, use block macros. The first part of the keyword is image followed by two colons, then the filename (with an optional path) followed by square brackets. To enter an image caption, use the caption= keyword with the caption text in double quotes. Alternatively, and more simply, you can use the following form, which additionally defines a label for the image:

[[Label]]
.Text
image::Filename[Options]

Lists are easy to format in AsciiDoc. Bulleted lists begin with a single dash or an asterisk. To create nested lists, increase the number of leading dashes (asterisks) to indent further.

Numbered lists use a similar principle. AsciiDoc changes the enumeration style at each level (Listing 3). For an even more elegant way of doing this, just type dots. You can insert up to five dots at the start of the line to create an enumerated list.

Listing 3

Numbered Lists

 

You can mix and embed the types of lists, but you need to pay close attention to an empty line so that AsciiDoc can tell which elements belong together and where the list ends. Because this typically leads to difficulties, it is useful to initially save the lists in a separate file and convert them as a sanity check. If it works, you can then add the code to the document.

Delimited blocks are another kind of structural element that uses specific formatting, such as displaying source code. In LaTeX-speak these elements are known as "environments." A number of delimited blocks are already defined in asciidoc.conf. Each of these environments normally has several variants; however, not all are available for all formats, or their appearance might differ. Listing 4 shows some of the basic structures. A block comprises the desired text delimited by two lines of special characters (e.g., asterisks, dashes, etc.): one above the text and one below.

Listing 4

Predefined Blocks

 

In the line that precedes the first line of a block, you define the environment style. However, many of the styles do not harmonize with all output formats, which can cause the conversion process to exit prematurely.

For example, the open block structure, which starts and ends with two dashes,

[Style]
--
Code
--

is used for summaries in the abstract style or to store introductory text for parts of a publication in the partintro style. In a list, a double-dashed section of text allows you to add an open block to a list element [2].

The AsciiDoc table format gives you useful results for simple and short content. Listing 5 shows the source code for a simple table. All the major settings rely on options which are included in the square brackets before the table. The table itself is created with pipe and equals signs. Although this is problematic for short tables, it can be error prone for longer tables.

Listing 5

AsciiDoc Tables

 

The table options are organized in several groups (Table 6). In AsciiDoc, tables can always have a header, typically for the column designators, and a footer. Both of these contain special formats if they are defined using D header and footer keywords.

Table 6

Lines and Columns

Option

Values

Function

grid

none, cols, rows, all

Table grid lines

frame

topbot, none, sides, all

Table border

options

header, footer

Header, footer

format

psv, csv, dsv

Separator character for columns

valign

top, bottom, middle

Vertical alignment within table cell

width

Between 1% and 100%

Table width

cols

Multiplier (*), alignment, width, style

Column description

Defining the style for the columns causes a number of side effects; you can try them out using the example in Listing 5 by changing the format for the columns from a to v. The use of formatting cells is extremely messy; Figure 1 shows an example of what you will want to avoid.

Figure 1: Although AsciiDoc supports a wide range of table formats, the source code can become very difficult to read.

Problems

Errors in authoring AsciiDoc documents obviously do not become apparent until you try to convert. Some editors, including Emacs, support the input of source code by providing special modes that discover at least simple oversights while you are typing; however, even these helpers are unable to find more complex logical errors or errors in syntax.

The AsciiDoc documentation [3] is fairly sparse; thus, makes a lot of sense to look at a few examples before you start on your own projects. Some documents are available on the website, including an article that explains the use of indexes [4], a variant on the User Guide [5] formatted as an article, and a template for a book [6] that includes more complex structures, such as a reference list, a colophon, and a dedication.

Converting with the asciidoc script and the a2x toolchain takes a surprising amount of time just to generate EPUBs or PDFs from relatively simple source text. The -v option gives you more detailed information on what is taking place.

Things look a little better in the case of text-only or HTML output; however, these formats only support a subset of the available options, so they are only useful for previewing documents at best.

To minimize the time overhead, you can always use the Shell && operator to combine the processes of creating an HTML file and a more complex format. You can then discover and resolve the syntactic errors with a faster HTML convert before converting to the final format after everything else has been worked out.

That said, a successful conversion to HTML is no guarantee that a PDF file will give you the desired results. In fact, it is quite difficult to create PDFs with an attractive layout directly in AsciiDoc (Figure 2). The workaround is to export to LaTeX. AsciiDoc generates standard LaTeX, which you can then convert to a PDF document after manual editing.

Figure 2: Even documents that work well with dblatex and Apache FOP (Formatting Objects Processor) still exhibit major differences in their PDF output.

The advantage of the workaround is that you have access to all of LaTeX's options with minimal additional overhead. AsciiDoc only supports a small subset of the formats available in LaTeX; among other things, it does not support picture environments, which either rules out their use, or forces you to do some post-editing.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Mermaid

    Mermaid lets you create diagrams from simple text-based statements.

  • ASCII Art

    Creating images from letters and numbers is a complex matter, unless you have the right tools at hand.

  • JavE: ASCII Art Editor

    The free JavE ASCII editor lets you create diagrams, brighten email messages, write circuit diagrams, create cartoons, or just design ASCII art for pleasure.

  • Binary Data in Bash

    Bash is known for admin utilities and text manipulation tools, but the venerable command shell included with most Linux systems also has some powerful commands for manipulating binary data.

  • Tutorials – Markdown

    Create attractive and structured documents from the comfort of your text editor – and convert them to a huge array of formats.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News