Customizing file formats with unoconv

Flexible Import/Export

© Photo by Hudson Hintze on Unsplash

© Photo by Hudson Hintze on Unsplash

Article from Issue 208/2018

A hidden utility in the LibreOffice toolbox, unoconv offers a wide array of import and export filter options for use at the command line.

LibreOffice is designed to save, import, or export one file at a time, using standard filter settings. The File menu allows you to choose PDF export options, but for most other types of files, you must use the default filter settings. If you want to save multiple files, or adjust the filter settings, you need to shift to the command line and run unoconv [1], a little known Python script that gives you greater control, both with a wide array of import and export filter options.

Unoconv is short for Universal Network Objects (UNO) conversion, a reference to the UNO API used by both LibreOffice and OpenOffice [2]. Bindings for UNO are available for most C++, Java, and Python compilers, and the API is used to create extensions, as well as to provide support for formats not visible in the LibreOffice desktop window, such as the obsolete LibreOffice 1.0 file formats.

Unsurprisingly, unoconv requires access to LibreOffice's resources. The easiest way to provide this access is to install unoconv on a system that already has LibreOffice installed. However, as detailed in the man page, you can also use the --connect (-c) option followed by a comma-separated list to define and connect to the location of a remote LibreOffice instance or --listener (-l) to have unoconv detect one.

Unoconv's basic command structure (Figure 1) is:

Figure 1: Unoconv's basic command structure. The --verbose option has been added to show the script's operation.

Other files can be added, either in a space-separated list or by using regular expressions. The command structure assumes that you are exporting the file(s) to PDF format, which is probably the most widely used operation for the command. The extension is the quickest way to specify the type of file, although alternatively you can use the option --doctype (-d) [TYPE], specifying document, graphics, presentation, or spreadsheet. Formulas, databases, or charts are not supported by unoconv – no doubt due to lack of demand, since these types of documents have existed in LibreOffice and its predecessor for over a decade. If you prefer to see confirmation that the command has been successfully carried out, you can also add up to three --verbose (-v) options – without at least one, unoconv only displays error messages, and the only sign of a completed conversion is the return to the command prompt.

If you want to change the export format, add the option --format= (-f). The supported formats for both exports and imports are displayed by running unoconv --show. Supported formats include text, CSV, dBase, HTML, PDF, several versions of Microsoft Office formats, StarOffice formats (LibreOffice's original ancestor), common graphic formats, and, of course, current LibreOffice formats (Figure 2).

Figure 2: A few of unoconv's supported formats. Unoconv supports dozens of formats, some of which are not listed in the desktop interface.

In addition, unoconv also includes several different housekeeping options. If a file's attributes matter, you can add --preserve so that the output file has the same attributes as the original file. For batch conversions, you might want to use --output (-o) to place all the output files in a separate directory, rather than have them mixed together with the original files. The output file can also be password protected by adding:

--password= [PASSWORD]

Still another interesting option is to set the output file to the same format as the original, then add --template (-t) [FILE] to add styles from another file to the output – a command-line version of the Load Styles feature in the Styles and Formatting window on LibreOffice's desktop interface.

Import and Export Filter Settings

For many users, the default filter settings are all that is needed. However, you can adjust both import and filter settings to your own preferences, using --export (-e) [SETTING] or --import (-i) [SETTING]. Among other purposes, this ability can be used as an easy method for adjusting the character encoding or date formats in the original file.

Filter settings are added directly after --export (-e) or --import (-i), with a separate option for each setting. For text and CSV files, these settings are introduced by FilterOptions= and completed by a comma-separated list unique to the format. In the list, settings can be left blank (,,) or at the end of the list omitted altogether, forcing the use of the default settings.

By contrast, PDF and graphics exports and imports are added after --export (-e) or --import (-i), with a separate option for each setting. In other words, to set a password and set the highest image resolution to 300dpi in a PDF file, the command would include:

--export PermissionPassword=abcdef --export MaxImageResolution=300

A complete list of standard import and export settings is available online [3], but it is far too long to mention here. However, different types of files have their own set of filter options.

Text Export and Import

For text import, the most common setting to customize is the encoding. A single value can be entered, such as

--import FilterOptions=76

which would set the encoding to UTF-8. However, for exporting text from a spreadsheet, the FilterOptions fields are encoding, field-separator, text-delimiter, quote-all-text-cells, and save-cell-content-as-shown.

CSV Text File Import

CSV files have four basic settings. In order, they are the field separator, the text delimiter, the encoding, and the first line in the file to convert to or from a spreadsheet. For example,

--export FilterOptions=44,34,76,2

will set commas as the field separator, a double quotation mark as the text delimiter, UTF-8 as the encoding, and the first line in the file to the second. In theory, at the end of the settings, you could add the date format for each column, so that:

--export FilterOptions=44,34,76,2,1/5,2/5,3/5

would specify that the date formats for the first three columns would be YY/MM/DD. Any other columns would use the date format already specified for them.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • LibreOffice 6.0 Released

    The new version comes with an experimental ribbon interface.

  • LibreOffice 5

    The Document Foundation released LibreOffice 5.0.0 at the beginning of August, and the first update 5.0.1 appeared just three weeks later. In addition to several fixes and new features under the hood, Version 5 provides some very visible improvements.

  • Testing LibreOffice

    Companies that depend on LibreOffice have a reason to wonder whether the office suite is working on all systems. You can use Python and the LibreOffice API to check.

  • LibreOffice Music Database

    LibreOffice Calc and Base are all you need to create a simple database for organizing the songs in your music collection.

  • LibreOffice 5.4 Released

    Comes with improved support for Microsoft Office file formats.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95