Rewriting a photo tagger in Go
Programming Snapshot – Go Photo Tagger
In honor of the 25th anniversary of his Programming Snapshot column, Mike Schilli revisits an old problem and solves it with Go instead of Perl.
Hurray! This issue marks the 25th anniversary of my "Programming Snapshot" column, which first appeared in the German edition of Linux Magazine back in October 1997 (originally under the "Perl Snapshot" banner). Times have changed: Now the featured programs in this column mainly use Go, but you might also see Ruby, Python, or even TeX, as was the case recently.
For this dinosaur birthday party, I thought I might rewrite a tool I put together in Perl back in the dot-com era, but looking at it from today's perspective in Go. The photo tagger from 2003 (it was called Image Database [1] or idb
for short) is something I've been wanting to use again for a long time.
The idb
tool assigns one or more tags to a set of photo files, distributed over arbitrary subdirectories somewhere on the hard drive. Once tagged with the tool, the same program can retrieve the photos if you provide the name of the desired tag. The problem with the old Perl code, though, is that you need both the time and the inclination to go through the installation and dependency hell of all the Perl modules used by it. Moreover, many years have passed since then, and some CPAN module developers have broken backward compatibility by changing the original programming interfaces. Luckily, it's 2022, and Go has solved these kinds of installation problems for all time, as you can compile static binaries that run on similar architectures.
Also, the old tagger script used a separate standalone MySQL server back in the day, but today – at least for tools that only run locally – I prefer to have everything bundled into a single binary, such as an embedded SQLite flat file database engine. Somewhat surprisingly, the rewrite with newfangled technology was remarkably quick.
SQLosaurus Rex
SQL databases are a bit out of fashion these days. If you only need a key-value store for your data, you are more likely to use a persistent cache or a server solution like Redis. However, for local data not exceeding a few megabytes, running an external process is unlikely to be worthwhile. Also, I tend to be suspicious of binary data in caches or key-value stores such as Berkeley DB. Instead, I prefer to take a direct look at the data myself from time to time. SQLite is an ideal database because it stores the data in a single file that a command-line tool such as sqlite3
lets you browse. Plus, backing up a single file is easier than creating and backing up a dump of a running database.
On top of this, SQLite is one of the few open source tools that is truly in the public domain. This is why there's a Go module on GitHub like mattn/go-sqlite3
that lets you legally include the SQLite source code in any program you write and distribute. The Go compiler then turns SQLite, the library, and the application code into a single binary that can be copied to other computers with a similar architecture and will run there without any installation hassles. It's the end of dependency hell as we know it – I never thought I'd get to see that! For the installation at least, recompiling legacy code can be a different story and subject to problems arising from non-backward-compatible changes.
Three Tables
So which relational data model is suitable for a photo tagger application? The idb
tool assigns one or more tags to one or more files. Since the Stone Age of data processing, the three-table model has proven useful for many-to-many relations like this: two tables to assign index numbers to tag names and file paths, and then a third, two-column table that maps the index numbers to each other if a particular tag is attached to a particular file.
In this way, the database only needs to store the full tag or file name once in each case, a basic requirement for a normalized database. This has advantages beyond wasted disk space due to duplicate storage. Moreover, if the user corrects a typo in a tag, the database only has to correct it in one place, even if the tag is attached to thousands of files.
For example, to tag the dsc13.jpg
photo file with the surfing
tag (Figure 1), the tool first creates a new entry for the surfing
tag in the tag
table (on the left of Figure 1) if the entry does not already exist. SQLite automatically assigns the associated sequential index number, 2
in this case, to the entry because entries start at an index of
and surfing
is the third entry in the name
column. In addition, the file name dsc13.jpg
, if not already present, needs to be inserted into the file
table – in Figure 1 it ends up in the third row and has an index number of 2
(again, an ascending index starting at
).
That takes care of the two lookup tables for tags and file names. Now you need the actual assignment of the tag to the photo. This is handled by an entry in the tag map
table (center, Figure 1), which assigns a tag ID of 2
to the file ID 2
. All done! Using typical SQL joins, it is then easy for the database to respond to the question as to which photos were tagged with surfing
. An SQL query to this effect quickly yields dsc13.jpg
and possibly others. In the opposite case, the query engine can also easily discover which tags are attached to the dsc13.jpg
image file, again by joining the tables.
Homemade
The finished idb
binary, linked together from the Go sources for this article, can carry out the commands listed in Table 1. The binary supports tagging files, searching for files with a specific tag, and listing all tags assigned so far. As a special treat, the --xlink
option generates a directory full of symlinks pointing to the original photos for files found for a given tag. With a tool such as iNuke [2], featured in a recent column, the photos can then be viewed, and the best ones selected.
Table 1
Commands
idb --tag=foo image.jpg … |
Tag photos with foo |
idb --tag=foo |
Find photos with the foo tag |
idb --tag=foo --xlink |
Find photos with the foo tag and create a local symlink |
idb --tags |
List all tags |
Buy this article as PDF
(incl. VAT)