dd(1): deceptively simple

Paw Prints: Writings of the maddog

Oct 26, 2010 GMT

Jon maddog Hall

Unix (and Linux) command line programs are like old friends. You get caught up in the day-to-day hustle of life and you may forget about them temporarily, but sooner or later you remember them and that warm feeling comes over you....

dd(1) is one of those programs that gives me a warm feeling. How simple dd(1) seems to most people, just reading in data at one end of the program and outputting the data at the other end, perhaps doing a little data blocking or unblocking and perhaps a little conversion. And of course a lot of us use dd(1) to clone disk drives and other “utility” tasks, because dd(1) is simple, fast and can work from a command line.

Yet on two separate occasions dd “saved my bacon”, so there is a soft spot in my heart for the command.

The first time was around 1984, when I had first started working for Digital Equipment Corporation. A salesman had a nine-track tape that had some data on it, and he had been told by the customer that if Digital could get the FORTRAN programs and the data off the tape, compiled onto our Ultrix system and run, that the customer would buy a lot of systems. The salesman, not having any other place to go, found me and asked if I would help. The salesman told me that “the format of the programs and the data on the tape was well documented.” I had heard stories of “well documented tapes” before, and I was skeptical, but I agreed to help.

Amazingly enough, the programs and the data on the tape WERE documented well. The tape had been made on an IBM system, so all the sources on the tape were in EBCDIC (pronounced “eb seh dik”, a character encoding used by IBM) instead of ASCII (you know how to pronounce that) and the instructions told how the records (80 character card images) and blocks (fixed size) were put on the tape. First came the programs, then the data. The data was a different block size, but still 80 character records.

After mounting the tape on the system, one “for” loop in shell with the dd(1) command set up to unblock the records and to translate the EBCDIC to ASCII, and I had pulled all the FORTRAN programs off the tape into separate source code files for compilation. Another “for” loop and another dd(1) command and I had all the data from the tape onto the disk in separate files.

I looked at the FORTRAN source code. I noticed that their logical unit numbers for I/O more or less matched up with what Ultrix was expecting for standard in, standard out and standard error. I compiled and linked the programs and ran them, redirecting the data files to standard input and capturing the output by re-directing standard out onto the disk.

At that point I had done everything the customer and salesperson had requested....total time was about an hour.

Then I noticed that the customer had a plotting program also written in FORTRAN to help them visualize the data. I had no plotter available, but the VT125 character cell terminal on my desk had a crude graphics mode called ReGIS. With ReGIS you could draw things on the character-cell terminal using byte codes. I decided to create a small set of subroutines that would match up with the customer's subroutine calls and output ReGIS byte codes to the terminal.

The subroutine calls were simple and few, so I finished the subroutines in a couple of hours, and linked them into the plotting program. Now I could see the customer's data on a simple, relatively fast and relatively inexpensive VT125 instead of having to put a piece of paper in a plotter and wait for a pen to draw the diagram.

I put all of this back onto the magnetic tape, but in a tar file, and wrote the same careful instructions about how to pull it off and what I had done.

The salesman bought me dinner.

The next time that dd(1) played a role was with the TK50 streaming tape drive that I blogged about yesterday.

Everyone tried to keep the TK50 streaming, but the VAX processors, disks and memories of the day could not keep up with the data streaming needs of the TK50. The buffer feeding data to the TK50 would empty and sooner or later the TK50 would stop, back up, and re-position itself. This would make backups take a very, very long time. It did not make any difference if you were using tar(1), dump(8) or any other program to write data...we just could not keep the tiny buffer full.

The engineers were discussing this one day and I happened to mention that we had a similar problem on a large IBM mainframe one time. We had fixed the problem on the IBM by creating a ring-buffer and using asynchronous I/O to fill the buffers as fast as possible and write them out as fast as possible. Of course asynchronous I/O was not as useful in a data-stream, but the engineers at Digital thought about it and made a version of dd(1) that had an option to allocate “n” buffers for data I/O. This was enough to allow dd(1) to buffer data and keep ahead of the streaming tape drives.

We encouraged the engineers to also add this functionality to tar(1), dump(8) and other utilities, but they (quite correctly) pointed out that these programs were maintained by other people, and to spread these changes to all these programs would cause huge amounts of work integrating the code into future versions of those commands.

Then the engineers came up with a unique idea. People could put the new dd(1) as a filter in a pipeline with tar(1) and dump(8). If dd(1) was the last filter on the line before the streaming device, it could do its buffering and keep the device streaming. This concept worked very well.

A couple of years later I was at a DECUS user group meeting and a developer of a backup utility program approached me. He told me that he had tried everything to get his TK50 to stream, but he could not get it to stream. I told him that I could get it to stream any time I wanted, and I told him the story about the buffers in dd(1). His eyes lit up, and a couple of days later I got an email from him that only said “its streaming, its streaming!”

It is a shame that Ultrix and Digital Unix were never “open source”. The current versions of dd(1) seem to be missing the buffering technology. Perhaps in the days of gigabyte memories and large I/O buffers this buffering technology is not needed as much, but it was interesting nevertheless.

Carpe Diem!

« previous post next post »

comments powered by Disqus

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

Yet Another Linux Kernel Vulnerability Discovered

Kernel

Affecting millions of systems, a kernel flaw discovered by Qualys could allow users to gain root privileges.
Ubuntu 26.10 to Include Ubuntu Certified Hardware Check

Ubuntu

If you've ever wondered if your laptop or PC is officially certified to run Ubuntu, that curiosity will soon be met.
Substantial Update to IPFire Now Available

The lastest version of IPFire features a fundamental change to how the system handles DNS.
Gnome Working on Test Center App to Make Testing Easier

Gnome , Linux

It's now possible to test experimental features on the Gnome desktop without worrying that you'll break things.
New Vulnerability Discovered in Linux Kernel

Artificial Inte... , Kernel , vulnerability

Hiding out for nearly 15 years, the Ghostlock vulnerability allows a standard logged-in user to gain root privileges.
New Linux Flaw Lets Attackers Escape VMs

RHEL , Security , vulnerability

A 16-year-old vulnerability allows an attacker to escape a virtual machine, gain access to the host, and execute malicious code.
Hannah Montana Linux Is Back!

DEBIAN , Kubuntu , Plasma

Developer Noah Cagle decided the world needed the once obscure but beloved Linux distribution and gave it a decidedly pink refresh.
System76 Refreshes the Lemur Laptop

Hardware , laptop

If you're looking for a laptop with tons of power and battery, look no further than the latest iteration of the System76 Lemur Pro.
More than 43 Million Lines of Code in Linux Kernel 7.2

Kernel , Linux

Using the cloc utility, Michael Larabel of Phoronix discovered that Linux kernel 7.2 has over 43 million lines of code.
Kubuntu Focus Goes Ultra

Hardware , Kubuntu , laptop

The Kubuntu Focus team has upped the performance ante of its M2 and Zr laptops with the latest, greatest CPUs from Intel.

dd(1): deceptively simple

Paw Prints: Writings of the maddog

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters