Examining the algorithms of the diff utility


Article from Issue 76/2007

Diff finds the differences between two versions of a file. We’ll show you how diff finds changes and matches in files without affecting a system's resources.

For a user at the command line, discovering the differences between two text files is easy: a simple command, such as diff Version_1.txt Version_2.txt, is all it takes. On closer inspection, however, it turns out that diff needs a large amount of memory and some ingenious algorithms to compare files. This article investigates how diff manages to find changes and matches in multiple megabyte files without affecting a system’s resources.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Spam-Detecting Neural Network

    Build a neural network that uncovers spam websites.

  • Perl: Automating Color Correction

    If you have grown tired of manually correcting color-casted images (as described in last month's Perl column), you might appreciate a script that automates this procedure.

  • BeeDiff

    BeeDiff compares two files and quickly displays the differences in a convenient desktop GUI interface.

  • Command Line: Diffutils

    The Diffutils tool set helps you compare text files, discover and display the differences between files, and even automatically synchronize files.

  • Hash Functions

    Cryptographic hash functions help you protect your passwords, but hashing is only secure if properly understood.

comments powered by Disqus

Direct Download

Read full article as PDF:

Diff_Algorithms.pdf  (320.74 kB)