Examining the algorithms of the diff utility


Article from Issue 76/2007

Diff finds the differences between two versions of a file. We’ll show you how diff finds changes and matches in files without affecting a system's resources.

For a user at the command line, discovering the differences between two text files is easy: a simple command, such as diff Version_1.txt Version_2.txt, is all it takes. On closer inspection, however, it turns out that diff needs a large amount of memory and some ingenious algorithms to compare files. This article investigates how diff manages to find changes and matches in multiple megabyte files without affecting a system’s resources.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Perl: Automating Color Correction

    If you have grown tired of manually correcting color-casted images (as described in last month's Perl column), you might appreciate a script that automates this procedure.

  • BeeDiff

    BeeDiff compares two files and quickly displays the differences in a convenient desktop GUI interface.

  • Command Line: Diffutils

    The Diffutils tool set helps you compare text files, discover and display the differences between files, and even automatically synchronize files.

  • Hash Functions

    Cryptographic hash functions help you protect your passwords, but hashing is only secure if properly understood.

  • Security Lessons: System Rescue

    Kurt provides some tips and recommends some tools to help you detect signs of network intrusion and data corruption.

comments powered by Disqus

Direct Download

Read full article as PDF:

Diff_Algorithms.pdf  (320.74 kB)