Analyzing Ada: Who wrote the notes attributed to Ada Lovelace?
Off the Beat: Bruce Byfield's Blog
Ada Lovelace is a hero of women in computing. Crediting her as the first computer programmer, her admirers defend her fiercely against detractors who question her accomplishments, pointing out the misogyny that lurks behind the attempts at debunking. However, so far as I know, nobody has attempted to challenge the detractors directly by comparing known samples of Lovelace's writing against the Notes that are her claim to fame.
The issue concerns Lovelace's translation and annotation of Luigi Menabrea's transcript of a lecture delivered by Charles Babbage at the University in the early 1840s. With Babbage's encouragement, Lovelace added seven highly technical notes labelled A to G. The most important of these is Note G, which concerns an algorithm for creating Bernoulli numbers and a table showing the punch card flow that has been called the first computer program.
Although the Notes are signed by Lovelace, detractors claim that she lacked the mathematical skills to write them. Instead, they suggest that the Notes were written by Babbage, who credited Lovelace in the hopes of gaining publicity for his work.
Even without a prose analysis, these claims are unlikely for at least two reasons. To start with, Babbage was punctilious about giving both himself and others credit for their work. The idea that he would give anyone false credit for any reason is contrary to his entire personality. Nor did he show the least ability to publicize his work.
More importantly, anyone questioning Lovelace's mathematical talents has to explain away the verdict of her contemporaries. Her mathematical teacher, Augustus de Morgan, told her mother that, had a man had her abilities, "they would have certainly made him an original mathematical investigator, perhaps of first-rate eminence."
Similarly, Babbage wrote to Michael Faraday that she "has thrown her magical spell around the most abstract of Sciences and has grasped it with a force that few men (in this country at least) could have exerted over it."
There are also numerous pieces of secondary evidence, such as Lovelace's letters to Babbage as the translation and the Notes were going to print, which sound exactly like a writer talking to an editor. The deeper you dig, the less plausibility Lovelace's detractors have.
Still, there is something satisfying in the possibility of answering the detractors on their own terms. Lacking the time for an original analysis, I took samples from the translation itself, Notes, A, B, D, and G, as well as from Lovelace's letters, and Babbage's casual and professional writing (I would have preferred to find some undisputed technical prose by Lovelace to use as well, but could find none). Whenever possible, these samples were at least 500 words long, although I had to compromise in the case of Lovelace's letters, which are often brief and available only in highly edited form.
Using Editing Central and the Coherence rating in the Readability Report extension for Apache OpenOffice, I ran them through a variety of tests for clarity and complexity: the Flesch reading score, Automated Readability index, Flesch-Kincaid index, Coleman-Liau, Gunning Fog index, and SMOG index. The Flesch score gives a percentage, with more difficult samples scoring higher, while the others are supposed to give the American education level needed to read the sample, with 14 signifying two years of university, and a score of 17 or over post-graduate work. Editing Central also gives other statistics about each sample, including the average number of words per sentence, and the average number of syllables per word, while the Readability Report's Coherence rating, which measures the extent to which new information is introduced without making clear its connection to previous text, gives a percentage score in which most samples have a single-digit rating and few very exceed 15%
Whether these tools help users to write more clearly is debatable, especially since they often give very different results. However, the tests do produce consistent characteristics that can be compared to help establish authorship.
Statistics for the Translation and Notes
The translation and the Notes are highly technical documents. The Flesch scores for the Notes range from 39 for Note A to 59 for Note B, while the translation scores 47.7. The average reading level from the other tests range from 12.4 for Notes B and G to 14.3 for the translation and 16.8 for Note A. The average sentence length varies from 22 words for Note D to 27.5 for the translation, while the average number of syllables per word is 1.6 for the translation, and 1.45-1.65 for the Notes. The Coherence Report is equally variable, with Notes A and D recording 2%, Note B 7%, and Note A 13%.
By themselves, these statistics indicated a sophisticated writer, capable of complex sentences and possessing an advanced vocabulary, but whose coherence varied considerably.. More importantly, unless you assume multiple authors, they also suggest a writer whose style varies considerably, depending on subject matter.
Statistics for Lovelace's Letters
I took excerpts from Lovelace's letters in 1842-3, just before and during the time the Notes were written.
These letters show that Lovelace has one characteristic indicated by the Notes -- considerable variation in her prose. In letters written to her husband and mother, Lovelace's Flesch scores are 80-83. The index scores average 6-7, their sentence lengths 11-13, and the average of poly-syllabic words 1.31-1.35.
However, when writing about her mathematical interests, Lovelace's style starts to change. In an 1842 letter to Dr. Morgan about her ambitions, her Flesch score drops to 70.4. Similarly, the last minute revisions sent to Babbage on July 4, 1843 register 78 on the Flesch score. The index scores almost move upward to 11-12, while the sentence length rises to 21 when writing to Morgan, and 15 when writing to Babbage, although the number of multiple-syllable words rises only slightly.
The trouble is, even though she is writing about mathematics, these scores are nothing like those of the translation and the Notes. That is probably to be expected, since the letters are casual writing, and Lovelace presumably did not take much care over them. Still, detractors might be tempted to take them as proof that Lovelace did not write the translation and Notes.
However, another possibility exists. Lovelace was still learning mathematics, and the translation and notes were her first technical publications. In her letters, she was frequently defensive about her efforts, and repeatedly talked about how she labored over the projects.
Under these circumstances, what could be more natural making a deliberate effort at clarity while elevating her diction as a defense against her doubts and as proof to others of her competence? She would hardly be the first writer in such circumstances to write with such motives, as any graduate student could tell you.
Still, one promising statistic does emerge. In all Lovelace's writing, the coherence rating is low, averaging 0-1%. This rating does not indicate incompetence in writing so much as a tendency to move quickly from idea to idea. It is compatible with Note D and the all-important G, but at odds with the ratings for B and A. It suggests that she is the most likely person to have written Note G, but, overall, the statistics are less definitive than might be expected. Perhaps other samples of her writing would offer different results.
Statistics for the Babbage sample
However, if comparisons cannot completely establish Lovelace as the author, they can rule out Babbage as a candidate. As a writer, Babbage is exceptionally consistent. Regardless of whether he is writing casually or formally, his writing remains much the same. His Flescher scores fall between 48-53, while the indexes usually give him a score of 14-14.3, although his formal writing sometimes goes as high as 15.4. His sentences average 29-35 words in length, with 1.4-1.52 syllables per word, and his coherence rating is usually around 7%.
Unsurprisingly, these scores fit well with the translation, which after all is supposed to be a transcript of Babbage lecturing. Apparently, being translated into Italian by one writer, then back into English by another, and transposed from first to third person is not enough to alter Babbage's prose style to any degree. However, aside from Note A, which is considerably higher, the Notes themselves all have consistently lower test scores than Babbage's writing, and most of them have fewer words per sentence and a higher average of multi-syllable words.
These results make the idea that Babbage was sole author of the Notes next to impossible. Longer sentences for Notes G and B might indicate more editing by Babbage,while the coherence rate suggests that he might have composed some or all of Note B, but even these are uncertain. For the most part, his involvement with the Notes in general and Note G in particular appears minimal.
Authorship by elimination
These statistics are less conclusive than I had hoped. With a little interpretation, they suggest the likelihood that Lovelace wrote the Notes, but they cannot prove that she did so.
The problem is that, while the available samples of Lovelace's prose are not a close fit for the Notes, either, completed and unedited samples might give a closer match. As things are, the closest that the available letters come to the Notes is the description of the Solitaire game in a letter written on February 16, 1840, in which Lovelace describes to Babbage the rules of the game with a precision and attention to detail somewhat reminiscent of the notes.
Both the Notes and Lovelace's letters also share a sense of structure utterly different Babbage's engaging but rambling prose, as well as a fondness for diagrams. If you copy out part of the Notes then parts of her letters, as Robert Graves used to do to gain insight into a work, the resemblance in tone is unmistakable -- but that is a feature beyond the limits of the tests used for analysis here. For now, the best that can be said is that, based on the coherence rating, Lovelace most likely wrote Notes D and G on her own, and that, Babbage, with the possible exception of Note B, had very little to do with their composition. In other words, given that no third candidate is in sight, the conventional attribution to Lovelace is likely to be correct.comments powered by Disqus
New flaw in an old encryption scheme leaves the experts scrambling to disable SSL 3
Lennart Poettering wants to change the way Linux developers talk to each other.
Enterprise giant frees itself from ink and home PCs (and visa versa).
Mozilla’s product think tank sinks silently into history.
TODO group will focus on open source tools in large-scale environments.
New tool will look like GParted but support a wider range of storage technologies.
New public key pinning feature will help prevent man-in-the-middle attacks.
Carnegie Mellon researchers say 3 million pages could fall down the phishing hole in the next year.
The US government rolls new best-practice rules for protecting SSH.