3: Stylometry
Literary studies has arguably been the most active branch of the humanities in computational text analysis, most notably in the use of stylometry, a form of inquiry that combines literary theory with linguistics in order to examine the underlying structure of texts. For a colorful example of stylometry, see Radiolab’s story on how the aging process impacted Agatha Christie’s writing style.
Authorship Attribution
One pioneering work of authorship attribution was a 1964 study by statisticians Frederick Mosteller and David Wallace, who examined the Federalist Papers. Mosteller and Wallace did an analysis of the twelve papers of questionable authorship, comparing the use of common “function words” (but, and, of, etc.) used in those essays with known writings of John Jay, Alexander Hamilton, and James Madison. Their conclusion (based on evidence such as a higher usage of the word “upon”) was that the twelve unattributed papers were most likely authored by Madison, supporting suspicions historians already held.
Macro Patterns
Literary historians have often classified novels by genre (Gothic, Bildungsroman, etc.) based on content and themes. Many are now considering whether books can be grouped by their stylistic attributes. Are Gothic novels structurally different from satire in a measurable way? What about authors that cross genres – do they tend to write differently or does their personal literary “fingerprint” remain the same? Can computational analysis be used to discover new genres? Meanwhile, stylometry might also be able to measure broader differences between groups of authors. Do American novelists tend to write substantially differently than Irish or Scottish novelists? Can a computer program differentiate between a male and female author? How does structural syntax change over time?