Tooling Up for Digital Humanities

  • Development Blog
  • Documentation
  • Plugins
  • Suggest Ideas
  • Support Forum
  • Themes
  • WordPress Planet
  • Home
  • Workshop Series
  • About
  • Virtual You
    • 1: Virtual You
    • 2: Keeping a Finger on the Pulse
    • 3: Building Community
    • 4: Further Reading
    • 5: Discussion
  • Digitization
    • 1: Making Documents Digital
    • 2: Metadata and Text Markup
    • 3: Further Reading
    • 4: Discussion
  • Text Analysis
    • 1: The Text Deluge
    • 2: A Brief History
    • 3: Stylometry
    • 4: Content-Based Analysis
    • 5: Metadata Analysis
    • 6: Conclusion
    • 7: Further Reading
    • 8: Discussion
  • Spatial Analysis
    • 1: The Spatial Turn
    • 2: Spatial History Lab
    • 3: Geographic Information Systems
    • 4: Further Reading
    • 5: Discussion
  • Databases
    • 1: The Basics
    • 2: Managing Your Bibliography
    • 3: Cloud Computing
    • 4: Organizing Images
    • 5: Further Reading
    • 6: Discussion
  • Pedagogy
    • 1: In the Classroom
    • 2: Student Collaboration
    • 3: Debating Pedagogical Efficacy
    • 4: Further Reading
    • 5: Discussion
  • Data Visualization
    • 1: Introduction
    • 2: Getting Started
    • 3: For Analysis and Understanding
    • 4: For Communication and Storytelling
    • 5: Visualizations and Accountability
    • 6: Recommended Reading/Viewing
    • 7: Discussion
  • Discussion

3: Stylometry

Literary studies has arguably been the most active branch of the humanities in computational text analysis, most notably in the use of stylometry, a form of inquiry that combines literary theory with linguistics in order to examine the underlying structure of texts. For a colorful example of stylometry, see Radiolab’s story on how the aging process impacted Agatha Christie’s writing style.

Authorship Attribution

One pioneering work of authorship attribution was a 1964 study by statisticians Frederick Mosteller and David Wallace, who examined the Federalist Papers. Mosteller and Wallace did an analysis of the twelve papers of questionable authorship, comparing the use of common “function words” (but, and, of, etc.) used in those essays with known writings of John Jay, Alexander Hamilton, and James Madison. Their conclusion (based on evidence such as a higher usage of the word “upon”) was that the twelve unattributed papers were most likely authored by Madison, supporting suspicions historians already held.

Macro Patterns

Literary historians have often classified novels by genre (Gothic, Bildungsroman, etc.) based on content and themes. Many are now considering whether books can be grouped by their stylistic attributes. Are Gothic novels structurally different from satire in a measurable way? What about authors that cross genres – do they tend to write differently or does their personal literary “fingerprint” remain the same? Can computational analysis be used to discover new genres? Meanwhile, stylometry might also be able to measure broader differences between groups of authors. Do American novelists tend to write substantially differently than Irish or Scottish novelists? Can a computer program differentiate between a male and female author? How does structural syntax change over time?
2: A Brief History 4: Content-Based Analysis

Navigation

  • Welcome
  • Workshop Series
  • About
  • Virtual You
  • Digitization
  • Text Analysis
    • 1: The Text Deluge
    • 2: A Brief History
    • 3: Stylometry
    • 4: Content-Based Analysis
    • 5: Metadata Analysis
    • 6: Conclusion
    • 7: Further Reading
    • 8: Discussion
  • Spatial Analysis
  • Databases
  • Pedagogy
  • Data Visualization
  • Discussion
Powered by WordPress | “Blend” from Spectacu.la WP Themes Club