My concern is that an accumulation of facts about a book or an author makes one think they know what book and author are about. It is but a rake skimming across a pond. You'll find some interesting things but your knowledge of what lies beneath is opaque. [DE]
* * * * * * * * * * * * * * *
Dickens, Austen and Twain, Through a Digital Lens
by STEVE LOHR
ANY list of the leading novelists of the 19th century, writing in English, would almost surely include Charles Dickens, Thomas Hardy, Herman Melville, Nathaniel Hawthorne and Mark Twain.
But they do not appear at the top of a list of the most influential writers of their time. Instead, a recent study has found, Jane Austen, author of “Pride and Prejudice, “ and Sir Walter Scott, the creator of “Ivanhoe,” had the greatest effect on other authors, in terms of writing style and themes.
These two were “the literary equivalent of Homo erectus, or, if you prefer, Adam and Eve,” Matthew L. Jockers wrote in research published last year. He based his conclusion on an analysis of 3,592 works published from 1780 to 1900. It was a lot of digging, and a computer did it.
The study, which involved statistical parsing and aggregation of thousands of novels, made other striking observations. For example, Austen’s works cluster tightly together in style and theme, while those of George Eliot (a k a Mary Ann Evans) range more broadly, and more closely resemble the patterns of male writers. Using similar criteria, Harriet Beecher Stowe was 20 years ahead of her time, said Mr. Jockers, whose research will soon be published in a book, “Macroanalysis: Digital Methods and Literary History” (University of Illinois Press).
These findings are hardly the last word. At this stage, this kind of digital analysis is mostly an intriguing sign that Big Data technology is steadily pushing beyond the Internet industry and scientific research into seemingly foreign fields like the social sciences and the humanities. The new tools of discovery provide a fresh look at culture, much as the microscope gave us a closer look at the subtleties of life and the telescope opened the way to faraway galaxies.
“Traditionally, literary history was done by studying a relative handful of texts,” says Mr. Jockers, an assistant professor of English and a researcher at the Center for Digital Research in the Humanities at the University of Nebraska. “What this technology does is let you see the big picture — the context in which a writer worked — on a scale we’ve never seen before.”
Mr. Jockers, 46, personifies the digital advance in the humanities. He received a Ph.D. in English literature from Southern Illinois University, but was also fascinated by computing and became a self-taught programmer. Before he moved to the University of Nebraska last year, he spent more than a decade at Stanford, where he was a founder of theStanford Literary Lab, which is dedicated to the digital exploration of books.
Today, Mr. Jockers describes the tools of his trade in terms familiar to an Internet software engineer — algorithms that use machine learning and network analysis techniques. His mathematical models are tailored to identify word patterns and thematic elements in written text. The number and strength of links among novels determine influence, much the way Google ranks Web sites.
It is this ability to collect, measure and analyze data for meaningful insights that is the promise of Big Data technology. In the humanities and social sciences, the flood of new data comes from many sources including books scanned into digital form, Web sites, blog posts and social network communications.
Data-centric specialties are growing fast, giving rise to a new vocabulary. In political science, this quantitative analysis is called political methodology. In history, there is cliometrics, which applies econometrics to history. In literature, stylometry is the study of an author’s writing style, and these days it leans heavily on computing and statistical analysis. Culturomics is the umbrella term used to describe rigorous quantitative inquiries in the social sciences and humanities.