Distant Reading: Characteristics & Overview

Instructor: Benjamin Gaines

Benjamin has his master's degree in literature and has taught writing in and out of academia.

Distant Reading is an unusual and controversial alternative way of analyzing literature created by literary scholar Franco Moretti. Read more about how Moretti is challenging tradition by viewing literature as data.

What is Distant Reading?

When you read a novel or poem, how do you analyze it? What kind of information do you look for, and how do you decide what it means? What can a British novel written in 1855 tell you about the society and people's lives back then? What can it tell you about what's being written today?

For most students and teachers, understanding literature involves close reading. Close reading is done by carefully reading and reflecting on a piece of literature, be it a poem, novel, or essay. You might pay special attention to characterization, to the pace of the plot, or to the symbolism and imagery found throughout a work. This has traditionally led to many interesting and complex theories about literature, but what if there is another way?

That's just the question that Stanford professor Franco Moretti is seeking answers to. Moretti has pioneered a new practice called distant reading, which is the opposite of close reading. Instead of carefully reading and analyzing a single work (or a group of works), distant reading takes thousands of pieces of literature and feeds them into a computer for analysis.

Why Be Distant?

Distant reading attempts to uncover the patterns and unspoken rules behind literature from a very technical perspective. Where close reading relies on subjective analysis of what a single piece of literature means, distant reading compiles objective data about many, many works.

Franco Moretti and his wife, Teri Reynolds
Franco Moretti

This idea is born from the fact that there are simply too many books for anyone to read and study seriously. If you're studying Victorian Literature, Moretti says, there are roughly two hundred novels in the canon to read. A literary canon is the list of important works that make up most of the literature that is studied. Comparing and analyzing two hundred books is a very, very large task, but Moretti sees it as still too limited.

You see, the canon represents a very small selection of what is written. There are tens of thousands of other books written during the Victorian era that are rarely (if ever) read any more. How could you possibly read through them all to make sure you had the 'whole picture' of what writing was like then? And what about all the other periods of history, and all the other places where things were written?

The simple answer is that you can't effectively study it all. No person can read and carefully analyze so much information, but a computer can. To this end, Moretti founded the Stanford Literary Lab, a part of Stanford University dedicated to using computers to analyze literature.

How Distant Reading Works

With the help of other academics and data specialists, Moretti has developed a system of using computers to analyze novels as raw data. While computers can't 'read' and understand a novel in the way people can, they are very good at searching for specific information you give them and finding patterns. They can measure sentence length, structure, and lexicon, and they can give a scholar patterns of data to analyze. In this way, distant reading is more of a practice than a literary theory.

For example, Moretti put the titles of 7,000 British novels published between 1740 and 1850 into his computer. He then had the computer count the words of each title, and compare the averages. The computer found that as time went on, the titles grew shorter. In 1740, the average novel had a title of twenty-five words. By 1850, that had shrunk to a much more manageable eight words.

It's natural to wonder 'so what?' when you hear that. On its own, that information is little more than a piece of trivia. However, Moretti took this information and looked into what else changed at the same rate as titles, and he found an answer: the market for books.

Before 1740, book production was a very small market in Britain, with five or ten new books published each year. As time went on, more and more books began being published. At the same time, authors started writing shorter titles.

Why might this be? What is it about having lots of competition that makes a writer need to come up with a shorter title? What is the appeal of a shorter title? What does it say about writers or readers that those titles flourish as more and more books are produced? These are all interesting questions, and they're questions we might never think to ask without distant reading.

This text-as-data approach also allows computer programs to learn how to spot what genre a story belongs to without being told. Interestingly, the programs use very different measures for this than a human reader would do.

