So here’s an embarrassing confession:
I’ve had a bit of a complex about my reading pace since middle school; I don’t actually know if I read slowly relative to The Masses–probably not?–but I certainly feel like it takes me longer than it should to finish a book compared to certain peers and/or siblings. I’m self-conscious about reading on public transportation, as if my fellow riders are carefully tracking the rate at which I turn pages and judging me accordingly–as in, “Wow, she’s still on the same page she was on when we left the station? What a dimwit.” Obviously, no one actually gives a shit. Still, it means that length is something I take into consideration when choosing books to read; I’m reluctant to start anything over 500 pages, and I’m ashamed of that reluctance, since it seems super anti-intellectual.
There are way more Important books than can be completed in one’s lifespan, especially if one wants to be not only well-read but also well-versed in other forms of media and, on top of that, one only has a few hours per day for “leisure activities.” So how to prioritize? Spend a month just reading Moby Dick because you feel like you Ought To (whatever that means) or spend a month reading several shorter novels of greater personal interest? If we want to be super cliched about it, ‘brevity is the soul of wit,’ ‘variety is the [very] spice of life,’ etc., right?
Of course there’s that ingrained idea, probably from whenever you started to think of yourself as Gifted and Talented, that lugging around a heavy tome marks you as a Smart Person to the rest of the world. But to conflate a book’s word count with its quality/depth/brilliance is to assume that its length is purely an artistic choice and that the author is basically infallible; what about the cases where the author is paid by word or installment? Or the author is at the point of their career that no one dares advise them to edit? Does the author really need 700 pages to tell the story they’re trying to tell or are they to some extent just reveling in their ability to hold their readers hostage? The eternal question: will reading a Charles Dickens novel actually improve my life at all?
I intend to do a more extensive analysis of this sort in the future, but let’s just start with this.
In 2015, BBC Culture put out a list of the 100 greatest British novels. Assessing the “validity” of that list or any similar lists–definitely not in scope; but if you’re, say, a non-English major looking to build on your foundational knowledge of Literature, these lists are always a good place to look. (And lists are fun!)
The 40 oldest books on that list are available on Project Gutenberg (okay, technically, 41, but Clarissa is broken up into multiple volumes and we couldn’t be bothered to deal with that), and luckily, there’s a charming R package (gutenbergr) that makes pulling down and messing around with Project Gutenberg super easy.
So you want to cross off some classics without devoting too much time1. Where to start?
(It’s maybe worth noting that titles and chapter names are going into the total word counts, average word lengths, etc. so these numbers shouldn’t be taken as the precise, canonical values, but I’m assuming they’re very close?)
- Alice’s Adventures in Wonderland, Lewis Carroll (27k)
- Heart of Darkness, Joseph Conrad (39k)
- The Wind in the Willows, Kenneth Grahame (60k)
- A Room with a View, E.M. Forster (67k)
- Frankenstein, Mary Shelley (75k)
- The Good Soldier, Ford Madox Ford (76k)
Longest (aka what the fuck, Charles Dickens):
- David Copperfield, Charles Dickens (359k)
- Dombey and Son, Charles Dickens (359k)
- Bleak House, Charles Dickens (358k)
- The Way We Live Now, Anthony Trollope (355k)
- History of Tom Jones, a Foundling, Henry Fielding (353k)
Note that the six shortest books in total have fewer words than any one of the five longest.
But okay, sheer length isn’t the only thing that goes into readability. Look at Robinson Crusoe–relatively few words (~121,500), but some excruciatingly long sentences.
Average sentence length (in words):
- Sons and Lovers, D.H. Lawrence (10)
- A Room with a View, E.M. Forster (11)
- Howards End, E.M. Forster (11)
- Women in Love, D.H. Lawrence (11)
- The Old Wives’ Tale, Arnold Bennett (12)
- Robinson Crusoe, Daniel Defoe (51)
- The Life and Opinions of Tristram Shandy, Gentleman, Laurence Sterne (46)
- Moll Flanders, Daniel Defoe (42)
- Gulliver’s Travels, Jonathan Swift (37)
- The History of Tom Jones, a Foundling, Henry Fielding (31)
And to cover all our bases, what about the use of big words?
Average word length (in letters)
- Moll Flanders, Daniel Defoe (3.92)
- Robinson Crusoe, Daniel Defoe (3.96)
- Alice’s Adventures in Wonderland, Lewis Carroll (4.06)
- Great Expectations, Charles Dickens (4.10)
- Sons and Lovers, D.H. Lawrence (4.13)
- The Old Wives’ Tale, Arnold Bennett (4.47)
- Nostromo, Joseph Conrad (4.44)
- Frankenstein, Mary Shelley (4.422)
- Pride and Prejudice, Jane Austen (4.40)
- Sense and Sensibility, Jane Austen (4.39)
How rich is the vocabulary used? We judge this by finding the number of distinct words in a random sample of 1000 words3 from each text (i.e. the type token ratio).
- Moll Flanders, Daniel Defoe
- Robinson Crusoe, Daniel Defoe
- Sense and Sensibility, Jane Austen
- Pride and Prejudice, Jane Austen
- The Way We Live Now, Anthony Trollope
- Nostromo, Joseph Conrad
- Villette, Charlotte Brontë
- A Room with a View, E.M. Forster
- Gulliver’s Travels, Jonathan Swift
- Tess of the d’Urbervilles, Thomas Hardy
Conclusion so far? E.M. Forster is probably the ideal mix of cachet and accessibility for someone trying to brush up on pre-1920s British Literature, which we probably could have already told you.
1.Not accounting for the case where you, say, read Howards End…and then Maurice…and then A Room with a View…and then before you know it, you have a quasi-religious compulsion to complete E.M. Forster’s entire literary output, personal correspondence, biographies, etc. (And probably, yes, in that time you could have completed a Dickens or two.)^
2. Although one wonders how this would change if one removed “frankenstein” from the words used to calculate the mean.^
3. Why sample 1000 words? If we use the whole text, the TTR will be highly correlated with the word count—as the text gets longer, the likelihood of a word popping up for the first time decreases, unless the text in question is like actually a dictionary. Possibly it would be better methodology to use 1000 contiguous words from the text, but using the bag-of-words model was easier. ^