Video: Matthew Wilkens on using big data for literary studies

Author: Todd Boruff


“A lot of the questions are, in some sense, the same ones that literary scholars have long asked, but we do it with respect not just to one or two or half a dozen books at a time, but to hundreds or thousands — or, these days, even potentially millions — of books.” 

— Matthew Wilkens

Matthew Wilkens is associate professor of English at the University of Notre Dame. His research interests include contemporary American fiction, digital humanities, and computational literary studies. More information can be found at Matthew Wilkens' faculty page.

Video Transcript

I work on American fiction, primarily after the second World War, but I do it in a way that’s a little unconventional. It's primarily computational and data-driven; you might think of it as something like big data for literary studies.

A lot of the questions are in some sense the same ones that literary scholars have long asked, but we do it with respect not just to one or two or half a dozen books at a time, but to hundreds or thousands or these days even potentially millions of books. How you work with that much text becomes an interesting problem for computer scientists and I'm not sure you ever want to be an interesting problem for computer scientists. It means you have something very difficult indeed, but that does happen and so I work with the Center for Research Computing here at Notre Dame. I also work with the Center for Digital Scholarship in the Libraries here so there are resources around to help scale up that work from what works very smoothly on your laptop to what needs a cluster or supercomputer.

We examined 20th-century American fiction, so from about 1900 up to the present, looking for genres. What we found is that we were indeed able to identify some of the standard genres, things that wouldn't surprise you. It’s not very hard to find detective fiction: there are detectives and guns and somewhat surprisingly offices and telephones and that sort of thing really stand out in detective fiction.

What we didn’t really expect to find and what really stood out in that research was a cluster of pretty prominent sort of near-canonical books by white male authors published between the 60s and the 80s, that is to say, it is as heavily marked by the kinds of conventions, the kinds of narrowness of scope that we find in West Coast detective fiction of the seventies, it really is that specific. You see some of the great writers of suburban angst in the early part of the period, the Cheevers and Updikes or Richard Yates. You also see some authors who hang around sort of right at the margin of the main literary canon, so Stephen King is in there, or James A Michener.

That really was a shock, and it was a shock I think because our professional understanding of literary fiction as opposed to detective fiction or Westerns is that it’s a kind of anti-genre. What these books have in common if anything is that they don't look like anything else, but that just isn't true. And that I think should produce some real changes in the way that we teach literature to our students and in the way we think about it in the profession.