Updated Tue Mar 7 12:54:00 EST 2023

HUM 307: Literature as Data (Spring 2023)

KN95 masks are required in the classroom and in office hours. Always bring a mask. Thanks!

Dramatis personae:

Meredith Martin (English, Center for Digital Humanities), Brian Kernighan (Computer Science)


Weekly assignments and other material

Feb 1:     MM assignment 1         Studio 1 in-class         BK assignment 1         required readings for Feb 8         observations on Studio1

Feb 8:     Week 2 readings     Studio 2 (basic Unix commands, week 2)         BK assignment 2 (due Feb 15)

Feb 15:     Studio 3 (exploratory data analysis, week 3)         BK assignment 3 (due Feb 22)

Feb 22:     Studio 4 (starting Python, week 4)         BK assignment 4 (due Mar 1)

Mar 1:     Awk to Python         no studio or assignment

Mar 8:     Studio 5 (Pandas, week 6)         BK assignment 5 (due Mar 22)

Quick links and old stuff:

Command-line tips and tricks
Loose ends from studio 2
Google Drive for the class
Google Drive class folder
Slack channel
recommended readings for Feb 1
Overview, guidelines, administration
Weekly schedule
Studio 0

Source Materials

Some files can't be put on the public Internet because of copyright issues. You can find those on the Google Drive for the class.

barrett.zip, a Zip file of Elizabeth Barrett Browning's Sonnets from the Portuguese, which we will use in the first class or two.

shakespeare.zip, a Zip file of Shakespeare's sonnets for the first take-home programming exercise.

Basic Unix commands, a quick summary.

Ken Church's Unix for Poets, a useful take on Unix commands for non-experts.

Overview, Guidelines, Administration

This seminar introduces students to basic concepts of working with literary texts and working with data. Crossing the divisional boundaries of literary analysis and quantitative and computational reasoning, we'll learn how to develop a compelling research question, to explore a few of the many methodologies for using computation to analyze literature, and to put our work in context of the long history of literature conceived of as data. We'll think broadly about the role of humanities in data science, and learn the importance of interpretation, exploration, iteration, creativity, analysis, and critique in both literary and quantitative work.

Weekly readings, reflections, and in-class studio work will introduce the key concepts, methods, and histories of digital humanities focused particularly on literature and data. Students will explore these methods and concepts through short code assignments, reflection work, exercises in data curation and critique, and final projects. Course meetings will begin with short lectures and discussion, and the second half of class will be studio-based, with visits from practitioners and researchers across and beyond campus.

In this class, you will:

Attendance and Participation

The first half of every class will require active listening to lectures and participation in discussion, while the second half of every class will require active engagement and participation in the studio assignments. It is your responsibility to arrive at the seminar on time, to complete the readings and exercises, to complete your reading reflections and discussion questions, and to be prepared to engage in class discussions and activities. Because this course meets only 12 times, we cannot accommodate more than one unexcused absence before it will affect your grade, with the exception of medical or personal emergencies. We are still in a global pandemic, and if you find yourself struggling please contact one of us asap so that we can work out an accommodation.

Covid Protocol

If you test positive for COVID, stay home. The sooner you let us know, the sooner we can work out some kind of Zoom alternative.

KN95 masks are required. Always bring a mask and wear it properly. We know that this is really a pain in the nose most of the time, but your cooperation will help to keep all of us (and our friends and families) safer.

Distractions & Learning on Laptops

Because we will often be on our screens in this course, we must create an atmosphere that minimizes distraction. During our class sessions -- whether in discussion or in group or solo work -- close all windows except those with our course materials. If you are using your computer to check social media, message with friends, deal with email, and the like (except during the break) you will be marked as absent for that class session.

Academic Integrity

You are required to adhere to the Princeton University Honor Code. Always err on the side of citation.

Assignments and Assessment

In-Class Participation:                       20%
Weekly Reading Assignments and Short Papers:  20%
Programming Exercises:                        30%
Oral Presentation & Final Project:            30%

In-Class Participation

As discussed above, for the discussion parts of class, be prepared to contribute in positive ways to our classroom culture. During lectures, guest visits, and studio sessions we expect active engagement as well as a willingness to try again. Learning how to fail productively is part of both literary analysis and quantitative reasoning. Admit when you don't know something and ask questions so that you can better judge when you need to adjust and change course. Conversely, help your classmates when you can; people learn things at different speeds and in different ways, and hearing how something works from multiple sources can be good for everyone.

Classroom Culture

Here are a few guidelines for class participation:

Reading Assignments, Weekly Reflections, and Short Papers

This class is an experiment and you are our collaborators. We rely on your observations to know how the course is going so that we can adjust week by week.

Every week (after every class and before the next class) we'll require you to complete a short reflection on the prior class: what was mysterious or frustrating or engaging, what you would like to know more about. These are written reflections, so pay attention to sentence structure and punctuation, but they are informal, around 300 words (and no more than 500). You may choose to reflect on recommended or required readings if we discuss them in class, or even if we don't, but no pressure.

You'll have five main reading and writing assignments over the 12 weeks of the course:

  1. A sonnet exercise and 1-2 page reflection due Feb 8.
  2. A 2-3 page close-reading of a data set or “data biography” due March 22.
  3. A 2-3 page summary and analysis of a critical article related to data-driven work in the humanities due on a date of your choosing but you must choose your due-date before class Feb. 8.
  4. A close-reading of a data visualization due April 12.
  5. A reflection paper of 3-4 pages to accompany the creation of your project for Princeton Research Day due Dean's Date.

Reflection papers do not have arguments; they're explanatory papers that document your experience completing the exercise or giving a reflective critique of an object (e.g., “this dataset was easy to access and I can see how it related to the graph on the website because the article clearly explained it,” etc.). We'll give more information on these later on.

Programming Exercises

You must complete all coding exercises, which are intended to reinforce the class studio sessions. You'll complete these exercises on your own. For the last four weeks of class we'll help you apply what you've learned to an independent project.

Oral Presentation / Final Project

The final project is intended to be suitable for submission to Princeton Research Day as a poster. The final two weeks of class will be devoted to presenting your poster (solo or collaborative) in anticipation of submission to Princeton Research Day to the CDH staff and integrating their feedback. You'll also write a 3-4 page reflection paper on the process and findings of your work.

Weekly Schedule at the Bottom of the Page

Week One (Feb 1): What is Humanities Data? Or “Let me count the ways”

Introductions. Who are we and what do we bring to this class? What is the history of the attempt to turn humanities sources into data? In what ways is tabulating or counting words in literature nothing new? In what ways is it (now) entirely new? What is the data in literature? What does a 19th-century sonnet sequence have to teach us about the math of poetry?

Week Two (Feb 8): What is Computational Humanities?

Historical overview of fields Computational Humanities, Digital Humanities and questions about methodology. Introduction to a few examples of exploring literature with data. Using sonnet structures to learn about close reading. Close & distant reading as practices. How does what we choose to count, and the way that we count things teach us about literature? Do we need a computer for this?

Week Three (Feb 15): The Shapes of Humanities Data

What is a category? What is a literary genre? What is metadata and what do I need to know about it in order to translate humanities sources into data?

Week Four (Feb 22): Humanities Data are Messy. Data Formats are Interpretive

What do we learn when we make decisions about a format? What are the formats for data work? How do our choice of formats for data shape what questions we can ask? How does the representation of information work in literary formats and standard data formats? What does all of this have to do with how we find things now and later? Data types and file formats.

Week Five (March 1): Kinds of Literary Data and Where to Get It Part 1: Existing Datasets

Do we start with a question or do we start with the data? Methods vs. tools vs. questions in reading literature as data. What does “regular” data look like in the humanities? What kinds of data do we use as scholars and readers of literature?

Week Six (March 8): Kinds of Literary Data and Where to get it Part 2: Transformations and Tags

A tour of CDH data. Visit from CDH staff. What is lost when we translate historical sources into tractable data? Spring Break: pick your dataset.

Week Seven (March 22): More Data, More Questions, Metadata, More Questions

Let’s look at the data you’ve decided to work with. This class will blend studio and discussion as we decide what and how your data might lead to a good research question. Students may elect to work together on a particular dataset or to work solo, but this class will be devoted to creating a work-plan for the rest of the semester, depending on what students decide about their literary data, and we’ll leave class with a research question and a plan of action. Data Biography due.


The final four weeks of class will be responsive to the questions students worked on in week seven; we’ll follow the class needs from here on out, so the following weeks are a rough sketch.

Week Eight (March 29): Exploring with Literary Data / Text Analysis

What is Natural Language Processing and what does it have to do with literature?

Programming Assignment 5.

Week Nine (April 5): Storytelling with Literary Data

Introduction to basic visualization. What stories do we want to tell with our dataset and what is the best way to tell them?

Week Ten (April 12): Ethics of Cultural Data

Who owns cultural data? Accountability and ethics in data work

Week Eleven (April 19): Poster & research critique with CDH staff & invited faculty

Week Twelve (April 26): Finalizing poster, in-class work.