Skip to Main Content

HathiTrust Research Center

How to use this guide

This resource was designed with absolute beginners in mind. For those who are entirely new to text analysis, HTRC provides introductory tools and access to one of the world's largest online libraries. While you can refer to HTRC's official documentation for more information, this abbreviated guide will walk you through the following processes:

About the HathiTrust Research Center

HathiTrust Research Center (HTRC) is an organization that makes the millions of scanned books in the HathiTrust Digital Library available for researchers to analyze with computer programs and machine learning algorithms to reveal insights about literature at a hitherto unprecedented scale. Using tools provided by HTRC, you can:

  1. Create a Workset of just one or up to several thousand books scanned and deposited in the HathiTrust Digital Library by its membership of large North American academic libraries, including books still in copyright.
  2. Apply one or more of a handful of standard text mining Algorithms with the push of a few buttons.
  3. Export the resulting data for exploration in the tool of your choice.

HTRC also provides tools that offer researchers with coding skills access to pre-calculated data from every page in every volume in HathiTrust via the Extracted Features Dataset, and the chance to run their own code on the full text of HathiTrust works via Data Capsules.

HTRC is made possible by its policy of non-consumptive access, which makes it legal for HTRC to allow researchers to export facts about HathiTrust books (like counts of words, beginning and ending letters of lines, identified parts of speech etc.) but not the full texts of in-copyright works.

HTRC was founded in 2011 and is hosted by Indiana University and the University of Illinois. Ohio University Libraries has been a member of HathiTrust since 2020. You can login to HTRC with your Ohio credentials.