Skip to Main Content

HathiTrust Research Center

HTRC Advanced Features

HTRC advanced features offer flexibility and scale but require more technical knowledge on the part of the researcher.

  • The Extracted Feature Dataset offers prepackaged structural and part-of-speech data for every volume in HathiTrust, but requires automated downloading using rsync, decompression using bzip2, and bulk JSON wrangling.
  • Data Capsules offer a flexible environment for computational access to full OCR transcripts, but requires that researchers develop their own code and deploy it to an Ubuntu virtual machine while adhering to HTRC's non-consumptive use policy.