Skip to Main Content

HathiTrust Research Center

What are HTRC worksets?

HTRC Worksets are lists of books from the HathiTrust Digital Library. They are how you let HTRC tools know what books to analyze. They can be public, or private to your HTRC account. You can create them by importing a Collection from the HathiTrust Digital Library or from a file of HathiTrust identifiers. You can include any work in HathiTrust in a Workset, no matter its copyright status. Examples include:

Accessing & creating worksets

This page will guide you through two ways of generating worksets:

  • Create and import a HathiTrust collection
  • Browse public collections and worksets

For additional workset creation methods, see the following resources:

  • Upload a list of HathiTrust volume IDs - Ideal for users who want to adapt existing public worksets and collections to create their own. HTRC has a video and step-by-step instructions on their Workset Tutorials page.
  • HTRC Workset Builder 2.0 - This tool is currently in development. It is ideal for building very large worksets using complex and specific queries to locate volumes by page text or metadata. HTRC has a video introduction to Workset Builder, as well as Wiki page with detailed information.

Import a collection

Create a workset from scratch by first building a collection in the HathiTrust Digital Library:

  1. Login to HathiTrust to create a public collection. See Ohio University's login instructions for details.
  2. Once logged in, use the search bar or Advanced Full-text Search to find volumes.
  3. A checkbox will appear to the left of each item in the search results, as well as an option to "Select all on page" at the top of the list. Selected items can be added to an existing or new collection by clicking the "Add" button. (Note: The check boxes will only appear when conducting a full-text search, rather than a catalog search.)
  4. When creating a new collection, you must provide a name and optional description. You can also determine whether it will be public or private. Collections must be public to generate worksets and run HTRC algorithms. The number of works in a collection will impact HTRC processing times and dataset sizes. Smaller collections may produce more manageable results. Additionally, collections exceeding 1,000 items will not be searchable until indexed, usually within 1-5 days.
  5. You can view your collections in the Collections tab by filtering to view My Collections or searching all public collections. Clicking through to a collection page will display the list of works and assorted tools, including a hathitrust.org link required for sharing and generating worksets.
  6. Copy the collection link and login to HTRC. Select the Worksets tab and Create a Workset.
  7. Import a collection from HathiTrust by pasting the URL to retrieve its details and contents.

 

Import a HathiTrust Collection video tutorial produced by HTRC

Browse public worksets

HathiTrust Digital Library has thousands of public, user-created collections that you can browse and import using the above steps.

You can also browse public worksets within HTRC:

  1. Login to the HathiTrust Research Center.
  2. Go to the Worksets tab and filter the search to ensure you are viewing All Worksets instead of just your own.

Screenshot of HathiTrust Research Center Workset tab showing search options

  1. Browse or search by keyword to make a selection.
  2. Click the workset name to view details, including a list of works that it contains.
  3. You can initiate an algorithm directly from the workset's page by selecting from the Analyze with Algorithm dropdown menu.

Validating worksets

Because the HathiTrust Digital Library is a dynamic collection to which volumes are added and occasionally removed, HTRC offers a workset validation tool. This is used to confirm whether or not all the works are currently available for analysis. It can be run on any public workset or those private to the user.

The validation tool also provides a convenient way to download the workset's metadata as a CSV file, including volume titles, authors, languages, publication dates, and HathiTrust IDs.

Screenshot of HTRC workset validation report with red box indicating download button

 

Workset validation report (top) and sample from downloaded metadata (bottom)

 

id title year language authors
wu.89098885551 Anarchy! : an anthology of Emma Goldman's Mother earth... 2001 eng Glassgold, Peter 1939-
wu.89098853716 Buonasera, what won him : also the stories... 1901 eng Gibbud, H. B. (Henry B.) 1857-1901; Gibbud, H. B., (Ellen) Mrs
wu.89098850209 The Ladies' wreath : a selection from the female poetic writers of England and America... 1839 eng Hale, Sarah Josepha Buell 1788-1879
wu.89100339860 Greenwood leaves : a collection of sketches and letters... 1849 eng Greenwood, Grace 1823-1904