Skip to Main Content

Digital Exhibit Sampler

A testing ground for online display of Digital Archives content, including collection visualizations and presentation templates that can be applied to curated exhibits.

W.E. Peters Athens County map

Created using ArcGIS Online. To gain access to the Libraries' ArcGIS Organization, contact Erin Wilson.

 

Book Beat place mentions Google Map

The Book Beat Test Map below is intended to show every geographic place mentioned in the Don Swaim Collection's Book Beat transcripts. It was generated using Python Natural Language Processing library spaCy. A Python script scanned each of the Book Beat broadcast transcript text files for instances of

  • GPE ("Countries, cities, states")
  • LOC ("Non-GPE locations, mountain ranges, bodies of water")
  • FAC ("Buildings, airports, highways, bridges, etc.")

as identified by the en_core_web_sm pretrained statistical model for English. A second Python script then searched Wikidata via its Wikibase API for the first search result with geographic coordinates (P625) and compiled the results to a CSV which was then uploaded to Google maps.

Each point is shaded to reflect the number of mentions in the Book Beat transcripts (with darker points having more mentions) and are tagged with an

  • Entity: the name of the place as identified by spaCy
  • filenames: the file names of the text transcripts where entity was identified
  • labels: the labels (GPE, LOC, FAC) spaCy used to identifiy the place
  • mentions: the number of mentions of that place
  • description: the Wikidata description of that place
  • wikidata-label: the name of the place in Wikidata
  • wikidata-url: the URL of the Wikidata page for that place
  • latitude and longitude

Because of the very rudimentary search of Wikidata, there are many obvious mistakes on this map. "Georgia" was mentioned 36 times, but it is unlikely that all of them are of the "republic in the Caucasus region," and there is a "Long Island" in the Outer Hebrides that was probably not actually referred to 38 times. A proper reconciliation would significantly enhance the resulting data, and the map could be used as a collection browsing interface with the inclusion of links to the broadcasts themselves in CONTENTdm.

Book Beat places mentioned dashboard

The Book Beat places mentioned map was recreated as an ArcGIS map and dashboard in August 2020. Book Beat transcript text was downloaded from CONTENTdm then run though two spaCy models (en_core_web_md and xx_ent_wiki_sm) to reduce numerous false identifications. The ~1,000 unique places identified were reconciled against Wikidata using OpenRefine to get name, coordinate, and type data. Place and transcript mention data were then joined in ArcGIS Online to create a map and associated dashboard.

ArcGIS has much richer data analysis and display features than Google Maps, but somehow doesn't seem to have way to filter location points based on more than one value of Wikidata's "instance of" types.

A similar workflow has been described in detail by Charlie Harper and R. Benjamin Gorhamin in their article "From Text to Map: Combing Named Entity Recognition and Geographic Information Systems" in issue 49 of the Code4Lib Journal.