Topics: , , ,

The George Washington Financial Papers Project: Building Content-Specific Taxonomies and System Specifications

By Senior Editor Jennifer Stertzer & Research Editor Erica Cavanaugh
April 28, 2016

One of the many interesting challenges the George Washington Financial Papers Project (GWFPP) team has faced is how best to make content accessible, or more accurately, intellectually accessible. This is hardly a new challenge, though, as editors have always worked to move beyond mere availability.

Regardless of approach (whether print or digital), documentary editions are created to make documents and content accessible: transcriptions make hard-to-decipher text readable; annotations provide contextualization and aid in understanding; and indexes allow users to search for both explicit text as well as indirect references, concepts, themes, and ideas. Indeed, several of the project’s goals relate directly to this intellectual accessibility: to provide accurate and understandable transcriptions and manuscript images, to supply context for these materials, and to create an opportunity for reader/user engagement.

What makes this challenge particularly interesting for the project has been the opportunity to create accessibility while developing a content management/publication platform. This allows us to experiment with how best to organize and structure the content within the system so that we can build a variety of access points. We will explore the different aspects of this process in the next few blog posts from the GWFPP team, beginning with our work with taxonomies.

Taxonomy is a word we generally associate with the sciences. Organizing information in an orderly, structured way though is hardly unique; as more and more projects put content online, the desire for common, shareable ontologies (a formally established common language within a field of work, or in this case, in the organization of digital content) has become the topic of many a conversation in the digital humanities. The questions we tackled with the financial papers was whether this standardization was possible, and if so, whether it was desirable.

04-28 JES and EC

An image from a GWFPP presentation made in April 2016 at the MEDEA spring conference.

Consequently, we began our investigation with this question: are there established controlled vocabularies that accurately capture and categorize content within very specific collections, such as Washington’s financial papers? Currently, we have not identified an established ontology that focuses on the time period, geographical location, demographics, and etymology of Washington’s financial papers, so it was necessary to create our own.

We first started by examining the content and considered: 1) what information is in the ledgers; 2) how should that information be recorded in the platform; 3) how the information relates to each other; 4) how best to structure and define this information; and 5) how do the answers to these questions influence and enhance browsing and searching capabilities.

The ledgers are full of interesting content: names of people, places, ships, and organizations; occupational names and titles; services being performed and paid for; types of currency; and multiple types of commodities. Drupal (the system our platform is built on) provides two solutions for capturing and organizing this information: taxonomies and content types.

Taxonomies are controlled vocabulary lists, hierarchically structured, that allow for the classification of content. Within the platform, we use taxonomies for occupation/title, services, place type (tavern, city, house, etc.), county, state, and country.

Content types are similar to templates designed specifically for content such as document type, person, place, etc. Within the platform, content types contain a pre-defined collection of fields; for example, a person content type contains fields to capture name(s), alternate spellings, birth/death dates, gender, and IDs. Content types can also call on taxonomy lists, so a person can be associated with related locations, occupations, and services.

Once it was determined what type of information could be found within the ledger, the GWFPP team focused on the terminology used by Washington himself, creating extensive lists of all of the occupational names and titles, services being performed, and commodities. The next step was to illustrate how these taxonomy lists related to the content (account pages, people, and places). This was accomplished through the use of fields, allowing for the terms within the controlled vocabulary lists to be associated with and thus further contextualize the content from the ledgers.

04-28 JES and EC pic 2

By using the appropriate fields, we were also able to increase accessibility and browsing possibilities. Not only do you see that Robert Adam was a merchant, but you can also click on “Merchant” and see every individual who has been identified as one as well.

04-28 JES and EC pic 3

In an effort to further increase the browsing and searching capabilities, the GWFPP team thought it important to create a hierarchy within the various taxonomies, due to the fact that the lists by themselves are large and difficult to navigate.

04-28 JES and EC pic 4

After consulting already established taxonomies and indexes such as the Getty Research Institute’s Art & Architecture Thesaurus (AAT) and the cumulative index in The Papers of George Washington Digital Edition, we were able to create broad categories that our content-based terms could be nested under. The result is a user-friendly taxonomy list that can be easily browsed.

04-28 JES and EC pic 5

Through careful consideration, we were able to determine what information needed to be captured, structured, and defined, and where established ontologies and content-driven taxonomies could be effectively used. The resulting vocabularies and complimentary interfaces provide a variety of search and browse features for users so they can explore, search, and discover information in the edition. While we are constantly working to improve accessibility, our future work will also look at sharing these taxonomies so that other projects may also use them.


All images provided by Jennifer Stertzer and Erica Cavanaugh. Unless otherwise noted, all images are April 2016 screenshots of the GWFPP website.