Despite its growing importance in the digital world, data has always existed. All archival material is a piece of data; Even if it is written by hand. But what about data that describes archival material? This is called metadata, or the data behind the data. It’s incredibly important since it becomes the voice of that original data. Moreover, in the digital world, the data we tend to see most is metadata. For example, when we look at a painting or photograph, the physical artwork becomes the data while all the information presented detailing the title, genre, year, and artist (to list a few examples) is the metadata. If a person wants to learn about a specific item, how it is described is often more important than the work itself. Now what if you were to use the metadata to try and visualize a collection? What would this tell us about the original data itself? This is what I set out to understand. 

For my Digital Humanities class at Carleton University, I was tasked with creating a digital project. I wanted to incorporate the work being done at the Canadian Museum of Nature in their Library and Archives. Since 2019, I have been scanning and importing Arctic related images in the CMN archive. So I exported the metadata from the Digital Asset Management Software, Portfolio to create some knowledge graphs.  

The original objective was to gain some experience with knowledge graphs. I had never used them before but after my supervisor, Shawn Graham, had explained what they were I decided to have a crack at them. This visual representation of the data would help answer two questions: What have I digitized? And who have I digitized? However, as my own data was relatively small, I decided to add more datasets. I focused on the three largest contributors (catalogers) of images to Portfolio: myself, Christina Kum (current Digital Asset Manager) and Susan Goods (retired Digital Asset Manager).  

For the first graph, I decided to isolate data on individual species (represented by their common English names) and link them based on their taxonomic relationships. What I created was a map of all the species that the three of us had catalogued in Portfolio. For the sake of time and the amount of available data, I limited my results to only include species from image creators (i.e photographers) who have uploaded at least 10 images. What resulted was a detailed graph indicating what species have been digitized. This graph is colour-coded based on their Kingdoms; except for animals which is done by class. 

From a single circle, branches of multicoloured data points representing animal, plant, and mineral taxonomy groups spread out in a web.
A map of the data extracted from Portfolio representing different taxonomy groups. Image: Callum McDermott © Canadian Museum of Nature 

The second graph focuses on the catalogers, the people who uploaded the data itself. This time Christina, Susan, and I are represented as three large nodes in different colours linked to the image creators who are also linked to the species (common English name) they have been attributed to. In this case, every image creator needed to have been attributed to at least 10 images to appear is in this graph. This tells us which cataloger (Christina, Susan or Callum) digitized which creator, and what has that individual taken photos of. This helps us know who might have expert knowledge about a particular fond or gallery.

Multiple branches of multicoloured data points spread out in a web from three names: Christina, Susan and Callum. Each name then branches out to names of different species.
A map of the data extracted from Portfolio representing the three main people who have ingested images into the database and what types of species are in their images. Image: Callum McDermott © Canadian Museum of Nature   

The third graph focuses on the image creators in Portfolio. In this final graph, they are uniquely coloured with arms branching out to represent the unlabeled species that are in the images they created. This was made to help CMN know the volume of images contributed by their staff. 

Multiple branches of multicoloured data points spread out in a web from a central name.
A map of the data extracted from Portfolio representing the different individuals whose images are in Portfolio. Image: Callum McDermott © Canadian Museum of Nature   

I went into this project wanting to answer questions about what has been digitized, but in the process, these three graphs taught me three additional things I did not anticipate:   

First, by using the graph relating to common name of species, I can see that most digitization efforts have been centred on the Arctic. This did not come as much of a surprise as I was specifically hired to digitize Arctic material, and with the growing dangers of climate change more attention is being placed on the region every year.  

Second, it tells us about the evolution of the collection. We can use the graphs to explain who is in the database, and to help answer some deeper questions about the institution, like, who has worked on what areas and which collections are actively being put into Portfolio.  

Finally, the graphs help us with questions about the creators themselves.  How are images being prioritized for digitization? The most important images will include those that align with the Museum’s strategic priorities and/or are at the highest risk for loss of information. As many staff members have been retiring, we need to ensure we have as much material and information from an individual before they leave. With this identification of the gaps in our collection, we can encourage employees to upload their images on a regular basis. 

All three graphs demonstrate just one of the many ways that metadata can be used to help highlight and explore the original data itself. By presenting metadata in this visual manner, we can more easily see patterns emerge. Metadata is important; Not only is it the voice of the data, describing it and giving it agency, it can also be the face of the data, presenting it in a visual manner, showing new patterns and connections that would remain hidden in its original format.