iDigBio 2014: Geolocation, Digitization Modules and Crowdsourcing

Day three of the 2014 iDigBio Conference was held at Bishop Museum, and while many of us were ready for more information about digitization in Pacific collections, we were also exhausted from the overflow of new incoming knowledge. It was really nice to be able to hold the conference at Atherton Hale at Bishop in the open air! I learned some pretty interesting things during the Digitization modules, tasks and workflows session presented by Gil Nelson such as creating digitization workflows and modules for training.

Source: Panda Security, 2014

Geolocation

Another interesting topic which came up was geolocation. As a library and information science student, we do not typically learn about data geolocation. However, as a library student who is vastly interested in archives and special collections, I found this information extremely useful. So, for those who aren’t sure what geolocation is, let me explain. Usually, when people hear the term geolocation they think of finding something or someone on a map. This can be applied to almost all the apps you use on your smart device since most of them basically track your every move or allow you to “check in” to local businesses. However, in the world of online taxonomic/anthropological collections geolocation is used to pin-point a specific place on a map where an artifact, remains, plant or animal species was found at the time of collection, or from where it originates. This is an extremely useful tool for generating maps in order to analyze patterns of animal/plant specimen based on location and climate. During this discussion there were some issues raised such as not enough online collections having a geo-locate feature embedded and that many collections will enter in a GIS number, but will not also enter in the actual name of the location. It was decided that adding the name of the location is important for accuracy as a GIS number may be entered incorrectly, or the name of the originating location may change over time. Therefore, for accuracy and legacy-tracking, not to mention ease of understanding (so you don’t have to look up the GIS number), adding the map number as well as location name is urged. To learn more about this topic, read Deborah Paul’s presentation, Georeferencing the past,present, and future.

Digitization Modules

Task Cluster, by Gil Nelson, 2014

If you are currently working in a collection and have volunteers entering data, scanning, processing and arranging collection items, it is of the utmost importance to have a written protocol. If you do not write down the step-by-step process of tasks for the volunteer or technician, they will not know what to do, and will most likely try to work around the process as best as they can, sometimes creating a new process, or unknowingly creating issues which will need to be correct later down the road. Therefore, prior to bringing in students and volunteers to work in your collection, make sure you have clearly defined workflows which allow them to know what to do on a daily basis as well as what to do when they hit a wall or encounter a problem. Creating this workflow means accounting for each step in the processing/scanning/digitizing process which will demonstrate. You should also consider adding special icons which reflect which tasks will be done over and over again.

Darwin Core

While librarians typically follow RDA or Dublin Core, in taxonomic collections, they follow Darwin Core. Therefore, it is important when creating a database to train your volunteers and students about Darwin Core, the importance and purpose of each field being used and also to add a unique identifier and barcode to each specimen being recorded. Additionally, many collection curators are now generating a QR code which is physically attached to the specimen, which links back to all the data on that particular item. Finally, although collecting this data is a major part of the process, one should also consider being able to export it into a platform which is easily manipulated and shared such as Excel or Access. Typical Darwin Core field identifiers used for taxonomic collections are:

  • Occurrence: Category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.).
  • Material Sample: Category of information pertaining to the physical results of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed.
  • Event: Category of information pertaining to an event (an action that occurs at a place and during a period of time).
  • Location: A spatial region or named place. For Darwin Core, a set of terms describing a place, whether named or not.
  • Geological Context: The category of information pertaining to a location within a geological context, such as stratigraphy.
  • Decimal Latitude: The geographic latitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic center of a Location. Positive values are north of the Equator, negative values are south of it. Legal values lie between -90 and 90, inclusive.
  • Decimal Longitude: The geographic longitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic center of a Location. Positive values are east of the Greenwich Meridian, negative values are west of it. Legal values lie between -180 and 180, inclusive.
  • Identification: The category of information pertaining to taxonomic determinations (the assignment of a scientific name).
  • Taxon: The category of information pertaining to taxonomic names, taxon name usages, or taxon concepts.
  • Resource relationship: The category of information pertaining to relationships between resources (instances of data records, such as Occurrences, Taxa, Locations, Events).
  • Measurement or Fact: The category of information pertaining to measurements, facts, characteristics, or assertions about a resource (instance of data record, such as Occurrence, Taxon, Location, Event).

After some discussion, attendees felt that the following additional identifier fields would be useful if added to Darwin Core:

  • Geolocation tool which allows users to browse the Internet and find the location on a US GIS map or Google Maps.
  • Images: Allowing a user to upload the images to the data set rather than adding a link or just describing it.
  • Barcode/QR: A section which would allow a user to upload an image of the barcode and enter the assigned barcode number, as well as an image of the QR code associated.

Digitization

It was not the first time that I heard Susan Shaner’s talk, Digital Preservation Begins at Creation, however, I still find it not only interesting, but useful. She discusses the topic of born digital items, the process of digitizing print collections, maintenance of those collections and key concepts in digital preservation and curation.  If you are interested in creating a mass digitization project for your organization, please check out her talk!

Crowdsourcing

Workflow example by Nicole Fisher, 2014.

Crowdsourcing has becoming increasingly popular because limited full-time technicians. This has lead to the growing reliance on volunteers and students to enter data for collections and with crowdsourcing, this expands the possibility of turning a 25 year project into a 10 year project and can help institutions make their collections more accessible and searchable online. Through crowdsourcing, users can gain better understanding of the museum collection, while making this data available for exciting new research and understanding. Finally, it is important to note that digitizing collections is not enough, you must make it known and available for searching and use. For example, a simple workflow for a digitization project may look like this (below). The portion where crowdsourcing comes into play is the last step, Transcription and Data Enrichment. At this point, organizations may make their collection available online for users to add meaningful information to the dataset such as transcription of the item label which may include fields such as date, location, classification, description, geographic information, dates and other pertinent label information. As an avid volunteer, I was intrigued by the concept of doing citizen scientist volunteer work by transcribing for online collections. For more information, please review Nicole Fisher’s talk, Crowd-sourcing, Public Participation, and Data Enrichment Using crowd-sourcing tools.

Notes from Nature, 2014

Citizen Science Opportunities

Below are a list of sites which I personally tested that are great for someone interested in developing skills while helping to build a collection. In my opinion, Notes from Nature ranks as number one for user interface and ease of use, particularly for those who are new to transcription. It does not require a login, gives you a variety of collections to transcribe from, and does not require additional downloads. The Smithsonian Digital Volunteer ranks as second, only because it does require the user to go through some basic online training and makes you sign up for a user account. However, this is good for the collection side as it eliminates the amount of avoidable mistakes and allows the curators to contact the contributor in case of major issues.

 Thank you!

Photo Credit: iDigBio

I’m so grateful to have been invited to attend the iDigBio Digitization in the Pacific 2014 Conference! I was able to meet so many inspiring people and learned many useful and informative things. I hope that I can use this knowledge to help others in my future as a librarian.

Other talks presented on this day are as follows: