Skip to main content

Exploring Machine Learning with the Project Aida Team

image of digitized manuscript page
Image courtesy of Yi Liu, Chulwoo Pack, Elizabeth Lorang, and Leen-Kiat Soh. Reference slide 10 in their midterm presentation linked below.

Initiated Summer 2019, Published Spring 2020

In 2019, the LC Labs team embarked on a series of experiments, events, and engagements with external partners and Library staff to learn more about how machine learning and artificial intelligence processes might connect with Library of Congress collections, understand what information could be created, and identify directions or indicators of how machine learning could be applied to collections broadly.

In Summer and Fall 2019, the LC Labs team partnered with the University of Lincoln-Nebraska (UNL) Project AIDA team for a 16 week collaboration that included on-site research, biweekly meeting calls, and presentations across 3 phases of iterative research. The collaboration with UNL was designed to provide evidence and information to the Library of Congress, as well as its professional community; to assist in decision-making around when and how to apply ML techniques to library and archival collections and processes.

The overarching recommendations are to:

  1. Develop a statement of values or principles that will guide how the Library of Congress pursues the use, application, and development of machine learning for cultural heritage.
  2. Create and scope a machine learning roadmap for the Library that looks both internally to the Library of Congress and its needs and goals and externally to the larger cultural heritage and other research communities.
  3. Focus efforts on developing ground truth sets and benchmarking data and making these easily available.

To support ongoing explorations and investigations, the Project Aida team further recommends that the Library:

  1. Join the Library of Congress’s emergent efforts in machine learning with its existing expertise and leadership in crowdsourcing. Combine these areas as “informed crowdsourcing” as appropriate.
  2. Sponsor challenges for teams to create additional metadata for digital collections in the Library of Congress. As part of these challenges, require teams to engage across a range of social and technical questions and problem areas.
  3. Continue to create and support opportunities for researchers to partner in substantive ways with the Library of Congress on machine learning explorations.

Read more about the research from these blog posts, presentations, the final recommendations, project code and documentation:

 Back to top