Experimenting with artificial intelligence and machine learning at the Library of Congress.
Since 2018, the Library of Congress has been researching and experimenting with artificial intelligence and machine learning, focusing on ethical uses of these technologies, and addressing the challenges of their adoption in libraries and cultural memory organizations.
Below are resources from expert consultations, explorations with Library staff and users, and experiments demonstrating how automated technology can enhance collections, operations, and services.
The Library of Congress has been engaging in international, federal, and sector-wide community conversations about AI for years, including participation in AI conferences as part of the International Federation of Library Associations (IFLA), and as a member of the Secretariat of the international AI4LAMs community alongside other National Libraries and major research institutions.
Within the federal community, Library staff have joined federal communities of practice, including the Equitable Data Community of Practice, the Congressional AI Advisory Group, and the AI Community of Practice hosted by GSA, including a subgroup chaired by a Library staff member dedicated to developing requirements for vendors to adhere to when using Natural Language Processing (NLP).
Active Artifical Intelligence Use Cases
Use cases for artificial intelligence at varying stages include:
- Creating machine-readable text from digitized documents using Optical Character Recognition (OCR) to support search and discovery of collections and content online.
- Creating standardized catalog records from eBooks and other digital material- testing different machine learning (ML) models to generate data for bibliographic records, measuring the quality of outcomes, and understanding the use of ML in the cataloging processing.
- Extracting key data from historic, handwritten and typed Copyright records forms - experiment to train multiple ML models with historical legislative data to generate terms for geographic and organizational data fields from established vocabularies, for the enhanced discovery and management of legislative data in congress.gov.
- Parsing legislative data - experiment to to test ML models in creating geographic place and subject terms for legislative data with an emphasis on measuring the quality of outcomes and analyzing the use of ML in the larger legislative data workflow that supports analysist in delivering efficient and accurate services.
- The National Library Service for the Blind and Print Disabled is experimenting with available machine learning (ML) models to synthesize and compress lengthy book descriptions into succinct and engaging content for patron discovery.
Experiments to Date
- Speech to Text Viewer: proof of concept tool testing off-the-shelf transcription tools
- Exploring ML with the Project Aida team: six explorations of how machine learning could be applied to the Library's digital collections
- Experimental Access: exploring experimental ways of providing access to the Library's digital collections
- Humans in the Loop: an experimental humans in the loop workflow for pairing human decision-making with automated processes
- Newspaper Navigator by 2020 Innovator in Residence Ben Lee
- Citizen DJ by 2020 Innovator in Residence Brian Foo
- America’s Public Bible: Machine-Learning Detection of Biblical Quotations Across LOC Collections via Cloud Computing by CCHC Research Expert Lincoln Mullen
- Access & Discovery of Documentary Images by CCHC Research Expert Lauren Tilton
- Situating Ourselves in Cultural Heritage: Using Neural Nets to Expand the Reach of Metadata and See Cultural Data on Our Own Terms by CCHC Research Expert Andromeda Yelton
Reports and Presentations
- The Machine Learning + Libraries Summit: Event Summary includes more detailed information about the Machine Learning + Libraries Summit hosted by LC Labs in September 2019.
- Digital Libraries, Intelligent Data Analytics, and Augmented Description: Final Report details exploratory projects conducted by the Project Aida Team at the University of Nebraska Lincoln in collaboration with LC Labs and addresses social and technical challenges that are critical context for the development of machine learning in the cultural heritage sector.
- Machine Learning + Libraries: A Report on the State of the Field - LC Labs commissioned Ryan Cordell, Associate Professor of English at Northeastern University, to write this report on the “state of the ﬁeld in machine learning and libraries.”
- Feasible, Adaptable and Shared: A Call for a Community Framework for Implementing ML and AI" (2022) by Abigail Potter, Meghan Ferriter, Eileen J. Manchester, and Jaime Mears. Proceedings of the 18th Internatonal Conference on Digital Preservaton 2022, p. 145
Artificial Intelligence Governance
Our current policies guide the use of technology to meet the agency’s mission, encouraging the adoption of tools and technology that will improve our ability to meet the information needs of Congress and the American people. As experiments with AI move to additional planning and implementation, policies and governance frameworks will be updated to reflect the particular challenges and opportunities presented by these tools. This includes Library policy reviews and updates and new policy evaluations.
Over the past several years, we have been developing a framework for AI decision making that aligns closely with the AI Risk Management Framework from the National Institute for Standards and Technology (NIST) and recommendations of the Office of Management and Budget in Memorandum M-21-06 titled "Guidance for Regulation of Artificial Intelligence Applications" to work towards voluntary sector-based standards, to make data available, and to communicate with the public.