Data for Exploration

Explore the many ways the Library of Congress provides machine-readable access to its digital collections.

Get Started

Accessing images for image analysis on Loc.gov
Using the Loc.gov JSON to grab WWI Sheet Music
Cats or dogs? An example of exploring the Chronicling America API
Search the Library of Congress from your browser in one step
Change image URLs to rotate and resize images on loc.gov
Extracting location data from the loc.gov API for geovisualization with the Historic American Engineering Record
Exploring the Meme Generator Metadata demonstrates some of the basic things that can be done with the set of data from memegenerator.
Exploring the GIPHY.com Metadata demonstrates an intermediate approach to exploring the GIPHY.com data set produced by the Library of Congress.

Loc.gov JSON API - provides data about Library of Congress digital collections. The API is a work in progress and subject to change.
Congress.gov API - The Congress.gov Application Programming Interface (API) provides a method for Congress and the public to view, retrieve, and re-use machine-readable data from collections available on Congress.gov such as bills, amendments and committee reports.
Chronicling America APIs - over 12 million (and growing) digitized historic newspaper pages from almost every U.S. state and territory.
American Archive of Public Broadcasting APIs tens of thousands of historic public radio and television programs are available for streaming and more content is added periodically. In addition, the website provides data records for approximately 2.5 million items inventoried by public broadcasting stations for this project. Further, scholars may request access to JSON and text transcripts for items in the AAPB's Online Reading Room through the AAPB Transcripts Research Access service. Credentials are required to access the transcripts API and can be obtained by contacting aapb_notifications@wgbh.org. Contact AAPB for more information about accessing the collection for digital humanities/research projects.

Bulk data for Congress.gov bills, bill status, and bill summaries
MARC records - bibliographic information for most of the Library’s collections. 25 million records are available for exploration in UTF-8, MARC8, and XML formats.
Sample MARC data set and ReadMe file
Chronicling America Bulk OCR Data – text only
Chronicling America Bulk Data – image, metadata, and OCR text batches
Selected Datasets collection on loc.gov – datasets acquired by the Library for the permanent collection
Web Archive Datasets – derivative datasets from the Library's web archives
Computing Cultural Heritage in the Cloud Data Sandbox – The CCHC grant team devised data.labs.loc.gov as an experimental sandbox for sharing data packages compiled as part of the initiative.