By the People: LC Crowdsourcing

By the People invites students and lifelong learners to contribute to the Library of Congress as virtual volunteers -- transcribing, reviewing, and tagging historical texts to improve search and accessibility of Library of Congress digital collections.

Try It!

Published October 2018

Graduated to a Permanent Program 2020

About

By the People is a web-based crowdsourcing application where anyone with an internet connection can transcribe documents from Library of Congress digitized collections.

This application was initially launched as a crowdsourcing pilot by the LC Labs team (as part of National and International Outreach), Manuscript Division, and the Office of the Chief Information Officer. By the People is now managed by the Digital Content Management Section at the Library.

By the People invites members of the public, non-specialists and specialists alike, to help make data more usable and discoverable. In project, volunteers transcribe, review, and tag digitized images of manuscripts and typed materials from the Library's collections. Everyone is welcome to take part, with or without an account! Creating an account gives volunteers access to additional features such as tagging, and reviewing other people's transcriptions. All transcriptions are made and reviewed by volunteers before they are returned to loc.gov, the Library's website. These transcriptions will improve search, readability, and access to handwritten and typed documents for those who are not fully sighted or cannot read the handwriting of the original documents. Check out the FAQs in our Help Center for more detailed information.

By the People runs on Concordia. This open source software developed by the Library of Congress to power crowdsourced transcription projects. The code is visible and free to reuse and improve: visit our Github repository for more information. The platform was built utilizing user-centered design principles, which emphasize building trust and approachability. This project is a partnership between the Library and a growing community of volunteers who help us to iteratively improve the platform. Be in touch to give feedback about the experience of transcribing or how to improve the code base and the project itself by emailing crowd@loc.gov or mentioning @Crowd_LOC on Twitter.

By the People Datasets

All contributions to the By the People application are released into the public domain as they are created. Interested in exploring the transcription data created as the By the People campaigns are completed? Transcription data will be made available on loc.gov as a dataset; see this Branch Rickey scouting reports data in bulk as zipped .csv files. Anyone is free to use and reuse these datasets in any way they want. Portions of the Branch Rickey scouting reports dataset were previously shared as an experiment on labs.loc.gov, but have been superseded by the loc.gov datasets described above. Datasets will consist of raw transcriptions data and README files including details of data creation and field names. Read about the 2019 data release of the Branch Rickey scouting reports transcriptions in this Library of Congress blog post. The data available in that release are drawn from part of legendary baseball scout Rickey’s Baseball Files available in his archival papers. You can check back on loc.gov for additional By the People datasets as they become available.

Special Thanks

The By the People program has been generously supported by the National Digital Library Trust Fund. The design and development of By the People was informed by the 2017 LC Labs, OCIO, and Serial and Government Publications Division experiment Beyond Words. This application is the result of collaboration between numerous divisions and teams at the Library of Congress, as well as members of the public who contribute to By the People.