How to transcribe
We transcribe to improve search functionality. Our goal is to make documents word-searchable in the Library of Congress website, which means typing transcriptions that can be read by that computer system, as well as by humans. Most handwriting and some typed text cannot be automatically and accurately translated into machine-readable text using current technologies -- that’s why we need your help!
We ask you to transcribe a document roughly as it appears on the page. Preserve line breaks, except in cases where words are broken over two lines. Our main goal is to capture all of the words on these pages, so broken words are not helpful for search. Using line breaks to roughly capture the layout of the page helps reviewers (other volunteers) check transcriptions. These will be viewable beside the original images in the website, so anyone who is interested in the physical layout of the original document will be able to see it.
All in order:
Transcribe text in the order it appears.
Preserve original spelling unless the author seems to have made a minor error, such as writing “teh” instead of “the”. If a misspelling will impact the searchability of the document, use a tag to add the correct spelling. Example:
- An author wrote “Willa Kather” instead of “Willa Cather”. Transcribe Willa Kather, and tag “Willa Cather”.
When text has been inserted or added later, but should be read as part of a sentence, bring it down into the line and type it in the order you would read it aloud.
Line-breaks, words broken across two lines or two pages:
Do not reproduce words broken between two lines. Write the full word on the first line, for example library rather than li-brary, kitten rather than kit-ten. Otherwise always preserve line breaks by clicking "enter" or "return" at the end of a line, to roughly mirror the layout of the original document. This will help reviewers in By the People to easily compare your transcription with the image. If a word is broken between two pages, transcribe it on both pages, unless you are transcribing a two page spread where you can see two pages side by side. Then just transcribe it on the first page.
The Library website cannot search for bold, italic, underlined or superscript text, so even when you see these features please transcribe the words without any styling.
Illegible text is anything you can’t read because a page is damaged, text is heavily crossed out or you can’t tell what the author has written. If there is a word or a string of words you cannot read use a pair of square brackets around an empty space [ ]. Example:
- "I have [ ] loved coffee ice cream"
If you can read any letters or parts of words transcribe what you can and use question marks for the remaining letters or words. Examples:
- "I have [a?????] loved coffee ice cream"
- "I have [a?] loved coffee ice cream"
If you cannot read a word or phrase that’s ok. Another volunteer may be able to identify the missing letters and update your transcription. If there is a lot of text you cannot read consider looking for another page that you can decipher better.
If you can read crossed out or otherwise deleted text, transcribe the deleted words within a pair of square brackets. Example:
- “I have always loved [vanilla] coffee ice cream.”
Marginalia is text written in the space around the main block of text. It is usually a comment on the main body text but can also be unrelated. It is different from an insertion, because it cannot be directly inserted into the main text and still make sense when read aloud. Use a pair of square brackets and asterisk [*] around marginalia text and order it within the transcription where it makes the most sense (or at the end of the transcription if it appears unrelated). Example:
- I have always loved coffee ice cream. Last summer I made my own using a recipe from the 1970s. It was the creamiest coffee ice cream I ever ate. No one else in my family likes that flavor. Oh well, more for me! [*Brazil was the largest coffee producing country in the world in 2017*]
Printed and typed text:
Some material in crowd.loc.gov was created on a typewriter or printed. If we have included it here it is because the text is not machine-readable. A computer using Optical Character Recognition (OCR) technology cannot create an accurate word-searchable transcription. Examples include the scouting reports of Branch Rickey, which are typed on thin paper and are often too fuzzy for successful OCR. Similarly, mixed materials containing manuscript and print have not been run through OCR, so please transcribe letterhead and any other printed features that will shed light on where a document was created.
When not to transcribe printed text:
Some calendars and diaries contain many pages of pre-printed almanacs or other text that is probably machine-readable and therefore should not be transcribed as part of this project. It might be interesting to copy the first page from a repeating template in a diary or journal, but this is not the core text we are aiming to capture. However, if you want to transcribe it, feel free. Alternatively, click "Nothing to transcribe". This button should also be used for archival folders, blank pages, and pictorial images.
Tables, graphics, images:
Some documents will contain tables. Transcribe these in a way that will make them relatively easy for a reviewer to check over, but don't try to capture the exact layout of the data. The material will go back into the Library website without styling. Don't include notes or descriptions of visual features. If you would like to describe images, watermarks, stamps, or any other non-text features, feel free to use the tagging function. Register for an account if you want to add tags!