How to transcribe
There are many ways to transcribe documents, and many crowdsourcing projects ask volunteers to transcribe in different ways. This section explains the By the People transcription conventions. Our primary goals are to improve search functionality within the Library's website, and the accessibility of documents for people who use screen readers or other assistive technology. The Library of Congress website cannot search for formatted text such as superscripts or underlining, so we ask that you not try to capture this kind of information. The following instructions were created with the Library's website search functionality in mind, and with the intention of making these pages a pleasure to hear aloud. It is also our goal to preserve the original spelling, grammar, and punctuation of the documents, in order to honor the original creators' historical reality and stylistic choices. We hope these instructions answer most of the issues you will encounter as you transcribe, but if not please be in contact via the Crowd project discussion forum called History Hub or email. History Hub is a separate website requiring its own username and password.
In what order should I transcribe?
Transcribe text in the order it appears on the page. If there is more than one page in an image, transcribe both pages one after the other. You can use two hard returns ("enter" or "return" keys) to divide the two pages, which will make it easier for the next volunteer to review. In some rare cases an author writes across two pages rather than writing from top to bottom and then top to bottom again. Transcribe the text in the way that it would make sense to read it aloud.
Spelling, grammar, punctuation, word order, long s, page numbers:
Preserve original spelling, punctuation, grammar, word order, and any page numbers or catalog letters or numbers. Do not paraphrase the original text, just type what you see. Some historical handwriting uses the "long s" form, which looks like a lowercase "f". Transcribe this as a lowercase "s". Some writers use an equals sign as a dash or to add emphasis. You can use an equals sign to represent this feature. Dashes and other punctuation can be a little unusual in the early twentieth century and earlier, so just make your best guess on whether something is an en dash, em dash, period or something else.
If a misspelling will impact the searchability of the document, use a tag to add the correct spelling. Only registered users can add tags and review documents, so create an account if you'd like to do either of those activities.
When text has been inserted over a line or added later, but should be read as part of a sentence, bring it down into the line and type it in the order you would read it aloud. Do not use carrot symbols or brackets to indicate that the text has been inserted.
Preserve line breaks to make it easier for someone to review your transcription. To create a line break hit "enter" or "return" at the end of a line. Don't worry if your text in the transcription box spills over two lines before you hit enter. So long as you have not inserted a hard return, no line break will be recorded. The exception to preserving line breaks is where words are broken over two lines or two pages. When the word is broken over two lines on the same page, type the word on the first line that it appears. In the case of a word that breaks across two pages, transcribe it on both pages.
Do not expand abbreviations, just type what you see. You can use the tagging function to record an important abbreviation such as a proper noun that is not already recorded in the item description.
Bold, underline, italic, superscript, indentation, etc.:
The Library website cannot search for bold, italic, underlined, superscript or indented text, so even when you see these features please transcribe the words without any styling.
Illegible or unclear text:
Illegible text is anything you can’t read because a page is damaged, text is heavily crossed out or because you can’t tell what the author has written. If there is a word or a string of words you cannot read use a pair of square brackets around a question mark [?]. Example:
- "I have [?] loved coffee ice cream"
If you can read any letters or parts of words transcribe what you can and use question marks for the remaining letters or words. Examples:
- "I have [a?????] loved coffee ice cream"
If you cannot read a word or phrase that’s ok. Another volunteer may be able to identify the missing letters and update your transcription. If there is a lot of text you cannot read consider saving your transcription and looking for another page that you can decipher better.
If you can read crossed out or otherwise deleted text, transcribe the deleted words within a pair of square brackets. Example:
- “I have always loved [vanilla] coffee ice cream.”
Marginalia is text written in the space around the main block of text. It is usually a comment on the main body text but can also be unrelated. It is different from an insertion, because it cannot be directly inserted into the main text and still make sense when read aloud. Use a pair of square brackets and asterisk [*] around marginalia text and order it within the transcription where it makes the most sense (or at the end of the transcription if it appears unrelated). Example:
- I have always loved coffee ice cream. Last summer I made my own using a recipe from the 1970s. It was the creamiest coffee ice cream I ever ate. [*Brazil was the largest coffee producing country in the world in 2017*]
Printed and typed text:
Some material in crowd.loc.gov is typed or printed. Most of this text is not machine-readable, meaning that a computer using Optical Character Recognition (OCR) technology cannot create an accurate word-searchable transcription. Examples include the scouting reports of Branch Rickey, which are typed on thin paper and are often too fuzzy for successful OCR, and the speeches of Mary Church Terrell which were typed by her and then preserved on microfilm at the Library. If you would like to try using dictation software or OCR, you are welcome to do so, but please go back and check the output for accuracy and insert any linebreaks. Read how some of our volunteers have used these technologies, and join in the conversation on the project discussion forum, History Hub. Please transcribe letterhead, including names, places, and any words that are in the letterhead.
When not to transcribe printed text:
Some mass produced calendars and diaries contain many pages of pre-printed almanacs or other text that should not be transcribed as part of this project. This is not the core text we are aiming to capture. However, if you want to transcribe it, feel free. Alternatively, click "Nothing to transcribe".
Use the "Nothing to Transcribe" button for blank pages, images or printed templates.
Tables and tabular data:
Some documents will contain tables. Transcribe these in a way that will make them relatively easy for a reviewer to check over, but don't try to capture the exact layout of the data. You can use spaces and hard returns, but please do not add any additional characters such as the pipe symbol or slashes to divide the data.
Images, photographs, and graphical material:
Don't describe visual elements or add editorial notes in the transcription box. If you would like to describe images, watermarks, stamps, or any other non-text features, use the tagging function. Register for an account to add tags! If you want to start a longer conversation about your findings, join the discussion on History Hub.
Translation and text other than English:
If you can transcribe the original language of a document, please do so. You can change your language input settings in your browser, and may need to use a foreign language keyboard or shortcuts for non-English characters. However, do not translate documents. The Library is not currently able to support translation . If you would like to translate a document and discuss it with other volunteers, please visit History Hub and join in one of the many conversations on this topic. Please note that the project ran a short trial period of translation, but we now ask that all translations be kept out of the transcription space.
May I do additional research?
You most certainly may! If you are stumped about a word such as the name of a person or place, it is often helpful to visit the original document on the Library of Congress website. Do this by clicking on the grey button "view original on www.loc.gov", located in the transcription interface. You can also consult library resources resources linked in each campaign page. Additional information or historical context can be found through general web searches, maps, books, and more.
Key commands or why did the image flip or rotate?
You can use keyboard commands to manipulate the image you are transcribing in the viewer. Press the question mark button above the image to see the keystroke combinations or refer to this guide. Click into the viewer and then use the following keystrokes: