How to transcribe

This section explains the By the People transcription conventions. There are many ways to transcribe documents, and different crowdsourcing projects ask volunteers to transcribe in different ways.

Our primary goals are to improve the searchability, readability, and accessibility of these documents for people who use screen readers or other assistive technology. We also want to honor the creators' historical reality by preserving the original spelling, grammar, and punctuation. The instructions were created with the Library's website search functionality in mind, and with the intention of making these pages a pleasure to hear aloud.

These instructions cover most issues you will encounter as you transcribe, but we can't cover everything! Post questions or clarifications in our History Hub discussion forum or contact us directly.

Text order

Transcribe text in the order it appears on the page. If you're unsure of order, transcribe the text in the way it would make sense to read aloud.

If there is more than one page in an image, transcribe all pages, one after the other, in the order they appear. You can use two hard returns to leave space between pages. This may make it easier for another volunteer to review.

Spelling and punctuation

Preserve original text spelling, punctuation, grammar, word order, page numbers, and catalog letters or numbers. Do not paraphrase the original text, just type what you see. Some writers use an equals sign as a dash or to add emphasis. You can use an equals sign to represent this feature. Dashes and other punctuation can be a little unusual in the early twentieth century and earlier, so just make your best guess on whether something is an en dash, em dash, period or something else.

If a misspelling will impact the searchability of the document, use a tag to add the correct spelling. Only registered users can add tags and review documents, so create an account if you'd like to do either of those activities.

  • Example: An author wrote “Ace Blinckin” instead of “Abe Lincoln”. Transcribe as "Ace Blinckin", and tag "Abe Lincoln".
  • Don't leave notes in text

    Only type the original text of the page into the transcription box. You may be tempted to leave notes about the document, styling, or research you have done in the text. Please don't! Helpful information or context you want to leave for others can be added as a tag or posted in our discussion form on History Hub.

    Blank pages

    Leave the transcription box blank and use the "Nothing to Transcribe" button for blank pages, images or printed templates.

    Line breaks

    Preserve line breaks. Line breaks make it easier for someone to review your transcription. Don't worry if your text in the transcription box spills over two lines before you hit enter. If you have not inserted a hard return, no line break will be recorded. The exception is when words are broken over two lines or two pages. When the word is broken over two lines on the same page, type the word on the first line that it appears. In the case of a word that breaks across two pages, transcribe it on the first page.

  • Example: Write library rather than li-brary, kitten rather than kit-ten.
  • Abbreviations

    Do not expand abbreviations, just type what you see. You can use the tagging function to record the expanded text of an important abbreviation such as a proper noun that otherwise does not appear in the text.

    Formatting: Bold, underline, italic, indents, superscript, etc.

    Do not try to capture formatting, such as underlining or indents. Preserving the font styling and formatting does not enhance the discoverability or accessibility of a page and may not dispay as desired when the transcriptions are published. Please also do not make note of formatting in the text.

    Insertions

    When text has been inserted over a line or otherwise added later, but should be read as part of a sentence, bring it down into the original text and type it in the order you would read it aloud. Do not use caret symbols or brackets to indicate that the text has been inserted.

    Illegible or unclear text

    Illegible text is anything you can’t read because a page is damaged, the text is crossed out, or you can’t tell what the author has written. If there is a word or a string of words you cannot read, transcribe as a pair of square brackets around a question mark [?]. Example:

    • "I have [?] loved coffee ice cream"

    If you can read any letters or parts of words transcribe what you can and use question marks for the remaining letters or words. Examples:

    • "I have [a?????] loved coffee ice cream"

    If you cannot read a word or phrase that’s ok! Another volunteer may be able to add to your transcription. If there is a lot of text you cannot read consider saving your transcription and looking for another page you can better decipher.

    Deletions

    If you can read crossed out or otherwise deleted text, transcribe the deleted words within square brackets [ ]. Example:

    • “I have always loved [vanilla] coffee ice cream.”

    Marginalia

    Marginalia is text written in the space around the main block of text. It is often a comment on the main body text but may also be unrelated. It differs from an insertion, in that it cannot be directly inserted into the main text. Put a pair of square brackets and asterisks [* *] around marginalia text and order it within the transcription where it makes the most sense (or at the end of the transcription if it appears unrelated). Transcribe all original punctuation within the [* *] as well, including brackets, parentheses, and other special characters. Example:

    • I have always loved coffee ice cream. Last summer I made my own. [*In 2017, Brazil was the largest coffee producing country*]

    Printed, typed, or newspaper text

    Some material in By the People is typed or printed, including typewritten letters and memos, newspaper clippings, printed forms, and more. This text still needs to be transcribed as it is not yet machine-readable. For various reasons, the Library has been unable to automate transcription using Optical Character Recognition (OCR) technology. If you would like to try using dictation software or OCR, you are welcome to do so, but please check the output for accuracy and insert linebreaks. Read how other volunteers have used these technologies, and join in the conversation on History Hub.

    • Letterhead: Transcribe letterhead text.
    • Newspapers: Transcribe all articles, not just those you think are relevant. Transcribe columns in the order you would read them. Don't try to preserve layout.

    When not to transcribe printed text

    Some mass-produced calendars and diaries contain many pages of pre-printed almanacs or other text that should not be transcribed as part of this project. This is not the core text we are aiming to capture. However, if you want to transcribe it, feel free. Alternatively, if a page is blank other than pre-printed template text, you can click "Nothing to transcribe".

    Images

    Don't describe images or other visual elements within the transcription box. If you would like to describe images, watermarks, stamps, or any other non-text features, use the tagging function. Register for an account to tag!

    Non-English languages, characters, and translation

    If you can transcribe the original language of a document, please do so! Many languages can be found throughout our campaigns. We want to make sure these materials are also transcribed wholly and accurately.

    Please use the correct characters when transcribing non-English text. You can change your language input settings in your browser, and may need to use a foreign language shortcuts for non-English characters.

    For our Herencia campaign we created guides and cheatsheets to help you transcribe Spanish and Latin.

    Please do not translate non-English text in the transcription box. If you have translated a By the People document, we would love for you to share it in History Hub!

    Shorthand

    Some of our campaigns include shorthand text. Shorthand is a writing method that uses symbols for words or phrases to more efficiently and quickly capture notes (some examples here and here). Many forms of shorthand exist and we have found that shorthand transcription is really closer to translation. When you recognize text as shorthand, do not transcribe it. Instead, where it appears on the page type [[shorthand]]. You can also add a "shorthand" tag.

    Other symbols and special characters

    Transcribe symbols and other special characters within words when they are utilized in the original document. These include ampersands (&), currency symbols ($, £, etc.) and the silcrow (§, used in legal documents).

    You can learn about British Colonial currency markings here.

    Tables

    Some documents will contain tables of data. Transcribe these in a way that will preserve the relationships between columns and rows, and reflect the meaning of the original documents. Try to make your transcription relatively easy for a reviewer to check, but don't try to capture the exact layout of the data. You can use spaces and hard returns, but please do not add any additional characters such as the pipe symbol or slashes to divide the data.

    Long s or "funny" f

    Some historical handwriting and printing uses the "long s" form, which looks like a lowercase "f". Transcribe this as a lowercase "s".

    Cross-writing

    Some letters have "cross-writing" where the author has written text in two directions to save paper or the cost of postage. Transcribe these letters in the order they were written or would make the most sense to read. You can also add the tag "cross-writing."

    Bleed-through

    Text written or printed on thin paper, like letterbooks, will often have bleed-through backward text, where ink from the proceeding page has seeped through the paper or is just visible through the thin paper. Bleed-through can make transcribing a page more difficult, but try to ignore the bleedthrough and do your best to decipher the non-mirror text. You can usually go to the proceeding page to view and transcribe the bleedthrough text in the correct direction.

    Research

    We're often asked "can I do research?" -- of course! If you are stumped about a word such as the name of a person or place, it is often helpful to do a little research. We suggest starting by visiting the original document on the Library of Congress website. Do this by clicking the button "View original on www.loc.gov", located above the transcription interface. We've also linked helpful resources on each campaign page. Additional information or historical context can be found through general web searches, maps, books, and more.

    Saving work in progress

    Saving a transcription stores what is in the transcription box; it does not reserve that page for a user. Saved transcriptions move to the status of "In Progress" and can be edited by another user once you leave that page. Saving and remaining on a page for longer than 2 hours will also result in that page being released for editing by another user.


    Ready to try transcribing?

    LET'S GO!