Transcription
Transcription Overview
The Team: Keller Brown, Kaitlin Carman, Carlos Febres-Cordero & Scott Cousland
Our goal was to complete an initial transcription for each of the journals, a smaller amount by students and staff, and the bulk of the journals by using optical character recognition software.
Early FSU Students: Transcribe-A-Thon Sub-Project
In an attempt to get more pages transcribed more quickly for this project, we decided to, with the support of the Framingham State University Library's Special Collections, host a Transcribe-A-Thon, which was held in the Archives Room at the Whittemore Library on Novermber 17th of 2022.
With the help of our amazing volunteers, we were able to begin transcribing Louisa Harris' sixth journal. Although the transcription process proved to be a little more time-consuming that we initially anticipated, we are content with the progress we have made in such a short amount of time. We hope that our work serves as a solid starting point for those who continue to transcribe the Louisa Harris journals and work on the Early FSU Journals project.
Cursive Optical Character Recognition Sub-Project
An initial search found three websites that claimed to be able to handle handwritten documents, but after digging deeper, they turned out to draw the line at diaries.
A modified search found a company that actually focuses on cursive OCR called "Pen-to-Print," but initial tests did not go well.
Pen-to-Print's support department was provided with a description of the problem, but responded that it had not been reported before.
The solution turned out to be to force Adobe Acrobat to "Remove Hidden Information” from the PDF before submitting it for processing.
A side-by-side comparison of a page from Eliza Gould’s journal, one transcribed by a human, with one transcribed by the OCR software, found the OCR software to be 88.5 percent accurate. A purchase of a license for Pen-to-Print is in process.
Pen-to-Print can process 50 pages within a PDF at once. I used Adobe Acrobat Pro to submit single page PDFs because the free license is limited to 10 pages.
The existing journal PDFs have the left and right side of the journals as one page in the PDF.
It may be more efficient to re-scan the journals so that a left or right side page is on its own page in the PDF, instead of having to crop every "double" page into separate pages within the PDFs.
A license to use Acrobat Pro is included in the Adobe Creative Cloud software, which must be requested from the IT Department.
Here is the link to make a request for: Virtual Software, Apporto, Adobe Creative Cloud etc