A new electronic text retrieval and imaging system, being developed by Oregon State University's (OSU) Kerr Library Special Collections, points to revolutionary possibilities for research with rare and valuable documents. OSU's library is in the process of converting the collection of two-time Nobel Prize winner Linus Pauling into a digital format that can be stored and retrieved by computer. The new system will give researchers around the world remote access to the Pauling collection without ever touching the original documents.

The Pauling collection contains about 150,000 items stored in over 800 archival boxes. It includes letters, speeches, hand-written manuscripts, and newspaper articles. The Special Collections staff is in the process of scanning each document for storage in a digital form. After a document is scanned and stored using LaserFiche imaging software, an optical character recognition software (OCR) developed by Calera is used to extract the typewritten text of the scanned document. This method lets researchers have a facsimile image of the original document and the extracted text side-by-side on a single screen. Researchers using the system will have the option of storing the image, text, or both in digital form.

The system will also allow researchers to do key-work, easy boolean, and fuzzy word searches. For example, if a researcher wants to find all the correspondence between Pauling and Albert Einstein the name "Einstein" may be entered as a key-word. The computer will find every piece in the collection where Einstein's name is mentioned.

Project coordinator Ramesh Krishnamurthy proposed the project to Clifford Mead, head of OSU's Special Collections. The two teamed with Bob Baker, Kerr Library information analyst, in 1991 to begin work defining the scope and goals of the project. Funding has come from Kerr library. Mead expects the project will be completed within one year. According to Krishnamurthy and Mead, the new system still must work out some problems with the optical character recognition software. Existing software can now accurately scan 75-80 percent of a document; the goal is to achieve nearly 100 percent accuracy. Mead is confident that the remaining technical problems can be worked out.

Public access to the system will come through OSU's Kerr Library local area network. People will be able to access the system in much the same way as they do when searching the library's OASIS electronic catalog. Researchers from around the world will be able to access the Pauling collection through the Internet.

