Document imaging is the scanning of paper documents and their conversion into electronic images on a computer, which can be retrieved effortlessly in seconds.

Production capture encompasses a complex flow of processes that includes scanning but extends much further. In general, production capture includes six operations: document preparation, scanning, recognition, indexing and data validation, QC and rescanning, and release.

Image processing and storage puts millions of documents at your fingertips. Imagine retrieving vast amounts of data in mere seconds. Discover a new dimension in efficiency, accuracy and cost savings. From simply archiving for disaster recovery to complete office automation and network integration, we work with our clients to design and implement the customized solutions that meet their needs.

Our focus is not simply technology, but on solutions that meet our clients specific needs for increased productivity, security and cost savings. We work with our clients to develop complete document management strategies that integrate into their current environment with efficiency and ease.

Scanning refers to the actual transformation of paper documents into digital images. In addition to these newly created files, existing digital image files can be imported into a new system. Effective scanning requires precise control over the scanning equipment and scanner settings, including resolution, contrast, simplex or duplex operation and advanced thresholding options, etc.
Data can be extracted from images automatically via a recognition process or manually by a keyboard operator (an operation known as “key from image”-typically used when the accuracy of automatic recognition on a zone is too poor to be useful). In either case, the data must be validated and verified, sometimes by a second independent operator and sometimes via automated processes such as database lookups and built-in business rules.
OCR is the most common type of recognition and is the process by which the image of text on a page in read by a computer program make the text editable. The OCR process can be classified two ways: zonal and full-text. Zonal OCR is typically used on forms, where only specific fields on the form are of interest. Full-text OCR is used on free-form documents, such as legal briefs, to read the entire document and then prepare a searchable, full-text index of the document.

OCR can also be generalized in two forms: text-over and text-under. Text-over is when the OCR data is placed over the image and the text becomes very clean and no longer appears as the original print. Text-under is when the OCR data is placed underneath an image and that data is placed on the same x-y coordinates as the original text. Text-under is used when keeping the original look of a document is required.

Image cleanup is also performed in the recognition step. Techniques include:

  • Deskewing, despeckling, deshading, streak removal, and other basic cleanup functions
  • Line removal and character reconstruction for use on forms
  • Edge enhancement, which sharpens character edges to increase OCR accuracy

The purpose of image cleanup is to remove unwanted noise that can decrease the accuracy of automated recognition.

Quality control involves systematic reviews and checks to ensure that the scanned images are readable. QC includes methods for flagging bad images and explaining why or how images should be rescanned, and can be performed by a dedicated QC operator.
FADGI is a collaborative effort started in 2007 by federal agencies to articulate common sustainable practices and guidelines for digitized and born digital historical, archival and cultural content. Two working groups study issues specific to two major areas, Still Image and Audio-Visual.

2018. Federal Agencies Digital Guidelines Initiative. Retrieved from http://www.digitizationguidelines.gov/