Document Conversion FAQs: IDI Solutions

What is imaging?

Document imaging is the scanning of paper documents and their conversion into electronic images on a computer, which can be retrieved effortlessly in seconds.

Production capture encompasses a complex flow of processes that includes scanning but extends much further. In general, production capture includes six operations: document preparation, scanning, recognition, indexing and data validation, QC and rescanning, and release.

Why should I be interested in document imaging?

Image processing and storage puts millions of documents at your fingertips. Imagine retrieving vast amounts of data in mere seconds. Discover a new dimension in efficiency, accuracy and cost savings. From simply archiving for disaster recovery to complete office automation and network integration, we work with our clients to design and implement the customized solutions that meet their needs.

Our focus is not simply technology, but on solutions that meet our clients specific needs for increased productivity, security and cost savings. We work with our clients to develop complete document management strategies that integrate into their current environment with efficiency and ease.

What is scanning?

Scanning refers to the actual transformation of paper documents into digital images. In addition to these newly created files, existing digital image files can be imported into a new system. Effective scanning requires precise control over the scanning equipment and scanner settings, including resolution, contrast, simplex or duplex operation and advanced thresholding options, etc.

What is indexing and data validation?

Data can be extracted from images automatically via a recognition process or manually by a keyboard operator (an operation known as “key from image”-typically used when the accuracy of automatic recognition on a zone is too poor to be useful). In either case, the data must be validated and verified, sometimes by a second independent operator and sometimes via automated processes such as database lookups and built-in business rules.

What is Optical Character Recognition (OCR)?

OCR is the most common type of recognition and is the process by which the image of text on a page in read by a computer program make the text editable. The OCR process can be classified two ways: zonal and full-text. Zonal OCR is typically used on forms, where only specific fields on the form are of interest. Full-text OCR is used on free-form documents, such as legal briefs, to read the entire document and then prepare a searchable, full-text index of the document.

OCR can also be generalized in two forms: text-over and text-under. Text-over is when the OCR data is placed over the image and the text becomes very clean and no longer appears as the original print. Text-under is when the OCR data is placed underneath an image and that data is placed on the same x-y coordinates as the original text. Text-under is used when keeping the original look of a document is required.

Image cleanup is also performed in the recognition step. Techniques include:

Deskewing, despeckling, deshading, streak removal, and other basic cleanup functions
Line removal and character reconstruction for use on forms
Edge enhancement, which sharpens character edges to increase OCR accuracy

The purpose of image cleanup is to remove unwanted noise that can decrease the accuracy of automated recognition.

What is and why do you need Quality Control (QC)?

Quality control involves systematic reviews and checks to ensure that the scanned images are readable. QC includes methods for flagging bad images and explaining why or how images should be rescanned, and can be performed by a dedicated QC operator.

What is FADGI?

FADGI is a collaborative effort started in 2007 by federal agencies to articulate common sustainable practices and guidelines for digitized and born digital historical, archival and cultural content. Two working groups study issues specific to two major areas, Still Image and Audio-Visual.

2018. Federal Agencies Digital Guidelines Initiative. Retrieved from http://www.digitizationguidelines.gov/

What is imaging?

Imaging refers to the process of converting paper documents into electronic images that can be stored and accessed digitally. This allows for efficient document management, easy retrieval, and enhanced security compared to physical paper records.

The imaging process typically involves scanning the documents, converting them into digital files, and organizing them for easy access and retrieval. This technology revolutionizes how organizations manage and interact with their important documents.

Why should I be interested in document imaging?

Document imaging offers numerous benefits for businesses and individuals. It significantly improves efficiency, productivity, and accessibility by providing quick and easy access to digital records. This can lead to cost savings, better security, and enhanced disaster recovery capabilities.

Additionally, document imaging allows for advanced features like Optical Character Recognition (OCR), which enables full-text searching and indexing of the digitized documents. This makes it easier to find and retrieve specific information when needed, improving overall document management and streamlining business processes.

What is Optical Character Recognition (OCR)?

Optical Character Recognition (OCR) is a technology that allows computer software to extract and recognize text from scanned or digital images of documents. This enables the conversion of printed or handwritten text into machine-readable and searchable digital data.

OCR is a crucial component of document imaging, as it allows for the indexing and retrieval of information within the digitized documents. By applying OCR, the content of the scanned documents becomes searchable, facilitating efficient information management and data extraction.

What is and why do you need Quality Control (QC)?

Quality Control (QC) is an essential step in the document imaging process. It involves the systematic inspection and verification of the digitized documents to ensure their accuracy, completeness, and overall quality.

QC is necessary to identify and address any issues or errors that may have occurred during the scanning, indexing, or data validation stages. By performing QC, organizations can maintain the integrity and reliability of their digital document repository, ensuring the information is accurate and readily available when needed.

What is FADGI?

FADGI (Federal Agencies Digital Guidelines Initiative) is a collaborative effort initiated in 2007 by federal agencies in the United States. The goal of FADGI is to establish common sustainable practices and guidelines for the digitization and preservation of historical, archival, and cultural content.

FADGI provides standards and best practices for the digital capture and representation of still images, audio, and visual materials. These guidelines help ensure the long-term viability and accessibility of digitized content, enabling organizations to maintain high-quality digital archives and collections.

Frequently Asked Questions

What is imaging?

Why should I be interested in document imaging?

What is Optical Character Recognition (OCR)?

What is and why do you need Quality Control (QC)?

What is FADGI?