Home | e-Store | Service Request | Sitemap | Contact Us  
HomeOur CompanyCase StoriesDocument Management SolutionsProducts & ServicesCustomer Care
document management company document management company
document management company
 
  News: Mortgage Document Capture Made Easy



New technologies make it easy to speed up the loan approval cycle.

As stated in Integrated Solutions (March 2005):

"Companies today have plenty of reasons to look into implementing imaging solutions. Businesses can reduce the quantity of their paper files, speed up forms processing procedures, and improve their ability to distribute crucial information."

Improved Workflow

Companies are proving that documents can flow through an organization up to 100 times faster as electronic images. These firms have gained the highest degree of efficiency and have even been able to profit from automated “missing document” retrieval and reporting methods. Because of this, some companies are offering incentives for documents that arrive in electronic form versus paper form.

Documents can be quickly routed, stored and maintained in an electronic form. However, the challenge to achieving automated document flow is, “How do you get the documents in?” Management wants easy solutions to get unstructured information accurately classified, indexed, and routed quickly.

Document Capture Industry

This challenge has created an entire new industry called “Document Capture.” Document capture is the concept of intercepting and accurately classifying faxes, e-mails, voice mails, mail, overnight shipments, and physical folder files containing an array of miscellaneous document types.

The document capture industry is comprised of multiple product companies. Lending entities need to understand the difference between a document capture software product and a document repository product.

Many institutions assume that scanning documents is an extension of the viewer and retrieval software. The difference is the same as believing a TV network is responsible for content. Typically, TV networks just broadcast and other entities create the content.
In the document management world, a skillful designer will look for the most cost-effective approach to move documents and information into a system. However, many institutions do not realize all of their options until they have implemented a document storage system. Once they have implemented a system, they assume that repository software products such as FileNet, Documentum, Hyland OnBase, Liberty, OpenText, and countless others have a sufficient method of document classification and identification.

Frequently, within the integration budget of an electronic document management system (EDMS), the company buying the new system only comprehends that the documents are organized electronically and underestimates the time associated with identifying each document visually.

Consequently, most VARs or Integrators task their customers with document identification that is costly, time intensive, and often error prone.

Three Capture document identification methods are commonly used by system integrators:

1. Key-From-Image

“Key-from-image” is the most common method. This technique requires the user to identify the scanned document image and manually key in the index. Although every capture process will ultimately require a small percentage of documents to be identified this way, to have all documents identified this way is usually cost prohibitive. Because of this, many production managers are not committed to EDMS.

Rather, they remain determined to work with the hard-copy records maintaining that it's just quicker to have an experienced employee rapidly fan through the paper files. The reality is that most operators only have one or two PC windows available at a time to page down through thumbnail images in search of a specific document. This often results in a frustrating user task.

Many times, lending companies will take a half-step towards document identification and group the documents into sections and not make a full commitment to identifying each and every document type. This causes problems down-the-line when looking for a more accurate inventory of the document images or a specific document.

2. Barcode Separators

“Barcode Separators” is the next common consideration for document identification. It is easy to describe, but hard to envision placing a document separator (colored 8 ½ by 11 sheet of paper with a barcode identifier) between each and every document type in a loan file.

This manual insertion process introduces a highly recognizable and accurate machine-readable code that identifies each document type. Loan files can contain as many as 50 to 100 uniquely different document types. Ironically, we have to create more paper to reduce paper. In addition, the error rate can be high if someone is not attentive and/or not completely sure of the documents they are identifying.

Most service bureaus work with document separator techniques to identify the documents. This forces imaging to move to a post-loan closing consideration versus the more efficient pre-funding imaging process.

In consumer lending, some of the documents can originate with pre-printed barcodes that identify the document type. This reduces the need for insertion of document separators.

Using barcodes to identify documents is logical. It shifts the expense from scanning work to prepping work. A new labor element is created to reduce a future labor element.

However, this approach creates skepticism by Production Managers and they fall back to key-from-image process, batching the documents into groups, or imaging at the close of the loan.

3. Rules-Based

“Rules-Based” document identification is another approach. Documents can be organized in a sequence. Footers and headers are read and classified using a technique called zonal optical character recognition (OCR). To determine document types, programming and scripts can be used with a combination of barcode technology, key-from-image, and process of elimination.

A serious production oriented document identification process would not use rules and programming techniques as the main approach to identifying documents. It would require too much programming to set up and the process would be too slow to run. Nevertheless, every good document management capture process will always have a small percentage of rules-based technology to clean-up those difficult document types that can not be reorganized by any other method.

Templating: A Better Approach to Document Capture

“Templating” is both the new and old buzz word used in describing a better approach to setting up a document capture process. Setting up templates or examples of document types for quick recognition means something different for each capture software vendor in the marketplace.

Tagging

Tagging is a technique used to identify a document to a template.
The most common methods of tagging are:

  • Anchor Point Tagging
  • OCR Tagging
  • Pattern Detection Tagging

The subtle differences in these technical approaches can make or break a document capture process. As the volume and complexity of document type variables increase, the selection of the technical approach becomes more critical.

1. Anchor Point Tagging Technology

Documents have elements on them that are unique. They can be logos, lines, form identifiers, signature blocks or virtually any distinct object on the image. One might think of these as fingerprints unique to a document.

The capture industry has coined these unique points as “Anchor Points.” They are anchored in a specific zone or area on the document. The XYZ logo is always in 1” from the left and 2” from the top in the upper right hand corner of a document.

A configuration expert could find unique anchor points on every document that comes into an organization. Those anchor points could be mapped back to one of the pertinent document types that are part of the lending organization’s classification criteria. As the capture software finds each new anchor point, it can determine the start and end of a document type.

Barcode separators are no longer needed to flag the start of a new document type. However, blank sheets are inserted between each document type to alert the software to look for an anchor point on the next page.

This technique puts a lot of pressure on the software configuration process and the designer must remember where anchor points were used in previous documents and have a solid understanding of what is and what is not a good anchor point.

Typically, this technique promotes each template to have two or three unique anchors per document.

2. OCR Tagging Technology

It is amazing that software can recognize every character in a printed page. That is the ultimate outcome of OCR technology. ICR or intelligent character recognition is even more amazing because it can recognize handwritten characters.

If software can recognize the characters, it can recognize the words. If software can recognize the words, it can ultimately look for key words that would be unique to certain document types. Eventually, software will be able to string words together and derive meaning from the document. This is the exciting future of OCR technology.

OCR tagging is recommended in A/P and A/R document capture processes because its strength lies in forms-based data retrieval. It is best used when:

  • there are few document types
  • data needs to be derived from within the document
  • it may not be in common zones on the document

3. Pattern-Based Tagging Technology

What if we had a great population of documents already scanned and identified? They may have been identified using one of the aforementioned techniques.

Then, what if we could program the capture software to know that all these pre-identified scanned images (that have been mapped back to one of the document types we are looking for) are to be used as templates?

Pattern-based tagging technology is used to tag a new scan to an image that was previously scanned. The two patterns have enough similarity to statistically deem them a match, so we tag the document's identity and go on.

Rapid Document Identification

As new documents are scanned, the software recognizes what they are and the pattern is updated for future images that are scanned. This technique is the best approach to rapid document identification.

Instead of looking for a specific “anchor point” the whole image is looked at as a fingerprint and compared against documents that have been recognized once before. If there are enough statistical similarities, it is a match.

To illustrate this point, imagine a partially blind man entering a room in a house and through blurred vision he makes out the pattern of a refrigerator. He could quickly conclude that he is in the kitchen. The pattern of the refrigerator maps back in his mind that he has seen this pattern before and has already learned that refrigerators are always in the kitchen.

In the same way, Lending Capture software can be taught that all “Hazard Insurance” documents from State Farm have a similar pattern.

What is the best approach?

It depends on how and when the technique is applied.

The pattern-based tagging approach may be the best technique to use first. Documents that are not recognized by pattern-based tagging can be tagged by anchor points. If no anchors points can be found, you will need to open up the document and perform an OCR approach. Lastly, when all of these techniques fail, then use some rules-based technology to assist in the document identification.

By using all of these approaches we fine sort the document down to fewer possibilities and ultimately tag it with its correct identity.

We use the fastest technology first to sort through the greatest bulk of documents and move to a more accurate, but intensive PC processing technique for the final document identification process.

This technique of cascading down to an ultimate conclusion is common in the banking industry. When bulk sorting checks, it is optimal to move them through multiple sorts, with the last step being the fine sort.

Spam detection is another analogy to illustrate the use of cascading. We all know what Spam e-mail is. What we don't know is why we can't stop it completely.

When we use software to tag Spam, there are multiple approaches to determine what is good versus bad e-mail. There is “White Listing” (the known people), there is “Black Listing” (the known bad people), there is “Content Searching” (Does the word “timeshare” appear in the e-mail?) and ultimately authentication of each e-mail.

If you rely on one approach you will still have Spam. If you ordered all the approaches above, tagging the e-mail could be 100% effective, but it would dramatically slow the e-mail process.

The best anti-Spam infrastructures use a combination of several approaches in a specific cascading order. It is this same cascading approach that we believe makes the most sense for lending operations to use in their capture techniques.

Conclusion

Document Capture is a new industry to fix a new problem. Technology-based electronic document workflow has created the need to identify the type of documents faster. The increasing number of documents has created a need to approach the set-up of templates in a way that does not require sophisticated programming and/or human decisions on what is a unique component of a document. These finite rules should be used on the small percentage of documents that are not recognized through pattern technology.

About the Author:

Steve MacWilliams has 24 years experience with lending operation software applications. He works at DocuSource, a company that specializes in document capture software products for lending applications and represents multiple software repository applications.

Steve MacWilliams
Senior Vice President
DocuSource