Does OCR work well for reading IDs?
By Ihar Kliashchou, Chief Technology Officer at Regula
Waiting may kill many things, and user experience is one of the first victims. Let’s take banking as an example of an industry where the UX bar is set high. Being a regulated industry, it must verify plenty of personal data for any basic operation, such as opening a new account. Yet, the competition dictates that the process should also be fast and smooth to not create unnecessary bottlenecks. No surprise that banking is one of the industries which leverage Optical Character Recognition (OCR) tools the most.
The tricky thing with OCR is that it’s just not enough when it comes to processing ID documents, as they contain much more than just text. In this article, we’ll dive into the difference between OCR and ID data parsing, and how the latter can benefit high-load institutions’ workflows.
“Reading” the image
Optical character recognition (OCR) is a technology that can turn an image of text into actual editable text. Say, a scanned passport is an image. If you need to copy details to use them somewhere else, you can hire the OCR technology which does exactly this: distinguishes text characters within images and converts them into text format. This is convenient, as it helps you avoid tedious and time-consuming manual data entry.
Some OCR tools are smart enough to additionally help you with structuring extracted data. This comes in handy when it’s not a one-time operation. Such capabilities are called “OCR templating” and allow you to manually create document templates, using a set of your most common documents as a foundation. These templates let the computer know where important elements are located on the page, so you can automate some repetitive processes at scale.
All is good when working solely with text. The very definition of OCR implies that the technology works with characters. Modern IDs, however, can include as many as four different types of data sources: visual inspection area, MRZs (machine readable zones), RFID (radio frequency identification) chips, and barcodes. An OCR tool can neither fetch encrypted data — in QR codes, for instance — nor validate and cross-check it. Here’s where identity data parsing comes into play.
How does data parsing from identity documents work?
The point of data parsing is that you get structured and additionally analyzed data as an outcome. Generally, the process of document parsing consists of five steps:
- Scanning a document;
- Automatically identifying its type by comparing the document against a database of document templates;
- Reading and validating the fields that are defined by the template;
- Structuring the output;
- Document verification.
While the first three steps of the ID document parsing process resemble the principles of OCR templating, there can be major differences, depending on who created the document templates, the number of templates, and how well they are done. To illustrate the point, we’ll use the data parsing capabilities of Regula solutions, which are purpose-built for reading identity documents.
When using an OCR solution, the number of templates is usually limited to the few most common ones. In contrast, Regula’s solution for data parsing leverages the world’s largest document template database, which currently includes over 12,000 templates of passports, ID cards, visas, driver’s licenses, and other documents from all over the world. It saves you a tremendous amount of time, as you don’t need to create any ID document templates. But it’s not only about saving time.
To create a reliable template, you need to have information about all possible variations for each of the fields in the document. This isn’t something you can do having a couple of samples at hand. For example, on ID cards, the expiration date is usually written as a date. In some countries, like Bulgaria or Vietnam, for people over a certain age, there are the words “No expiration date” (or words to that effect in the respective language). If you don’t know these peculiarities, the template becomes useless.
Can a data parsing solution really verify documents?
It depends on the level of analysis depth you need, but the short answer is: yes, it can. Even if your customer submits a document you’ve never seen, you should know there is a solution that will be able to recognize it in a moment and tell you what it is and what its characteristics are.
For example, the Regula data parser starts with lexical analysis and validation that every field in the document says exactly what it should say. It checks if the expiration dates are valid, and flags if the document has expired. The lexical analysis also includes mask violations (say we expect a field to contain a date, but it’s empty or has another value). There is also an analysis for stop words: the provided documents shouldn’t have words such as “sample,” “specimen” or “test.” All this happens automatically and is indicated in the field statuses.
Also, as noted above, identity documents can have four types of data sources: visual inspection area, MRZ, RFID (radio frequency identification) chip, and barcodes. The data in different sources is often duplicated. Unlike an OCR solution, Regula reads all the sources and automatically compares all similar fields. For example, it can take a person’s last name from the RFID chip and compare it to the last name written in the MRZ and the one in the visual inspection zone. If anything doesn’t match, the solution will mark this field as invalid. So, if someone altered their name in the visual inspection zone (relatively easy to do) but failed to update the chip (a way harder thing to do), it’ll be detected.
Structuring data makes it actionable
Data can hardly be used in its raw state. Once it’s collected, it needs to be broken down and analyzed to have value and, eventually, turn into decisions. While OCR is a great technology that has revolutionized data collection, it’s no longer enough to effectively deal with identity documents. The highly structured output is one of the biggest pros of applying data parsing for processing ID documents.
With it, all the data it reads and analyzes is divided into groups, fields, and types. You can scan a document and instantly pull out the specific information you need: request the full name or date of birth. You can also have relevant data automatically converted into proper format: say, bring measurement systems (metrical/imperial) and date formats (yyyy/dd/mm, dd.mm.yyyy) into a unified format. This allows you to provide values that users are familiar with and immediately compare apples to apples at verification checks without any workarounds.
The main idea behind data parsing is to quickly deliver ready-to-use results. You quickly get the analysis, make sure the document is authentic, quickly fetch information from certain fields, and quickly scan and digitize the document to fill out a form in your internal system. When backed up with solid expertise in protected document forensics, data parsing solutions help you effectively tackle most challenges with identity document processing.
About the author
Ihar Kliashchou is the Chief Technology Officer at Regula.
DISCLAIMER: Biometric Update’s Industry Insights are submitted content. The views expressed in this post are that of the author, and don’t necessarily reflect the views of Biometric Update.