FB pixel

Explainer: What is OCR, and how does it work?

Explainer: What is OCR, and how does it work?
 

Optical character recognition is the process of converting an image of text into a machine-readable text format.

The technology was invented to solve issues connected with text editors’ inability to edit, search or count the words in image files.

OCR is particularly relevant as increasing digitalization requires businesses to receive information from print media, which is traditionally harder to store and manage. This includes scans of identity documents, such as passports or driver’s licenses, which also include photos that can be used for biometric identity binding.

Scanning images via OCR eliminates manual intervention and enables the conversion of text images into text data that can later be analyzed by other business software. 

Companies can use the data to conduct analytics, streamline operations, automate processes and enhance productivity.

How does OCR work?

OCR systems comprise both hardware and software components. The hardware is used to physically scan the document, while the software takes care of the analysis of the characters and their translation into machine-readable text.

From a technical standpoint, OCR software transforms the document into a two-color (usually black-and-white) version. The scanned image, or bitmap, is subsequently analyzed for light and dark areas, with the latter identified as characters to be recognized. In contrast, the former areas are classified as background and therefore excluded from further processing. 

The dark areas are analyzed to find either alphabetic letters or numeric digits. This part of the process typically targets characters individually and identifies them using one of two types of algorithms: pattern matching or feature extraction.

Pattern matching isolates a character image (called a glyph) and compares it with a similarly stored glyph. Noticeably, pattern recognition works only in those cases where the stored glyph has a font with similar font and scale to the input glyph. Because of this, the method works best with scanned images of documents that rely on standard fonts.

The second type of algorithm uses feature extraction, a method that breaks down the glyphs into features such as lines, closed loops, line direction, and line intersections. These features are then used to find the best match among the stored glyphs.

After analysis, the system converts the extracted text data into a digital file. The file can also be used to automate the completion of forms. 

Companies using the technology in conjunction with biometrics include OCR Labs, Datatang and Smart Engines.

Article Topics

 |   |   |   | 

Latest Biometrics News

 

RIVR results show biometric liveness detection effectiveness highly variable

The state of the art in biometric presentation attack detection (PAD) is better than document validation, but far worse than…

 

Court signals NetChoice faces tougher road on age check laws

The legal campaign against state social media age check laws is entering a more precarious phase for NetChoice and the…

 

Spain’s AEPD fines Yoti $1.1M for biometric data handling violations

Yoti has been fined 950,000 euros (roughly US$1.1 million) by Spanish data protection regulator AEPD for the handling of biometrics…

 

UK gov’t to design and build national digital ID in-house

The UK government plans to design, build and run its digital ID in-house, rather than outsourcing it to a private-sector…

 

UK Lords reject bid to block police facial recognition searches of DVLA database

The UK’s House of Lords has voted down an attempt to prevent the Driver and Vehicle Licensing Agency (DVLA) database…

 

India is leading example of digital infrastructure, IMF says

Digital public infrastructure (DPI) is being recognized as a foundational public good and a new paper from the International Monetary…

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Biometric Market Analysis and Buyer's Guides

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events