FB pixel

Datatang adds 5,000 Traditional Chinese characters to OCR system

Categories Biometrics News  |  Trade Notes
Datatang adds 5,000 Traditional Chinese characters to OCR system

Beijing-based artificial intelligence (AI) company Datatang has updated its optical character recognition (OCR) database to include 5,000 handwritten characters in Traditional Chinese.

In a dedicated webpage for the new set, Datatang said the characters were collected by various samples written on A4 paper, square paper, and lined paper, among others.

By adding the characters to its software suite, Datatang enables customers to use OCR of the corresponding traditional Chinese characters when encountering them in the wild. In other words, by scanning a text through a smartphone and the Datatang app, users will now be able to automate data entry and filling out forms.

OCR is sometimes implemented for document scanning in digital identity verification and onboarding applications.

According to the company, the error bound of each vertex of the quadrilateral bounding box around each character is within five pixels, for a qualified annotation. The accuracy of bounding boxes and text transcription accuracy are both reportedly not less than 97 percent.

The addition of the new dataset comes months after Datatang executives said their speech recognition datasets were created with native language speakers and surpassed the industry’s standards. 

More recently, the company showcased its synthetic data generation technology at the 2022 Conference on Computer Vision and Pattern Recognition (CVPR 2022).

Related Posts

Article Topics

 |   |   |   | 

Latest Biometrics News


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

The ID16.9 Podcast

Most Read This Week

Featured Company

Biometrics Research

Biometrics White Papers

Biometrics Events

Explaining Biometrics