BW Communities

Artificial Intelligence Needs To Evolve To Recognize Human Handwriting

Sandeep K Chhabra Jun 21, 2021

Current Week Holds Critical Cues For Near Term Market Direction

Reading and writing is a skill that humans alone possess among all living beings. However, humans find it hard to read text written by others. The rise of IT continuously reduces the need to type and hence read handwritten text. However as per the World Bank, the global literacy rates are near 90% while even in richer OECD country, one fourth of the population cannot use computers.

Typed text

The technology to read typed text is called OCR or Optical Character Recognition. It could be used for reading documents, traffic signs, number plates etc in printed form or images. The first task for any OCR software is pre-processing where it does a number of activities e.g.it de-skews i.e. tilts the image to make characters perfectly horizontal or vertical and despeckles i.e. removes spots and smoothens edges etc. Thereafter it isolates characters and removes unnecessary boxes and lines. It also does zonal analysis to find what type of text is expected in each section of the document.

There are 2 main methods to read characters i.e pattern matching and feature extraction. In the former and the older method, software tries to match each character with a matrix of characters stored in the database. This works well if the database has the characters of same font and roughly the same size as to be matched character. The more recent method of feature extraction extracts features e.g. lines, their intersections, circle etc e.g. letter “A” would have 2 inclined intersecting lines. This removes the restriction of matching only stored fonts. Recently neural networks have been applied for both methods.

Accuracy of OCR can be further improved by post processing i.e. using a lexicon which contains list of permitted words for that type of document or checking group of words that appear together e.g. correcting “living groom” to “living room”. Commercially available software have accuracy upto 99%.

Handwritten text

However accuracy of OCR techniques fall dramatically with handwritten text as humans have different writing styles. Also characters may be joined together to form words called cursive handwriting. Modern handwriting recognition software almost certainly use neural networks. The subset of OCR that deals with handwriting recognition is called Intelligent Character Recognition (ICR) and one targeted for cursive handwriting is called Intelligent Word Recognition (IWR).

Technology

Convolutional neural networks (CNN) are used in image processing for feature extraction. Recurrent neural networks and their variants e.g. LSTMs (Long short-term memory), GRUs(Gated Recurrent Units) etc are used in language models and can predict correct sequent of words in a sentence. The same are used here. Training of neural networks requires vast amount of data. Many databases have been created for training them for handwriting recognition e.g. MNIST. Its extended version, i.e. EMINST contains 240,000 training images, and 40,000 testing images of digits and characters written by different people. Street View Text dataset was created from Google Street View, and has outdoor street level signs and boards etc.

Handwritten text recognition has two main parts, i.e. online where conversion happens instantaneously as characters are written on screen and offline where analysis happens on static document.

Online

In online methods, pen or stylus is used for data input in a transducer based screen that can detect pressure. Online methods benefit from additional inputs e.g. direction of movement of pen, when pen or stylus is lifted or put down, pen pressure etc, also called digital ink. This additional temporal information along with absence of background noise makes them more accurate than offline scenarios. They can even recognize scientific and mathematical equations. However their software is more complicated as it needs to process additional information besides issues e.g. stroke order variation e.g. letter ‘E’ could be written in four ways, delayed strokes etc. They use the pre processing, feature extraction, classification, and post processing cycle.

Accuracy rates upto 90% are possible but even this level of accuracy imply more than 10 errors for a document with just 100 words. They are still less accurate than keyboards and that has limited their use to small number of mobile phones, tablets etc. Companies e.g. Google, Microsoft etc and many smaller companies are active in this area.

Algorithms

Many of the algorithms used for online methods for recognition are similar to that used for offline methods but use different features. Machine Learning algorithms e.g. K Nearest Neighbour (KNN), Hidden Markov Model (HMM), Support Vector Machines (SVM) etc have been used.

Nowadays Deep Learning algorithms are being used e.g. Multi Dimensional LSTMs that read from multiple previous layers. Their bidirectional versions can read from either side of the word allowing them to capture context from many dimensions. Another example is Encoder Decoder models with Attention that use CNNs, GRUs. Attention feature focusses only on required features and thus even larger sentences can be handled. Transformer models use multi headed attention and can handle both character recognition and language related dependency thus removing the need of post processing.

Offline

Smaller specific applications

Handwriting recognition works more accurately when used with segmentation techniques e.g using a pre printed form that can force users to enter data in each box thus isolating characters, to use only capital letters thus reducing the character set to look for, separating areas for entry of numerical characters, using checkboxes that limit the variety of possible answers e.g. only character to mention gender among a total of fixed number of characters. This is similar to zonal OCR. The pre printed colour called ‘dropout colour’ can be easily discarded by scanners leaving only handwriting.

Such templates can be created for reading patient prescriptions in healthcare industry, claims forms in insurance industry, cheques in banking sector, invoices in multiple industries, etc. For the cheques, signature verification is done automatically for majority of cases and based on thresholds set, the rest are verified manually.

Longer text

The hardest and the biggest potential use case of handwriting recognition is that of longer and free flowing text. The challenges multiply for cursive writing or if no contextual or grammar specific information is available.

The low current level of accuracy imply that the market is small. As per Credence Research the global handwriting recognition market was about 1 Billion in 2016 but is expected to grow at compound rate of 15.7% between 2016 and 2025.

Summary

More research and sophistication is needed especially in neural network models for handwriting recognition software to gain mainstream acceptance, Despite the increasing use of IT handwriting recognition is still a potential untapped area.

Sandeep K Chhabra

Guest Author Sandeep K Chhabra is a software professional working as General Manager at Ericsson India Global Services Pvt Ltd (EGIL). He is B Tech from IIT Delhi in Computer Science and Technology has more than 24 years of experience of working in IT industry. He is a Digital/Business transformations expert, startup mentor and an evangelist of emerging technologies.