Ever since computers entered the lives of humans, they have dictated the ways humans interact with them. Humans interact between themselves using text and speech of human languages but they to need to change their ways of communication to interact with computers.
Natural Language Processing
Natural Language Processing or NLP aims to change this man-machine interaction. It is a branch of computer science that deals with software processing of natural (human) languages. It has three main branches i.e. input, processing and output. On the input side, it deals with reading data from texts, images and understanding human handwriting. This has to be followed by the computationally hard problem of processing data as human interaction is often imprecise, grammatically incorrect, loaded with slangs and acronyms and varies with context. On the output side, it aims to generate natural language. Both on input and output side additional challenges appear when working on speech rather than text i.e. speech recognition and speech generation. On the processing side, different human languages bring a different set of difficulties and techniques developed for one language e.g. English will need modifications to use for another language e.g. Arabic.
The growth of NLP is aided by advances in Artificial Intelligence especially Machine Learning and Neural Networks and Computational Linguistics besides the increase in computer processing speeds and storage space.
Input recognition
The input recognition has 2 main areas i.e. reading typed or printed text and understanding the handwritten text. The technology to understand the images of typed or printed text is called Optical Character Recognition (OCR) and is fairly common. It may compare the image to a stored character image on the pixel by pixel basis or on the basis of features. Commercial software has more than 90% accuracy and they also try to form correct words from the misspelled words. In fact, technology has advanced to an extent that various free online tools are available. One obvious use is to convert the printed text of books and newspapers to electronic form including a text of yesteryears. The output is editable as well as searchable. One example of the use of OCR is Gutenberg project which has created 57000 eBooks so far. The technology can also find pieces of texts in images e.g. passports, number plates, visiting cards etc.
One area of research is handwriting recognition. Decoding text from handwritten pages uses OCR but needs the additional input of the same handwriting that is used as the target data. This is used to "train" the underlying Neural network to improve accuracy. This feature is available in various versions of Windows, OS, Linux etc. The more common handwriting recognition use case is the online part where a person feeds data into a tablet or a laptop using a stylus or finger. This looks like a natural method to use but accuracy and speed have limited its use to small inputs.
Speech recognition is more challenging than text recognition as it has to deal with factors e.g. accent, pitch, pronunciation, volume, background noise, culture, gender etc.
Some research is also happening on use of natural languages as programming languages e.g. Wolfram Mathematica. However, programming languages not only pass precise instructions, they are optimized for use with respect to underlying hardware and software, have extensive libraries and are built for a specific purpose. Hence this is not a significant use case as of now.
Processing of information
Processing of information or Natural Language Understanding (NLU) is most challenging part of NLP and is considered to be Artificial Intelligence hard problem. It deals with unstructured inputs that are governed by poorly defined and flexible rules and converts them into a structured form that a machine can understand and act upon. There are many tasks to accomplish e.g.it has to break the document into sentences and sentence into words (Tokenization), find the part of speech it is (part of speech tagging), tag it to proper names if possible (Named Entity Recognition), find keywords across documents (TF-IDF), find words that are used together (ngrams) etc. Advanced systems can even find emotions and sentiments.
Challenges to accompany mentioned tasks vary with languages e.g. tokenization which looks trivial in English is difficult in non-phonetic languages e.g. Chinese or Japanese where spaces do not separate the words etc.
NLP on the input side is already used by voice assistants e.g. Google's Assistant, Microsoft's Cortana, Apple's Siri, Amazon's Alexa etc. It is behind the voice commands in automobiles, voice dialling in phones, creating medical transcriptions, voice search, chatbots, IVRs etc.
Improvements in speech recognition will allow complex, multiple sentences to be used in all the previously mentioned cases rather than simple commands. Numerous customer reviews presented on websites of corporates are processed for sentiments giving an objective outcome. The success of IoT will also be influenced as users may find it easier to command at least some devices by voice due to the numbers involved. Gartner says by 2019, 20 Percent of User Interactions with smartphones will take place via VPAs.
Output generation
Output generation or Natural Language Generation (NLG) is the task of turning machine knowledge base, facts and statistics into natural languages. The task involves document planning when the system decides the overall structure and the content of the document. Then it needs to decide the words to choose and form sentences. Alongside it needs to decide the grammatical choices e.g. voice, person, use of phrases etc. Speech generation or synthesis further needs an expansion of texts e.g. numerals, special characters, abbreviations; correct pronunciation and prosody which includes pitch, duration, intonation etc.
NLG can improve the interaction with voice assistants, chatbots, IVRs etc. Voice assistants may evolve into companions whom people can talk to. One important use of NLG is a conversion of data into natural languages using templates, thus summarizing business and financial data. This can create personalized reports and hence improve customer experience. Various companies e.g. Amazon, IBM, NLS cloud, Wordsmith, Yseop, Quill, Arria etc. offer commercial grade software for text and speech generation. But the more challenging use case of NLG is automated content generation using web mining. This reduces efforts and improves speed in the analysis. It is already used extensively by newspapers e.g. Washington Post, the New York Times, and Norwegian news agency NTB etc. NLG is even used for writing books e.g. "The Day A Computer Writes A Novel" is written by a bot.
Summary
NLP is allowing humans to interact with computers the same way they interact with themselves. It is also increasing the use of computers in our lives. Hence man-machine interface is not only changing, it's importance is actually reducing.