Oct 14, 2021 By Team YoungWonks *
What is Natural Language Processing aka NLP? When we share a voice command with an intelligent virtual assistant (IVA) such as Siri or Alexa, what is the technology that helps the device (iPhone or Amazon Echo) process the information? Broadly speaking, it is Machine Learning (ML), a field of Artificial Intelligence (AI). But there is yet another subfield of AI that comes to play here and that is NLP. In this blog post, we shall talk about this field that has become increasingly relevant today.
What is Natural Language Processing (NLP)?
A subfield of linguistics, computer science, and AI, Natural Language Processing (NLP) comes under the data sciences category even as it deals with the interactions between computers and human language, especially with programming computers to process and analyze large amounts of natural language data. The aim here is to give the software the ability to understand the contents shared with it along with the contextual nuances of the language, so it can process information in a given language a human would. This technology can accurately extract information and insights from the shared documents and even classify and sort the documents themselves.
Thus, NLP is an interdisciplinary field running software that in turn helps us in our day-to-day lives today. Knowing NLP is key for data scientists given the fact that text is an easy-to-use and common container for storing data.
Some of the use cases for NLP are IVAs such as Siri, Cortana and Google Assistant (read more about IVAs in our blog here: https://www.youngwonks.com/blog/What-is-an-Intelligent-Virtual-Assistant); the auto-complete feature in search engines such as Google and Bing; spell checks in one’s browser, one’s Integrated Development Environment (IDE, say Visual Studio) and desktop apps such as Microsoft Word); and machine translation (example, Google Translate.)
In addition to the above daily computer interactions, NLP also boasts several business-related use-cases. For instance, it can - and is - being used as part of conversational AI, where virtual assistants automate processes such as handling orders and complaints, thereby reducing human intervention. This also ends up saving companies lots of time, labor and cost.
Today, NLP algorithms are used to share automatic summarization of the main points in a given text or document. There are also algorithms that can organize information and even sort text as per the predefined categories or classes, particularly for email routing and spam filtering.
In fact, NLP models are being used today to measure the success rate of social media campaigns; it does so by interpreting consumers’ public sentiment, again saving humans the efforts of doing this task.
Today, an NLP task comprises three parts:
This entails the translation of spoken language into text.
Natural Language Understanding (NLU)
This refers to the computer's ability to comprehend what we say.
Natural Language Generation (NLG)
It refers to the generation of natural language by a computer.
NLP is rather challenging since it is about understanding human language which in turn is a difficult task for a program to achieve on account of the complexity of the human language. For instance, there are infinite ways of arranging words in a sentence. Plus, words mean different things in different situations, so context becomes crucial to correctly interpreting sentences.
Different programming languages have different toolkits / libraries for NLP. For instance, NLTK (Natural Language Toolkit) is the API for NLP (Natural Language Processing) with Python.
It is important to note that NLP works through syntactic analysis (syntax) and semantic analysis (semantic), the two primary techniques of understanding natural language. While syntax refers to the grammatical structure of the text, semantics is the meaning being conveyed. So a sentence needs to be both syntactically and semantically correct. For eg, the sentence, Birds flow supremely may be grammatically sound but it still doesn’t make any sense.
The process of analyzing natural language with the rules of a formal grammar is called syntactic analysis, or syntax analysis or parsing. Grammatical rules are applied to categories and groups of words, not individual words. Syntactic analysis basically lends a semantic structure to text.
How one understands what has been said is something that depends on the context, our intuition and knowledge about language itself. So meaning and context have a key role to play.
Semantic analysis is thus the process of understanding the meaning and interpretation of words, signs and sentence structure. It allows computers to process natural language the way humans do; it is also one of the toughest parts of NLP and is not even fully solved yet. So while speech recognition is very doable now and is carried out almost flawlessly, but understanding natural language the same way as a human is still very tricky. Because of this, often your phone - despite having a basic understanding of what you said - may not act upon it since it doesn’t grasp the meaning behind it.
Today, some technologies may make you think they understand the meaning of a text but it may not be the case. An approach based on keywords or statistics or even pure machine learning methods could mean just relying on a matching or frequency technique for clues about the meaning of the text. Hence, such methods are limited as they do not seek the real underlying meaning.
What is parsing? Parsing refers to resolving a sentence into its component parts and describing their syntactic roles. It is the formal analysis of a sentence by a computer by breaking it into its constituents, which gives us a parse tree showing their syntactic relation to one another in visual form, and this can then be used for further processing and understanding.
Stemming is a technique that comes from morphology and information retrieval and is used for pre-processing and efficiency purposes. So stemming is about reducing words to their word stem. For example, the stem for the word ‘cleaned’ is ‘clean’. Clean is also the stem of ‘cleaning’ and so on. Since understanding language means having to interpret different combinations of words with the same stem and the same meaning, stemming is quite useful.
Text segmentation in NLP is the converting text into units like words, sentences, topics, the underlying intent and more. Mostly, the text is broken down into component words, which is a difficult task, thanks to the complexity of human language.
Named Entity Recognition
Named entity recognition (NER) focuses on figuring out the items in a text (i.e. the "named entities") that can be located and sorted into pre-defined categories. These categories could be the names of people, things, organizations and locations to monetary values and percentages. Say for example, there’s this sentence: Mike donated 20 dollars to UNICEF in 2019. After NER, it would be processed like this: [Mike]Person donated 20 dollars to [UNICEF]Organization in Time.
Relationship extraction is when the semantic relationships between the named entities of NER are identified. This includes finding out who is related to whom, whether a person works for a particular company and so on. This task can also be converted into a classification problem and a machine learning model can be trained for each relationship type.
Here, the attitude (i.e. the sentiment) of a speaker or writer - with regards to a document, interaction or event - is sought out. So the text needs to be understood in order to get at the underlying intent. The sentiment is usually classified into positive, negative and neutral categories. With the help of sentiment analysis, we could predict a customer’s take and approach about a product on the basis of a review they have shared. Sentiment analysis is mainly applied to assess reviews, surveys and documents.
NLP and Deep Learning
Deep learning is a Machine Learning technique that teaches computers to do what humans can do naturally: learn by example. It’s not surprising to see that Deep Learning, a subset of ML, is used extensively for NLP. (If you wish to know more about Deep Learning, check out our blog here: https://www.youngwonks.com/blog/What-is-Deep-Learning.)
To begin with, Deep Learning can make sense of the structure of sentences with syntactic parsers. Deep learning models are also used for sentiment analysis. Take for example, a comment like this: “Mike couldn’t not care less about cleanliness.” A traditional approach would interpret this as positive intent, but a neural network would be able to get at its real meaning. Chatbots, IVAs (such as Siri and Google Assistant), machine translation and Google inbox suggested replies count among other applications of Deep Learning and NLP together.
1. Search Autocorrect and Autocomplete
When one looks up something on Google, after typing 2-3 letters, you get a prompt from Google on the possible search terms. Or, even if you search for something with typos, Google corrects them and manages to get you relevant results. This is thanks to NLP.
2. Grammar checkers
NLP is widely used for checking grammar. Grammar Checking tools such as Grammarly come with loads of features offering help in writing better content.
3. Language Translation
Google Translate and even machine translation - the process of automatically converting the text in one language to another language while keeping the meaning intact -are examples of NLP being used for language translation.
4. Email filtering
When we get an email, it gets sorted into our Primary, Social and Promotions categories. Spam gets piled up separately as well. This email filtering takes place thanks to text classification, which is an NLP technique where a piece of text gets sorted into pre-defined categories.
5. Social media monitoring
As mentioned earlier, NLP is also used for gleaning and understanding info from social media. With increasingly more people on social media, NLP is used to analyze vast sums of unstructured data , including people's likes and dislikes. This in turn helps generate valuable insights.
6. Targeted Advertising
Targeted advertising works mainly on Keyword Matching, so that the ads are linked to certain keywords or phrases, and are shown to only those users who search for the keywords similar to the ones associated with the advertisement. Here again NLP is at play.
NLP is used to run chatbots that help the companies offer smooth customer experience.
With NLP, recruitment can be a lot easier. A recruiter deploying NLP would not need tocheck every resume and shortlist the right candidates manually. With Named Entity Recognition, NLP can extract relevant data about skills, name, location, and education.
9. Survey Analysis
Surveys help with assessing a company’s performance and they also help companies get customer feedback on various products. But with more people taking a survey, it can increasingly tricky to have them all read. This is where NLP comes in. NLP is used these days for analyzing surveys and gathering insights from them.
10. Voice assistants or Intelligent Virtual Assistants (IVAs)
Voice assistants such as Siri, Alexa and Google Assistant uses speech recognition, natural language understanding, and NLP to understand the voice commands of a user and carry out tasks actions accordingly.
11. NLP in healthcare
With NLP, massive amounts of patient datasets can be searched, analyzed and interpreted; this is why its use in healthcare is rising now. Indeed, the adoption of natural language processing in healthcare is rising because of its recognized potential to search, analyze and interpret mammoth amounts of patient datasets.
Thus we see how NLP has shot to prominence in recent times and is likely to stay so in light of recent tech developments and its ever growing scope.
*Contributors: Written by Vidya Prabhu; Lead image by: Abhishek Aggarwal