Natural language processing is the semantic analysis and interpretation of a text, allowing computers to learn, analyze and understand human language. These features can be applied to use cases like text mining, recognizing individual people through their writing and conducting sentiment analysis. Watson ™ Natural Language Understanding offers advanced text analysis.
Watson Natural Language Understanding can analyze text and return a taxonomy of five levels of content, as well as concepts, emotions, feelings, entities and relationships. The new version of its syntax API feature allows users to extract much more semantic information from their content, taking advantage of tokenization, grammatical classes, stemming and sentence division. Let’s take a look at each of these subfeatures.
Tokenization is the process in which phrases are segmented into words, phrases or symbols called tokens. This is a crucial and necessary step that takes place before any data processing. Tokenization is essentially a pre-processing of data, identifying the basic units needed to be processed. Without these basic units, content analysis is difficult.
In most languages, words are often divided by blanks. Tokenization divides words and punctuation symbols and provides the building blocks for text.
Documents will have multiple versions of a word within the content – execute and execute, for example. These words and their different forms have similar meanings only in their simplest forms. The purpose of stemming is to reduce the complexity of these words and divide them into their simplest forms.
Through stemming, we see that the more complex version of the words was divided into simpler meanings, although still transmitting the same thought. Stemming allows users to reduce the complexity of their algorithms by reducing words to their simplest forms.
Parts of speech
After a sentence is tokenized, each token is categorized by a certain grammatical class. Watson Natural Language Understanding uses universal grammatical classes in all languages, including noun, verb, adjective, pronoun, punctuation and proper noun. Parts of speech are extremely important when it comes to natural language processing and can be used to disambiguate the meaning of the word and to understand the intention behind each entry within a sentence.
There will be several cases where users will have to know when a sentence ends and when the next one begins without being confused with a proper name, such as “Mr.” or “Mrs.” in the sentence. Watson Natural Language Understanding determines when a complete thought has been expressed in one sentence and can tell when that sentence ends and when the next one begins.