Important questions on NLP

Two (02) Mark Question

Q. 1. What is a Chabot?
A.
A chatbot is a computer program that's designed to simulate human conversation through voice commands or text chats or both. Eg: Mitsuku Bot, chatterbot etc
OR
A chatbot is a software application used to conduct an on-line chat conversation via text or text-to-speech, in lieu of providing direct contact with a live human agent.


Q. 2. What is the full form of NLP?
A.
Natural Language Processing


Q. 3. While working with NLP what is the meaning of?
a. Syntax
b. Semantics
A.
Syntax: Syntax refers to the grammatical structure of a sentence.
Semantics: It refers to the meaning of the sentence.

Q. 4. What is the difference between stemming and lemmatization?
A.
Stemming is a technique used to extract the base form of the words by removing affixes from them. For example, the stem of the words eating, eats, eaten is eat.

OR
Stemming is the process in which the affixes of words are removed and the words are converted to their base form.
In lemmatization, the word we get after affix removal (also known as lemma) is a meaningful one. Lemmatization makes sure that lemma is a word with meaning and hence it takes a longer time to execute than stemming.


Q. 5. What is the full form of TFIDF?
A.
Term Frequency and Inverse Document Frequency


Q. 6. What is meant by a dictionary in NLP?
A.
Dictionary in NLP means a list of all the unique words occurring in the corpus. If some words are repeated in different documents, they are all written just once as while creating the dictionary.


Q. 7. What is term frequency?
A.
Term frequency is the frequency of a word in one document. Term frequency can easily be found from the document vector table as in that table we mention the frequency of each word of the vocabulary in each document.

Q. 8. Which package is used for Natural Language Processing in Python programming?
A.
Natural Language Toolkit (NLTK). NLTK is one of the leading platforms for building Python programs that can work with human language data.


Q. 9. What is a document vector table?
A.
Document Vector Table is used while implementing Bag of Words algorithm. In a document vector table, the header row contains the vocabulary of the corpus and other rows correspond to different documents. If the document contains a particular word it is represented by 1 and absence of word is represented by 0 value.
OR
Document Vector Table is a table containing the frequency of each word of the vocabulary in each document.


Q. 10. What do you mean by corpus?
A.
In Text Normalization, we undergo several steps to normalize the text to a lower level. That is, we will be working on text from multiple documents and the term used for the whole textual data from all the documents altogether is known as corpus.
OR
A corpus is a large and structured set of machine-readable texts that have been produced in a natural communicative setting.
OR
A corpus can be defined as a collection of text documents. It can be thought of as just a bunch of text files in a directory, often alongside many other directories of text files.

Q. 11. What are the types of data used for Natural Language Processing applications?
A.
Natural Language Processing takes in the data of Natural Languages in the form of written words and spoken words which humans use in their daily lives and operates on this.


Q. 12. Differentiate between a script-bot and a smart-bot. (Any 2 differences)
A.

script-smart-bot

Q. 13. Give an example of the following:
 Multiple meanings of a word
 Perfect syntax, no meaning
A.
 Example of Multiple meanings of a word –
His face turns red after consuming the medicine.
Meaning - Is he having an allergic reaction? Or is he not able to bear the taste of that medicine?

 Example of Perfect syntax, no meaning-
Chickens feed extravagantly while the moon drinks tea.
This statement is correct grammatically but it does not make any sense. In Human language, a perfect balance of syntax and semantics is important for better understanding.


Q. 14. Define the following:
● Stemming
● Lemmatization
A.
Stemming: Stemming is a rudimentary rule-based process of stripping the suffixes (“ing”, “ly”, “es”, “s” etc) from a word. Stemming is a process of reducing words to their word stem, base or root form (for example, books — book, looked — look).

Lemmatization: Lemmatization is an organized & step by step procedure of obtaining the root form of the word, it makes use of vocabulary (dictionary importance of words) and morphological analysis (word structure and grammar relations).

Q. 15. Which words in a corpus have the highest values and which ones have the least?
A.
Stop words like - and, this, is, the, etc. have highest values in a corpus. Hence, these are termed as stopwords and are mostly removed at the pre-processing stage only.

Rare or valuable words occur the least but add the most importance to the corpus. Hence, when we look at the text, we take frequent and rare words into consideration.

word in Corpus

Q. 16. Does the vocabulary of a corpus remain the same before and after text normalization? Why?
A.
No, the vocabulary of a corpus does not remain the same before and after text normalization. Reasons are –
● In normalization the text is normalized through various steps and is lowered to minimum vocabulary since the machine does not require grammatically correct statements but the essence of it.
● In normalization Stop words, Special Characters and Numbers are removed.
● In stemming the affixes of words are removed and the words are converted to their base form.
So, after normalization, we get the reduced vocabulary.


Q. 17. What is the significance of converting the text into a common case?
A.
In Text Normalization, we undergo several steps to normalize the text to a lower level. After the removal of stop words, we convert the whole text into a similar case, preferably lower case. This ensures that the case-sensitivity of the machine does not consider same words as different just because of different cases.


Q. 18. Mention some applications of Natural Language Processing.
A.
Natural Language Processing Applications-
● Sentiment Analysis.
● Chatbots & Virtual Assistants.
● Text Classification.
● Text Extraction.
● Machine Translation
● Text Summarization
● Market Intelligence
● Auto-Correct

Q. 19. What is the need of text normalization in NLP?
A.
Since we all know that the language of computers is Numerical, the very first step that comes to our mind is to convert our language to numbers.
This conversion takes a few steps to happen. The first step to it is Text Normalization. Since human languages are complex, we need to first of all simplify them in order to make sure that the understanding becomes possible. Text Normalization helps in cleaning up the textual data in such a way that it comes down to a level where its complexity is lower than the actual data.


Q. 20. Explain the concept of Bag of Words.
A.
Bag of Words is a Natural Language Processing model which helps in extracting features out of the text which can be helpful in machine learning algorithms. In bag of words, we get the occurrences of each word and construct the vocabulary for the corpus. Bag of Words just creates a set of vectors containing the count of word occurrences in the document (reviews). Bag of Words vectors are easy to interpret.


Four 04 Mark Questions
Do Practice of such questions

Q. 1. Create a document vector table for the given corpus:
Document 1: We are going to Mumbai
Document 2: Mumbai is a famous place.
Document 3: We are going to a famous place.
Document 4: I am famous in Mumbai.
A.

document vector table

Q. 2. What are the steps of text Normalization? Explain them in brief.
A.
Click here to know answer

Q. 3. Normalize the given text and comment on the vocabulary before and after the normalization:
Raj and Vijay are best friends. They play together with other friends. Raj likes to play football but Vijay prefers to play online games. Raj wants to be a footballer. Vijay wants to become an online gamer.
Normalization of the given text:
A.
Sentence Segmentation:
1. Raj and Vijay are best friends.
2. They play together with other friends.
3. Raj likes to play football but Vijay prefers to play online games.
4. Raj wants to be a footballer.
5. Vijay wants to become an online gamer.