The Importance of Naive Bayes Classifiers
In recent years, with an increase in wealth information like emails, tweets, posts, and product reviews by consumers on the internet, Businesses are trying to derive the sentiments from these texts in order to understand the likes and dislikes of their customers.
Therefore, Sentiment analysis has become a major need for business. Fortunately, with the power of machine learning and artificial intelligence, we can extract the sentiments from texts and can predict the opinions of the users about a particular subject.
There are various different techniques for doing sentiment analysis and to know which one is best suited to the business requirements. It’s one of the most complicated tasks. For a machine, there is no concept of spoken language, unlike humans. In order to enable the machine to make decisions or predict sentiments from a given text or information, it must be trained first. The training dataset will train the machine with positive, negative and neutral polarity words. Applying math to this dataset machine will classify the text into positive, negative or neutral, hence predicting overall emotion from a piece of text.
DIFFERENT TECHNIQUES FOR SENTIMENT ANALYSIS
There are several algorithms for doing the text classification for sentiment analysis like for eg. Decision Tree, Naive-Bayes, Support Vector Machines, K Nearest Neighbours, K-Means, Fuzzy C-Means, LDA-Topic Clustering. Below are the short descriptions of these algorithm basic strategy. We will not go deep into these except for the Naive Bayes.
Decision tree algorithm comes under the family of supervised learning algorithms. Decision tree algorithm can solve the problems of regression and classification. This algorithm uses the tree graph model. Each node in the tree refers to an attribute and every leaf node corresponds to a class label. The best of the training data set is placed as the root of the tree. Training data set is further split into subsets that are created in such a way that each subset has data with the same value for each attribute.
Deciding which data should be set as a root node is the primary challenge in the decision tree implementation. Doing this is called attribute selection. Again there are different attribute selection techniques like information gain and Gini index.
SUPPORT VECTOR MACHINES
In machine learning, support vector machines also known as SVM are supervised learning models created by Hava Siegelmann and Vladimir Vapnik. Support Vector machine is a type of classifier that does the classification task by constructing a set of hyperplanes in an infinite dimensional space or n-dimensional space. For instance, if there are two input variables, then it would lead to two-dimensional space. The hyperplane is a line that splits the input variable space. The dimension can be visualized as a line and all the input variables are separated by this line which is called the linear classifier. SVM is the most widely used clustering algorithm used in industrial applications.
‘K’ MEANS CLUSTERING
K-means clustering is a kind of unsupervised learning algorithm which is used when there is unlabeled data which is data without defined categories. K-means unsupervised model finds groups in the data with similarity and categorizes them in the number of groups represented by the variable K. To calculate the similarity for categorization, it uses the Euclidean distance as a measurement. First, K random points are initialized called means. The points are called means because they have the mean value of the items categorized in it. Each item is categorized in the closest mean and then mean’s coordinates are updated which is the average of the items categorized in that mean so far. This process is repeated for iteration till the clusters are obtained.
All different techniques yield different results with the same dataset. Deciding which one to use depends upon the dataset and subject of the sentiment analysis. There is no particular algorithm that will work best for all cases. Proper testing of each algorithm with dataset needs to be done. Among many text classification algorithms, Naive Bayes is the most basic, simple and easy to use an algorithm. It has simple mathematics involved and performs fairly well. Also, it is less expensive in terms of memory and CPU usage.
Naive Bayes Classifier is a text classification technique also used in Machine learning. It has various applications in sentiment analysis, email spam detection, language detection and document categorization to name a few. Naive Bayes classifiers is a collection of various algorithm all fundamentally based on Bayes Theorem.
Bayes Theorem is a probability theorem which calculates the probability of an event to occur given the probabilities or knowledge of other conditions that are related to the event.
Mathematically Bayes Theorem can be stated as follows:
P(A|B)= P(B|A) P(A)/P(B)
A and B are events and P(B)!=0.
‘Naive Bayes’ as the name implies makes the naive assumption that all the events or features make the equal and independent contribution to the outcome. This assumption mostly does not give correct estimates but makes good decisions and is computationally very cheap.
NAIVE BAYES BASED TEXT CLASSIFIER EXAMPLE
For example, suppose we are building a text classifier which tells whether a given text is about cryptocurrency or not. Prior to that, we need a training dataset. suppose we have it as follows with text and tags.
|India may not ban Cryptocurrencies but treat them as commodities.||Cryptocurrency|
|Bitcoin prices fall following the slump in Cryptocurrency market.||Cryptocurrency|
|An enterprise value of Parador Holdings is 80 million euros.||Not Cryptocurrency|
|Over 30 stocks defy positive market mood, hit lowest on NSE.||Not Cryptocurrency|
|BuyUcoin is a cryptocurrency exchange.||Cryptocurrency|
If we want to know about the text – ” Best platform to buy Cryptocurrency” falls under which tag. According to Naive Bayes, we want to calculate P(Cryptocurrency|best platform to buy cryptocurrency). That means the probability is that the tag with cryptocurrency has the text “best platform to buy cryptocurrency”.For enabling the machine to decide about the tag we give it some information which machine uses to decide. This information can be words or frequency of words. This will not include the sentence construction or word order. Using this word information Bayes probabilistic theorem will calculate the probability of the above sentence falling under a particular tag.
It will calculate on each individual words in the sentence “best platform to buy Cryptocurrency” and hence “Cryptocurrency to buy the best platform” or “buy platform best Cryptocurrency” is all same for the machine.
Mathematically this means : P(best platform to buy Cryptocurrency) = P(best) * P(platform) * P(to) * P(buy) * P(Cryptocurrency).
Now here some words will be in our Training data set. Here the word cryptocurrency is the only word present. One interesting thing to note here is that the other words have no occurrence in our data set and so the combined probability will be zero. This concern is solved by using Laplace smoothing which adds 1 to every count of the word so that it is never zero.
We calculate the individual probabilities of all these words with words of Cryptocurrency and not Cryptocurrency. Comparing both for which is greater, we can predict the tag of the sentence.
P(best|Cryptocurrency) * P(platform|Cryptocurrency) * P(to|Cryptocurrency) * P(buy|Cryptocurrency) * P(Cryptocurrency|Cryptocurrency)
P(best|Not Cryptocurrency) * P(platform|Not Cryptocurrency) * P(to|Not Cryptocurrency) * P(buy|Not Cryptocurrency) * P(Cryptocurrency|Not Cryptocurrency)
Voila! We can thus conclude that the “best platform to buy Cryptocurrency” has the Cryptocurrency tag.
The Naive Bayes classifier model outperforms several other higher level classifiers in many cases and has low CPU uses. There are many other variations of Naive Bayes like Multinomial Naive Bayes, Binomial Naive Bayes, Bernoulli Naive Bayes. All perform differently with the same data set. All of these should be used and tested for determining which one is best suited for the text classification purpose.