Naive Bayes Tutorial for Machine Learning

Naive Bayes is a very simple classification algorithm that makes some strong assumptions about the independence of each input variable.

Nevertheless, it has been shown to be effective in a large number of problem domains. In this post you will discover the Naive Bayes algorithm for categorical data. After reading this post, you will know.

• How to work with categorical data for Naive Bayes.
• How to prepare the class and conditional probabilities for a Naive Bayes model.
• How to use a learned Naive Bayes model to make predictions.

This post was written for developers and does not assume a background in statistics or probability. Open a spreadsheet and follow along. If you have any questions about Naive Bayes ask in the comments and I will do my best to answer.

Let’s get started.

Naive Bayes Tutorial for Machine Learning

Photo by Beshef , some rights reserved.

Tutorial Dataset

The dataset is contrived. It describes two categorical input variables and a class variable that has two outputs.

`Weather Car Class sunny working go-out rainy broken go-out sunny working go-out sunny working go-out sunny working go-out rainy broken stay-home rainy broken stay-home sunny working stay-home sunny broken stay-home rainy broken stay-home `

We can convert this into numbers. Each input has only two values and the output class variable has two values. We can convert each variable to binary as follows:

• sunny = 1
• rainy = 0

• working = 1
• broken = 0

Variable: Class

• go-out = 1
• stay-home = 0

Therefore, we can restate the dataset as:

`Weather Car Class 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 `

This can make the data easier to work with in a spreadsheet or code if you are following along.

Learn a Naive Bayes Model

There are two types of quantities that need to be calculated from the dataset for the naive Bayes model:

• Class Probabilities.
• Conditional Probabilities.

Calculate the Class Probabilities

The dataset is a two class problem and we already know the probability of each class because we contrived the dataset.

Nevertheless, we can calculate the class probabilities for classes 0 and 1 as follows:

• P(class=1) = count(class=1) / (count(class=0) + count(class=1))
• P(class=0) = count(class=0) / (count(class=0) + count(class=1))

or

• P(class=1) = 5 / (5 + 5)
• P(class=0) = 5 / (5 + 5)

This works out to be a probability of 0.5 for any given data instance belonging to class 0 or class 1.

Calculate the Conditional Probabilities

The conditional probabilities are the probability of each input value given each class value.

The conditional probabilities for the dataset can be calculated as follows:

Weather Input Variable

• P(weather=sunny|class=go-out) = count(weather=sunny and class=go-out) / count(class=go-out)
• P(weather=rainy|class=go-out) = count(weather=rainy and class=go-out) / count(class=go-out)
• P(weather=sunny|class=stay-home) = count(weather=sunny and class=stay-home) / count(class=stay-home)
• P(weather=rainy|class=stay-home) = count(weather=rainy and class=stay-home) / count(class=stay-home)

Plugging in the numbers we get:

• P(weather=sunny|class=go-out) = 0.8
• P(weather=rainy|class=go-out) = 0.2
• P(weather=sunny|class=stay-home) = 0.4
• P(weather=rainy|class=stay-home) = 0.6

Car Input Variable

• P(car=working|class=go-out) = count(car=working and class=go-out) / count(class=go-out)
• P(car=broken|class=go-out) = count(car=brokenrainy and class=go-out) / count(class=go-out)
• P(car=working|class=stay-home) = count(car=working and class=stay-home) / count(class=stay-home)
• P(car=broken|class=stay-home) = count(car=brokenrainy and class=stay-home) / count(class=stay-home)

Plugging in the numbers we get:

• P(car=working|class=go-out) = 0.8
• P(car=broken|class=go-out) = 0.2
• P(car=working|class=stay-home) = 0.2
• P(car=broken|class=stay-home) = 0.8

We now have every thing we need to make predictions using the Naive Bayes model.

Make Predictions with Naive Bayes

We can make predictions using Bayes Theorem.

P(h|d) = (P(d|h) * P(h)) / P(d)

Where:

• P(h|d) is the probability of hypothesis h given the data d. This is called the posterior probability.
• P(d|h) is the probability of data d given that the hypothesis h was true.
• P(h) is the probability of hypothesis h being true (regardless of the data). This is called the prior probability of h.
• P(d) is the probability of the data (regardless of the hypothesis).

In fact, we don’t need a probability to predict the most likely class for a new data instance. We only need the numerator and the class that gives the largest response, which will be the predicted output.

MAP(h) = max(P(d|h) * P(h))

Let’s take the first record from our dataset and use our learned model to predict which class we think it belongs.

weather=sunny, car=working

We plug the probabilities for our model in for both classes and calculate the response. Starting with the response for the output “go-out”. We multiply the conditional probabilities together and multiply it by the probability of any instance belonging to the class.

• go-out = P(weather=sunny|class=go-out) * P(car=working|class=go-out) * P(class=go-out)
• go-out = 0.8 * 0.8 * 0.5
• go-out = 0.32

We can perform the same calculation for the stay-home case:

• stay-home = P(weather=sunny|class=stay-home) * P(car=working|class=stay-home) * P(class=stay-home)
• stay-home = 0.4 * 0.2 * 0.5
• stay-home = 0.04

We can see that 0.32 is greater than 0.04, therefore we predict “go-out” for this instance, which is correct.

We can repeat this operation for the entire dataset, as follows:

`Weather Car Class out? home? Prediction sunny working go-out 0.32 0.04 go-out rainy broken go-out 0.02 0.24 stay-home sunny working go-out 0.32 0.04 go-out sunny working go-out 0.32 0.04 go-out sunny working go-out 0.32 0.04 go-out rainy broken stay-home 0.02 0.24 stay-home rainy broken stay-home 0.02 0.24 stay-home sunny working stay-home 0.32 0.04 go-out sunny broken stay-home 0.08 0.16 stay-home rainy broken stay-home 0.02 0.24 stay-home `

If we tally up the predictions compared to the actual class values, we get an accuracy of 80%, which is excellent given that there are conflicting examples in the dataset.

Get your FREE Algorithms Mind Map

Sample of the handy machine learning algorithms mind map.

I’ve created a handy mind map of 60+ algorithms organized by type.

Summary

In this post you discovered exactly how to implement Naive Bayes from scratch. You learned:

• How to work with categorical data with Naive Bayes.
• How to calculate class probabilities from training data.
• How to calculate conditional probabilities from training data.
• How to use a learned Naive Bayes model to make predictions on new data.

Do you have any questions about Naive Bayes or this post.Ask your question by leaving a comment and I will do my best to answer it.

Need Help Getting Past The Math?

Finally understand how machine learning algorithms work, step-by-step in the new Ebook:

Take the next step with 12 self-study tutorials across

10 top machine learning algorithms.

Includes spreadsheets that show exactly how everything is calculated.

Ideal for beginners with no math background.