神刀安全网

Interactive Data Visualization in Python with Bokeh

Versão em português deste post / Portuguese version of this post

Introduction

After having talked about the entry door for Data Visualization in Python (matplotlib) onthis post, let’s talk now about Bokeh.

Bokeh ( official website ) is a Python library for interactive data visualization, with a style similar to D3.js. Its objective is to allow the creation of interactive charts, dashboards and Data applications.

Installation

Bokeh does not come installed with Anaconda, but it is very simple to install it. If you are using Anaconda, you only need one command to install it:

conda install bokeh

If you have all dependencies installed (NumPy, Pandas, Redis, among others) you can also install Bokeh through pip.

If you want more information about Bokeh’s installation, you can check them clicking here .

Getting Started

Well, let’s start using Bokeh. First with a very simple example, like always 😉

Let’s do our basic line chart. First, we will prepare the data for the chart, define the output file with the “output_file” function and create a figure for chart plotting with the “fig” function, setting up its title and the axes titles. Then we will plot the line passing to the “line” function the chart data that we prepared, and finally we will use the “show” command to show the figure:

import pandas as pd import numpy as np from bokeh.plotting import figure, output_file, show  # Data preparation y = [10, 20, 30, 40, 50] x = range(len(y))  # Configuring plot output file output_file("bokeh_example_1.html", title="Bokeh Line Chart Example")  # Create the figure and define some properties fig = figure(title="Bokeh Line Chart Example", x_axis_label='x', y_axis_label='y')  # Add the line fig.line(x, y)  # Show results, similar to matplotlib show(fig)

Note that you can pan the chart, save, zoom in with the mouse scroll. This interactivity is really nice when you want to create a web application that involves charts.

Scatter Plots

Now, let’s see how we can create a scatter plot with Bokeh, like the one we created on the previous post. Like the first example, we will set the data that will be used for the plot, extracting them from the Titanic Dataset. Then, we will configure the output file and the figure, but now we will use the “circle” function from the figure to create the plot points. Let’s set an alpha value for transparency and the size of the points:

train_df = pd.read_csv('train.csv')  ages = train_df['Age'] fare = train_df['Fare']  output_file("bokeh_scatter_example.html", title="Bokeh Scatter Plot Example")  fig2 = figure(title="Bokeh Scatter Plot Example", x_axis_label='Age',                y_axis_label='Fare')  fig2.circle(ages, fare, size=5, alpha=0.5)  show(fig2)

Nice, isn’t it? Now let’s create some bar charts. Bar charts in Bokeh works a little differently.

Bar Charts

Data for a Bar Chart in Bokeh is organized in Python Dictionaries, composed of Lists with the values to be used on the chart. Let’s see the Titanic survival by gender example with Bokeh. Additionally, we will create multiple charts, and then we can learn how to create both simple bar charts and stacked bar charts.

First, let’s define the values we need. We will need the quantity of survivors and non survivors for each gender. Let’s use Pandas’ pivot_table to calculate that. Then, we need to transform the values in a Python List. The List will contain the count of “female”, non survivors and survivors, in this order, and then “male”, in the same order. The “gender” and “survival” lists need to indicate to which category these values belong. So, if the first value on the quantities list refers to female non survivors, the first item in the gender list needs to be “female” and the first item in the survival List needs to be “not survived”, and so on for the remaining values of the lists.

Then we will use the Bar function that we imported to create two charts (one stacked and one not stacked). For this function, we will pass the created Dict (that we called “chart_data”), indicate which values should be aggregated (the “quantity” key on the Dict), which key is going to be the label and the title. For the non stacked chart, we will pass two variables to the “label” parameter, and Bokeh will create four bars, for each combination that is possible with the keys on the label. For the stacked chart, we will set through which variable the chart should be stacked, in this case, the “survival” key. This should be passed to the “stack” parameter, and the other key should be passed to the “label” parameter. To show the charts, we will use the “hplot” function, which creates multiple plots on the horizontal. This is what the result looks like:

from bokeh.charts import Bar, hplot  table = pd.pivot_table(data=train_df, values='PassengerId', index='Sex',                          columns='Survived', aggfunc='count')                          chart_values = list(table.ix['female'].values) for item in (list(table.ix['male'].values)):     chart_values.append(item)  output_file("bokeh_barchart_example.html", title="Bokeh Bar Chart Example")  chart_data = {     'survival': ['Not Survived', 'Survived', 'Not Survived', 'Survived'],     'gender': ['female', 'female', 'male', 'male'],     'quantity': chart_values }  bar = Bar(chart_data, values='quantity', label='gender', stack='survival',            title="Titanic Survival by Gender - Stacked", legend='top_left')  bar2 = Bar(chart_data, values='quantity', label=['gender', 'survival'],            title="Titanic Survival by Gender")  show(hplot(bar, bar2))

Histograms

Histograms on Bokeh are pretty simple. We need to import the Histogram function. To this function, we can pass the Dataframe itself, and then the variable that will be plotted on the Histogram. We can also define the number of bins, through the “bins” parameter. Let’s plot a Histogram of the Ages of the Titanic Dataset, with 10 bins.

from bokeh.charts import Histogram  hist = Histogram(train_df, values="Age",                   title="Age Distribution on Titanic", bins=10)                   output_file("bokeh_histogram_example.html", title="Bokeh Histogram Example")  show(hist)

Boxplots

Boxplots are interesting when you want to visualize variation on values in a category and possible outliers. Let’s create one to see how Fare varies according to the Passenger Class on the Titanic.

Let’s import the Boxplot function and pass the Dataframe to it. Then, we need to define the variable with the values to be aggregated on the Boxplot and to the “label” parameter we shall pass the variable that contains the category. In this case, we will pass the “Fare” column to the values and the “Pclass” column to the label, and then, each unique value in the “Pclass” column will be a different box.

from bokeh.charts import BoxPlot 

box = BoxPlot(train_df, label="Pclass", values="Fare")

output_file("bokeh_boxplot_example.html", title="Bokeh Boxplot Example")

show(box)

We’re getting to the end and maybe you are asking where the pie charts are. As far as I know, pie charts do not have a very good support on Bokeh. They are not even mentioned on the official documentation. That being said, maybe they add it in a future release. For now, we have to live without it in Bokeh.

In the next post, Seaborn, to improve matplotlib charts. Stay tuned :)

Regards!

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » Interactive Data Visualization in Python with Bokeh

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
分享按钮