Making boxplot with custom statistical values for the boxes

We will see here how to draw boxplots, using Matplotlib, when we have a set of values that represent the box statistics, such as median, mean, minimum, maximum, first quartile, and third quartile.

In some case we may also have confidence interval of the median values.

We will use the .bxp attribute of the plot axes in pyplot.subplots.

We begin first by importing the necessary packages to draw the boxplots, namely, NumPy and Matplotlib.

import numpy as np
import matplotlib.pyplot as plt

The data

Let’s take an example of a data.
Show in the table below is the data which can be read from a excel or csv file into a pandas dataframe (df).

df

label whislo q1 med mean q3 whishi cilo cihi
0 A 8 26 56 49 96 116 54 62
1 B 7 21 53 45 96 122 51 59
2 C 10 22 57 54 101 120 55 63
3 D 9 26 54 50 98 116 52 60

Convert data into list of dictionaries

For making the plot, we need to convert this data into a list of dictionaries. So, in this list each dictionary will contain data about one boxplot as a key:value pair.

The keys of this dictionary are:
['label', 'med', 'q1', 'q3', 'whislo', 'whishi', 'mean', 'cilo', 'cihi', 'fliers']

The keys fliers is for any outliers you want to represent as data-points in the plot. In our example, there is no column called fliers. So we will first create one. All the values in this column would be an empty list.

df['fliers'] = [[],[],[],[]]

The dataframe can then be converted to “list of dictionaries” using df.to_dict() function.

dataset = df.to_dict(orient='records')

As we have four values for each statistic, we will be having four dictionaries in the dataset.

Below is how one of the dictionaries looks like.

dataset[0]
{'label': 'A',
 'whislo': 8,
 'q1': 26,
 'med': 56,
 'mean': 49,
 'q3': 96,
 'whishi': 116,
 'cilo': 54,
 'cihi': 62,
 'fliers': []}

Simple boxplot with default settings

Now we are ready to plot the boxplots. Below is how to plot it with default parameters.

fig, ax = plt.subplots(1,1, figsize=(5,5))
ax.bxp(dataset)
plt.show()


Customizing the boxplot

To this simple boxplot, we can add other details from the data such as: - show the confidence intervals of the median - show mean values - colour the boxplot

The following code shows these examples in different plots.

fig, ax = plt.subplots(2,2, figsize=(10,10))

bx1 = ax[0,0].bxp(dataset)
ax[0,0].set_title('default')

bx2 = ax[0,1].bxp(dataset, 
           shownotches=True) # 
ax[0,1].set_title('with median CI notches')


bx3 = ax[1,0].bxp(dataset, 
           shownotches=True,
           showmeans=True)
ax[1,0].set_title('with means shown')

bx4 = ax[1,1].bxp(dataset,
                  shownotches=True,
                  showmeans=True,
                  patch_artist=True,
                  boxprops={'facecolor': '#f7bd98'})

ax[1,1].set_title('with coloured boxes')


    
plt.show()


In addition to this, we can customize how the box, whisker and median line properties using boxprops, whiskerprops, and medianprops in the arguments respectively. But in this post, I will leave it here.

Popular posts from this blog

Principal Coordinate analysis in R and python

Principal Coordinate Analysis (PCoA) in R