Making boxplot with custom statistical values for the boxes
We will see here how to draw boxplots, using Matplotlib, when we have a set of values that represent the box statistics, such as median, mean, minimum, maximum, first quartile, and third quartile.
In some case we may also have confidence interval of the median values.
We will use the .bxp
attribute of the plot axes in
pyplot.subplots
.
We begin first by importing the necessary packages to draw the
boxplots, namely, NumPy
and Matplotlib
.
import numpy as np
import matplotlib.pyplot as plt
The data
Let’s take an example of a data.
Show in the table below is the data which can be read from a
excel
or csv
file into a pandas dataframe
(df
).
df
|
label | whislo | q1 | med | mean | q3 | whishi | cilo | cihi |
---|---|---|---|---|---|---|---|---|---|
0 | A | 8 | 26 | 56 | 49 | 96 | 116 | 54 | 62 |
1 | B | 7 | 21 | 53 | 45 | 96 | 122 | 51 | 59 |
2 | C | 10 | 22 | 57 | 54 | 101 | 120 | 55 | 63 |
3 | D | 9 | 26 | 54 | 50 | 98 | 116 | 52 | 60 |
Convert data into list of dictionaries
For making the plot, we need to convert this data into a list
of dictionaries. So, in this list each dictionary will contain
data about one boxplot as a key:value
pair.
The keys of this dictionary are:
['label', 'med', 'q1', 'q3', 'whislo', 'whishi', 'mean', 'cilo', 'cihi', 'fliers']
The keys fliers
is for any outliers you want to
represent as data-points in the plot. In our example, there is no column
called fliers. So we will first create one. All the values in this
column would be an empty list.
df['fliers'] = [[],[],[],[]]
The dataframe can then be converted to “list of dictionaries” using
df.to_dict()
function.
dataset = df.to_dict(orient='records')
As we have four values for each statistic, we will be having four
dictionaries in the dataset
.
Below is how one of the dictionaries looks like.
dataset[0]
{'label': 'A',
'whislo': 8,
'q1': 26,
'med': 56,
'mean': 49,
'q3': 96,
'whishi': 116,
'cilo': 54,
'cihi': 62,
'fliers': []}
Simple boxplot with default settings
Now we are ready to plot the boxplots. Below is how to plot it with default parameters.
fig, ax = plt.subplots(1,1, figsize=(5,5))
ax.bxp(dataset)
plt.show()
Customizing the boxplot
To this simple boxplot, we can add other details from the data such as: - show the confidence intervals of the median - show mean values - colour the boxplot
The following code shows these examples in different plots.
fig, ax = plt.subplots(2,2, figsize=(10,10))
bx1 = ax[0,0].bxp(dataset)
ax[0,0].set_title('default')
bx2 = ax[0,1].bxp(dataset,
shownotches=True) #
ax[0,1].set_title('with median CI notches')
bx3 = ax[1,0].bxp(dataset,
shownotches=True,
showmeans=True)
ax[1,0].set_title('with means shown')
bx4 = ax[1,1].bxp(dataset,
shownotches=True,
showmeans=True,
patch_artist=True,
boxprops={'facecolor': '#f7bd98'})
ax[1,1].set_title('with coloured boxes')
plt.show()
In addition to this, we can customize how the box, whisker and median
line properties using boxprops
, whiskerprops
,
and medianprops
in the arguments respectively. But in this
post, I will leave it here.