How to plot product concentrations in different strains using python?
The most common type of graphs that we, as experimental biologists make, are bar graphs. When we want to compare: - the amount of a product secreted by different conditions or cells - The enzyme activity in different conditions or cells
or similar cases when we want to compare the value of an observation at different conditions we typically plot a bar graph. Also, with replicates of experiments we plot the mean and standard deviations of the experiment.
Excel is perhaps the quickest way to draw a single such graph but in case you want to make similar graphs for several observation or plot two or more such graphs in one figure as subplots, using python may be a better choice unless we want to spend time in adjusting ech graph into a powerpoint slide of in inkscape to make a collage.
Here we will see how to plot these kind of graphs using python. We
will use numpy
, pandas
and
matplotlib
packages to do this.
We will take an example of observations depicting the concentration (g/l) of a product at the end of experiment in four different strains of bacteria. For each strain we have the observation in triplicate.
Import python packages
First thing we do is to import the three packages into the workspace.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
Read data
We will read the data using the pandas
package. The data
is stored as an excel file (data1.xlsx) in a folder called “data”
relative to our current working directory.
= pd.read_excel('data/data1.xlsx')
data data
|
Strain | Rep1 | Rep2 | Rep3 |
---|---|---|---|---|
0 | A | 12.4 | 11.8 | 12.7 |
1 | B | 8.2 | 8.5 | 7.6 |
2 | C | 18.5 | 17.2 | 17.6 |
3 | D | 14.1 | 14.8 | 15.3 |
Above, we see that the first column represents the names of the strains and the columns ‘Rep1’, ‘Rep2’, and ‘Rep2’ represent observations from the three replicates respectively.
The data we read from the excel above is store as a pandas dataframe object. In python, an object is basically something which defines any value or data, its ‘type’. Based on which type of object the value or data belongs to, it has several atributes to it. Attributes are python functions associated with that object.
Calculation of mean and standard deviation
The pandas dataframe object has attribute, mean()
and
std()
using which we can calculate the means and standard
deviations of our replicate observations. Following statements show how
to calculate the means and standard deviations. The calculated values
are first stored in variables and then added as columns of the
dataframe, i.e. the variable data
we created by reading the
excel file. The comment written within triple quotes is for you to read. It is not
executed by python
''' axis=1 means calculate means of rows,
# numeric_only=True means that only numeric values will be taken into account
# i.e. the 'Strain' column would be ignored here while doing the calculations
'''
= data.mean(axis=1, numeric_only=True)
mean = data.std(axis=1, numeric_only=True)
stdev 'Mean'] = mean
data['stdev'] = stdev
data[ data
|
Strain | Rep1 | Rep2 | Rep3 | Mean | stdev |
---|---|---|---|---|---|---|
0 | A | 12.4 | 11.8 | 12.7 | 12.300000 | 0.458258 |
1 | B | 8.2 | 8.5 | 7.6 | 8.100000 | 0.458258 |
2 | C | 18.5 | 17.2 | 17.6 | 17.766667 | 0.665833 |
3 | D | 14.1 | 14.8 | 15.3 | 14.733333 | 0.602771 |
Plotting the bar graph
Now that we have calulated the means and standard deviations of the observations we will plot them. We will have bar plot in which each bar represents one strain of bacteria. The height of the bar represents the mean value of the product concentration. The error bars will represent the standard deviations.
=0.5)
plt.bar(data.Strain, data.Mean, alpha'product concentration (g/l)')
plt.ylabel('Product formation in various strains')
plt.title(=data.stdev, fmt='.', capsize=10) plt.errorbar(data.Strain, data.Mean, yerr
<ErrorbarContainer object of 3 artists>
The 'plt.bar' function has following syntax:
plt.bar(x,y)
x and y is the data. Additional argument that we have used is the
alpha=0.5
. It makes the bars transparent by 50 percent so
that the part of error bars inside the main bar is visible.
To add the error bar we used:
plt.errorbar(data.Strain, data.Mean, yerr=data.stdev, fmt='.', capsize=10)
Here the systax was:
plt.errorbar(x, y, stdev)
The argument fmt='.'
puts a small dot at the mid-point
of the errorbars.
The argument capsize=10
put the whiskers of the
errorbars.