How to plot product concentrations in different strains using python?

The most common type of graphs that we, as experimental biologists make, are bar graphs. When we want to compare: - the amount of a product secreted by different conditions or cells - The enzyme activity in different conditions or cells

or similar cases when we want to compare the value of an observation at different conditions we typically plot a bar graph. Also, with replicates of experiments we plot the mean and standard deviations of the experiment.

Excel is perhaps the quickest way to draw a single such graph but in case you want to make similar graphs for several observation or plot two or more such graphs in one figure as subplots, using python may be a better choice unless we want to spend time in adjusting ech graph into a powerpoint slide of in inkscape to make a collage.

Here we will see how to plot these kind of graphs using python. We will use numpy, pandas and matplotlib packages to do this.

We will take an example of observations depicting the concentration (g/l) of a product at the end of experiment in four different strains of bacteria. For each strain we have the observation in triplicate.

Import python packages

First thing we do is to import the three packages into the workspace.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Read data

We will read the data using the pandas package. The data is stored as an excel file (data1.xlsx) in a folder called “data” relative to our current working directory.

data = pd.read_excel('data/data1.xlsx')
data

Strain Rep1 Rep2 Rep3
0 A 12.4 11.8 12.7
1 B 8.2 8.5 7.6
2 C 18.5 17.2 17.6
3 D 14.1 14.8 15.3

Above, we see that the first column represents the names of the strains and the columns ‘Rep1’, ‘Rep2’, and ‘Rep2’ represent observations from the three replicates respectively.

The data we read from the excel above is store as a pandas dataframe object. In python, an object is basically something which defines any value or data, its ‘type’. Based on which type of object the value or data belongs to, it has several atributes to it. Attributes are python functions associated with that object.

Calculation of mean and standard deviation

The pandas dataframe object has attribute, mean() and std() using which we can calculate the means and standard deviations of our replicate observations. Following statements show how to calculate the means and standard deviations. The calculated values are first stored in variables and then added as columns of the dataframe, i.e. the variable data we created by reading the excel file. The comment written within triple quotes is for you to read. It is not executed by python

''' axis=1 means calculate means of rows, 
# numeric_only=True means that only numeric values will be taken into account
# i.e. the 'Strain' column would be ignored here while doing the calculations
'''
mean = data.mean(axis=1, numeric_only=True) 
stdev = data.std(axis=1, numeric_only=True)
data['Mean'] = mean
data['stdev'] = stdev
data

Strain Rep1 Rep2 Rep3 Mean stdev
0 A 12.4 11.8 12.7 12.300000 0.458258
1 B 8.2 8.5 7.6 8.100000 0.458258
2 C 18.5 17.2 17.6 17.766667 0.665833
3 D 14.1 14.8 15.3 14.733333 0.602771

Plotting the bar graph

Now that we have calulated the means and standard deviations of the observations we will plot them. We will have bar plot in which each bar represents one strain of bacteria. The height of the bar represents the mean value of the product concentration. The error bars will represent the standard deviations.

plt.bar(data.Strain, data.Mean, alpha=0.5)
plt.ylabel('product concentration (g/l)')
plt.title('Product formation in various strains')
plt.errorbar(data.Strain, data.Mean, yerr=data.stdev, fmt='.', capsize=10)
<ErrorbarContainer object of 3 artists>

The 'plt.bar' function has following syntax:

plt.bar(x,y)

x and y is the data. Additional argument that we have used is the alpha=0.5. It makes the bars transparent by 50 percent so that the part of error bars inside the main bar is visible.

To add the error bar we used:

plt.errorbar(data.Strain, data.Mean, yerr=data.stdev, fmt='.', capsize=10)

Here the systax was:

plt.errorbar(x, y, stdev)

The argument fmt='.' puts a small dot at the mid-point of the errorbars.

The argument capsize=10 put the whiskers of the errorbars.

Popular posts from this blog

Principal Coordinate analysis in R and python

Principal Coordinate Analysis (PCoA) in R