Plotting growth curve using python
Data
plotting can be easily done in excel. Excel is a very easy and efficient
tool for calculations and plotting of biological data and most people
including me prefer it. With excel one has to plot the data and do all customization each and every time for a new data set. Therefore, when
it comes to plotting multiple datasets of similar nature over and over
again, using a programming language is more efficient. Once a template
code for a plot is ready, one can plot any number of data sets with it
in a few seconds. Here we will see how to plot a simple scatter plot by
taking an example of growth profile (i.e. data of time vs O.D.) of a cell
culture. The reading are from three experiments. The O.D.s were taken
from 0 to 6 hours at an interval of one hour.
Initially we need to import the packages we will need for plotting the data. The matplotlib package is useful for plotting the data and the pandas is useful for reading the data from excel sheet. The pandas package has functions to plot the data through matplotlib but here we will not use the pandas functions to plot our data. We will directly use the matplotlib package. Below, the pyplot domain from matplotlib is imported and is named as "plt". So, anywhere in the program, plt would mean pyplot. Similarly, the pandas package is imported as "pd". You can import and name the package/domain as you like but for consistency, the naming convention of pyplot and pandas is "plt" and "pd" repsectively. Anyways, following are the commands for importing these packages.
from matplotlib import pyplot as plt
import pandas as pd
Now
we have read our data from the excel sheet. I have saved the data in
excel sheet named "growth_profile.xlsx". The screenshot of the excel
file is shown below.
Following is the code for reading the excel sheet. the "read_excel" function reads the data from excel sheet and converts into a 'pandas data-frame'. We will name this data-frame as "readings". It looks similar to the excel sheet where the data is arranged in columns. Each column is named based on the labels written in the first row of the excel sheet.
readings = pd.read_excel('growth_profile.xlsx')
print(readings)
Time | rep1 | rep2 | rep3 | |
---|---|---|---|---|
0 | 0 | 0.027 | 0.031 | 0.032 |
1 | 1 | 0.063 | 0.059 | 0.057 |
2 | 2 | 0.125 | 0.131 | 0.133 |
3 | 3 | 0.246 | 0.254 | 0.255 |
4 | 4 | 0.512 | 0.502 | 0.498 |
5 | 5 | 1.121 | 1.136 | 1.034 |
6 | 6 | 1.873 | 1.759 | 1.985 |
To
plot the data, we need to define separate the data that would represent
the x-axis (the time) from the data that represents the y-axis (the
O.D.s). Here we will select the first column from the "readings" and
store it as "x_data".
Similarly, we will select all other columns which represent the O.D.s and name it y_data.
Note that for selecting the x_data, we have used the column name and for selecting the y_data, we have used the indices of the columns to be selected. In python, the indices begin with 0 (zero). Therefore the index of the first column is be 0 and the second column is 1.
For plotting we will use the pyplot domain which we had earlier imported as "plt". Then we will customize the plot by adding the title and names of the axes.
x_data = readings['Time']
print(x_data)
y_data = readings[readings.columns[1::]]
print(y_data)
rep1 | rep2 | rep3 | |
---|---|---|---|
0 | 0.027 | 0.031 | 0.032 |
1 | 0.063 | 0.059 | 0.057 |
2 | 0.125 | 0.131 | 0.133 |
3 | 0.246 | 0.254 | 0.255 |
4 | 0.512 | 0.502 | 0.498 |
5 | 1.121 | 1.136 | 1.034 |
6 | 1.873 | 1.759 | 1.985 |
Note that for selecting the x_data, we have used the column name and for selecting the y_data, we have used the indices of the columns to be selected. In python, the indices begin with 0 (zero). Therefore the index of the first column is be 0 and the second column is 1.
For plotting we will use the pyplot domain which we had earlier imported as "plt". Then we will customize the plot by adding the title and names of the axes.
plt.plot(x_data, y_data)
plt.title('Growth curve', fontsize=16)
plt.xlabel('Time (h)', fontsize=14)
plt.ylabel('O.D. 600nm', fontsize=14)
plt.show()
The above plot shows the growth curve of individual experiments independently. However, in real life we have to plot the means and standard deviation of the independent experiments for making a report. We will now calculate the mean and standard deviation of the ODs and store as separate columns in the "readings" data-frame.
readings['mean'] = y_data.mean(axis=1)
readings['std'] = y_data.std(axis=1)
print(readings)
Time | rep1 | rep2 | rep3 | mean | std | |
---|---|---|---|---|---|---|
0 | 0 | 0.027 | 0.031 | 0.032 | 0.030000 | 0.002646 |
1 | 1 | 0.063 | 0.059 | 0.057 | 0.059667 | 0.003055 |
2 | 2 | 0.125 | 0.131 | 0.133 | 0.129667 | 0.004163 |
3 | 3 | 0.246 | 0.254 | 0.255 | 0.251667 | 0.004933 |
4 | 4 | 0.512 | 0.502 | 0.498 | 0.504000 | 0.007211 |
5 | 5 | 1.121 | 1.136 | 1.034 | 1.097000 | 0.055073 |
6 | 6 | 1.873 | 1.759 | 1.985 | 1.872333 | 0.113001 |
And
finally we will plot the means and standard deviation. Also, the figure
generated must be saved so that it could be shared with others. Here we
save the file as "growth_curve.png".
plt.errorbar(readings['Time'],readings['mean'],
yerr=readings['std'],
fmt='-o',
capsize=5)
plt.title('Growth curve',fontsize=16)
plt.xlabel('Time (h)',fontsize=14)
plt.ylabel('O.D. 600nm',fontsize=14)
plt.savefig('growth_curve.png',dpi=200)
plt.show()
It should be noted that the excel file and the python file in which the above code is written should be present in the same folder. The ".png" file generated will also be saved in the same folder.