Posts

Showing posts with the label pandas

Fill missing values using SimpleImputer

Data often would contain missing values. Sometime it makes sense to fill the missing values with some appropriate value. For example we may want to fill the missing value with, say, mean of the available values. We can fill such missing values by calculating the mean of the column and using the fillna() function. However, if several columns have missing values then we might have to repeat this process several times or write a loop. Scikit-learn offers functionality called as SimpleImputer to easily fill the missing values . 

Divide numerical data into categories

Sometimes we need to categories numerical values into different categories. For example, the population of town might be needed to be categorized into different income groups. Or, the marks of students might be needed to be categorized into different grade levels. Pandas’ cut() method can be used to categorize the numerical values very easily.

How to plot product concentrations in different strains using python?

Image
The most common type of graphs that we, as experimental biologists make, are bar graphs. When we want to compare: - the amount of a product secreted by different conditions or cells - The enzyme activity in different conditions or cells or similar cases when we want to compare the value of an observation at different conditions we typically plot a bar graph. Also, with replicates of experiments we plot the mean and standard deviations of the experiment. Excel is perhaps the quickest way to draw a single such graph but in case you want to make similar graphs for several observation or plot two or more such graphs in one figure as subplots, using python may be a better choice unless we want to spend time in adjusting ech graph into a powerpoint slide of in inkscape to make a collage. Here we will see how to plot these kind of graphs using python. We will use numpy , pandas and matplotlib packages to do this. We will take an example of observations depicting the concentration (g/l) ...

How to convert categorical text data into numerical data using OneHotEncoder

Image
 Machine learning algorithms handle numerical data better than text data. A dataset can contain categorical data in text form such a gender, food_type, taxonomic_class, etc. In order to better utilize the power of machine learning algorithms we would have to convert the categorical data in text form into numerical form. This can be done using encoders. There are a few types of encoders in scikit-learn that convert the categorical data into either binary or numerical data. Here we will learn about OneHotEncoder in scikit-learn. OneHotEncoder converts  the categorical data into binary data in which each category in  dataframe column is converted into one separate column where the value of the column is 1 in rows where that particular category is present. For example, if the category of gender in row number 12 in a dataset is 'male'. Then the column corresponding to 'male' category created by OneHotEncoder will have 1 in row number 12. We will see an example how to encod...