Posts

Showing posts with the label OneHotEncoder

How to convert categorical text data into numerical data using OneHotEncoder

Image
 Machine learning algorithms handle numerical data better than text data. A dataset can contain categorical data in text form such a gender, food_type, taxonomic_class, etc. In order to better utilize the power of machine learning algorithms we would have to convert the categorical data in text form into numerical form. This can be done using encoders. There are a few types of encoders in scikit-learn that convert the categorical data into either binary or numerical data. Here we will learn about OneHotEncoder in scikit-learn. OneHotEncoder converts  the categorical data into binary data in which each category in  dataframe column is converted into one separate column where the value of the column is 1 in rows where that particular category is present. For example, if the category of gender in row number 12 in a dataset is 'male'. Then the column corresponding to 'male' category created by OneHotEncoder will have 1 in row number 12. We will see an example how to encod...