Posts

Showing posts with the label machine learning

Why accuracy is not a good metric for scoring classification models?

  Accuracy of machine learning models trained to classify data into discreet categories is the proportion of samples the model is correctly able to classify. For example, in data that contains two categories, if the model is able to correctly predict 45 out of 50 samples, then the accuracy of the model is 90%.

How to convert categorical text data into numerical data using OneHotEncoder

Image
 Machine learning algorithms handle numerical data better than text data. A dataset can contain categorical data in text form such a gender, food_type, taxonomic_class, etc. In order to better utilize the power of machine learning algorithms we would have to convert the categorical data in text form into numerical form. This can be done using encoders. There are a few types of encoders in scikit-learn that convert the categorical data into either binary or numerical data. Here we will learn about OneHotEncoder in scikit-learn. OneHotEncoder converts  the categorical data into binary data in which each category in  dataframe column is converted into one separate column where the value of the column is 1 in rows where that particular category is present. For example, if the category of gender in row number 12 in a dataset is 'male'. Then the column corresponding to 'male' category created by OneHotEncoder will have 1 in row number 12. We will see an example how to encod...

Setting up a simple GridSearchCV in Scikit-learn

 In most machine learning projects we need to train the model with different model parameters so as to get best results out of the model. Also, we need to divide the training datasets into training and validation datasets while training the model. This  method improves the machine learning model while avoiding over-fitting. GridSearchCV in Scikit-learn package provides all these functionalities. Using GridSearchCV , we can make several training and validation sets automatically without needing to code it ourselves. We will see here how to set up a simple GridSearchCV using k-neighborhood search algorithm.