Category: Machine Learning

Posted 2017-12-04Updated 2023-07-02Machine Learning14 minutes read (About 2089 words)

Data cleaning is a very important part of any data science project as data scientist spend 80% of their time is this step of the project. But not very much attentions is given to the cleaning process and not much research efforts are put to create any sort of framework recently I came across an amazing paper titled as Tidy data by Hadley Wickham in Journal of Statistical Software in which he talks about common problems one might encounter in data cleaning and what a Tidy data looks like I couldn’t agree more to him, he has also created a R package reshape and reshape2 for data cleaning, but the problem was the paper had very little to no code I also found the code version of the paper but it was in R, while most of my data cleaning work is done in pandas, I had to translate all those R solutions to pandas equivalent, so in this post the I will summarize all the main idea of the paper that the author suggests in the paper and also how we can solve it in pandas.

Posted 2017-10-14Updated 2023-07-02Machine Learning10 minutes read (About 1491 words)

Handling categorical features with python

As a data scientist, you may very frequently encounter categorical variable in your dataset like location, car model, gender, etc. You cannot directly use them in our machine learning algorithm as these algorithms only understand numbers. There are various techniques to convert these categorical features to numerical features but that is not the focus of this post, this post is about how to implement these techniques in python. I will talk a little bit about these techniques but won’t go into too much depth, I will emphasise more on various ways how you can implement this technique in python.

Posted 2017-09-25Updated 2023-07-02Machine Learning18 minutes read (About 2774 words)

Implementing K-NearestNeighbour algorithm from scratch in python

K-Nearest Neighbour is the simplest of machine learning algorithms which can be very effective in some cases. The objective of the post it to implement it from scratch in python, you need to know a fair bit of python for and a little bit of numpy for the faster version of the algorithm. Once we have implemented the algorithm we will also see how to improve the performance of the algorithm. As there is no single invincible algorithm, we will look into advantage/disadvantage of the algorithm, this will help us to decide on when to use the algorithm. Alright, then let’s get straight into it.

Categories

Tags

Recents

Archives

Your browser is out-of-date!