Learning Python

robobcb$ python3 classifier_challenge.py; 

After watching a youtube video on machine learning by Siraj Raval, I was inspired. Seeing how 10 lines of python could utilize a machine learning library was enough to convince me that the C and JAVA I learned in uni were not enough. Time to learn another language.

In this video "Intro to Deep Learning #1" Siraj lays out his code for utilizing the tree classifier from scikit learn to predict the gender of someone given the height, weight, and shoe size. Whether or not these actually correlate to a gender is irrelevant as this is about how different algorithms learn from a given data set. 

In the video, I was challenged to take the code he started and move it to the next level by adding in 3 more classifiers and then compare their accuracies. If implemented correctly, the code should only print the most accurate prediction and which classifier predicted it.

I used three libraries to complete this challenge: numpy, sklearn, and textwrap. Numpy and sklearn were required, while I used textwrap just to format the output to my liking. Since this is my first time programming in python, I tried many different ways to accomplish these tasks and most likely used some undesirable ways. 

X and Y are the given data sets were are training these algorithms on, as shown in the four lines below them. Then we see new_data, which is the height, weight, and shoesize of a new person to be classified by our code. I then created an array which help the predictions of all four classifiers. I suppose I could have just used a list here, but I had just ran through a online course of basic python syntax that was partial the numpy arrays so that was my first instinct.

As noted earlier, lists would work in place of an array for this use and thats what I did with the precision score. Held all four values in a list, indexed the list for the highest value, then used the index to pull the prediction from the prediction array. All that was left was to pull out the classifier's name and print my results.

As you can see from the terminal output, the decision tree classifier made the most accurate prediction of male! Of course I can't stop here though, what is 2nd? 3rd? 4th? Did I pick good algorithms for this? Well fortunately, scikit learn provides a handy flowchart to help new peeps like me choose. (Handy Flow Chart) Using a simple for loop, I can see the outputs are as follows...

I can see quickly that they all predicted the same output: male. However, given that the precision scores are a binary 1 or 0, I am inclined to think that I implemented the sklearn.metric wrong or that I used the wrong type. After retrying the precision score with accuracy score, I receive the same result: 1, 0, 1, 0.

Below will be the github link for my code when it is uploaded. So let me know what you think of my responce to the classifier challenge. Is my scoring method accurate? How would you rate the different classifiers? Leave a comment with any observations or improvements for my code! 



  1. Github link to my repository as I follow through the machine learning tutorials.



Post a Comment