In the previous article, we explored what the definition of Machine Learning is in simple terms. Since last week, I have started taking the Machine Learning specialization in Coursera. I now have a better understanding of this topic and I would like to share it with you.
Revisiting our Machine Learning Definition
Revisiting our discussion last week, we found out that Machine Learning is the process of training software to make predictions by running data through an algorithm. One vital point I missed last time is the relationship between the data and the model. The accuracy of a model depends on quality and quantity of the data it is trained on. Naturally, the more correct the data is, the more correct the model’s predictions will be. Additionally, if a model is trained on more data, then it would have more data points it could use as a basis for its predictions. This would also lead to a more accurate and fine-tuned model.
Types of Machine Learning
We have thoroughly discussed what machine learning is, but we can further dissect this topic by diving into its different types. There are two major types of machine learning, supervised learning and unsupervised learning. Supervised learning, the more common one out of the two, trains the model with data points containing both an input and an output. This means that each point already has an expected value associated to it. You can use this expected value to verify if your model is producing correct predictions. In supervised learning, you wish to find out the relationship between the input data and its output. On the other hand, data in unsupervised learning does not contain an expected output. You are only given the data without the relationship that it describes. In this type of machine learning, you instead let the machine come up with its own conclusions about the data. The machine would then analyze it and find patterns that would normally be difficult to see with the human eye.
To help understand the difference between these two types, I would like to follow our analogy of machine learning being similar to human learning from last week. I would compare supervised and unsupervised learning to learning through formal education and being self-taught from a human’s point of view. Supervised learning is like formal education because you are given immediate feedback for your answers to see if you properly understand the lessons. All exams, for example, have an answer sheet associated to it with the correct answers. With each exam, you are graded based on your correct answers and this would allow you to adjust your understanding of the lesson based on what you got wrong. There is no need for you to figure things out on your own because the educational system will guide you through it. On the other hand, unsupervised learning is like self-study because you are not spoon-fed with answers like in formal education. You study the material on your own and have to build your own conclusions and insights from the material. You do not have a guideline you can use and have to find your own way to make sense of the data.
Sub Types of Machine Learning
Beneath these two overarching types of machine learning are also various subtypes. Let us quickly go through these to improve our understanding.
Supervised learning has two subtypes, namely regression and classification. We can infer from its name that classification deals with grouping data into relevant classes. Input data is labeled with what class it belongs to and this is what the model is trained with. The model would then answer the question, “Given the input, what class would this be categorized under?” On the other hand, regression aims to map the relationship between the input data and the predicted output. Input data is labeled with a specific output value and our goal is to produce a graph that would model the relationship for all points. A regression model would usually answer the question, “Given the input, how much of the output do I get?” Unlike classification that has a limited number of outputs, regression has an unlimited number of outputs because it shows a relationship. To clarify the difference between the two, here are some sample questions that can be answered by each type. Classification would be able to answer the question, “If an animal has a long wagging tail, is it a dog or a cat?” Regression, on the other hand, would be able to answer the question, “Given the lot area of a house, how much would it cost?”
Unsupervised learning has three types under its belt: clustering, anomaly detection and dimensionality reduction. Clustering deals with grouping data to find relationships between its points. Anomaly detection tries to identify extraneous data points within the data set. Dimensionality reduction removes as much data points as possible while still keeping the overall structure and pattern of the data.
Conclusion
This week, we discussed the different types of machine learning and delved into their definitions and differences. The next section of the Machine Learning specialization course is going to focus mainly on regression. I will share the knowledge that I gain from studying as soon as I finish that section. Until then, see you next time!
Leave a comment