The 5 Steps of Machine Learning
Machine learning is the ability of a system to learn, but without being explicitly programmed. In this article I’m going to share my approach when I’m working with Machine learning projects and hackathon’s.
There are 5 main steps involved in machine learning. To explain this I am going to use a very basic example. Let’s say that we wanted to come up with a system that identifies if a flower is an setosa, versicolor and virginica. This system that we will build is called a model, and the process by which this model is created is called training.
- Data Collection and Preparation
The first step of machine learning is gathering or collecting data. This is extremely important because the amount of data you collect and the quality of that data will have a huge impact on the results. In our case, the data we collect will be the sepal length of a flower, sepal width, petal width and the petal length in each flower type. This will be our training data.
After this data is collected, we have to prepare it for use in our machine learning training. We would obviously have a lot more data than what is shown here, so that the system would be more accurate.
Once we collect all the data, we need to randomize it, because we don’t want the order of the data to affect the training. We would also need to make sure that there are an equal amount of data points for all flowers, because if there are way more data points for one of them, the computer would think that it is that specific flower almost all the time.
We also need to divide the data into 2 parts. The first part, and the majority of the data, will be used for training, and the second part will be used for evaluating the model’s performance. We don’t want to use the same data for training and evaluating because the system would already know the training data, so it could just memorize it, which would defeat the purpose.
2. Type of Model
The next step is to choose a model. There are many models in scikit-learn, for example, logistic regressions, linear models, decision trees, and much more.
3. Training
Next, we have the part that makes up most of machine learning, which is the training. We will use our data to improve our model’s ability to predict whether the flower its shown is an setosa,virginica or an versicolor.
4. Evaluation and Parameter Tuning
Once the training is complete, we need to see if the model actually works. We need to use the second data set that we kept aside for this part. Evaluation allows us to test the model against data that it has never seen before. The way the model performs is representative of how it is going to perform in the real world.
Once the evaluation is done, we need to see if we can still improve our training. We can do this by tuning our parameters. We can now show our model the full dataset so that it can finetune its predictions. We can also show it the dataset multiple times, because it may make the predictions more accurately.
5. Prediction
We are finally at the last step of machine learning. Prediction is the step where we actually get to answer the question. We can now use our model to predict whether the flower is an setosa,virginica or an versicolor, given its width and length.
Thank you for reading :).
Please let me know if you have any feedback.