The Machine Learning Step-by-Step Workflow

Farshid CheraghchianFarshid Cheraghchian
The Machine Learning Step-by-Step Workflow

Well, here’s the recipe for machine learning! Seriously, it’s not always as easy as this, but the structure is in the following steps:

1- What is the task?

First of all, you need to ask: What is the purpose here?
What am I predicting or classifying?
Is it a classification, segmentation, clustering, or something else?
What output am I supposed to have?

2- Data Collection

Gather a good amount of data that is relevant, diverse, and representative of real-world scenarios for the task.
You can use data sources like:

  • Databases
  • APIs
  • Web scraping
  • CSV files
  • Self-collected data
  • Open datasets (Kaggle, UCI, etc.)

3- Data Preprocessing

Most of the time, your data is messy, and you need to:

  • Clean (fill or remove)
  • Normalize
  • Convert
  • Format
  • Remove duplicates

before using it for training.

4- Split the Dataset

For real-world performance evaluation and to avoid overfitting, you need to split your dataset into:

  • Training set (e.g., 60%) to teach the model.
  • Validation set (e.g., 20%) to adjust and tune hyperparameters.
  • Test set (e.g., 20%) to evaluate final results and performance.

5- Model Selection

It is crucial to choose an appropriate model (algorithm) based on your problem.
For example:

  • Classification: Logistic Regression, Decision Tree, Random Forest, SVM, Neural Networks
  • Regression: Linear Regression, Ridge/Lasso Regression
  • Unsupervised: K-Means, PCA

6- Model Training

The actual learning starts in this step. The data is fitted to the training set, and the model will start to adjust its internal parameters (weights) to decrease prediction error using optimization techniques like Gradient Descent.

7- Model Evaluation

In this step, using a separate validation set or testing data, you can see how well your model generalizes. Depending on the task, you may use various metrics for evaluation, such as accuracy, precision, recall, F1 score, RMSE, and MAE.

8- Tune Hyperparameters

Hyperparameters are settings that are not learned from data. These settings are set manually by you. There are many you can play with, such as learning rate, number of trees, max depth, etc.
These are some techniques in hyperparameter tuning:

  • Grid search
  • Random search
  • Bayesian optimization
  • Cross-validation

9- Model Testing (Generalization)

Generalization is the goal of machine learning. Whenever you see the model’s performance on the validation set is satisfying, you can test it on the test set to see if it works well on unseen data; in other words, you check if it generalizes well.
Don’t forget to keep the test set untouched until the end of the process!

10- Model Deployment

Now it’s time to use the model in the real world!
This real world can be:

  • REST API (using Flask or FastAPI)
  • Embedded in apps or websites
  • Serverless functions (AWS Lambda, Google Cloud Functions), etc.

11- Monitoring and Maintaining the Model

Even deployment is not the end of your job!
You will need to monitor:

  • Model accuracy
  • Data drift (real-world data changes over time)
  • Latency and performance

And the model needs to be maintained by retraining and fine-tuning with new data periodically.


Simple Code Example

Let’s make a mini brain and teach it to divide apples and oranges. In this example:

  • Weight and size are our features.
  • “Apple” or “Orange” are our labels.
  • “Decision Tree Classifier” is our model to get trained.
  • A new fruit is our prediction!

Code part

You can use Google Colab to try this machine learning example:

  • Open the Google Colab website.
  • Click “+ New Notebook”
  • Install scikit-learn (sometimes it’s already installed in Colab)
    Run to install:
pip install scikit-learn

Now copy the following code into a new cell and run it (Shift + Enter) to see the result.

# Step 1: Import the Machine Learning library
from sklearn.tree import DecisionTreeClassifier

# Step 2: Prepare the training data
# Features = [weight in grams, size in cm]
features = [
    [150, 7],  # apple
    [130, 6],  # orange
    [140, 6.5],  # apple
    [120, 5.5]   # orange
]

# Labels = the correct answers
labels = ["apple", "orange", "apple", "orange"]

# Step 3: Create the model
model = DecisionTreeClassifier()

# Step 4: Train the model
model.fit(features, labels)

# Step 5: Predict a new fruit
new_fruit = [[135, 6]]  # A fruit with 135g weight and 6cm size
prediction = model.predict(new_fruit)

print("The model predicts:", prediction[0])

Output:
The model predicts: apple


Take ML Easy!

Machine learning sounds like a complicated area, but it’s not! After understanding the basics, you will find yourself on the path of learning. Remember that ML is just a way of teaching computers to think and learn from experience, just like us humans!

Future blogs will be the beginning of this exciting ML journey:

  • What are supervised and unsupervised learning?
  • What are neural networks (the brain behind deep learning)?
  • And so on …

Stay tuned then!