๐Ÿš€ Building an End-to-End Iris Classification Pipeline using Airflow, Flask, and Docker

In this blog, weโ€™ll walk through how to build and deploy a machine learning pipeline using the Iris dataset, orchestrated with Apache Airflow, served with a Flask API, and packaged with Docker.

๐Ÿ“ Project Folder Structure

airflow-final/
โ”œโ”€โ”€ Airflow_Setup_Steps.txt
โ”œโ”€โ”€ Dockerfile
โ”œโ”€โ”€ app.py
โ”œโ”€โ”€ iris-train-model.py
โ”œโ”€โ”€ iris_pipeline_dag.py
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ iris-airflow-main.zip
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ iris_train.csv
โ”‚   โ””โ”€โ”€ iris_test.csv
โ”œโ”€โ”€ model/
โ”‚   โ””โ”€โ”€ iris_model.joblib

โš™๏ธ Step-by-Step Breakdown

โœ… 1. Data โ€“ data/iris_train.csv

This is the standard Iris dataset used to train the model.

โœ… 2. Training Script โ€“ iris-train-model.py

import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import joblib

data = pd.read_csv('data/iris_train.csv')
X = data.drop(columns=['species'])
y = data['species']

model = RandomForestClassifier()
model.fit(X, y)

joblib.dump(model, 'model/iris_model.joblib')

โœ… 3. Flask API โ€“ app.py

from flask import Flask, request, jsonify
import joblib
import numpy as np

app = Flask(__name__)
model = joblib.load("model/iris_model.joblib")

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json(force=True)
    prediction = model.predict([np.array(data['features'])])
    return jsonify({'prediction': prediction.tolist()})

โœ… 4. Airflow DAG โ€“ iris_pipeline_dag.py

from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime

dag = DAG('iris_training_pipeline', start_date=datetime(2023, 1, 1), schedule_interval='@daily')

train_model = BashOperator(
    task_id='train_model',
    bash_command='python3 /opt/airflow/iris-train-model.py',
    dag=dag,
)

โœ… 5. Dockerfile

FROM python:3.9

WORKDIR /app
COPY . .

RUN pip install -r requirements.txt
CMD ["python3", "app.py"]

โœ… 6. Requirements โ€“ requirements.txt

flask
scikit-learn
joblib
pandas

โœ… 7. Airflow Setup โ€“ Airflow_Setup_Steps.txt

Contains steps to setup Airflow, place the DAG, and run the scheduler and webserver.

๐Ÿงช Testing the Flask API

docker build -t iris-api .
docker run -p 5000:5000 iris-api

โœ… Conclusion

This project delivers a reproducible and scalable pipeline that can easily be extended to larger problems. Future steps could include integration with S3, model versioning via MLflow, and alerting for drift detection.

Leave a Reply

Your email address will not be published. Required fields are marked *

DeepNeuron