๐ Building an End-to-End Iris Classification Pipeline using Airflow, Flask, and Docker
In this blog, weโll walk through how to build and deploy a machine learning pipeline using the Iris dataset, orchestrated with Apache Airflow, served with a Flask API, and packaged with Docker.
๐ Project Folder Structure
airflow-final/ โโโ Airflow_Setup_Steps.txt โโโ Dockerfile โโโ app.py โโโ iris-train-model.py โโโ iris_pipeline_dag.py โโโ requirements.txt โโโ iris-airflow-main.zip โโโ data/ โ โโโ iris_train.csv โ โโโ iris_test.csv โโโ model/ โ โโโ iris_model.joblib
โ๏ธ Step-by-Step Breakdown
โ
1. Data โ data/iris_train.csv
This is the standard Iris dataset used to train the model.
โ
2. Training Script โ iris-train-model.py
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import joblib
data = pd.read_csv('data/iris_train.csv')
X = data.drop(columns=['species'])
y = data['species']
model = RandomForestClassifier()
model.fit(X, y)
joblib.dump(model, 'model/iris_model.joblib')
โ
3. Flask API โ app.py
from flask import Flask, request, jsonify
import joblib
import numpy as np
app = Flask(__name__)
model = joblib.load("model/iris_model.joblib")
@app.route('/predict', methods=['POST'])
def predict():
data = request.get_json(force=True)
prediction = model.predict([np.array(data['features'])])
return jsonify({'prediction': prediction.tolist()})
โ
4. Airflow DAG โ iris_pipeline_dag.py
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime
dag = DAG('iris_training_pipeline', start_date=datetime(2023, 1, 1), schedule_interval='@daily')
train_model = BashOperator(
task_id='train_model',
bash_command='python3 /opt/airflow/iris-train-model.py',
dag=dag,
)
โ 5. Dockerfile
FROM python:3.9
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python3", "app.py"]
โ
6. Requirements โ requirements.txt
flask
scikit-learn
joblib
pandas
โ
7. Airflow Setup โ Airflow_Setup_Steps.txt
Contains steps to setup Airflow, place the DAG, and run the scheduler and webserver.
๐งช Testing the Flask API
docker build -t iris-api .
docker run -p 5000:5000 iris-api
โ Conclusion
This project delivers a reproducible and scalable pipeline that can easily be extended to larger problems. Future steps could include integration with S3, model versioning via MLflow, and alerting for drift detection.