Introduction to Machine Learning Models (Regression & Decision Trees)

Machine Learning (ML) enables computers to learn from data and make predictions. Two common ML techniques are Regression (used for predicting continuous values) and Decision Trees (used for classification and regression tasks).


1. Introduction to Linear Regression

Linear Regression is a supervised learning algorithm used to predict a continuous target variable based on input features.

Key Concept:

  • It assumes a linear relationship between input (X) and output (Y).
  • The goal is to find the best-fit line, represented by: Y = mX + b, where:
    • m is the slope (weight of the feature)
    • b is the intercept (constant value)
    • Y is the predicted value

Real-World Example:

Predicting house prices based on features like area, number of rooms, and location.


2. Implementing Linear Regression in Python

Using Scikit-learn

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Sample Data (House size vs. Price)
X = np.array([500, 700, 1000, 1200, 1500]).reshape(-1, 1)  # House size in sq ft
Y = np.array([100000, 140000, 200000, 230000, 300000])  # Price in USD

# Creating and training the model
model = LinearRegression()
model.fit(X, Y)

# Predicting price for a 1100 sq ft house
predicted_price = model.predict([[1100]])
print(f"Predicted Price: ${predicted_price[0]:,.2f}")

# Plotting the regression line
plt.scatter(X, Y, color='blue', label="Actual Prices")
plt.plot(X, model.predict(X), color='red', label="Regression Line")
plt.xlabel("House Size (sq ft)")
plt.ylabel("Price (USD)")
plt.legend()
plt.show()

Practice Problems

  1. Train a Linear Regression model to predict car prices based on mileage.
  2. Modify the above code to predict employee salaries based on experience.

3. Understanding Logistic Regression

Logistic Regression is used for classification problems where the output is categorical (e.g., Yes/No, Spam/Not Spam).

Key Concept:

  • Instead of predicting a continuous value, it predicts probabilities.
  • The logistic function (sigmoid function) transforms predictions between 0 and 1.
  • The decision boundary is set (e.g., if probability > 0.5, classify as 1, else classify as 0).

Real-World Example:

Predicting whether a customer will buy a product based on age and income.


4. Implementing Logistic Regression in Python

Master This Topic with PrepAI

Transform your learning with AI-powered tools designed to help you excel.

Using Scikit-learn

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Sample Data: Age & Income vs. Purchase Decision (1 = Buy, 0 = No Buy)
X = np.array([[25, 50000], [30, 60000], [35, 70000], [40, 80000], [50, 90000]])
Y = np.array([0, 0, 1, 1, 1])  # Purchase Decision

# Splitting data
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# Training the model
log_model = LogisticRegression()
log_model.fit(X_train, Y_train)

# Predicting
Y_pred = log_model.predict(X_test)
accuracy = accuracy_score(Y_test, Y_pred)
print(f"Model Accuracy: {accuracy:.2f}")

Practice Problems

  1. Use Logistic Regression to predict whether a student passes based on study hours.
  2. Train a model to classify emails as spam or not spam using text data.

5. Introduction to Decision Trees and Their Use Cases

A Decision Tree is a flowchart-like structure where:

  • Each node represents a decision based on an attribute.
  • Each branch represents an outcome.
  • The leaves represent final predictions.

Key Advantages:

✔ Easy to understand and interpret.
✔ Works with both numerical and categorical data.
✔ Handles missing values well.

Real-World Example:

A bank uses a decision tree to determine whether to approve a loan based on credit score, income, and debt.


6. Implementing Decision Trees in Python

Using Scikit-learn

from sklearn.tree import DecisionTreeClassifier
from sklearn import tree

# Sample Data: Age & Income vs. Loan Approval (1 = Approved, 0 = Denied)
X = np.array([[25, 30000], [35, 60000], [45, 80000], [50, 100000]])
Y = np.array([0, 1, 1, 1])  # Loan Approval

# Training Decision Tree Model
dt_model = DecisionTreeClassifier()
dt_model.fit(X, Y)

# Visualizing the Tree
tree.plot_tree(dt_model, feature_names=["Age", "Income"], class_names=["Denied", "Approved"], filled=True)
plt.show()

Practice Problems

  1. Build a Decision Tree to classify customers as high-risk or low-risk for insurance.
  2. Train a Decision Tree to predict if a person will buy a car based on salary and age.

Conclusion

  • Linear Regression is used for predicting continuous values.
  • Logistic Regression is used for binary classification problems.
  • Decision Trees are flexible models used for both classification and regression.
  • Understanding these models is essential for building ML applications in Python.

By practicing these concepts, you'll develop strong ML modeling skills! 🚀