Labs must be completed individually by each student.
Lab work will be released weekly. Students will have one full week to complete each lab.
During the scheduled lab session, students are required to present their work and explain their solutions.
Each lab will be evaluated during the lab class and will contribute to 50% of the overall lab component grade.
Any form of copying or academic dishonesty will be dealt with strictly.
Teaching Assistants (TAs) will be available during every lab session to assist students and answer questions.
Select any classification dataset from Scikit-learn (e.g., Iris or Diabetes dataset).
Implement the basic Perceptron algorithm from scratch.
Report the following:
- Final training loss after completion of training
- Confusion matrix values: TP, TN, FP, FN
- Decision boundary plot (consider any two features)
Implement Logistic Regression from scratch on the same dataset.
Plot: Training loss vs. epochs and Test loss vs. epochs
Part 3: Logistic Regression (Scikit-Learn)
Use the inbuilt Logistic Regression function from Scikit-learn.
Train on the training dataset.
Evaluate on the test dataset.
Plot: Training loss, Test loss
Report: TP, TN, FP, FN
Part 4: Classification Using PyTorch
Use the PyTorch library to implement a classification model on any dataset.
Divide the dataset into:
Training set: 80%
Test set: 20%
Plot: Training loss, Test loss
Report: TP, TN, FP, FN
Use a Cat image dataset.
Implement Logistic Regression to classify images into:
Cat
Non-Cat
You may use PyTorch for implementation.
Split the dataset:
Training set: 80%
Test set: 20%
Plot: Training loss, Test loss
Report: TP, TN, FP, FN
Part 1: Linear Regression
Implement the linear regression using gradient descent from scratch without using inbuilt functions. Use any publicly available dataset of your choice.
Download any regression dataset from kaggle.
First model should select any one feature from the input and implement the model. Let us say this model as model1
Now choose more than 5 features and implement another model. Let us name it model2.
For both these models, plot the loss with the number of iterations.
Compare the performance of model1 and model2.
For both these plots, plot the final line on the graph with some points from the dataset.
For model1, draw a contour plot showing the loss function with respect to parameters θ0 and θ1. Show the path followed by gradient descent during optimization.
Repeat the training procedure for different learning rates:
η = {0.001, 0.005, 0.01, 0.05}
Plot the loss curves for all learning rates on the same graph. Comment on the effect of learning rate on convergence.
Part 2: Normal Equations
Implement the Normal Equation for model1 and model2 in part 1
Compute parameters using matrix operations.
2. Compare with Gradient Descent (from Part 1)
Compare:
Final training loss
Test loss
Learned parameters
Computational time
Create a comparison table:
Method
Train Loss
Test Loss
Time Taken
Remarks
Part 1: Learning Curves
To study the effect of model complexity on training and validation performance. Use any publicly available dataset of your choice that contain input features and corresponding outputs.
Split that dataset into training set (70%) and validation set (30%).
Train polynomial regression models with degree:
d = {1, 2, 3, 4, 5}
Plot the regression curve for each degree along with the data points.
For each model compute:
• Training error
• Validation error
Plot learning curves showing training error and validation error as a function of training set size.
Based on the plots, identify situations where the model exhibits:
• Underfitting
• Overfitting
Explain your observations using the bias–variance tradeoff.
Part 2: Regularization
To understand the role of regularization in controlling model complexity.
Implement Ridge Regression using gradient descent. Train the model for the following values of the regularization parameter:
λ = {0, 0.01, 0.1, 1, 10}
Report the final parameter values and training loss.
Plot the regression curves obtained for each value of λ along with the dataset.
Implement Lasso Regression and compare the results with Ridge regression.
Comment on how the value of λ affects:
• Model complexity
• Parameter magnitude
• Overfitting behaviour
Part 3: Cross-Validation
To select optimal model parameters using cross-validation.
Perform 5-fold cross-validation on the training dataset to determine the optimal value of the regularization parameter λ. Consider the following values:
λ = {10^−4, 10^−3, 10^−2, 10^−1, 1, 10}
For each value of λ, compute the average validation error across all folds.
Plot validation error vs λ using logarithmic scale on the x-axis.
Select the best value of λ and retrain the model on the entire training dataset.
Evaluate the final model on the test dataset and report the test error.
Part 1: Support Vector Machine I
To implement a Support Vector Machine classifier.
Use any dataset
Train a linear Support Vector Machine (SVM) classifier using stochastic gradient descent. Report the learned weight vector and bias term.
Plot the data points and draw the decision boundary obtained by the SVM model.
Train SVM models for the following values of the regularization parameter:
C = {0.01, 0.1, 1, 10}
Plot the decision boundaries for each value of C.
Evaluate the model on the test dataset and compute the following metrics:
• Accuracy
• Precision
• Recall
Construct and display the confusion matrix for the classification results.
Part 2: SVM II [Bonus Question - Optional]
In part 1, you trained and evaluated an SVM classifier. In this question, you will compare the implementation of the SVM model from scratch with an existing library implementation.
Implement a linear Support Vector Machine from scratch using the hinge loss objective and stochastic gradient descent (SGD) or mini-batch updates. Train the model on the same dataset used in Part 1 and report the training and test accuracy.
Using a standard machine learning library (such as scikit-learn or LIBSVM), train an SVM classifier on the same dataset using:
• Linear kernel
• Gaussian (RBF) kernel
Compare the performance of your implementation with the library-based models in terms of:
• classification accuracy
• training time
• behaviour of the decision boundary