This project was done over the course of a semester in CSE 432/532: Machine Learning.
It focused on classifying human activities using data from four sensors: accelerometer,
gyroscope, magnetometer, and pressure sensor. The data came from 100+ experiments from a dozen users, and
the goal was to preprocess it, extract features, and train machine learning models to identify
activities like walking, running, and typing.
I started by organizing the raw sensor data, conducting exploratory data analysis (EDA), and cleaning it up.
Missing values were handled using interpolation, filling techniques, and dropping any leftover columns with
too many NaNs. To avoid data leakage, I split the data into training and testing groups based on users. I also
standardized all features and used SMOTE to address class imbalances.
Feature extraction was a big part of this project. I split the time-series data into overlapping windows, with
sizes from 100 to 500 samples and overlaps of 25% or 50%. From each window, I extracted 114 features, including
things like mean, variance, skewness, and kurtosis. These features were consistent across both training and testing
datasets. Feature selection was performed to identify the most relevant features, using mutual information and
correlation analysis to reduce redundancy and improve model performance.
I used the selected features to train models like Logistic Regression, SVM, Random Forest, KNN, and XGBoost.
I tuned their hyperparameters with GridSearchCV or RandomizedSearchCV and evaluated them using metrics like
F1 Score, Cohen's Kappa, and Precision. SVM and Random Forest performed the best in my testing.
This project had its challenges—things like computational limits and handling redundant features—but it taught
me a lot about working with time-series data and machine learning workflows. Overall, it was a excellent learning
experience and I am excited to use what I learned here in future projects.