Selected Projects
Data Analysis
Using Patient Data to Predict COVID-19 Fatality Risk: A Logistic Regression Approach
Built an explainable logistic regression model to predict COVID-19 mortality using patient demographics, symptoms, and comorbidities. Handled missing data, performed stepwise feature selection, and achieved 89.6% accuracy. Generated interpretable insights—highlighting key risk factors such as age, pneumonia, and chronic diseases—supporting clinical decision-making and early risk stratification. View full PDF report with plots and tablesHousing Price Prediction Model for Modern Homes in New Taipei City
Developed a linear regression model using filtered real estate data (houses ≤ 20 years old) from Sindian district, New Taipei City (2012–2013). Identified key pricing factors including transaction date, proximity to MRT, latitude, house age, and local amenities. Achieved RMSE of 8.50 and MAPE of 17.7%, enabling practical price estimation and insights for urban planning and investment. View Full PDF report with plots and tablesTitanic Passenger Survival Prediction Using Classification Models
Developed a logistic regression model to predict passenger survival on the Titanic using cleaned historical data. Identified key predictors—passenger class, sex, age, and number of siblings/spouses aboard—via stepwise feature selection. Achieved 80.9% accuracy, with clear interpretability highlighting lower survival rates for males, older individuals, and lower-class passengers. Simulated survival probabilities for common passenger profiles to support model validation and insight communication. View full PDF report with plots and tablesPredicting Household Income in Later Life: Insights from the SHARE Survey in Sweden
Analyzed household income determinants among older Swedish adults using simulated SHARE data (n = 420). Built a log-linear regression model identifying key predictors such as household size, partner status, sex, retirement, and early-life education indicators. Achieved moderate model accuracy (MAPE: 42.7%, RMSE: 13,851 SEK), highlighting long-term socioeconomic influences on income. View full PDF report with plots and tablesMammalian Sleep Data Analysis
Performed exploratory data analysis on the msleep dataset: cleaned data, visualized patterns using ggplot2, and extracted insights on sleep behavior across mammal categories. View full PDF report with plots and tablesStatistical Analysis and Modeling of the Iris Dataset in R
Analyzed the Iris dataset in R using statistical and visual techniques. Built regression models to predict sepal length and explored species-specific patterns. Applied the model for practical prediction. View full PDF report with plots and tablesPredictive Modeling of Diabetes Onset in Pima Indian Women
Built a logistic regression model using EDA, mean imputation, and AIC-based variable selection to predict diabetes onset with 77.4% accuracy. View full PDF report with plots and tablesKey skills: R, Tidyverse, Data wrangling, Data visualization, Exploratory data analysis(EDA), Correlation analysis, Linear regression, Logistic regression, Statistical modeling, Model evaluation.
Programming
Temple of the Treasure Hunt – A Console-Based Treasure Game with Leaderboard & Data Visualization
Developed an interactive Python game where players search for hidden treasure behind one of ten doors in a mysterious temple. The game tracks player performance, stores results in a CSV file, and visualizes statistics with a bar chart using Matplotlib. Implemented file handling, weighted averages, and leaderboard ranking based on fewest attempts. The game integrates user input, randomization, and performance analysis, offering both engaging gameplay and insight into basic data science concepts. View full PDF reportPython Programming Foundations – Practical Problem Solving and Visualization
Developed a comprehensive suite of Python programs across nine distinct tasks focused on mastering fundamental programming constructs and problem-solving skills. The project included implementation of control flow logic (conditionals, loops), user-defined functions, data type handling, string processing, numerical computation, and basic algorithm design. View full PDF reportPython Programming Project: Data Analysis, Algorithm Design & Visualization
Developed a multi-part Python project covering user input processing, algorithm development, and real-world data analysis. Implemented custom logic for statistical calculations and debugging, and created a basic bilingual word translator without external libraries. Analyzed and visualized classic car data using manual data aggregation and matplotlib, enhancing core programming, data handling, and visualization skills. View full PDF report- Word Counting
Built a simple text processing tool in Python to tokenize text, remove stopwords, and compute word frequencies. Practiced core programming concepts and explored basic NLP techniques like statistical word analysis. View full PDF report Numeral Base Evaluator
Created a Python script to convert numbers between binary, decimal, hexadecimal, and nonary bases. Built logic dictionaries and implemented conversion with error handling. View full PDF report- Key tools: Python, dictionaries, lists, string operations, Loops, conditionals, functions, file handling,
- Key skills Python Programming Algorithm Design, Numeral System Conversion, Data Validation & Error Handling, Dictionary & List Manipulation, Mathematical Reasoning, Functional Decomposition, Modular Code Structure
Machine learning
- Proposing and implementing a novel deep learning model by using feedforward neural networks and bidirectional long short-term memory (LSTM) to improve the prediction accuracy of protein secondary structure prediction based on benchmark datasets
- Proposing and implementing Reinforcement learning (RL) algorithms such as Q-learning and deep reinforcement learning (DRL) algorithms such as double Deep Q-network (DDQN) for energy management system (EMS) to improve the vehicle energy efficiency and convergence rate based on the data from Alternative Fuels Data Center (AFDC).
- Proposing and implementing the bidirectional long short-term memory (LSTM) network introduced to improve the deep Q-learning-based EMS (Energy manage system) improve the energy efficiency based on the synthesized data.
Software engineering
- Using Solidworks to create detailed 3D models with complex shapes and geometries, using parametric modeling to govern the behavior of the model and using simulation tools to test and analyze the designs virtually.
- Using Python to simulate real-time sensor data processing and analysis, aiding in decision-making for autonomous systems, predictive maintenance, and performance optimization.
- Using Python to implement the coding, running and testing in the task management system project and using Git to track the project’s version history