Selected Projects
Data Analysis
Using Patient Data to Predict COVID-19 Fatality Risk: A Logistic Regression Approach
Built an explainable logistic regression model to predict COVID-19 mortality using patient demographics, symptoms, and comorbidities. Handled missing data, performed stepwise feature selection, and achieved 89.6% accuracy. Generated interpretable insights—highlighting key risk factors such as age, pneumonia, and chronic diseases—supporting clinical decision-making and early risk stratification. View full PDF report with plots and tablesHousing Price Prediction Model for Modern Homes in New Taipei City
Developed a linear regression model using filtered real estate data (houses ≤ 20 years old) from Sindian district, New Taipei City (2012–2013). Identified key pricing factors including transaction date, proximity to MRT, latitude, house age, and local amenities. Achieved RMSE of 8.50 and MAPE of 17.7%, enabling practical price estimation and insights for urban planning and investment. View Full PDF report with plots and tablesTitanic Passenger Survival Prediction Using Classification Models
Developed a logistic regression model to predict passenger survival on the Titanic using cleaned historical data. Identified key predictors—passenger class, sex, age, and number of siblings/spouses aboard—via stepwise feature selection. Achieved 80.9% accuracy, with clear interpretability highlighting lower survival rates for males, older individuals, and lower-class passengers. Simulated survival probabilities for common passenger profiles to support model validation and insight communication. View full PDF report with plots and tablesPredicting Household Income in Later Life: Insights from the SHARE Survey in Sweden
Analyzed household income determinants among older Swedish adults using simulated SHARE data (n = 420). Built a log-linear regression model identifying key predictors such as household size, partner status, sex, retirement, and early-life education indicators. Achieved moderate model accuracy (MAPE: 42.7%, RMSE: 13,851 SEK), highlighting long-term socioeconomic influences on income. View full PDF report with plots and tablesMammalian Sleep Data Analysis
Performed exploratory data analysis on the msleep dataset: cleaned data, visualized patterns using ggplot2, and extracted insights on sleep behavior across mammal categories. View full PDF report with plots and tablesStatistical Analysis and Modeling of the Iris Dataset in R
Analyzed the Iris dataset in R using statistical and visual techniques. Built regression models to predict sepal length and explored species-specific patterns. Applied the model for practical prediction. View full PDF report with plots and tablesPredictive Modeling of Diabetes Onset in Pima Indian Women
Built a logistic regression model using EDA, mean imputation, and AIC-based variable selection to predict diabetes onset with 77.4% accuracy. View full PDF report with plots and tablesKey skills: R, Tidyverse, Data wrangling, Data visualization, Exploratory data analysis(EDA), Correlation analysis, Linear regression, Logistic regression, Statistical modeling, Model evaluation.
Programming
Game Design and Analysis: The Temple of Treasure Hunt
Developed a console-based treasure hunting game where players search for hidden rewards behind doors, with performance affecting the final prize. Implemented file handling to save player results, a leaderboard to track top scores, and matplotlib visualizations to display player attempts, combining game design, programming logic, and basic statistical analysis. View full PDF reportPython Programming and Data Analysis in the Energy Finance Market: A Case Study with OKQ8 Fuel Prices
Conducted Python-based analysis of fuel price trends (2015–2021) for gasoline, E85, and diesel using OKQ8 data. Implemented CSV parsing and manual statistical calculations, and visualized results with matplotlib. Developed complementary programs for algorithmic problem solving, user interaction, and simulations (e.g., savings growth, FizzBuzz, temperature conversion), demonstrating applied skills in programming, data analysis, and financial market visualization. View full PDF reportPython Programming and Data Analysis with the 1974 Motor Trend Car Dataset
Developed Python programs for numerical problem-solving, algorithm design, and data analysis. Implemented custom statistical functions and applied CSV handling with matplotlib to analyze and visualize the 1974 Motor Trend car dataset, demonstrating skills in clean coding, data structuring, and manual computation without built-in functions. View full PDF reportStructured Python Programming and Data Visualization with an Auto-Generated Dataset
This assignment series comprises nine structured Python tasks covering control flow, user input, custom functions, numerical logic, and data structure manipulation. One exercise generates 50 random integers using the random module, stores and sorts them into separate lists, and visualizes both sequences in a scatter plot with matplotlib, emphasizing modularity, computational thinking, and practical data visualization skills. View full PDF report- Word Counting
Built a simple text processing tool in Python to tokenize text, remove stopwords, and compute word frequencies. Practiced core programming concepts and explored basic NLP techniques like statistical word analysis. View full PDF report Numeral Base Evaluator
Created a Python script to convert numbers between binary, decimal, hexadecimal, and nonary bases. Built logic dictionaries and implemented conversion with error handling. View full PDF report- Key tools: Python, dictionaries, lists, string operations, Loops, conditionals, functions, file handling,
- Key skills Python Programming Algorithm Design, Numeral System Conversion, Data Validation & Error Handling, Dictionary & List Manipulation, Mathematical Reasoning, Functional Decomposition, Modular Code Structure
Machine learning
- Proposing and implementing a novel deep learning model by using feedforward neural networks and bidirectional long short-term memory (LSTM) to improve the prediction accuracy of protein secondary structure prediction based on benchmark datasets
- Proposing and implementing Reinforcement learning (RL) algorithms such as Q-learning and deep reinforcement learning (DRL) algorithms such as double Deep Q-network (DDQN) for energy management system (EMS) to improve the vehicle energy efficiency and convergence rate based on the data from Alternative Fuels Data Center (AFDC).
- Proposing and implementing the bidirectional long short-term memory (LSTM) network introduced to improve the deep Q-learning-based EMS (Energy manage system) improve the energy efficiency based on the synthesized data.
Software engineering
- Using Solidworks to create detailed 3D models with complex shapes and geometries, using parametric modeling to govern the behavior of the model and using simulation tools to test and analyze the designs virtually.
- Using Python to simulate real-time sensor data processing and analysis, aiding in decision-making for autonomous systems, predictive maintenance, and performance optimization.
- Using Python to implement the coding, running and testing in the task management system project and using Git to track the project’s version history