Modelling individuals' income using data collected from the 1994 U.S. Census. Final goal is to construct a model that accurately predicts whether an individual makes more than $50,000. Such scenarios arise in a non-profit setting, where organizations survive on donations.
Segmenting customers based on channel and region
Identifying key factors leading to 8th grade student performance in Indian schools.
Visualization design principles
Email between Enron employees are used to make a predictive model for identifying POI using scikit-learn package in Python.
natural language processing
Univariate, Bivariate, Multivariate plots and summary statistics to explore data and relationships between them. Data Visualizations are used to compare and identify trends.
Audited OpenStreetMap data for validity, accuracy, completeness, consistency and uniformity.
Cleaned data from many large files, stored, queried, and aggregated data using MongoDB
Analyzed the data set from Stroop task using hypothesis testing to differentiate the effect of congruent and incongruent words.
T - test
Exploratory Data Analysis of Titanic Data
Go Bike is a company that provides on-demand bike rentals for customers in San Francisco. Based on their open dataset, created visualizations and performed an exploratory data analysis.