Projects


Bertelsmann Arvato Customer Segmentation and Prediction

Summary of the project

The motivation behind this project is to use unsupervised learning techniques to uncover the relationship between the demographics of the company’s existing customers and the general population of Germany. This will enable the identification of which parts of the population are more likely to be customers of the mail-order company and which are less so. The next step involves building a prediction model using demographic information from individuals that were part of a mail-order marketing campaign to decide whether or not they would successfully convert into customers. By the end of the project, we aim to provide insights that will help the mail-order company better target its campaigns and increase its customer base.


NYC Electricity Demand Forecasting

Summary of the project

For this Project I wanted to analyze the NYC electricity demand enriched using NOAA weather data for NYC & answer the following questions:

  • What effect does date & time play in influencing daily electricity demand in NYC?
    • What are the Seasonal (daily, hourly, monthly, etc) patterns in the data?
    • Do holidays or events play any role?
  • How viable are traditional ML algorithms in forecasting Day Ahead Elctricity Demand?
  • How important are NYC Weather Variables as predictors to forecast energy\electricity demand?

Disaster Response Pipeline

Summary of the project

This project is part of the Udacity Data Scientist Nanodegree in collaboration with Figure Eight. The project demonstrates development and deployment of ETL and Model Pipelines to classify Disaster Messages into categories. The messages and the categories are both provided in the form of labelled messages. The messages and categories data was then cleaned and stored in a sqlite database. The ML pipeline processes the text data, trains the classifier and saves the fitted classifier.


SageMaker Sentiment Analysis Web App

Summary of the project

The notebook and Python files provided here, once completed, result in a simple web app which interacts with a deployed recurrent neural network performing sentiment analysis on movie reviews. Includes a binary classification neural network model for sentiment analysis of movie reviews and scripts to deploy the trained model to a web app using AWS Lambda.


SageMaker Plagiarism Detection

Summary of the project

Includes a binary classification neural network model implemented using pytorch as well as an AdaBoostClassifier ensemble model implemented using sklearn used to detect plagiarism. Feature engineering involved containment and longest common subsequence calculation for the text data.


Python Package for Vulnerability Index Calculation

Summary of the project

This package aims to create a collection of various Indices commonly used to calculate socio-economic performance and vulnerability.