Harshit Srivastava

San Diego, CA | +1 (347) 335-8170 | hs3500@nyu.edu

I believe in 3 fundamentals: Curiosity, Code and Communication.

During my Masters in Computer Science, I've focused towards solving diverse range of problems using data. I enjoy working with complex real-world problems and using structured / unstructured datasets to solve them. I worked at a Hedge Fund where I built predictive models to optimize bond pricing in real-time and deploying the model in production environment on Google Cloud Platform. I also worked as Teaching Assistant (TA) of "Machine Learning for Cities" at NYU Center for Urban Science + Progress (CUSP).

I have experience of working with real world data in the following areas:
• Visualization
• Anomaly detection
• Machine Learning
• Deep Neural Networks
• Probabilistic Graphical models
• Big Data Analysis
• Image / Video / Scene Analysis

• Text Mining / Sentiment Analysis

Experience

Computer Scientist

California Medical Innovations Institute

Working under National Institute of Health’s (NIH) grant to develop system for curating medical research data

• Using Artificial Neural Networks to automate creation of file content that’s required to submit research data to NIH
September 2019 - Present

Quantitative Analyst Intern

Constellation Capital Management

Built complete infrastructure of quantitative/financial models to stream, process and present market data in real-time

• Fetched FINRA’s TRACE data from sources like Bloomberg, Interactive Brokers etc. and engineered pipeline for streaming of market data to Google Cloud Platform using Pub/Sub, Dataflow and BigQuery through python API
• Real-time prediction of bond prices using regression models and LSTMs using online training on Google Cloud’s GPUs
• Increased F1 score of price-movement forecasting by more than 25%, impacting multi-million-dollar trades daily

• Performed sentiment analysis on Twitter and News feed to stream relevant market information on the company website

Summer 2018, January - August 2019

Teaching Assistant

NYU Center for Urban Science + Progress

Applied Data Science - Professor Sobolevsky and Savage (Fall 2018)
Machine Learning for Cities - Professor Daniel Neil (Spring 2019)

• Assisted ~60 students in Data Science and Machine Learning related techniques, focusing on urban datasets

• Organized and grade the assignments on topics like regression, classification, clustering, time-series, deep learning etc.

Fall 2018, Spring 2019

Software Engineer

Accenture

Performed Data Analysis for SAP Security module, responsible for analyzing activity-monitoring tables and role database

• Automated security risk detection by running analytical jobs on SAP Security Audit logs and traces
• Created R scripts to verify user and security role consistency, reducing critical incident count by 50%

• Used SAP Scripts to automate bulk change requests during Infrastructure changes and migration projects

August 2015 - July 2017

Research Intern

University of Malaya, Kuala Lumpur

The field of research was Information Retrieval Evaluation. The project was carried out under the supervision of Dr Sri Devi Ravana, Sr. Lecturer, University of Malaya.

• Created scripts in “R Programming Language” to analyze data sets and performed statistical tests to prove our hypothesis
• Paper published based on the results obtained (http://www.informationr.net/ir/22-2/paper752.html)
Summer 2014

Education

New York University

Master of Science (M.S.) - Computer Science
Fall 2017 Courses:
1. Foundations of Data Science
2. Big Data Analytics

3. Information Security & Privacy

Spring 2018 Courses:
1. Machine Learning
2. Design & Analysis of Algorithms

3. Computer Vision & Scene Analytics

Fall 2018 Courses:
1. Deep Learning
2. Information Visualization

3. Cloud Computing

Fall 2017 - Spring 2019

Guru Gobind Singh Indraprastha University

Bachelor of Technology (B.Tech) - Information Technology

Relevant Courses:
• Data Structures
• Algorithms
• Database Management System
• Data Warehousing & Mining
• Object Oriented Programming using C++
• Java
August 2011 - June 2015

Massive Online Open Courses (MOOC)

Certifications / Independent Coursework

• Machine Learning by Andrew Ng (Coursera)
• Machine Learning and Reinforcement Learning in Finance (Coursera)
• Complete guide to TensorFlow for Deep Learning (Udemy)
• Bayesian Statistics (Coursera)

• SQL for Data Science (Coursera)

Skills

Programming Languages, Frameworks & Tools
Proficient: Python, R, Linux, PySpark, Hadoop
Working Knowledge: SQL, D3.js, C, C++, java, HTML, CSS

Familiar: Django, JavaScript, JQuery

Python Libraries: Numpy, Pandas, Scipy, Math, Matplotlib, Statsmodels, NetworkX, OpenCV, Sci-kit Learn, Tensorflow
Machine Learning: Bayesian Statistics, Classification, Regression, Clustering, Kernel methods, PCA, Neural Networks
Google Cloud Platform: DataFlow, Pub/Sub, ML Engine, BigQuery, Cloud Composer, Data Studio, StackDriver
Amazon Web Services: Rekognition, ElasticSearch, Elastic MapReduce, Lambda, Cognito, SNS, RDS

Tools: Jupyter Notebook, LaTeX, Git, Docker, SAP GUI 740, Sabrix, MDM, TriplePoint CSL, Microsoft Office

Experience in Data Science lifecycle:

  • Data Cleaning
  • Exploratory Data Analysis
  • Visualization
  • Hypothesis Testing
  • Predictive Modelling
  • Model Evaluation
  • Reproducibility

Experience in Software Development lifecycle:

  • Requirements Estimation
  • Code Development
  • Automating Processes
  • System Administration
  • Production Outage
  • Website Development
  • Documentation

Projects

Unsupervised denoising of images

Deep Learning by Prof. Iddo Drori

• Image restoration without using ground truth images
• Used U-Net and SRRESNET to reconstruct original image
• Used Bayesian optimization for hyper-parameter searching and trained on NYU's High Performance Computing using GPUs

• Obtained different Poisson Signal to Noise Ratio (PSNR) values for various noise settings

Facial recognition system for Identity Verification using AWS Components

Cloud Computing by Prof. Sambit Sahu

• Web application on Identity verification through facial features using AWS
• Used Rekognition, ElasticSearch and other AWS components to build training/prediction module

• Web portal for training and real-time deployment of Facial recognition system to be used in buildings, offices, schools etc.

Flight Delay Prediction

Machine Learning by Prof. Lisa Hellerstein

• Pre-processed data by creating bins, reducing and normalizing features
• Tried various Machine Learning models by performing grid search on hyperparameters
• Built randomized cross-validation function for getting accurate information about the model

• Obtained Root-Mean Squared error of ~400

“DERMASCAN” - Melanoma detection application

Presented at HackNYU - March 2018

• Won Google's Award for best use of Google Infrastructure in the hackathon
• Built a multi-platform application for melanoma detection with the hope of reducing initial costs for detecting skin cancer
• Prepared Image Classification model by doing transfer learning on Google's Inception model

• Obtained 94% cross-validation accuracy

Analyzing and predicting counter-terrorism impact in Pakistan

Foundations of Data Science by Prof. Rumi Chunara

• Text analysis on news clippings to discover high-frequency words used in the news reports of the Drone strikes
• SpearmanR test on correlation between terrorist strikes and Drone strikes, found high correlation (P-Value = 0.001) at lag of 1 day
• Discovered high number of Terrorist attacks that followed a Drone strike

• Used ARMA model for Time-Series forecasting to predict the number of Drone strikes in the coming years with Mean Forecast Error (MFE) = 0.01

NYPD Crime Data Analysis

Big Data Analytics by Prof. Claudio Silva

• Exploratory analysis of crime patterns across New York leveraging Big data tools like Hadoop, Spark and HPC
• Discovered interesting patterns - high crime count on Jan 1 and least crime count during Christmas week
• Found statistically significant correlation between crime count and temperature on a particular day

• Built ARMA model for Time-Series forecasting to predict the crimes in coming years with MFE = -0.45

AI Music Generator using TensorFlow

Independent project - using AI to compose music

• Used TensorFlow and Restricted Boltzmann Machines (RBMs) to build the model
• Trained the model on good old Jazz Progressions to generate similar jazz music
• Can be trained on any kind of music to generate relevant chord progressions

Document Level Assessment of Information Retrieval Systems

Research Internship under Dr. Sri Devi Ravana

• Proposed new method for evaluation of Information Retrieval systems
• Performed Student's T-Test to obtain significantly different pairs of IR systems
• Paper published based on the results obtained (http://www.informationr.net/ir/22-2/paper752.html)

Publications

Published papers in Journals and Conferences

Document Level Assessment of Document Retrieval Systems in a Pairwise System Evaluation

Journal Publication (2017)

Sri Devi Ravana, Prabha Rajagopal, Harshit Srivastava, Masumeh S Taheri (2017). Information Research (ISSN 1368 - 1613), Vol. 22, No. 2, June 2017

Security in Internet of Things: Challenges, Solutions and Future Directions

IEEE Conference Publication (2016)

Sathish Alampalayam Kumar, Tyler Vealey and Harshit Srivastava (2016). 49th IEEE Hawaii International Conference on System Sciences (HICSS), 2016, Koloa, HI, USA, January 5-8, 2016 (Pages 5772-5781)

Control Framework for Secure Cloud Computing

Journal Publication (2015)

Harshit Srivastava and Sathish Alampalayam Kumar (2015). Journal of Information Security, 6, 12-23

Research Opportunities and Challenges in Cloud Security

University Conference Publication (2014)

Harshit Srivastava (2014). 6th International Conference (ACSEICT-2014) held in Jawaharlal Nehru University, New Delhi, India. Published in Advances in Computer Science and Information Technology (ACSIT) Volume 1 Number 2, p. 13-17