HARSH SHAH
Phone: +1 703 689 9945 EXT 141
[email protected] Data Scientist/Python Developer
PROFESSIONAL SUMMARY:
Around 7 years of experience as a Data Scientist with Machine Learning, Data mining with large datasets of
Structured and Unstructured data, Data Acquisition, Data Validation, Predictive Modeling, and Data
Visualization.
Good understanding of Systems Development Life Cycle (SDLC), Agile, and waterfall methodologies.
Experience in complete Software Development Life Cycle including Analysis, Design, Development,
Testing and Implementation using Python, Django, and Flask technologies.
Good knowledge of the use of Pandas, NumPy, Seaborn, SciPy, matplotlib, and sci- kit-learn in Python for
developing various machine learning algorithms.
Good experience in statistical programming languages like Python, R, SQL, and SAS.
Proficient in Machine learning algorithms like Linear Regression, Logistic Regression, Decision Trees,
Supervised Learning, Unsupervised Learning, Classification, SVM, Random Forests, Naive Bayes, KNN,
K Means, and CNN.
Experience in configure and managing Amazon AWS Cloud Services, EC2, S3, EBS, ELB, Elastic IP,
RDS, SNS, SQS, Glacier, IAM, VPC, Cloud Formation, Lambda, Cloud Front, Route53, Cloud
Watch, AWS Code Commit, AWS Code Deploy.
Experience as a Web/Application Developer and coding with analytical programming using Python, Django,
Flask, API, MS-SQL.
Experience with Docker containers and container orchestration systems such as Confidential ECS, Kubernetes
and Docker Swarm.
Experience in writing SQL, PL/SQL and Stored Procedures for accessing and managing databases
queries with Oracle, MySQL, NoSQL, MongoDB, SQL Server, and DB2.
Experience with AWS Platform (Services: S3, RDS, Cloud front, Route53, API Gateway, Lambda, PyTests
for Athena, Quick sight, Cloud Watch, Cloud Trail, SNS, IAM, KMS, Glue).
Experience on REST service's invocations microservice using Docker, Kubernetes, consul, Swagger, http4s,
type level Scala.
Experience in developing MVC web-based applications using Python web frameworks like Django, Pyramid,
Flask and Web2Py.
Experienced in web applications development using Django, Flask Pytorch and Node.js, Angular.js, DOJO,
jQuery.
Experienced in working with various Python IDE's using PyCharm, PyScripter, Spyder, PyStudio and PyDev.
Experience in integrating Docker Swarm into Docker Engine to orchestrate and schedule containers.
Experienced in working with various Python IDE's using PyCharm, Scripter, Spyder, Anaconda, Studio and
Peeved.
Experience working in WAMP (Windows, Apache, MYSQL) and LAMP (Linux, Apache, My SQL) Architecture.
Experienced with databases using ORMs/DOMs for integrating with Postgres, Neo4J, MongoDB, and
Cassandra SQLite
Experience in writing UNIX Shell Scripts and automation of the ETL processes using UNIX Shell Scripting and
SQL.
Extensive experience in JAVA/J2EE technologies like Core Java, Servlets, JSP, JSTL, JDBC, Hibernate,
Spring, Struts, Web Services, JMS, multi-threading, MVC architecture and Design Patterns
Experienced in developing API services, NodeJS while leveraging AMQP and RabbitMQ for distributed
architectures.
Well-Versed in data visualization tools such as Tableau, and Power BI.
Working experience in writing Sub Queries, Stored Procedures, Triggers, Cursors, and Functions in MySQL.
SKILLS:
Languages: Python (NumPy, Pandas, SciKit-Learn, Matplotlib, Seaborn), R, JavaScript, HTML/CSS, UNIX, SQL, Java
Frameworks: Django, Flask, React, Hadoop
Libraries: Pandas, NumPy, Scikit-learn, Beautiful Soup, Keras
Statistical methods: Hypothesis Testing, ANOVA, Principal Component Analysis (PCA), Time Series, Correlation (Chi-
square test, Covariance, Multivariate Analysis, Bayes Law).
Machine learning: Linear Regression, Logistic Regression, Naïve Bayes, Decision Trees, Random Forest, Support
Vector Machines (SVM), K-Means Clustering, K-Nearest Neighbors (KNN), Random Forest, XGBoost, PCA, SMOTE.
Deep learning: Neural Networks, Convolutional Neural Networks, RNN, LSTM, Bi-LSTM
Natural language processing: Tf-IDf, SVD, LDA, Word2Vec, GloVe, BERT, ELMO
Databases: PostgreSQL, MongoDB, GraphQL
Data visualization: Tableau, Python (Matplotlib, Seaborn)
Tools: Docker, Tableau, Tensor flow, Keras, AWS Sagemaker, GCP, Docker, NLTK, SpaCy, Gensim, MS Office Suite,
GitHub, AWS (EC2/S3/Redshift/Lambda).
Architecture: REST, Microservices, MVC
WORK EXPERIENCE:
IBM, San Francisco, California May 2022 –
Present
Python Developer
Responsibilities:
• Involved in the development of backend Logics or data access logic using Oracle DB & JDBC.
• Involve in Agile software development life cycle (SDLC) with SCRUM methodology.
• Involved in building database Model, APIs, and Views utilizing Python technologies to build web-based
applications.
• Involved in software development in Python (libraries used: Beautiful Soup, NumPy, SciPy, matplotlib,
Panda’s data frame, network, urllib2, MySQL dB for database connectivity) and IDEs - sublime
text, Spyder, PyCharm.
• Involve in Agile software development life cycle (SDLC) with SCRUM methodology.
• Deployed cloud stacks using AWS S3 and EC2 instances & created multi-AZ VPC instances.
• Developed CRUD applications using MERN stack (MongoDB, Express, ReactJS and NodeJS) and REST based
API.
• Deployed the project into Heroku using GIT version control system.
• Developed, and designed a dashboard control panel for customers and Administrators using Django,
HTML, CSS, JavaScript, Bootstrap, jQuery and RESTAPI calls.
• Implemented AWS solutions using DynamoDB, EBS, Elastic Load Balancer, Auto scaling groups.
• Implemented Set up CI/CD pipeline for efficient and automated production deployments using DevOps
deployment and monitoring tools like Drone, Docker and Kubernetes.
• Developed Restful Microservices using Flask and Django and deployed on AWS serverless using EBS and EC2
• Create webservice component using REST, SOAP, WSDL, XML and XSLT to interact with the middleware.
• Developed Web Applications using Python and Django Framework.
• Worked on Django ORM module for signing complex queries. Developed an API that asynchronously
distributes task using RabbitMQ and Celery.
• Worked with Oracle RDBMS for writing complex queries and PL/SQL, SQL for Stored Procedures, Triggers and
Events, for generating some important responses needed by the application at times.
• Developed SQL queries with PHP Doctrine and Propel to create, retrieve and update data and programming
SQL sentences based on CRUD.
• Developed International Error Correction Screens using custom front end forms using XML, XSLT, Angular JS and
jQuery.
• Worked on application development using Oracle SQL, PL/SQL, Oracle Forms/Reports and Informatica ETL.
• Created mock-ups using web technologies like HTML/HTML 5, CSS/CSS3, Sass, jQuery & JavaScript, Git,
GitHub.
• Used Eclipse IDE for Java and XML development and QA and branched projects using Tortoise SVN.
• Design and implement disaster recovery for the PostgreSQL Database.
• Wrote unit test cases in Python and Objective-C for other API calls in the customer frameworks.
• Performed end to end testing using Selenium web driver, Jbehave and Testing.
• Automated environmental deployment for NIX using Jenkins server and wrote automated unit testing with Jest.
Environment: JavaScript, HTML5, CSS3, Angular 14, React, Redux, machine learning, data science, TensorFlow, PyTorch,
scikit- learn, NumPy, Es6, typescript, JSON web Java, Postgres, MySQL, SQL, PL/SQL, PostgreSQL, Cassandra, Rest,
PyCharm, GitLab, Git, unit test, Selenium, Jest, Mockito, Agile, Scrum, AWS EC2, S3, RDS, DynamoDB, Google Cloud
Platform, Maven, JSON, TFS, Linux, Nginx, Slack, Zoom, Docker, Jenkins, Travis, Ansible, Kubernetes, Docker Swarm.
Arihant AI, India Aug 2020 – July 2021
Python Developer
Responsibilities:
• Involved in Regression testing by following Agile-Scrum, Kanban and Waterfall software development.
• Involved in Continuous Integration and delivery prime responsible for creating a structured load build a delivery
system utilizing Git, Jenkins and Docker Registry to increase productivity and software quality.
• Involved in analysis, design and developing front end/UI using JSP, HTML, DHTML, JavaScript, jQuery and
AJAX.
• Involved in migrated CI/CD processes using Cloud Formation and Terraform, packer Templates and
Containerized the infrastructure using Docker, which was setup in OpenShift, AWS and VPCs.
• Developed Java UI JFC/Swing screens and components using NetBeans. Assisted with development of web
applications Flask, Pyramid, and Django.
• Develops MVC architecture using Django, Servlet and RESTful, SOAP web service and SOAPUI and also
Developed user interface solutions using a Django web framework.
• Involved in Designing and developed machine learning functions for different data filtering, classifying, and
clustering.
• Implemented a JavaScript front end for a Purchase Order processing applications built using Angular JS and
Implemented AWS solutions using DynamoDB, EBS, Elastic Load Balancer, Auto scaling groups.
• Implemented REST APIs in Python using micro-framework like Flask with SQL Alchemy in the backend for
management of data center.
• Implemented the application using the concrete principles laid down by several Java/J2EE Design patterns like
Business Delegate, MVC, Session Façade, Factory Method, Service Locator, Singleton and Data
Transfer Objects (DTO).
• Implemented Microservices architecture in developing the web application with the help of Flask framework.
• Developed frontend and backend modules using Python on Django Web Framework with GIT.
• Developed Supervised and Unsupervised learning using high end python libraries like sci-kit learn,
TensorFlow, pytorch, NumPy.
• Developed user interface of the web application using HTML, CSS.
• Wrote custom user defined functions in JavaScript to validate application functionalities/features.
• Created low thread count super-fast log file entries to SNMP traps or REST service's invocations microservice
using Docker, Kubernetes, consul, Swagger, http4s, type level Scala.
• Worked in AWS Cloud platform and its features which include EC2, RDS, DynamoDB, S3, and
CloudFormation.
• Performed end to end testing using Selenium web driver, Jbehave and Testing.
• Proficient with various Python IDE's using PyCharm, PyScripter, Spyder, PyStudio and PyDev.
• Worked on application development using Oracle SQL, PL/SQL, Oracle Forms/Reports and Informatica ETL.
• Instructed Quake Finder data science team on how to setup and train a model using TensorFlow.
• Worked on building custom CMSs to move off of word press being built using NodeJS and ReactJS.
• Using IDEs like Eclipse, NetBeans and version control tools like Mercurial, SVN, and Git.
Environment: Django, ORM, Microservices, Pandas, Flask, Python API, Celery, Tornado, SQL Alchemy, JavaScript,
HTML5, CSS3, Angular, React, Redux, machine learning, data science, TensorFlow, PyTorch, scikit-learn, NumPy, Es6,
typescript, JSON web token, Java, MySQL, SQL, PL/SQL, PostgreSQL, DynamoDB, Rest, PyCharm, Git, , Selenium,
Agile, Scrum, AWS, Flask EC2,Google Cloud Platform (GCP), JSON, Linux, Jenkins, Kubernetes, Docker
Arihant AI, India June 2016 – July 2020
Python Developer
Responsibilities:
• Involved in development of entire frontend and backend modules using Python on Django Web Framework.
• Involved in Python Developer worked on various micro web applications development using Flask and SQL
Alchemy.
• Involved in R&D tasks; researched and documented on Docker.
• Involved in building database Model, APIs, and Views utilizing Python technologies to build web-based
applications.
• Involved in design, implementation and modifying the Python code and MySQL database on-the back end.
• Developed and tested many features for dashboard using Python, Bootstrap, CSS, JavaScript and jQuery.
• Developed data engineering and ETL python scripts for ingestion pipelines which run on AWS
infrastructure setup of EMR, S3, Glue and Lambda.
• Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like
S3, ORC/Parquet/Text Files into AWS Redshift.
• Implemented custom python machine learning algorithms and modified python open-source algorithms.
• Designed and implemented open-source AI frameworks like PyTorch, TensorFlow, Scikit-learn.
• Used Docker container deploying micro services and scaling the deployment using Kubernetes.
• Worked on CI/CD pipeline and created sandbox, UAT and Production Environments in Google Cloud Platform.
• Used Kubernetes to orchestrate the deployment, scaling and management of Docker Containers.
• Created a Spark cluster on AWS EC2 and integrated with I Python to provide team with machine learning
environment.
• Developed consumer-based features and applications using Python and Django in test driven
Development and pair-based programming.
• Used Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark.
• Used different python libraries like Pandas and NumPy for various analysis.
Environment: Python, Django, Angular, TypeScript, Node JS, NPM, JSON, Mongo DB, AWS, Boto API, Shell Scripting, XML,
Jenkins, PyUnit, GIT, Docker, Jira, Agile.
EDUCATION:
Master of Science, Data Science – University of the Pacific, Stockton, California
Bachelor of Engineering (BE), Computer Engineering – Gujarat Technological University, India
INVOLVED OTHER PROJECTS:
Web Crawler and Text Classification (Python, Pandas, NumPy, Beautiful Soup, Keras) GitHub
Developed a web crawler using Python and Beautiful Soup, successfully extracting over 50,000 keyword-based
HTML files from multiple diverse websites, averaging a crawl rate of 100 pages per minute.
Designed and trained a text classification model using a deep learning architecture, achieving an accuracy
of 92% in classifying HTML files into 'good' and 'non-relevant' categories, reducing manual review efforts by 80%.
Utilized transfer learning and fine-tuning with pre-trained language models, resulting in a 30% reduction in model
training time.
Telecom Customer Churn Prediction (Python, Pandas, Scikit-learn, Matplotlib) Github
Engineered and executed a customer churn prediction system using Python, Scikit-learn, and pandas,
achieving an accuracy of 82% through feature engineering and model optimization techniques.
Utilized data visualization with Matplotlib to present actionable insights to stakeholders, enabling the identification
of key factors influencing churn and facilitating data-driven decision making for customer retention strategies.
Customer Segmentation Using Credit Card Data (Python, Pandas, Scikit-learn, Matplotlib) Github
Conducted customer segmentation analysis using credit card transaction data for a major financial institution,
resulting in a 15% increase in targeted marketing campaign effectiveness.
Utilized advanced clustering algorithms, such as k-means and hierarchical clustering, to identify distinct
customer segments based on spending behavior, leading to a more personalized approach in customer
engagement.
Global Health Expenditure Profile (Tableau, Data Visualization, Storytelling) Dashboard
Created an interactive Tableau dashboard showcasing global health expenditure trends and insights,
contributing to evidence- based decision-making in the healthcare sector.