HARITHA REDDY KAITHI
Sr. Data Scientist/ Data Analyst
Phone: 281-529-5059
Email: harithadatascience@gmail.com
PROFESSIONAL SUMMARY:
• Around 10+ years’ experience as a Data Scientist with strong technical, business, and communication skills.
• Responsible for creating on-demand tables on the Redshift database using S3 files and Lambda Functions
using Python.
• Extensively involved in data preparation, exploratory analysis, and predictive modeling.
• Experiences with Machine Learning, Semantic Web, and Natural Language Processing using Word Embed-
ding like Word2Vec, tf-idf, and Glove Method.
• Experienced in collaborating with Python packages like NumPy, Matplotlib, Beautiful Soup, SciPy, Pandas.
• Responsible for implementing data pipelines for various business problems.
• Used the AWS Sage Maker to quickly build, train, and deploy the machine learning models.
• Exceptional understanding of analytics concepts and Supervised Machine Learning algorithms like Logistic
Regression, Linear regression, K-Nearest Neighbors, Naïve Bayes, Support Vector Machines, Decision
Trees, and Ensemble models: Random Forests, Gradient Boosted Decision Trees, Stacking Models
• Direct advanced SQL experience in summarizing, transforming, segmenting, and joining datasets.
• Well experienced in Normalization & De-Normalization techniques for optimum performance in relational
and dimensional database environments
• Experienced in working with imbalanced datasets and leveraged resampling methods such as Oversam-
pling and Under sampling techniques to build models.
• Strong Experience with Data build tools in working with Databases like Lambda, SQL server, and MYSQL
and proficiency in writing SQL queries.
• Experienced in several projects that incorporate Gen AI-driven solutions into business processes, signifi-
cantly improving performance metrics and customer satisfaction.
• Designed and implemented machine learning solutions with tasks ranging from feature selection, model
building, maintaining the models, and retraining the models.
• Strong Experience with Databricks admin clusters dealing with cluster setup, optimization and ensuring
that the platform runs smoothly.
• Worked on cluster configuration and security using data bricks.
• Experienced working with Agile Scrum and Waterfall Models
• Involved in various phases of the Software Development Life Cycle (SDLC) such as requirements gathering,
modeling, analysis, and design.
• Worked in a high-performing transformational team shaping the next-generation data & BI platforms.
• Designed, developed, and maintained dashboards and reports providing actionable business intelligence
and data analytics for operational support using Tableau, Power BI, and Business Objects
• Adept in employing data visualization tools such as Tableau and Python libraries Matplotlib, Seaborn, and
Plotly to create visually appealing plots and interactive dashboards.
SKILLS:
CATEGORY SKILLS
Programming Python, R, SQL, Scala, Unix Scripting, Spark
Web Technologies HTML, CSS, JavaScript, and Bootstrap
Data Visualization Tools Tableau, Power BI, Advanced Excel, and Data Studio
Database Systems MySQL, MongoDB, SQL, SQL Server
Libraries NumPy, Scikit-Learn, TensorFlow, and Pandas
Cloud Tools Amazon Web Services (AWS), Azure
Version Control Git
Big Data Technologies Hadoop Ecosystems (HDFS, Hive, Map Reduce, Pig)
Tools Google Cloud Platform (GCP), GIT Regression, NLP, Bert, GPT-3,
GPT -4, Clustering, Large Language Models (LLMs) with Hugging Face and Lang Chain,
Transformers, and Speech Recognition.
Machine Learning Supervised Learning with Logistic Regression, Random Forests, SVM, XGBoost,
Algorithms GBM etc. Deep Learning: Convolutional Neural Networks, Sequence modeling,
LSTM, And GRU. Unsupervised learning with K-means, K-median and Hierarchical
clustering
PROFESSIONAL WORK EXPERIENCE:
SR. DATA SCIENCE ANALYST- Cielo Talent, Brookfield, WI May 2022 - Present
Perform SQL and Python queries within the internal warehouse data repository or raw retailer data
repository to perform data analysis.
Retrieve, manipulate, chart, and visualize data to determine a diagnosis and resolution.
Provide and hoc analysis to internal teams and clients by presenting findings with trends and patterns
of current data.
Orchestrated the development of an AI-driven avatar utilizing GPT-4's advanced capabilities and
OpenAI's Whisper technology, pioneering new methods
in prompt engineering to simulate real-time participation in Zoom meetings, enhancing remote
communication and collaborative experiences.
Expertise in prompting Large Language Model GPT-4, GPT-3, LLAMA2
Led end-to-end execution of data science projects, ensuring timely delivery of insights and models
through Agile methodologies (Scrum & Kanban).
Designed and implemented machine learning models using deep learning frameworks (TensorFlow,
Keras) to predict sales patterns, improving inventory management efficiency by 25%.
Initiated the deployment of advanced Large Language Models (LLMs), including ChatGPT and Llama2,
to craft content that resonates with brand values,
surging customer engagement by 40%.
Advocated for responsible use of Generative AI technologies, developing guidelines for ethical AI
deployment in projects and ensuring compliance with best practices.
Using Python on Jupyter Notebook to build the screening procedure, change features of drug and get
the cost and performance of the drug candidates.
Optimized data center operations by implementing efficient resource allocation and workload
management.
Contributing both mathematical and programming support for advanced NLP challenges
for live stream and post-processed incident data.
Integrated Dialog flow chatbots with various messaging platforms (e.g., Google Assistant, Slack,
Facebook Messenger) to extend reach and improve user engagement.
Contributed to the development of data-driven models for predicting the most cost-effective
configurations in S/4 HANA based on customer orders and historical data.
Designed and Implemented Machine Learning Models for predictive analytics, leveraging SAS Viya’s
capabilities to achieve [X]% accuracy improvements in key predictive tasks.
Automated data extraction from SAP S/4 HANA using ABAP and integrated it with Python for further
analysis, reducing manual data handling by 30%.
Knowledge of MLOps frameworks such as Kubeflow or MLflow.
Experience with natural language processing (NLP), computer vision, or time-series forecasting.
Machine Learning Frameworks: Experience with TensorFlow, PyTorch, Keras, or scikit-learn.
Managed multiple projects simultaneously, coordinating with teams to ensure project goals were met
on schedule.
Managed BigQuery datasets for large-scale data analysis, reducing SQL query times by 50% through
optimized table partitioning and clustering.
Collaborated with engineering teams to ensure seamless integration of BigQuery with internal data
sources, improving data availability for analytics.
Operate RAN in cellular structure dividing wide areas into a small cells and serves data of single base
station in a arrange.
Create more flexible vendor-agnostic networks by using ORAN standards and data software
components.
DATA SCIENTIST- Davita, Denver,CO Nov 2019 -Apr 2022
Applied statistical analysis to derive actionable insights from data.
Developed predictive models for various business applications using Python.
Hands-on experience with deep learning frameworks like TensorFlow and Keras.
Strong knowledge of Machine learning algorithms and libraries such as Scikit-Learn.
Collaborated with cross-functional teams to deploy Auto-Sklearn-based models in real-world
applications.
Employed Auto-Sklearns ensemble techniques to improve model robustness and generalization.
Developed a ML logistic regression model in Jupyter Notebook to predict unrated text star ratings on a
5-point scale.
Strong ability to communicate complex findings in a clear and concise manner through data
storytelling.
Proven track record of translating technical analysis into actionable business recommendations.
Analyzing and creating Monte - Carlo type simulations and probabilistic approaches that
resulted from the concerning ML - NLP problems by establishing them from theory to application using
MATLAB, R, Python, and other programming languages.
Expertise with AI implementations and experience using LLMs, GPT-3, GPT-4.
Skilled in data preprocessing and feature engineering using Sage Maker Processing jobs, ensuring high-
quality input for model training.
Designed and developed medium to large scale BI solutions on Azure using Azure Data
Platform services (Azure Data Lake, Data Factory, Azure Storage Explorer, Logic Apps, Azure SQL DW,
HDInsight, Azure Databricks, Azure Key vault, API Connections.
Led the Advanced Variant Configuration (AVC) product modeling efforts within SAP S/4 HANA, enabling
customized product offerings based on customer-specific configurations while optimizing the supply
chain and pricing models.
Proficient in implementing computer vision and Natural Language Processing (NLP) tasks.
Utilized Python libraries like Pandas and NumPy for efficient data cleaning and manipulation.
Familiar with Google Clouds Big Data Services, including Big Query and Dataflow, for handling large
datasets with Python.
Developed a machine learning model using LSTM (Long Short-Term Memory) to predict stock prices
based on historical data.
Built and tested machine learning models (Logistic Regression, SVM) for fraud detection, increasing
accuracy by 40%.
Published research on generative AI advancements in peer-reviewed journals or conferences,
contributing to the knowledge base in the field.
. Automated Data Pipelines in SAS Viya, reducing manual data processing time by [X]% and ensuring
real-time analytics availability.
. Collaborated with Cross-Functional Teams to gather requirements and deliver customized, scalable
analytics solutions using SAS Viya, leading to [X]% improvement in project delivery times.
Developed and implemented data governance frameworks to ensure data integrity and compliance
with regulatory standards (e.g., GDPR, HIPAA).
Data Handling: Strong understanding of SQL, NoSQL, and big data tools (e.g., Spark, Hadoop).
Cloud Platforms: Familiarity with AWS, Google Cloud, or Microsoft Azure for deploying ML models.
Model Evaluation: Expertise in using metrics like accuracy, precision, recall, RMSE, or AUC-ROC for
performance evaluation.
Utilized SAP HANA and SQL to query and manipulate large datasets, ensuring high performance for
AVC-related analytics (e.g., pricing, BOM, order management).
Utilized GCP's Pub/Sub for managing real-time event data streams, integrating with Dataflow for
processing and analysis.
DATA ANALYST- Elliott Hospitals, Manchester Dec 2018 -Oct 2019
Extensive experience in data manipulation and analysis using Python libraries such as Pandas, NumPy,
and SciPy.
Proficient in data visualization techniques with Matplotlib and Seaborn to effectively communicate
insights.
Utilized H20. AIs AutoML Functionality for automating model selection, hyperparameter tuning, and
feature engineering.
Experienced in handling missing values, outliers, and duplicate data for improved data quality.
Leveraged Python to interact with Amazon Redshift, automating data pipelines.
Optimized SQL queries for Big Query to improve query performance.
Experienced in customizing Matplotlib for publication-quality graphics.
Skilled in using Cloud Functions and Cloud Run for serverless Python application development.
Implemented the Agile Scrum methodology and Test-Driven Development (TDD) for application
development.
Implemented new cloud management platforms from AWS, OpenStack, and Google Cloud Platform.
Responsible for MySQL database administration and maintenance over Linux servers throughout the
project lifecycle.
Worked on Microservices for Continuous Delivery environment using Docker and Jenkins.
DATA ANALYST-Systek, LLC, Prime Vendor Sterling, VA Aug 2015-Sept 2018
Engaged in various stages of the Software Development Life Cycle (SDLC), encompassing activities such
as
requirements gathering, modeling, analysis, design, development, testing, and monitoring.
Participated in Data Preprocessing, conducted Exploratory Data Analysis, and constructed Machine
Learning models to identify customer churn.
Developed AWS Lambda services using Python and executed load testing with Python Locust.
Implemented data. Analysis and visualization using NumPy, Pandas, and Matplotlib.
Successfully created a machine learning model for predicting customer churn with a high level of
accuracy. Designed and developed automation frameworks, implementing test automation using
Python pytest.
Utilized AWS deployment services to swiftly deploy applications. Conducted Proof-Of-Concept creation
for Machine.
Learning algorithms (Random Forest, XG Booster) using Python.
Took charge of designing databases, creating tables, and composing complex SQL queries and stored
procedures in
alignment with project requirements.
Followed agile software development practices, including paired programming, test-driven
development, and active.
participation in scrum status meetings.
Contributed to unit integration, bug fixing, and acceptance testing with the creation of comprehensive
test cases.
Engaged in code reviews to ensure code quality and adherence to established standard
EDUCATION
Bachelor’s in pharmacy, Nalla Narasimha Reddy Group Of Institutions- 2009- 2012.
Masters in Master of Studies in Information Sciences (MSIS)- 2013-2015.
I came to the USA on an F1 and got the H1b in 2022. This is the 1st extension.