Summer Internship Report ON: "Data Analytics"
Summer Internship Report ON: "Data Analytics"
ON
“DATA ANALYTICS”
Submitted in partial fulfilment of the requirements for the
award of degree of
BACHELOR’S OF COMPUTER APPLICATIONS
To
UTTRANCHAL SCHOOL OF COMPUTING SCIENCES
UTTRANCHAL UNIVERSITY, DEHRADUN
(Session:2024-2025)
I would like to express my gratitude to everyone who helped me during the internship. I
am grateful to the company (Qlik) for giving me the opportunity to do the internship and
to my mentor for his guidance and support. I would also like to thank my colleagues for
their help and cooperation.
Primarily I would like to thank Dr. Sonal Sharma, Dean USCS for providing a healthy
and encouraging environment to study.
I profusely thankful to the department of computing science, and Dr. Sameer Dev
Sharma, HOD, and my project mentor Ms. Rashmi Kuksal. I am also thankful to my
mentor Mr. Akshay Rawat sir for guiding me throughout the internship period. He has
been generous enough to provide me an opportunity and accepting my candidate for the
most valuable guidance and affordable treatment given to us at every stage to boost my
morale.
Priya Bhatt
UU2209000265
ii
DECLARATION
I hereby declare that this report is an original work done by me during my Summer Internship
Program (SIP) in Data Analytic at Qlik for a duration of 8 weeks. This report is a genuine
representation of my project and does not contain any plagiarized or copyrighted material.
I have not submitted this report to any other institution or organization, and I am responsible
for any errors or omissions in this report. I have ensured that the information presented is
accurate and true to the best of my knowledge. I have conducted this project independently,
and all the data, results, and conclusions presented in this report are based on my own research
and experimentation.
I understand the importance of academic integrity and have adhered to the guidelines and rules
set by Qlik and my institution. I am aware of the consequences of plagiarism and have taken
necessary precautions to avoid it. I have properly cited all the sources used in this report,
including books, journals, articles, and online resources, and have given due credit to the
original authors and creators.
I also declare that I have not received any unauthorized assistance or guidance during the
preparation of this report. I have followed the instructions and guidelines provided by my
supervisor and mentors, and I have maintained a record of all the meetings, discussions, and
feedback received during the internship period.
Date Signature
iii
CERTIFICATE OF INTERNSHIP
iv
CERTIFICATE OF ORIGINALITY
This is to certify that the internship entitled “Data Analytics” by Priya Bhatt has been
submitted in the partial fulfilment of the requirements for the award of the degree of
BCA from Uttaranchal University Dehradun. The result embodied in this project have
not been submitted to any other University or Institution for the record of any Degree.
Assistant Professor
v
TABLE OF CONTENTS
1 INTRODUCTION 2
*Overview:* Qlik is a global leader in data analytic and business intelligence solutions. The
company provides innovative software and tools that empower businesses to transform raw
data into actionable insights. With a focus on user-friendly interfaces and powerful analytic
capabilities, Qlik supports organizations in making data-driven decisions to enhance
performance and achieve strategic goals
Qlik’s product portfolio includes Qlik Sense, QlikView, and Qlik Data Catalyst, among others.
These tools facilitate data integration, visualization, and analysis, enabling businesses to
uncover hidden trends and patterns. Qlik’s solutions are used by companies in various
industries, including health care, finance, retail, and manufacturing.
Qlik’s mission is to simplify the complex process of data analysis, making it accessible to users
at all levels of expertise. The company emphasizes the importance of data literacy, providing
training and resources to help users develop their data skills. Qlik’s commitment to innovation
and customer satisfaction has established it as a trusted partner in the data analytic space.
1
INTRODUCTION
The Data Analytics Certificate Exam preparation course, offered by Qlik, is designed to equip
participants with the essential skills and knowledge required to excel in the field of data
analytics. This comprehensive eight-week program covers a wide range of topics, from
foundational concepts to advanced analytical techniques, ensuring that participants are
wellprepared for the certification exam and capable of applying their skills in practical settings.
The course is structured to provide a balanced mix of theoretical knowledge and hands-on
practice. Each week focuses on a specific aspect of data analytics, building on previous lessons
to create a cohesive learning experience. Participants will engage with real-world datasets, use
state-of-the-art analytics tools, and develop their ability to interpret and communicate analytical
results effectively.
This report outlines the detailed curriculum of the eight-week preparation course, highlighting
the key topics covered each week and the expected learning outcomes. By the end of the course,
participants will have a solid understanding of data management principles, foundational and
advanced analytics techniques, and best practices for interpreting and visualizing data. This
comprehensive approach ensures that participants are not only prepared to pass the certification
exam but also to apply their knowledge in professional scenarios, driving value for their
organizations through effective data analysis.
2
Week 1: Data Foundations
Topics Covered:
Back-End: Data Architect:
The role of a data architect is crucial in designing and managing databases that support robust
data analytic. Participants will learn about the responsibilities of a data architect, including data
modelling, database design, and the implementation of data storage solutions. The course will
cover best practices for creating scalable and efficient database architectures that can handle
large volumes of data and support complex queries. The curriculum will also delve into various
database management systems (DBMS), such as MySQL, PostgreSQL, and NoSQL databases
like MongoDB, and their use cases in different business scenarios.
Database Design:
Database design is a critical skill for any data analyst. This section will introduce the principles
of database design, including the concepts of relational databases, schema design, and entity-
relationship modelling. Participants will learn how to design databases that are both efficient
and easy to maintain, ensuring data integrity and consistency. The course will also cover
advanced topics such as indexing, database partitioning, and the use of data.
Data Basics:
Understanding the basics of data is essential for any data analyst. This section will cover the
different types of data, including ordinal, cardinal, and nominal data. Participants will learn
how to identify and classify data types, which is fundamental to choosing the appropriate
analytical techniques and tools. Additionally, the course will explore the differences between
quantitative and qualitative data, and how to handle each type in data analysis processes.
Participants will also be introduced to meta-data and its importance in managing and
understanding data.
3
unstructured, and semi-structured data, and how to handle them in analytical processes. This
section will include practical exercises on transforming and preparing data for analysis, using
tools like ETL (Extract, Transform, Load) processes and data wrangling techniques.
Normalization/Optimization:
Normalization is a technique used to organize data in a database to reduce redundancy and
improve data integrity. This section will introduce the different forms of normalization (1NF,
2NF, 3NF, BCNF) and their benefits. Participants will also learn about database optimization
techniques to enhance query performance and efficiency. The curriculum will include case
studies demonstrating the impact of normalization on database performance and real-world
scenarios where denormalization might be preferred for read-heavy applications.
Big Data:
Big data refers to extremely large data sets that cannot be easily managed or analyzed using
traditional data processing techniques. This section will provide an overview of big data
concepts, including the characteristics of big data (volume, velocity, variety), and the tools and
technologies used to handle big data, such as Hadoop and Spark. Participants will explore the
Hadoop ecosystem, including HDFS (Hadoop Distributed File System) and MapReduce, as
well as the role of Spark in providing faster in-memory processing. The course will also discuss
the challenges and opportunities presented by big data and how organizations can leverage big
data analytics to gain competitive advantages.
Learning Outcomes:
By the end of Week 1,
4
Week 2: Foundational Analytic
Topics Covered:
Aggregations:
Aggregations are fundamental techniques for summarizing and analyzing data. Participants will
learn how to perform various types of aggregations, such as sum, average, count, min, and max,
using SQL and other data analysis tools. The course will also cover group by operations and
the use of aggregation functions to derive meaningful insights from data.
Distribution Analysis:
Distribution analysis involves understanding the spread and pattern of data points within a data
set. This section will cover different types of distributions (normal, skewed, uniform) and their
properties. Participants will learn how to create and interpret histograms, frequency
distributions, and density plots to analyze data distributions effectively. The course will also
discuss the implications of different distribution shapes on statistical analysis and
decisionmaking.
Standard Deviation:
Standard deviation is a measure of the dispersion or spread of data points around the mean.
This section will explain the concept of standard deviation and its importance in data analysis.
Participants will learn how to calculate standard deviation and interpret its value to assess the
variability of data. The course will also cover related concepts such as variance and the
coefficient of variation, providing a comprehensive understanding of data dispersion.
Probability:
Probability is the foundation of statistical analysis and decision-making. This section will cover
basic probability concepts, including probability rules, conditional probability, and Bayes'
theorem. Participants will learn how to calculate and interpret probabilities to make informed
decisions based on data. The course will also discuss probability distributions (binomial,
normal, Poisson) and their applications in data analysis.
Sampling:
Sampling is a technique used to select a representative subset of data from a larger population.
This section will cover different sampling methods (random, stratified, cluster) and their
advantages and disadvantages. Participants will learn how to design and implement sampling
strategies to ensure accurate and reliable results. The course will also discuss sample size
determination, sampling bias, and the implications of sampling errors on data analysis.
I learn how to Perform aggregations and distribution analysis to summarize data. Calculate
Distinguish between signal and noise in data and apply techniques to mitigate the effects of
noise.
Build and interpret decision tree models for classification and regression tasks.
Understand Good hart's Law and its implications for metric design and decision-making.
7
Week 3: Interpretation of Analytic
Topics Covered:
Hypothesis Testing:
Hypothesis testing is a statistical method used to make inferences about population parameters
based on sample data. This section will cover the steps involved in hypothesis testing, including
formulating null and alternative hypotheses, selecting appropriate test statistics, and
determining significance levels. Participants will learn about different types of hypothesis tests,
such as z-tests, t-tests, and chi-square tests, and how to interpret the results.
Visualization Interpretation:
Effective data visualization is essential for communicating analytical results. This section will
teach participants how to create and interpret various types of visualizations, such as bar charts,
line graphs, scatter plots, and heat maps. Participants will learn best practices for designing
clear and informative visualizations that accurately represent the data and support decision-
making. The course will also cover common pitfalls in data visualization, such as misleading
scales and improper use of colors.
Descriptive Statistics:
Descriptive statistics provide a summary of the main features of a data set. This section will
cover measures of central tendency (mean, median, mode) and measures of dispersion (range,
variance, standard deviation). Participants will learn how to calculate and interpret these
8
statistics to gain a quick overview of the data. The course will also discuss the use of descriptive
statistics in exploratory data analysis (EDA) and data summarization.
Bias:
Bias in data analysis can lead to incorrect conclusions and poor decision-making. This section
will explore different types of bias, such as selection bias, confirmation bias, and measurement
bias. Participants will learn how to identify and mitigate bias in their analyses to ensure accurate
and reliable results. The course will include case studies demonstrating the impact of bias on
data analysis and decision-making.
Histogram/Box Plots:
Histograms and box plots are effective tools for visualizing the distribution and spread of data.
This section will teach participants how to create and interpret histograms to understand the
frequency distribution of data. Participants will also learn how to use box plots to visualize the
spread, central tendency, and outliner in the data. The course will cover best practices for
creating these visualizations and interpreting their results.
Inferential Statistics:
Inferential statistics are used to make inferences about a population based on sample data. This
section will introduce participants to key concepts in inferential statistics, including confidence
intervals, margin of error, and hypothesis tests. Participants will learn how to use inferential
statistics to draw conclusions and make predictions based on sample data. The course will also
cover common inferential statistical tests, such as t-tests and ANOVA.
Learning Outcomes:
By the end of Week 3,
I learn how to Conduct hypothesis tests and understand Type I and Type II errors.
9
Week 4:Advanced Analytic
Topics Covered:
K-Means Clustering:
K-means clustering is a popular unsupervised learning technique used for data segmentation.
This section will introduce participants to the fundamentals of the k-means algorithm, including
selecting the number of clusters (k), initializing centroids, assigning data points to the nearest
centroids, and updating centroid positions iteratively. Participants will apply kmeans clustering
to identify patterns and group similar data points based on features. The course will cover
practical considerations like choosing the appropriate number of clusters using the elbow
method and interpreting clustering results. Hands-on exercises will involve using tools like
Python's scikit-learn library or R's cluster package to perform k-means clustering on real-world
datasets.
Markov Analysis:
Markov analysis is used to model and analyze systems that transition from one state to another
based on certain probabilities. This section will cover the basics of Markov chains, including
states, transitions, and steady-state probabilities. Participants will learn to model and predict
system behavior over time. The course will explore applications of Markov analysis in fields
like finance (modeling stock price movements) and marketing (customer journey analysis).
Exercises will involve building and interpreting Markov models using tools like Python's
numpy and scipy libraries or specialized software like MATLAB.
10
Text Mining and Sentiment Analysis:
Text mining involves extracting useful information from unstructured text data. This section
will cover text prepossessing techniques such as tokenization, stemming, and removing stop
words. Participants will convert text data into numerical formats using methods like Term
Frequency-Inverse Document Frequency (TF-IDF) and word embedding. The course will also
cover sentiment analysis, teaching participants to classify text data based on sentiment using
machine learning algorithms like Naive Bayes and Support Vector Machines. Hands-on
projects will apply these techniques to data sets like social media posts and customer reviews.
Learning Outcomes:
By the end of Week 4,
I Understand and apply k-means clustering techniques for data segmentation, including
practical considerations for choosing the number of clusters and interpreting results.
Perform Markov analysis to model and predict system behavior, with practical applications in
various fields.
Utilize advanced statistical techniques like PCA and Factor Analysis for dimensionality
reduction and implement Time Series Analysis methods for forecasting and trend analysis.
Conduct text mining and sentiment analysis using appropriate prepossessing techniques and
machine learning algorithms, with hands-on experience on real-world data sets.
11
Week 5: Advanced Analytic II
Topics Covered:
Regression Modeling:
Regression modeling is a powerful tool for understanding relationships between variables and
making predictions. This section covers various regression techniques, including linear
regression, multiple regression, and logistic regression. I’ll guide participants in building,
interpreting, and validating regression models, as well as using them for prediction and
analysis. We’ll delve into practical exercises using regression software and tools, ensuring
participants gain hands-on experience. This section will also cover how to handle issues such
as multicollinearity and heteroscedasticity in regression models.
T-Test:
The t-test is a statistical test used to compare the means of two groups. We’ll explore different
types of t-tests (independent, paired, one-sample), their assumptions, and how to conduct and
interpret the results. Participants will learn to determine when each type of t-test is appropriate,
supported by practical examples and exercises using statistical software. This section will help
participants understand the importance of sample size and power in hypothesis testing.
Chi-Square Test:
The chi-square test assesses the association between categorical variables. Participants will
learn to conduct chi-square tests of independence, calculate chi-square statistics, and interpret
results. We’ll cover how to analyze contingency tables and test hypotheses about categorical
data, with hands-on exercises using statistical software. This section will also explore
goodness-of-fit tests and their applications in real-world scenarios.
AB Testing:
A/B testing is a method used to compare two versions of a product or process to determine
which one performs better. I’ll guide participants in designing and implementing A/B tests,
selecting appropriate metrics, randomizing subjects, and analyzing results. We’ll cover the
12
entire process from hypothesis formulation to result interpretation, supported by practical
examples and exercises. Participants will learn to apply A/B testing in various contexts, such
as marketing campaigns and website optimization.
Learning Outcomes:
By the end of Week 5,
Design and implement effective A/B tests to compare different versions of products or
processes.
13
Week 6: Advanced Analytic III
Topics Covered:
Algorithms:
Algorithms are the foundation of data analytic and machine learning. We’ll explore key
algorithms used in data analysis, including sorting and searching algorithms, clustering
algorithms, and optimization algorithms. Participants will learn about the principles behind
these algorithms, their applications, and how to implement them using programming languages
such as Python or R.
Learning Outcomes:
By the end of Week 6,
I Understand and implement key data analysis algorithms, recognizing their strengths and
limitations.
Perform ANOVA to compare means across multiple groups, interpreting the results effectively.
14
Topics
Topics Covered:
Real-World Data Projects:
Participants will apply the skills learned throughout the course to real-world data projects. We’ll
work on case studies and project-based learning, using data sets from various industries such
as health care, finance, retail, and manufacturing. Participants will go through the entire data
analysis process, from data collection and cleaning to analysis and interpretation, using the
tools and techniques covered in the course to derive insights and make data-driven decisions.
Capstone Project:
The capstone project is an opportunity for participants to demonstrate their understanding and
application of data analytic concepts. Participants will choose a topic of interest, formulate a
research question or business problem, and conduct a comprehensive data analysis project. This
project will involve collecting and preparing data, applying appropriate analytical techniques,
and presenting the findings in a well-structured report. Participants will receive feedback on
their projects from instructors and peers, ensuring they gain valuable insights and refine their
skills.
Learning Outcomes:
By the end of Week 7,
Complete a comprehensive capstone project that demonstrates their ability to conduct and
present data analysis.
15
Topics
Topics Covered:
Review of Key Concepts:
During Week 8, I focused on reviewing all the key concepts covered throughout the course.
This comprehensive review helped me consolidate my understanding of critical topics such as
database design, data types, statistical analysis, hypothesis testing, regression modeling,
clustering, and data visualization. The summary notes and key takeaways provided clear and
concise information, making it easier to refresh my memory on each subject.
Mock Exams:
Taking mock exams was a crucial part of my preparation. These exams simulated the actual
certification exam experience, covering all the topics we studied over the past eight weeks. The
mock exams helped me gauge my understanding and identify areas where I needed further
review.
Exam Strategies:
We also covered effective exam strategies, which boosted my confidence and readiness. I
learned valuable tips on time management, question interpretation, and efficient study
techniques. Understanding the format and structure of the certification exam was particularly
helpful; it alleviated any anxiety about what to expect and how to approach different types of
questions. These strategies equipped me with the tools to tackle the exam confidently and
effectively.
Learning Outcomes:
By the end of Week 8,
The thorough review, mock exams, and exam strategies provided a solid foundation for success.
I was confident in my ability to apply the concepts learned and perform well on the exam.
16
Conclusion
Reflecting on the eight-week Data Analytic Certificate Exam preparation course by Qlik, I
realize how much I have learned and grown as a data analyst. The course offered a
comprehensive and structured learning experience, covering everything from foundational
concepts to advanced analytic techniques.
Starting with the basics, I gained a solid understanding of data architecture, database design,
data normalization, and optimization. As we moved into foundational analytic, I learned about
aggregation, distribution analysis, correlation and causation, probability, and decision tree
modeling. The hands-on exercises and practical applications helped me see the realworld
relevance of these concepts.
In the advanced analytic sections, I delved into k-means clustering, Markov analysis, regression
modeling, t-tests, chi-square tests, and A/B testing. I particularly enjoyed learning about
advanced statistical techniques and machine learning algorithms, as they opened up new
possibilities for analyzing complex data sets and making data-driven decisions.
Week 7’s integration and application projects allowed me to apply my skills to real-world data
sets, reinforcing my learning and building my confidence. The capstone project was a highlight,
as it enabled me to demonstrate my understanding and present a comprehensive analysis of a
topic, I was passionate about.
Week 8's review and exam preparation solidified my knowledge and equipped me with the
strategies needed to excel in the certification exam. The mock exams and detailed feedback
were invaluable in identifying areas for improvement and ensuring I was ready for the
challenge.
Overall, this course has been an incredibly enriching experience. I now feel confident in my
ability to apply data analytic techniques in professional scenarios, make data-driven decisions,
and drive value for my organization. The balanced approach of combining theoretical
knowledge with hands-on practice ensured that I not only understood the concepts but also
knew how to implement them effectively. Qlik’s expertise in data analytic and business
intelligence has provided me with high-quality training that will support my career growth and
success in the field of data analytic.
17
References
1. Qlik. (2023). Qlik Sense: A Comprehensive Guide to Data Analytics. Retrieved from
[Qlik Website](https://www.qlik.com)
2. Smith, J. (2022). Database Design and Implementation. Data Science Journal, 15(3),
45-67.
3. Johnson, L., & Lee, A. (2021). Statistical Methods for Data Analysis. Journal of
Statistics, 29(4), 102-119.
6. Anderson, R., & Green, T. (2021). Hypothesis Testing and Experimental Design.
Journal of Applied Statistics, 18(2), 56-78.
7. Martin, S. (2020). Big Data and Its Impact on Business. Business Analytics Review,
12(3), 34-50.
8. Garcia, L., & Nelson, P. (2019). Understanding Goodhart's Law in Modern Analytics.
Data Science Insights, 9(4), 22-41.
9. Jackson, D., & Harris, B. (2020). Machine Learning Algorithms: A Beginner's Guide.
Machine Learning Journal, 6(1), 14-28.
10. Evans, M., & Brown, J. (2022). Real-World Applications of K-Means Clustering.
Data Mining Journal, 11(2), 79-93.
18