100% found this document useful (1 vote)
2K views

Data Analytics Quantum

Uploaded by

tejasnigam2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
100% found this document useful (1 vote)
2K views

Data Analytics Quantum

Uploaded by

tejasnigam2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 148
; @ Quantum, peries, nin QUANTUM Series * Topic-wise coverage of entire syllabus in Question-Answer form. * Short Questions (2 Marks) QUANTUM SERIES For B.Tech Students of Third Year of All Engineering Colleges Affiliated to Dr. A.P.J. Abdul Kalam Technical University, Uttar Pradesh, Lucknow (Formerly Uttar Pradesh Technical University) Data Analytics By Aditya Kumar ™ ZS Quantum — Page — QUANTUM PAGE PVT. LTD. Ghaziabad mg New Delhi PUBLISHED BY: Apram Singh Quantum Publications® (A Unit of Quantum Page Pvt. Ltd.) Plot No. 59/2/7, Site - 4, Industrial Area, Sahibabad, Ghaziabad-201 010 Phone : 0120- 4160479 Email: [email protected] Website: www.quantumpage.co.in Delhi Office : 1/6590, East Rohtas Nagar, Shahdara, Delhi-110032 © Att Ricuts Reservep No part of this publication may be reproduced or transmitted, |, in any form or by any means, without permission. Information contained in this work is derived from sources believed to be reliable. Every effort has been made to ensure accuracy, however neither the publisher nor the authors guarantee the accuracy or completeness of any information published herein, and neither the publisher nor the authors shall be responsible for any errors, omissions, or damages arising out of use of this information. Data Analytics (CS : Sem-5 and IT : Sem-6) 1* Edition : 2020-21 Price: Rs. 55/- only Printed Version : e-Book. UNIT-1 : INTRODUCTION TO DATA ANALYTICS (1-1 J to 1-20 J) Introduction to Data Analytics: Sources and nature of data, classification of data (structured, semi-structured, unstructured), characteristics of data, introduction to Big Data platform, need of data analytics, evolution of analytic scalability, analytic process and tools, analysis vs reporting, modern data analytic tools, applications of data analytics. Data Analytics Lifecycle” Need, key roles for successful analytic projects, various phases of data analytics lifecycle — discovery, data preparation, model planning, model building, communicating results, operationalization UNIT-2 : DATA ANALYSIS (2-1 J to 2-28 J) Regression modeling, multivariate analysis, Bayesian modeling, inference and Bayesian networks, support vector and kernel methods, analysis of time series: linear systems analysis & nonlinear dynamics, rule induction, neural networks: learning and generalisation, competitive learning, principal component analysis and neural networks, fuzzy logic: extracting fuzzy models from data, fuzzy decision trees, stochastic search methods UNIT-3 : MINING DATA STREAMS. (3-1 J to 3-20 J) Introduction to streams concepts, stream data model and architecture, stream computing, sampling data in a stream, filtering streams, counting distinct elements in a stream, estimating moments, counting oneness in a window, decaying window, Real-time Analytics Platform (RTAP) applications, Case studies — real time sentiment analysis, stock market predictions. UNIT-4 : FREQUENT ITEMSETS & CLUSTERING (4-1 J to 4-28 J) Mining frequent itersets, market based modelling, Apriori algorithm, handling large data sets in main memory, limited pass algorithm, counting frequent itemsets in a stream, clustering techniques: hierarchical, K-means, clustering high dimensional data, CLIQUE and ProCLUS, frequent pattern based clustering methods, clustering in non-euclidean space, clustering for streams & parallelism. UNIT-5 : FRAME WORKS & VISUALIZATION (5-1 J to 5-30 J) Frame Works and Visualization: MapReduce, Hadoop, Pig, Hive, HBase, MapR, Sharding, NoSQL Databases, $3, Hadoop Distributed File Systems, Visualization: visual data analysis techniques, interaction techniques, systems and applications. Introduction to R -R graphical user interfaces, data import and export, attribute and data types, descriptive statistics, exploratory data analysis, visualization before analysis, analytics for unstructured data SHORT QUESTIONS (SQ-1 J to SQ-15 J) Gis QUANTUM Series US LAR Ls CE Ucn MAEM RST the big picture without spending ned n ec For Semester - 5 Quantum Series is the complete one-stop os solution for engineering student looking for (Computer Science & Engineering a simple yet effective guidance system for I Information Technology) core engineering subject. Based on the needs of students and catering to the + Database Management System requirements of the syllabi, this series uniquely addresses the way in which + Design and Analysis of Algorithm concepts are tested through university examinations. The essy to comprehend Compiler Design question answer form adhered to by the + Web Technology books in this series is suitable and recommended for student. The students are Departmental Electiv able to effortlessly grasp the concepts and . ie ideas discussed in their course books with Data Analytics the help of this series. The solved question + Computer Graphics papers of previous years act as a additional advantage for students to comprehend the + Object Oriented System Design paper pattem, and thus anticipate and Departmental Elective! prepare for examinations accordingly, The coherent manner in which the books in * Machine Leaming Techniques this series present new ideas and concepts * Application of Soft Computing to students makes this series play an essential role in the preparation for + Human Computer Interface university exarninations. The detailed and Commom Non Credit Course (NC) comprehensive discussions, easy to . __| understand examples, objective questions + Constitution of India, Law & Engineering) and ample exercises, all aid the students to a understand everything in an all-inclusive + Indian Tradition, Culture & Society pies mee ‘© Topic-wise coverage in Question-Answer form. © The perfect assistance for scoring good marks. Clears: course fundamentals. Good for brush up before exams. Includes solved deal for sel Quantum Publications* ™ (A Unit of Quantum Page Pvt. Ltd.) 4 Piot No. 59/2/7, Site-4, Industrial Area, Sahibabad, Ghaziabad, 201010, (U.P) Phane: 0120-416047: bah E-mail: pagequantum@>gmail.com Web: www quantumpage.coin Hii) Find us on: facebook.com/quantumseriesofticial Data Analytics (KCS-051) ‘Course Outcome (CO) Bloom’s Knowledge Level (KL) A the end of course , the student will be able to : col co2 Describe the life cycle phases of Data Analytics through discovery, planning and building. ‘Understand and apply Data Analysis Techniques. KIK2 E23 cos Implement various Data streams. KB co4 Understand item sets, Clustering, fame works & Visualizations KD cos ‘Apply R tool for developing and evaluating real time applications. K3.KS.K6 DETAILED SYLLABUS 3.00 Unit Topic Proposed Lecture Introduction to Data Analytics: Sources and nature of data, classification of data (structured, semi-structured, unstructured), characteristics of data, introduction to Big Data platform, need of data analytics, evolution of analytic scalability, analytic process and tools, analysis vs reporting, modern data analytic tools, applications of data analytics. Data Analytics Lifecycle: Need, key roles for successfull analytic projects, various phases of data analytics lifecycle — discovery, data preparation, model planning, model building, commmnicating results, operationalization. 08. u Data Analysis: Regression modeling, mulfivariate analysis, Bayesian modeling, inference and Bayesian networks, support vector and kernel methods, analysis of time series: linear systems analysis & nonlinear dynamics, rule induction, neural networks: learning and generalisation, competitive learning, principal component analysis and neural networks, fuzzy logic: extracting fuzzy models from data, fuzzy decision trees, stochastic search methods. 08, m Mining Data Streams: Introduction to streams concepts, stream data model and architecture, stream computing, sampling data in a stream, filtering streams, counting distinct elements in a stream, estimating moments, counting oneness in a window, decaying window, Real-time Analytics Platform ( RTAP) applications, Case studies — real time sentiment analysis, stock market predictions. 08, Vv Frequent Itemsets and Clustering: Mining frequent itemsets, market based modelling, Apriori algorithm, handling large data sets in main memory, limited pass algorithm, counting frequent itemsets in a stream, clustering techniques: hierarchical, K-means, clustering high dimensional data, CLIQUE and ProCLUS, frequent pattern based clustering methods, clustering in non-euclidean space, clustering for streams and parallelism. 08, v Frame Works and Visualization: MapReduce, Hadoop. Pig. Hive, HBase, MapR, Sharding, NoSQL Databases, $3, Hadoop Distributed File Systems, Visualization: visual data analysis techniques, interaction techniques. systems and applications. Introduction to R - R graphical user interfaces, data import and export, attribute and data types, descriptive statistics, exploratory data analysis, visualization before analysis, analytics for unstructured data. 08, Teat books and References: Michael Berthold, David J. Hand, Intelligent Data Analysis, Springer ‘Anand Rajaraman and Jeffrey David Ullman, Mining of Massive Datasets, Cambridge University Press. Bill Franks, Taming the Big Data Tidal wave: Finding Opportunities in Huge Data Streams with Advanced Analytics, John Wiley & Sons. Michael Minelli, Michelle Chambers, and Ambiga Dhiraj, "Big Data, Big Analytics: Emerging Business L 3 4. 10. 1 12. 13 14. 15 16. . David Dietrich, Barry Heller, Beibei Yan Tntelligence and Analytic Trends for Today's Businesses", Wiley ‘Data Science and Big Data Analytics”, EMC Education Series, John Wiley - Frank J Oblhorst, ig Data Analytics: Turning Big Data into Big Money”, Wiley and SAS Business Series . Colleen Mecue, “Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis”, Elsevier Michael Berthold, David J. Hand,” Intelligent Data Analysis”, Springer Paul Zikopoulos, Chris Eaton, Paul Zikopoulos, “Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data”, McGraw Hill Trevor Hastie, Robert Tibshirani, Jerome Friedman, "The Elements of Statistical Learning", Springer ‘Mark Gardner, “Beginning R: The Statistical Programming Language”, Wrox Publication Pete Warden, Big Data Glossary, O’Reilly Glenn J. Myatt, Making Sense of Data, John Wiley & Sons Pete Warden, Big Data Glossary, O'Reilly. . Peter Biihlmann, Petros Drineas, Michael Kane, Mark van der Laan, "Handbook of Big Data", CRC Press Jiawei Han, Micheline Kamber “Data Mining Concepts and Techniques”, Second Edition, Elsevier Data Analyties 1-1 J (CS-5/IT-6) UNIT Introduction to Data Analytics CONTENTS Part-1 Part-2 Part-3 Part-4 Part-5 Introduction of Data Analytics : 1-2J to 1-53 Sources and Nature of Data, Classification of Data (Structured, Semi-Structured, Unstructured), Characteristics of Data Introduction to Big Data ....... ... 1-53 to 1-65 Platform, Need of Data Analytics Evolution of Analytic 00.00.0202... 1-6J to 1-133 Scalability, Analytic Process and Tools, Analysis Vs Reporting, Modern Data Analytic Tools, Applications of Data Analysis Data Analytics Lifecycle a 1-133 to 1-175 Need, Key Roles for Successful Analytic Projects, Various Phases of Data Analytic Life Cyele : Discovery, Data Preparations Model Planning, Model ........ 1-17J to 1-205 Building, Communicating Results, Operationalization J Introduction to Data Analytics 1-2J (CS-5/IT-6) PART-1 Introduction To Data Analytics : Sourees and Nature of Data, Classification of Data (Structured, Semi-Structured, Unstructured), Characteristics of Data. Long Answer Type and Medium Answer Type Questions Que 1.1. | What is data analytics ? Answer 1. Data analytics is the science of analyzing raw data in order to make conclusions about that information. Ww Any type of information can be subjected to data analytics techniques to get insight that can be used to improve things. 3. Data analytics techniques can help in finding the trends and metrics that would be used to optimize processes to increase the overall efficiency of a business or system. 4. Many of the techniques and processes of data analytics have been automated into mechanical processes and algorithms that work over raw data for human consumption. 5. For example, manufacturing companies often record the runtime, downtime, and work queue for various machines and then analyze the data to better plan the workloads so the machines operate closer to peak capacity. Que 1.2. | Explain the source of data (or Big Data). Answer Three primary sources of Big Data are: 1. Social data: a, Social data comes from the likes, tweets & retweets, comments, video uploads, and general media that are uploaded and shared via social media platforms. b. This kind of data provides invaluable insights into consumer behaviour and sentiment and can be enormously influential in marketing analytics. Data Analyties 1-3 J (CS-5/IT-6) com The public web is another good source of social data, and tools like Google trends can be used to good effect to increase the volume of big data. 2. Machine data: 3. a. Machine data is defined as information which is generated by industrial equipment, sensors that are installed in machinery, and even web logs which track user behaviour. This type of data is expected to grow exponentially as the internet of things grows ever more pervasive and expands around the world. Sensors such as medical devices, smart meters, road cameras, satellites, games and the rapidly growing Internet of Things will deliver high velocity, value, volume and variety of data in the very near future Transactional data : a b. Transactional data is generated from all the daily transactions that take place both online and offline. Invoices, payment orders, storage records, delivery receipts are characterized as transactional data. Que 1.3. | Write short notes on classification of data. Answer Unstructured data: 1. a b. C. Unstructured data is the rawest form of data. Data that has no inherent structure, which may include text documents, PDFs, images, and video. This data is often stored in a repository of files. Structured data: a. Structured data is tabular data (rows and columns) which are very well defined. Data containing a defined data type, format, and structure, which may include transaction data, traditional RDBMS, CSV files, and simple spreadsheets. Semi-structured data: a Textual data files with a distinct pattern that enables parsing such as Extensible Markup Language [XML] data files or JSON. A consistent format is defined however the structure is not very strict. Semi-structured data are often stored as files. Introduction to Data Analytics 144.4 (CS-5/IT-6) Que 1.4. | Differentiate between structured, semi-structured an unstructured data. Answer Properties | Structured data | Semi-structured | Unstructured data data Technology | Itis based on Itisbasedon XML/|It is based o Relational database | RDF character and table. binary data. [Transaction | Matured transaction | Transaction is No transactio management} and various adapted from management an coneurrency DBMS. no coneurrency. techniques Flexibility Itis schema It is more flexible |It very flexible an dependent and less/than structured|there is absence of flexible. data but less than|schema. flexible than unstructured data. Scalability | It is very difficult to |It is more scalable|It is very scalable] scale database | than structured schema. data. Query Structured query | Queries over Only textual query lperformance| allow complex anonymous nodes|are possible. joining. are possible. Que 1.5. | Explain the characteristics of Big Data. Answer Big Data is characterized into four dimensions : 1. Volume: a. Volume is concerned about scale of data i.e., the volume of the data at which it is growing. b. The volume of datais growing rapidly, due to several applications of business, social, web and scientific explorations. 2. Velocity: a. The speed at which data is increasing thus demanding analysis of streaming data. b. The velocity is due to growing speed of business intelligence applications such as trading, transaction of telecom and banking domain, growing number of internet connections with the increased usage of internet etc. Data Analyties 1-5 J (CS-5/IT-6) 3. 4, Variety : It depicts different forms of data to use for analysis such as structured, semi structured and unstructured. Veracity: a. Veracity is concerned with uncertainty or inaccuracy of the data. b. Inmany cases the data will be inaccurate hence filtering and selecting the data which is actually needed is a complicated task. e. Alot of statistical and analytical process has to go for data cleansing for choosing intrinsic data for decision making. PART-2 Introduction to Big Data Platform, Need of Data Analytics. Questions-Answers Long Answer Type and Medium Answer Type Questions Que 1.6. | write short note on big data platform. Answer L be Big data platform is a type of IT solution that combines the features and capabilities of several big data application and utilities within a single solution. It is an enterprise class IT platform that enables organization in developing, deploying, operating and managing a big data infrastructure! environment. Big data platform generally consists of big data storage, servers, database, big data management, business intelligence and other big data management utilities. It also supports custom development, querying and integration with other systems. The primary benefit behind a big data platform is to reduce the complexity of multiple vendors/ solutions into a one cohesive solution. Big data platform are also delivered through cloud where the provider provides an all inclusive big data solutions and services Que 1.7. ] What are the features of big data platform ? Introduction to Data Analytics 1-6 J (CS-5/IT-6) Answer Features of Big Data analytics platform : 1. Big Data platform should be able to accommodate new platforms and tool based on the business requirement. It should support linear scale-out. It should have capability for rapid deployment. It should support variety of data format Platform should provide data analysis and reporting tools. It should provide real-time data analysis software. Ao Fe WON It should have tools for searching the data through large data sets. Que 1.8. | Why there is need of data analytics ? Answer Need of data analytics : 1. It optimizes the business performance. 2. It helps to make better decisions. 3. Tt helps to analyze customers trends and solutions PART-3 Evolution of Analytic Scalability, Analytic Process and Tools, Analysis vs Reporting, Modern Data Analytic Tools, Applications of Data Analysis. Questions-Answers Long Answer Type and Medium Answer Type Questions Que 1.9. | What are the steps involved in data analysis ? Answer Steps involved in data analysis are: 1. Determine the data: a. The first step is to determine the data requirements or how the data is grouped. b. Data may be separated by age, demographic, income, or gender. ce. Data values may be numerical or be divided by category. Data Analyties 1-7 J (CS-5/IT-6) 2. » Collection of data: a. The second step in data analytics is the process of collecting it. b. This can be done through a variety of sources such as computers, online sources, cameras, environmental sources, or through personnel. Organization of data: a. Third step is to organize the data. b. Once the datais collected, it must be organized so it can be analyzed. e. Organization may take place on a spreadsheet or other form of software that can take statistical data. Cleaning of data: a. In fourth step, the datais then cleaned up before analysis. b. This means it is scrubbed and checked to ensure there is no duplication or error, and that it is not incomplete. ce. This step helps correct any errors before it goes on to adata analyst to be analyzed. Que 1.10. ] write short note on evolution of analytics scalability. Answer L i] In analytic scalability, we have to pull the data together in a separate analytics environment and then start performing analysis. —— —_——oas> TOOT = —— tt The heavy processing occurs |_| in the analytic environment Analytic server or PC Analysts do the merge operation on the data sets which contain rows and columns. <_ The columns represent information about the customers such as name, spending level, or status. In merge or join, two or more data sets are combined together. They are typically merged / joined so that specific rows of one data set or table are combined with specific rows of another. Introduction to Data Analytics 1-8 J (CS-5/IT-6) 5. Analysts also do data preparation. Data preparation is made up of joins, aggregations, derivations, and transformations. In this process, they pull data from various sources and merge it all together to create the variables required for an analysis. 6. Massively Parallel Processing (MPP) system is the most mature, proven, and widely deployed mechanism for storing and analyzing large amounts of data. = An MPP database breaks the data into independent pieces managed by independent storage and central processing unit (CPU) resources. 00 GB 100 GB 100 GB 100 GB 100 GB Chunks | | Chunks | | Chunks | | Chunk Chunks 1 terabyte Cd table 100 GB 100 GB 100 GB 100 GB 100 GB Chunks || Chunks | | Chunks | | Chunks | | Chunks A traditional database 10 Simultaneous 100-GB queries will query a one terabyte one row at time. Fig. 1.10.1. Massively Parallel Processing system data storage. 8. MPP systems build in redundancy to make recovery easy. 9. MPP systems have resource management tools : a. Manage the CPU and disk space b. Query optimizer Que 1.11. ] write short notes on evolution of analytic process. Answer 1. With increased level of scalability, it needs to update analytic processes to take advantage of it. w This can be achieved with the use of analytical sandboxes to provide analytic professionals with a scalable environment to build advanced analytics processes. One of the uses of MPP database system is to facilitate the building and deployment of advanced analytic processes. 4. An analytic sandbox is the mechanism to utilize an enterprise data warehouse. 5. If used appropriately, an analytic sandbox can be one of the primary drivers of value in the world of big data. Analytical sandbox : 1. An analytie sandbox provides a set of resources with which in-depth analysis can be done to answer critical business questions. Data Analyties 1-9 J (CS-5/IT-6) 2. ae a An analytic sandbox is ideal for data exploration, development of analytical processes, proof of concepts, and prototyping. Once things progress into ongoing, user-managed processes or production processes, then the sandbox should not be involved Asandbox is going to be leveraged by a fairly small set of users. There will be data created within the sandbox that is segregated from the production database. Sandbox users will also be allowed to load data of their own for brief time periods as part of a project, even if that datais not part of the official enterprise data model. Que 1.12. | Explain modern data analytic tools. Answer Modern data analytic tools : 1 Apache Hadoop : a. Apache Hadoop, a big data analytics tool which is a Java based free software framework. b. It helps in effective storage of huge amount of data in a storage place known as a cluster. ec. It runs in parallel ona cluster and also has ability to process huge data across all nodes in it. d. There isa storage system in Hadoop popularly known as the Hadoop Distributed File System (HDF), which helps to splits the large volume of data and distribute across many nodes present in a cluster. KNIME: a. KNIME analytics platform is one of the leading open solutions for data-driven innovation. b. This tool helps in discovering the potential and hidden in a huge volume of data, it also performs mine for fresh insights, or predicts the new futures. OpenRefine: a. OneRefine tool is one of the efficient tools to work on the messy and large volume of data. b. It includes cleansing data, transforming that data from one format another. ec. Ithelps to explore large data sets easily. Orange: a. Orange is famous open-source data visualization and helps in data analysis for beginner and as well to the expert. Introduction to Data Analytics 1-10 J (CS-5/IT-6) Q b. This tool provides interactive workflows with a large toolbox option to create the same which helps in analysis and visualizing of data. RapidMiner: a. RapidMiner tool operates using visual programming and also it is much capable of manipulating, analyzing and modeling the data. b. RapidMiner tools make data science teams easier and productive by using an open-source platform for all their jobs like machine learning, data preparation, and model deployment. R-programming : a. Risa free open source software programming language and a software environment for statistical computing and graphics. b. Itisused by data miners for developing statistical software and data analysis. ec. It has become a highly popular tool for big data in recent years. Datawrapper: a. Itis an online data visualization tool for making interactive charts. b. It uses data file ina esv, pdf or excel format. e. Datawrapper generate visualization in the form of bar, line, map etc. It can be embedded into any other website as well. Tableau : a. Tableauis another popular big data tool. It issimple and very intuitive to use. b. It communicates the insights of the data through data visualization. Through Tableau, an analyst ean check a hypothesis and explore the data before starting to work on it extensively. ue 1.13. | What are the benefits of analytic sandbox from the view Ea of an analytic professional ? Answer | Benefits of analytic sandbox from the view of an analytic professional : 1 Independence : Analytic professionals will be able to work independently on the database system without needing to continually go back and ask for permissions for specific projects. Flexibility : Analytic professionals will have the flexibility to use whatever business intelligence, statistical analysis, or visualization tools that they need to use. Efficiency: Analytic professionals will be able to leverage the existing enterprise data warehouse or data mart, without having to move or migrate data. Data Analyties 1-11 J (CS-5/IT-6) 4. a Freedom: Analytic professionals can reduce focus on the administration of systems and production processes by shifting those maintenance tasks to IT. Speed : Massive speed improvement will be realized with the move to parallel processing. This also enables rapid iteration and the ability to “fail fast” and take more risks to innovate. Que 1.14. ] What are the benefits of analytic sandbox from the view of IT? Answer Benefits of analytic sandbox from the view of IT : 1 i pS ” Centralization : IT will be able to centrally manage a sandbox environment just as every other database environment on the system is managed. Streamlining: A sandbox will greatly simplify the promotion of analytic processes into production since there will be a consistent platform for both development and deployment. Simplicity: There will be no more processes built during development that have to be totally rewritten to run in the production environment. ‘Control : IT will be able to control the sandbox environment, balancing sandbox needs and the needs of other users. The production environment is safe from an experiment gone wrong in the sandbox. ‘Costs : Big cost savings can be realized by consolidating many analytic data marts into one central system. Que 1.15. | Explain the application of data analytics. Answer Answer | Application of data analytics : L yp Security : Data analytics applications or, more specifically, predictive analysis has also helped in dropping crime rates in certain areas. Transportation : a. Data analytics can be used to revolutionize transportation. b. It can be used especially in areas where we need to transport a large number of people to a specific area and require seamless transportation. Risk detection: a. Many organizations were struggling under debt, and they wanted a solution to problem of fraud. b. They already had enough customer data in their hands, and so, they applied data analytics. Introduction to Data Analytics 1-12 J (CS-5/IT-6) ce. They used ‘divide and conquer’ policy with the data, analyzing recent expenditure, profiles, and any other important information to understand any probability of a customer defaulting. 4. Delivery: a. Several top logistic companies are using data analysis to examine collected data and improve their overall efficiency. b. Using data analytics applications, the companies were able to find the best shipping routes, delivery time, as well as the most cost- efficient transport means. 5. Fast internet allocation : a. While it might seem that allocating fast internet in every area makes a city ‘Smart’, in reality, it is more important to engage in smart allocation. This smart allocation would mean understanding how bandwidth is being used in specific areas and for the right cause. b. Itis alsoimportant to shift the data allocation based on timing and priority. It is assumed that financial and commercial areas require the most bandwidth during weekdays, while residential areas require it during the weekends. But the situation is much more complex. Data analytics can solve it. ce. For example, using applications of data analysis, a community can draw the attention of high-tech industries and in such cases; higher bandwidth will be required in such areas 6. Internet searching : a. When we use Google, we are using one of their many data analytics applications employed by the company. b. Most search engines like Google, Bing, Yahoo, AOL etc., use data analytics. These search engines use different algorithms to deliver the best result for a search query. Digital advertisement : a. Data analytics has revolutionized digital advertising. ns b. Digital billboards in cities as well as banners on websites, that is, most of the advertisement sources nowadays use data analytics using data algorithms. Que 1.16. | What are the different types of Big Data analytics ? Answer Different types of Big Data analytics : 1. Descriptive analytics : a. Itusesdata aggregation and data mining to provide insight into the past. Data Analyties 1-13 J (CS-5/IT-6) b. Descriptive analytics describe or summarize raw data and make it interpretable by humans. 2. Predictive analytics: a. It uses statistical models and forecasts techniques to understand the future. b. Predictive analytics provides companies with actionable insights based on data. It provides estimates about the likelihood ofa future outcome. 3. Prescriptive analytics : a. Ituses optimization and simulation algorithms to advice on possible outcomes. b. Itallows users to “prescribe” a number of different possible actions and guide them towards a solution. 4. Diagnostic analytics : a. It is used to determine why something happened in the past. b. Itis characterized by techniques such as drill-down, data discovery, data mining and correlations. ec. Diagnostic analytics takes a deeper look at data to understand the root causes of the events. PART-4 Data Analytics Life Cycle : Need, Key Roles For Successful Analytic Projects, Various Phases of Data Analytic Life Cycle : Discovery, Data Preparations. Questions-Answers Long Answer Type and Medium Answer Type Questions Que 1.17. | Explain the key roles for asuccessful analytics projects. Answer Key roles for a successful analytics project : 1. Business user: a. Business user is someone who understands the domain area and usually benefits from the results. b. This person can consult and advise the project team on the context of the project, the value of the results, and how the outputs will be operationalized. Introduction to Data Analytics 1-14 J (CS-5/IT-6) e Usually a business analyst, line manager, or deep subject matter expert in the project domain fulfills this role. 2. Project sponsor : a Project sponsor is responsible for the start of the project and provides all the requirements for the project and defines the core business problem. Generally provides the funding and gauges the degree of value from the final outputs of the working team. This person sets the priorities for the project and clarifies the desired outputs. 3. Project manager : Project manager ensures that key milestones and objectives are met on time and at the expected quality. 4. Business Intelligence Analyst : a Analyst provides business domain expertise based on a deep understanding of the data, Key Performance Indicators (KPIs), key metrics, and business intelligence from a reporting perspective. Business Intelligence Analysts generally create dashboards and reports and have knowledge of the data feeds and sources. 5. Database Administrator (DBA): a DBA provisions and configures the database environment to support the analytics needs of the working team. These responsibilities may include providing access to key databases or tables and ensuring the appropriate security levels are in place related to the data repositories. 6. Data engineer: Data engineer have deep technical skills to assist with tuning SQL queries for data management and data extraction, and provides support for data ingestion into the analytic sandbox. 7. Datascientist : a. Data scientist provides subject matter expertise for analytical techniques, data modeling, and applying valid analytical techniques to given business problems. They ensure overall analytics objectives are met. They designs and executes analytical methods and approaches with the data available to the project. Que 1.18. ] Explain various phases of data analytics life cycle. Answer Various phases of data analytic lifecycle are : Phase 1: Discovery: Data Analyties 1-15 J (CS-5/IT-6) 1 In Phase 1, the team learns the business domain, including relevant history such as whether the organization or business unit has attempted similar projects in the past from which they can learn. The team assesses the resources available to support the project in terms of people, technology, time, and data. Important activities in this phase include framing the business problem as an analytics challenge and formulating initial hypotheses (IHs) to test and begin learning the data. Phase 2 : Data preparation: L Phase 2 requires the presence of an analytic sandbox, in which the team can work with data and perform analytics for the duration of the project. The team needs to execute extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox. Data should be transformed in the ETL process so the team can work with it and analyze it. In this phase, the team also needs to familiarize itself with the data thoroughly and take steps to condition the data. Phase 3 : Model planning: L bo Phase 3 is model planning, where the team determines the methods, techniques, and workflow it intends to follow for the subsequent model building phase. The team explores the data to learn about the relationships between variables and subsequently selects key variables and the most suitable models. Phase 4: Model building: L we In phase 4, the team develops data sets for testing, training, and production purposes. In addition, in this phase the team builds and executes models based on the work done in the model planning phase. The team also considers whether its existing tools will be adequate for running the models, or if it will need a more robust environment for executing models and work flows. Phase 5: Communicate results : L i] In phase 5, the team, in collaboration with major stakeholders, determines if the results of the project are a success or a failure based on the criteria developed in phase 1 The team should identify key findings, quantify the business value, and develop a narrative to summarize and convey findings to stakeholders. Phase 6 : Operationalize : L In phase 6, the team delivers final reports, briefings, code, and technical documents.

You might also like