0% found this document useful (0 votes)
9 views

DATA Analysis

Data analysis on mba notes

Uploaded by

sumayyagafar77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

DATA Analysis

Data analysis on mba notes

Uploaded by

sumayyagafar77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

UNDERSTANDING DATA - DATA are individual facts, statistics, applications.

cations.  Using tools for data cleaning will make for


or items of information, often numeric. In a more technical more efficient business practices and quicker .
sense, data are a set of values of qualitative or quantitative
variables about one or more persons or objects. data are DATA PREPARATION - Data preparation is the process of
sometimes said to be transformed into information when cleaning and transforming raw data prior to processing and
they are viewed in context or in post-analysis. However, in analysis. It is an important step prior to processing and often
academic treatments of the subject data are simply units of involves reformatting data, making corrections to data and INFORMATION SYSTEMS IN MODERN DAY BUSINESS - In
information. Data are used in scientific research, businesses the combining of data sets to enrich data. Data preparation is today’s continuously changing and fast moving world, where
management (e.g., sales data, revenue, profits, stock price), often a lengthy undertaking for data professionals or business customers’ requirements and preferences are always
finance, governance (e.g., crime rates, unemployment rates, users, but it is essential as a prerequisite to put data in evolving, the only businesses that can hope to remain
literacy rates), and in virtually every other form of human context in order to turn it into insights and eliminate bias competitive and continue to function at the performance
organizational activity (e.g., censuses of the number of resulting from poor data quality. For example, the data levels that can match their customers’ expectations are those
homeless people by non-profit organizations). Data are preparation process usually includes standardizing data that are going to embrace innovation. In the recent past, any
measured, collected, reported, and analyzed, and used to formats, enriching source data, and/or removing outliers. business success has been pegged on the information
create data visualizations such as graphs, tables or images. DATA PREPARATION STEPS -  Gather data: The data technology quality that the business has employed and the
Data as a general concept refers to the fact that some existing preparation process begins with finding the right data. This capability to correctly use such information. Information
information or knowledge is represented or coded in some can come from an existing data catalog or can be added ad- systems (IS) importance has increased dramatically, and most
form suitable for better usage or processing. Raw data hoc.  Discover and assess data: After collecting the data, it is businesses have been prompted to introduce it to keep their
("unprocessed data") is a collection of numbers or characters important to discover each dataset. This step is about getting competitive edge. Today, nobody can envisage a business
before it has been "cleaned" and corrected by researchers. to know the data and understanding what has to be done without an effective information system. Introduction of an
Raw data needs to be corrected to remove outliers or obvious before the data becomes useful in a particular context. information system to a business can bring numerous
instrument or data entry errors (e.g., a thermometer reading Discovery is a big task, but Talend’s data preparation platform benefits and assist in the way the business handles its
from an outdoor Arctic location recording a tropical offers visualization tools which help users profile and browse external and internal processes that a business encounters
temperature). TYPES OF DATA -  Qualitative Types - their data.  Cleanse and validate data: Cleaning up the data daily and decision making for the future. Some of the
Qualitative or Categorical Data describes the object under is traditionally the most time consuming part of the data benefits or importance of an information system include: - 
consideration using a finite set of discrete classes. It means preparation process, but it’s crucial for removing faulty data New Products and Services: Any company looking to improve
that this type of data can’t be counted or measured easily and filling in gaps. Important tasks here include: o Removing and secure the future has to establish a broader perspective
using numbers and therefore divided into categories. The extraneous data and outliers. o Filling in missing values. o with the use of a well-designed and coordinated information
gender of a person (male, female, or others) is a good Conforming data to a standardized pattern. o Masking private system. The IS makes it easier to analyze independent
example of this data type. These are usually extracted from or sensitive data entries. Once data has been cleansed, it processes such as information to produce valuable products
audio, images, or text medium. Another example can be of a must be validated by testing for errors in the data preparation or services and organized work activities. -  Information
smart phone brand that provides information about the process up to this point. Often times, an error in the system Storage: Every organization needs records of its activities to
current rating, the color of the phone, category of the phone, will become apparent during this step and will need to be find the cause of problems and proper solutions. Information
and so on. All this information can be categorized as resolved before moving forward.  Transform and enrich systems come in handy when it comes to storing operational
Qualitative data. There are two subcategories under this: - data: Transforming data is the process of updating the format data, communication records, documents, and revision
Nominal: These are the set of values that don’t possess a or value entries in order to reach a welldefined outcome, or histories. Manual data storage will cost the company lots of
natural ordering. Let’s understand this with some examples. to make the data more easily understood by a wider time, especially when it comes to searching for specific data.
The color of a smartphone can be considered as a nominal audience. Enriching data refers to adding and connecting data -  Easier Decision Making: Without an information system, a
data type as we can’t compare one color with others. It is not with other related information to provide deeper insights.  company can take a lot of time and energy in the decision
possible to state that ‘Red’ is greater than ‘Blue’. -- Ordinal: Store data: Once prepared, the data can be stored or making process. However, with the use of IS, it’s easier to
These types of values have a natural ordering while channeled into a third party application—such as a business deliver all the necessary information and model the results
maintaining their class of values. If we consider the size of a intelligence tool—clearing the way for processing and analysis and this can help you make better decisions. -  Behavioral
clothing brand then we can easily sort them according to to take place. INFORMATION -- The term ‘Information, is Change: Employers and employees can communicate rapidly
their name tag in the order of small < medium < large.  difficult to define precisely although its properties and effects and more effectively with an information system. While
Quantitative Types - This data type tries to quantify things are observed in all walks of life. The usage of information has emails are quick and effective, the use of Information systems
and it does by considering numerical values that make it given it different namings. The dictionary meaning of this is more efficient since documents are stored in folders that
countable in nature. The price of a Smartphone, discount term ‘Knowledge’ ‘Intelligence’ ‘a facts’ ‘data’ ‘a message’ ‘a can be shared and accessed by employees. IMPORTANCE OF
offered, number of ratings on a product, the frequency of signal’ which is transmitted by the act or process of INFORMATION PROCESSING IN MANAGEMENT -
processor of a Smartphone, or ram of that particular phone, communication. Information is a fact, thought or data Information processing is crucial for effective management. In
all these things fall under the category of Quantitative data conveyed or described through various mediums, like written, today's information age, information is a valuable resource
types. The key thing is that there can be an infinite number of oral, visual and audio communications. It is knowledge shared that can give businesses a competitive advantage. Managers
values a feature can take. For instance, the price of a or obtained through study, instruction, investigation or news need to process information effectively to make informed
Smartphone can vary from x amount to any value and it can and you share it through the act of communicating, whether decisions, allocate resources efficiently, and plan for the
be further broken down based on fractional values. The two verbally, nonverbally, visually, or through written word. future. - Here are some reasons why information processing
subcategories which describe them clearly are:- Discrete: Information has different names, including intelligence, is essential in management: - Decision-making: Managers
The numerical values which fall under are integers or whole message, data, signal or fact. Knowing what type of need to make decisions quickly and accurately. Processing
numbers are placed under this category. The number of information you need or how to share it can help you save information helps them to identify problems, analyze data,
speakers in the phone, cameras, cores in the processor, the time, stay organized and establish best practices for divulging and choose the best course of action. Without effective
number of sims supported all these are some of the examples information. STRUCTURED VS. UNSTRUCTURED DATA - Data information processing, managers may make decisions based
of the discrete data type. - Continuous: The fractional is the lifeblood of business, and it comes in a huge variety of on incomplete or inaccurate information, which can lead to
numbers are considered as continuous values. These can take formats — everything from strictly formed relational poor outcomes. - Resource allocation: Managers need to
the form of the operating frequency of the processors, the databases to your last post on Facebook. All of that data, in all allocate resources such as staff, time, and money effectively.
android version of the phone, Wi-Fi frequency, temperature different formats, can be sorted into one of two categories:- Information processing helps them to identify where
of the cores, and so on.  Structured Data - Structured data is structured and unstructured data. Structured vs. unstructured resources are needed most, track resource usage, and adjust
data that has been predefined and formatted to a set data can be understood by considering the who, what, when, resource allocation as needed. This can help businesses to
structure before being placed in data storage, which is often where, and the how of the data: -  Who will be using the optimize their use of resources and improve their bottom
referred to as schema-on-write. The best example of data?  What type of data are you collecting?  When does line. - Planning: Managers need to plan for the future and
structured data is the relational database: the data has been the data need to be prepared, before storage or when used? anticipate changes in the market. Effective information
formatted into precisely defined fields, such as credit card  Where will the data be stored?  How will the data be processing helps them to analyze trends, forecast future
numbers or address, in order to be easily queried with SQL.  stored? - These five questions highlight the fundamentals of demand, and develop strategies to capitalize on opportunities
Unstructured Data - Unstructured data is data stored in its both structured and unstructured data, and allow general or mitigate risks. - Communication: Managers need to
native format and not processed until it is used, which is users to understand how the two differ. They will also help communicate with employees, stakeholders, and customers
known as schema-on-read. It comes in a myriad of file users understand nuances like semistructured data, and guide effectively. Information processing helps them to gather,
formats, including email, social media posts, presentations, us as we navigate the future of data in the cloud. organize, and present information in a clear and concise
chats, IoT sensor data, and satellite imagery. DATA CLEANING CHARACTERISTICS OF INFORMATION -  Subjectivity: The manner. This can help to ensure that everyone has the same
- Data cleaning is the process of fixing or removing incorrect, value and usefulness of information are highly subjective, understanding of the situation and can work together
corrupted, incorrectly formatted, duplicate, or incomplete because what is information for one person may not be for towards a common goal.
data within a dataset. When combining multiple data sources, another.  Relevance: Information is good only if it is
there are many opportunities for data to be duplicated or relevant - that is, pertinent and meaningful to the decision
mislabeled. If data is incorrect, outcomes and algorithms are maker.  Timeliness: Information must be delivered at the
unreliable, even though they may look correct. There is no right time and the right place to the right person. 
one absolute way to prescribe the exact steps in the data Accuracy: Information must be free of errors, because
cleaning process because the processes will vary from dataset erroneous information can result in poor decisions and erode
to dataset. Data cleaning is the process that removes data the confidence of users.  Correct information format:
that does not belong in your dataset. Data transformation is Information must be in the right format to be useful to the
the process of converting data from one format or structure decision maker.  Completeness: Information is said to be
into another. Transformation processes can also be referred complete if the decision maker can satisfactorily solve the
to as data wrangling, or data munging, transforming and problem at hand using that information.  Accessibility:
mapping data from one "raw" data form into another format Information is useless if it is not readily accessible to decision
for warehousing and analyzing. BENEFITS OF DATA CLEANING makers, in the desired format, when it is needed.
-  Removal of errors when multiple sources of data are at
play.  Fewer errors make for happier clients and less-
frustrated employees.  Ability to map the different functions
and what your data is intended to do.  Monitoring errors
and better reporting to see where errors are coming from,
making it easier to fix incorrect or corrupt data for future
“racking and stacking”—hardware setup, software patching, Migration Service), SMS (Server Migration Service), Snowball.
and other time-consuming IT management chores. Cloud - Storage: Amazon Glacier, Amazon Elastic Block Store (EBS),
computing removes the need for many of these tasks, so IT AWS Storage Gateway. - Security Services: IAM (Identity and
teams can spend time on achieving more important business Access Management), Inspector, Certificate Manager, WAF
goals. 5. Performance: The biggest cloud computing services (Web Application Firewall), Cloud Directory, KMS (Key
run on a worldwide network of secure datacenters, which are Management Service), Organizations, Shield, Macie,
regularly upgraded to the latest generation of fast and GuardDuty. - Database Services: Amazon RDS, Amazon
efficient computing hardware. This offers several benefits DynamoDB, Amazon ElastiCache, Amazon RedShift. -
ONLINE DATA STORAGE - Online data storage is a virtual over a single corporate datacenter, including reduced Analytics: Athena, CloudSearch, ElasticSearch, Kinesis,
storage approach that allows users to use the Internet to network latency for applications and greater economies of QuickSight., EMR (Elastic Map Reduce), Data Pipeline. -
store recorded data in a remote network. This data storage scale. Management Services: CloudWatch, CloudFormation,
method may be either a cloud service component or used CloudTrail, OpsWorks, Config, Service Catalog, AWS Auto
with other options not requiring on-site data backup. Online 6. Reliability: Cloud computing makes data backup, disaster Scaling, Systems Manager, Managed Services.
data storage uses Internet channels to store information on recovery and business continuity easier and less expensive
remote servers kept secure by service providers. Explore the because data can be mirrored at multiple redundant sites on Internet of Things: IoT Core, IoT Device Management, IoT
benefits and challenges of using online data storage, with the cloud provider’s network. 7. Security: Many cloud Analytics, Amazon FreeRTOS. - Application Services: Step
many famous companies as examples. It can cost a company providers offer a broad set of policies, technologies and Functions, SWF (Simple Workflow Service), SNS (Simple
a lot of money to store data on-site, and every day someone controls that strengthen your security posture overall, helping Notification Service), SQS (Simple Queue Service), Elastic
loses their entire family photo album. Online data storage is a protect your data, apps and infrastructure from potential Transcoder, Deployment and Management, AWS CloudTrail,
virtual storage model that lets users and businesses upload threats. TYPES OF CLOUD COMPUTING -  Public cloud: Amazon CloudWatch, AWS CloudHSM. - Developer Tools:
their data across Internet channels to a remote data network. Public clouds are owned and operated by a thirdparty cloud CodeStar, CodeCommit, CodeBuild, CodeDeploy,
Data is stored in the cloud, or stored on servers that are not service providers, which deliver their computing resources CodePipeline, Cloud9. - Mobile Services: Mobile Hub,
owned by the person using them. Other users can also access like servers and storage over the Internet. Microsoft Azure is Cognito, Device Farm, AWS AppSync. Cloud based services
the same infrastructure. It's like a cloud in that we can all see an example of a public cloud. With a public cloud, all offered by Google - Compute: App Engine, Compute Engine,
the same cloud in the sky but no single individual owns it, yet hardware, software and other supporting infrastructure is Google Cloud VMware Engine (GCVE). - Storage: Cloud
we can each access it. Unlike a USB drive, external hard drive, owned and managed by the cloud provider. You access these Storage, Persistent Disk, Cloud Filestor, Cloud Storage for
or flash drive, users do not need to carry around a physical services and manage your account using a web browser. Firebase. - Databases: Cloud Bigtable, Datastore, Firestore,
device to store their data. They just have to remember a Learn more about the public cloud. -  Private cloud: A Memorystore, Cloud Spanner, Cloud SQL. - Networking:
password and trust the security of the service provider. private cloud refers to cloud computing resources used Cloud CDN, Cloud DNS, Cloud IDS (Cloud Intrusion Detection
Online storage is a viable option for data backup: it provides exclusively by a single business or organisation. A private System), Cloud Interconnect, Cloud Load Balancing, Cloud
both security and convenience to the end user. Small cloud can be physically located on the company’s on-site NAT (Network Address Translation), Cloud Router, Cloud VPN,
businesses and individuals may not have the network datacenter. Some companies also pay third-party service Google Cloud Armor, Google Cloud Armor Managed
bandwidth or the resources to maintain a strong on-site providers to host their private cloud. A private cloud is one in Protection Plus, Network Connectivity Center, Network
storage and retrieval system. Further, online storage may which the services and infrastructure are maintained on a Intelligence Center, Network Service Tiers, Service Directory,
alleviate the need to have physical backups of the data: the private network. Learn more about the private cloud. -  Traffic Director, Virtual Private Cloud. - Operations: Cloud
storage solution provider may already do this at their data Hybrid cloud: Hybrid clouds combine public and private Debugger, Cloud Logging, Cloud Monitoring, Cloud Profiler,
centers. Advantages - Data storage saving / World Wide clouds, bound together by technology that allows data and Cloud Trace. - Developer Tools: Artifact Registry, Container
accessibility / Data safety / Security / Easy sharing / Data applications to be shared between them. By allowing data Registry, Cloud Build, Cloud Source Repositories, Firebase Test
recovery / Automatic backup . Disadvantages - Improper and applications to move between private and public clouds, Lab, Google Cloud Deploy. - Data Analytics: BigQuery,
handing can cause trouble / Choose trustworthy source to a hybrid cloud gives your business greater flexibility, more Cloud Composer, Cloud Data Fusion, Cloud Life Sciences
avoid any hazard / Internet connection. RELEVANCE OF deployment options and helps optimise your existing (formerly Google Genomics), Data Catalog, Dataplex,
ONLINE DATA PROCESSING -  Ease of making reports - Since infrastructure, security and compliance. Learn more about Dataflow, Datalab, Dataproc, Dataproc Metastore,
data is already processed, it can be obtained and used the hybrid cloud. USES OF CLOUD COMPUTING -  Create Datastream, Pub/Sub. - User Protection Services: reCAPTCHA
directly. These processed facts and figures can be arranged cloud-native applications: Quickly build, deploy and scale Enterprise, Web Risk API. - Serverless Computing: Cloud Run,
appropriately such that it that helps executives in making applications—web, mobile and API. Take advantage of cloud- Cloud Functions, Cloud Functions for Firebase, Cloud
quick analysis. Pre-defined reports help professionals in native technologies and approaches, such as containers, Scheduler, Cloud Tasks, Eventarc, Workflows. Cloud based
making reports speedily. -  Accuracy and speed - Kubernetes, microservices architecture, API-driven services offered by IBM - Compute Infrastructure — includes
Digitization helps to process the information quickly. communication and DevOps. -  Test and build applications: its bare metal servers (singletenant servers that are highly
Thousands of files can be processed in a minute, storing Reduce application development cost and time by using cloud customizable), virtual servers, GPU computing, POWER
required information from each. During business data infrastructures that can easily be scaled up or down. -  servers (based on IBM’s POWER architecture) and server
processing, the system itself checks for and takes care of Store, back up and recover data: Protect your data more software. - Compute Services — includes OpenWhisk
invalid data or errors. Such processes thus help companies costefficiently—and at massive scale—by transferring your serverless computing, containers and Cloud Foundry
ensure a high accuracy in information management. -  data over the Internet to an offsite cloud storage system that runtimes. - Storage — includes object, block and file
Cost reduction - The cost of digitized processing is much is accessible from any location and any device. -  Analyze storage, as well as serverbackup capabilities. - Network —
lesser than that of managing and maintaining paper data: Unify your data across teams, divisions and locations in includes load balancing, Direct Link private secure
documents. It decreases expenditure on stationery such as the cloud. Then use cloud services, such as machine learning connections, network appliances, content delivery network
photo copies and mailing by using digital information and and artificial intelligence, to uncover insights for more and domain services. - Mobile — includes IBM’s Swift tools
email system. Companies can thus save millions of dollars informed decisions. -  Stream audio and video: Connect for creating iOS apps, its MobileFirst Starter package for
every year by improving their data management systems. - with your audience anywhere, anytime, on any device with getting a mobile app up and running, and its Mobile
 Easy storage – Online data processing helps to increase the high-definition video and audio with global distribution. -  Foundation app back-end services. - Watson — includes
storage space for adding, managing and modifying Embed intelligence: Use intelligent models to help engage IBM’s artificial intelligence and machine learning services,
information. By eliminating unnecessary paperwork, it customers and provide valuable insights from the data which it calls “cognitive computing,” such as Discovery search
minimizes clutter and also improves search efficiency by captured. -  Deliver software on demand: Also known as and content analytics, Conversation natural language services
elimination the need to go through data manually. CLOUD software as a service (SaaS), on-demand software lets you and speech-to-text. - Data and analytics — includes data
COMPUTING - Cloud computing is the on-demand offer the latest software versions and updates around to services, analytics services, big data hosting, Cloudera
availability of computer system resources, especially data customers—anytime they need, anywhere they are. What hosting, MongoDB hosting and Riak hosting. - Internet of
storage (cloud storage) and computing power, without direct advantages do organizations have in adopting cloud storage Things — includes IBM’s IoT platform and its IoT starter
active management by the user. Large clouds often have and cloud computing. - Scalability: Cloud services allow packages. - Security — includes tools for securing cloud
functions distributed over multiple locations, each location organizations to easily scale their storage and computing environments, such as a firewall, hardware security modules
being a data center. Cloud computing relies on sharing of resources up or down as needed. This means that they can (physical devices with key management capabilities), Intel
resources to achieve and economies of scale, typically using a quickly respond to changes in demand without having to Trusted Execution Technology, security software and SSL
"pay-as-you-go" model which can help in reducing capital invest in costly hardware or infrastructure. - Cost savings: certificates. - DevOps — includes the Eclipse IDE,
expenses but may also lead to unexpected operating Using cloud services can be more cost-effective than continuous delivery tools and availability monitoring. -
expenses for unaware users. Simply put, cloud computing is maintaining and upgrading in-house systems. Organizations Application services — includes Blockchain, Message hub and
the delivery of computing services—including servers, can avoid the costs of hardware, software, and maintenance, business rules, among others. E-COMMERCE APPLICATIONS -
storage, databases, networking, software, analytics, and and only pay for the resources they actually use. -  Retail and Wholesale: Ecommerce has numerous
intelligence—over the Internet (“the cloud”) to offer faster Flexibility: Cloud services can be accessed from anywhere applications in this sector. E-retailing is basically a B2C, and in
innovation, flexible resources, and economies of scale. You with an internet connection, making it easier for employees some cases, a B2B sale of goods and services through online
typically pay only for cloud services you use, helping lower to work remotely or collaborate with partners and customers stores designed using virtual shopping carts and electronic
your operating costs, run your infrastructure more efficiently in different locations. - Security: Cloud providers often have catalogs. A subset of retail ecommerce is mcommerce, or
and scale as your business needs change. BENEFITS OF extensive security measures in place to protect against data mobile commerce, wherein a consumer purchases goods and
CLOUD COMPUTING - 1. Cost: Cloud computing eliminates breaches, which can be more effective than what individual services using their mobile device through the mobile
the capital expense of buying hardware and software and organizations can implement on their own. - Improved optimized site of the retailer. -  Online Marketing: This
setting up and running on-site data centers —the racks of reliability: Cloud providers typically have redundant systems refers to the gathering of data about consumer behaviors,
servers, the round-the-clock electricity for power and cooling, and backup mechanisms in place, which can provide greater preferences, needs, buying patterns and so on. It helps
the IT experts for managing the infrastructure. It adds up fast. reliability and ensure that data and services are always marketing activities like fixing price, negotiating, enhancing
2. Speed: Most cloud computing services are provided self available. - Faster innovation: Cloud providers often offer product features, and building strong customer relationships
service and on demand, so even vast amounts of computing the latest technologies and software, allowing organizations as this data can be leveraged to provide customers a tailored
resources can be provisioned in minutes, typically with just a to innovate more quickly and stay ahead of the competition. and enhanced purchase experience. -  Finance: Banks and
few mouse clicks, giving businesses a lot of flexibility and Cloud based services offered by Amazon - Amazon Web other financial institutions are using e-commerce to a
taking the pressure off capacity planning. 3. Global scale: The Services offers a wide range of different business purpose significant extent. Customers can check account balances,
benefits of cloud computing services include the ability to global cloud-based products. The products include storage, transfer money to other accounts held by them or others, pay
scale elastically. In cloud speak, that means delivering the databases, analytics, networking, mobile, development tools, bills through internet banking, pay insurance premiums, and
right amount of IT resources—for example, more or less enterprise applications, with a pay-as-you-go pricing model. - so on. -  Manufacturing: Supply chain operations also use
computing power, storage, bandwidth—right when it is AWS Compute Services: EC2(Elastic Compute Cloud), ecommerce; usually, a few companies form a group and
needed and from the right geographic location. 4. LightSail, Elastic Beanstalk, EKS (Elastic Container Service for create an electronic exchange and facilitate purchase and sale
Productivity: On-site datacenters typically require a lot of Kubernetes, AWS Lambda. - Migration: DMS (Database of goods, exchange of market information, back office
information like inventory control, and so on. -  Online Volume: Websites have become great sources and cost to replace the whole machine can be saved. 8. Education
Booking: This is something almost every one of us has done repositories for many kinds of data. User clickstreams are Sector: Online educational course conducting organization
at some time –book hotels, holidays, airline tickets, travel recorded and stored for future use. Social media applications utilize big data to search candidate, interested in that course.
insurance, etc. These bookings and reservations are made such as Facebook, Twitter, Pinterest, and other applications If someone searches for YouTube tutorial video on a subject,
have enabled users to become prosumers (producers and then online or offline course provider organization on that
possible through an internet booking engine or IBE.  Online consumers) of data. There is an increase in the number of subject send ad online to that person about their course. 9.
Publishing: This refers to the digital publication of books, data shares and also the size of each data element. High- Energy Sector: Smart electric meter read consumed power
magazines, catalogues, and developing digital libraries.-  definition videos can increase the total shared data. There are every 15 minutes and sends this read data to the server,
Digital Advertising: Online advertising uses the internet to autonomous data streams of video, audio, text, data, and so where data analyzed and it can be estimated what is the time
deliver promotional material to consumers; it involves a on coming from social media sites, websites, RFID in a day when the power load is less throughout the city. 10.
publisher, and an advertiser. The advertiser provides the ads, applications, and so on. Media and Entertainment Sector: Media and entertainment
service providing company like Netflix, Amazon Prime, Spotify
and the publisher integrates ads into online content.- 
do analysis on data collected from their users. Data like what
Auctions: Online auctions bring together numerous people type of video, music users are watching, listening most, how
from various geographical locations and enable trading of long users are spending on site, etc are collected and
items at negotiated prices, implemented with e-commerce analyzed to set the next business strategy.
technologies. It enables more people to participate in
auctions.

BIG DATA - Big Data is an umbrella term for a collection of


datasets so large and complex that it becomes difficult to
process them using traditional data management tools. There
has been increasing democratization of the process of IMPORTANCE OR RELEVANCE OF BIG DATA - 1. Cost Savings:
content creation and sharing over the Internet using social Big Data tools like Apache Hadoop, Spark, etc. bring cost-
media applications. The combination of cloud-based storage, saving benefits to businesses when they have to store large
social media applications, and mobile access devices is amounts of data. These tools help organizations in identifying
BUSINESS IMPLICATIONS OF BIG DATA - Any industry that
helping crystallize the big data phenomenon. The leading more effective ways of doing business. 2. Time-Saving: Real-
produces information-based products is most likely to be
management consulting firm, McKinsey & Co. created a time in-memory analytics helps companies to collect data
disrupted. Thus, the newspaper industry has taken a hit from
flutter when it published a report in 2011 showing a huge from various sources. Tools like Hadoop help them to analyze
digital distribution channels, as well as from published-on-
impact of such big data on business and other organizations. data immediately thus helping in making quick decisions
web-only blogs. Entertainment has also been impacted by
They also reported that there will be millions of new jobs in based on the learnings. 3. Understand the market
digital distribution and piracy as well as by user-generated-
the next decade related to the use of big data in many conditions: Big Data analysis helps businesses to get a better
and uploaded content on the internet. The education industry
industries .Big data can be used to discover new insights from understanding of market situations. For example, analysis of
is being disrupted by massively on-line open courses
a 360-degree view of a situation that can allow for a complete customer purchasing behavior helps companies to identify
(MOOCs) and user-uploaded content. Healthcare delivery is
new perspective on situations, new models of reality, and the products sold most and thus produces those products
impacted by electronic health records and digital medicine.
potentially new types of solutions. It can help spot business accordingly. This helps companies to get ahead of their
The retail industry has been highly disrupted by e-commerce
trends and opportunities. For example, Google is able to competitors. 4. Social Media Listening: Companies can
companies. Fashion companies are impacted by quick
predict the spread of a disease by tracking the use of search perform sentiment analysis using Big Data tools. These enable
feedback on their designs on social media. The banking
terms related to the symptoms of the disease over the globe them to get feedback about their company, that is, who is
industry has been impacted by the cost-effective online
in real time. Big data can help determine the quality of saying what about the company. 5. Boost Customer
selfserve banking applications and this will impact
research, prevent diseases, link legal citations, combat crime, Acquisition and Retention: Customers are a vital asset on
employment levels in the industry. There is a rapid change in
and determine real-time roadway traffic conditions. Big data which any business depends on. No single business can
business models enabled by big data technologies. Steve
is enabling evidence-based medicine, and many other achieve its success without building a robust customer base.
Jobs, the ex-CEO of Apple, conceded that his company’s
innovations. Data has become the new natural resource. But even with a solid customer base, the companies can’t
products and business models would be disrupted. He
Organizations have a choice in how to engage with this ignore the competition in the market. 6. Solve Advertisers
preferred his older products to be cannibalized by his own
exponentially growing volume, variety and velocity of data. Problem and Offer Marketing Insights: Big data analytics
new products rather than by those of the competition. Every
They can choose to be buried under the avalanche or they shapes all business operations. It enables companies to fulfill
other business too will likely be disrupted. The key issue for
can choose to use it for competitive advantage. Challenges in customer expectations. Big data analytics helps in changing
business is how to harness big data to generate growth
big data include the entire range of operations from capture, the company’s product line. It ensures powerful marketing
opportunities and to leapfrog competition. Organizations
curation, storage, search, sharing, analysis, and visualization. campaigns. 7. The driver of Innovations and Product
need to learn how to organize their businesses so that they
Big data is more valuable when analyzed as a whole. More Development: Big data makes companies capable to innovate
do not get buried in high volume, velocity, and the variety of
and more information is derivable from analysis of a single and redevelop their products. APPLICATIONS OF BIG DATA -
data, but instead use it smartly and proactively to obtain a
large set of related data, as compared to separate smaller 1. Tracking Customer Spending Habit, Shopping Behavior: In
quick but decisive advantage over their competition.
sets. However, special tools and skills are needed to manage big retails store (like Amazon, Walmart, Big Bazar etc.)
Organizations need to figure out how to use big data as a
such extremely large datasets. CHARACTERISTICS OF BIG management team has to keep data of customer’s spending
strategic asset in real time, to identify opportunities, thwart
DATA - Variety: There are many types of data, including habit (in which product customer spent, in which band they
threats, build new capabilities, and enhance operational
structured and unstructured data. Structured data consists of wish to spent, how frequently they spent), shopping behavior,
efficiencies. TECHNOLOGY IMPLICATIONS OF BIG DATA - The
numeric and text fields. Unstructured data includes images, customer’s most liked product (so that they can keep those
growth of data is made possible in part by the advancement
video, audio, and many other types. There are also many products in the store). Which product is being searched/sold
of storage technology. The attached graph shows the growth
sources of data. The traditional sources of structured data most, based on that data, production/collection rate of that
of disk drive average capacities. The cost of storage is falling,
include data from ERPs systems and other operational product get fixed. 2. Recommendation: By tracking customer
the size of storage is getting smaller, and the speed of access
systems. Sources for unstructured data include social media, spending habit, shopping behavior, Big retails store provide a
is going up. Flash drives have become cheaper. Random
web, RFID, machine data, and others. Unstructured data recommendation to the customer. E-commerce site like
access memory storage used to be expensive but now is so
comes in a variety of sizes, resolutions, and are subject to Amazon, Walmart, Flipkart does product recommendation.
inexpensive that entire databases can be loaded and
different kinds of analysis. For example, video files can be They track what product a customer is searching, based on
processed quickly instead of swapping sections of it into and
tagged with labels and they can be played, but video data is that data they recommend that type of product to that
out of high-speed memory. New data management and
typically not computed which is the same with audio data. customer.
processing technologies have emerged. IT professionals
Graphic data can be analyzed for network distances. 3. Smart Traffic System: Data about the condition of the traffic
integrate big data structured assets with content and must
Facebook texts and tweets can be analyzed for sentiments of different road, collected through camera kept beside the
increase their business requirement identification skills. Big
but cannot be directly compared. - Velocity: The Internet road, at entry and exit point of the city, GPS device placed in
data is going democratic. Business functions will be protective
greatly increases the speed of movement of data, from e- the vehicle (Ola, Uber cab, etc.). All such data are analyzed
of their data and will begin initiatives around exploiting it. IT
mails to social media to video files, data can move quickly. and jam-free or less jam way, less time taking ways are
support teams need to find ways to support end-user-
Cloud-based storage makes sharing instantaneous and easily recommended. Such a way smart traffic system can be built in
deployed big data solutions. Enterprise data warehouses will
accessible from anywhere. Social media applications enable the city by Big data analysis. One more profit is fuel
need to include big data in some form. The IT platform needs
people to share their data with each other instantly. Mobile consumption can be reduced. 4. Secure Air Traffic System: At
to be strengthened to help provide the enablement of a
access to these applications also speeds up the generation various places of flight (like propeller etc) sensors present.
‘digital business strategy’ around digital assets and
These sensors capture data like the speed of flight, moisture,
capabilities. DATA SEARCH ALGORITHMS IN SEARCH ENGINES
temperature, other environmental condition. Based on such
- A search engine algorithm is a complex algorithm used by
data analysis, an environmental parameter within flight are
search engines such as Google, Yahoo, and Bing to determine
set up and varied. By analyzing flight’s machine-generated
a web page’s significance. According to Net craft, an Internet
data, it can be estimated how long the machine can operate
research company, there are over 150,000,000 active
flawlessly when it to be replaced/repaired. 5. Auto Driving
websites on the Internet. Without search engines, there
Car: Big data analysis helps drive a car without human
would be no way to determine which of these sites are
interpretation. In the various spot of car camera, a sensor
worthy of viewers time, and which sites are simply spam.
placed, that gather data like the size of the surrounding car,
Search engines collect significant data, which allows them to
obstacle, distance from those, etc. 6. Virtual Personal
almost instantly determine whether a site is spam or relevant
Assistant Tool: Big data analysis helps virtual personal
data. Relevant sites receive high rankings in search engines,
assistant tool (like Siri in Apple Device, Cortana in Windows,
and spam or irrelevant sites can receive exceptionally low
Google Assistant in Android) to provide the answer of the
rankings. Each search engine uses a search engine algorithm,
various question asked by users. This tool tracks the location
and no two search engines use exactly the same formula to
of the user, their local time, season, other data related to
determine a page’s ranking. However, there are several things
question asked, etc. 7. IoT:  Manufacturing company install
that all search engines will look for when crawling a web
IOT sensor into machines to collect operational data.
and access to data. page. List a few data search algorithms - Linear Search: In
Analyzing such data, it can be predicted how long machine
linear search, also known as sequential search, the algorithm
will work without any problem when it requires repairing so
sequentially checks each element in the list until the desired
that company can take action before the situation when
element is found. - Binary Search: Binary search is an
machine facing a lot of issues or gets totally down. Thus, the
efficient algorithm for finding an item from a sorted list. It
works by repeatedly dividing the list in half and eliminating in the prediction of customer behavior. use of Customer organization’s operations. Now, that’s a fairly boring definition
the half that cannot contain the item. - Hashing: Hashing is a Analytics -  Retail: Although until recently over 90% of for an idea that I sincerely believe has the potential to change
technique that maps data of arbitrary size to a fixed-size retailers had limited visibility on their customers, with how organizations use data. Instead of using dashboards or
value. This can be useful for indexing and searching data increasing investments in loyalty programs, customer tracking reports to understand your data, operational analytics drives
quickly. - Depth-First Search (DFS): DFS is an algorithm for solutions and market research, this industry started action by automatically delivering real-time data to the exact
traversing or searching a tree or graph data structure. It starts increasing use of customer analytics in decisions ranging from place it’ll be most useful, no matter where that is in your
at the root node and explores as far as possible along each product, promotion, price and distribution management. -  organization. And when implemented well, this flow of real-
branch before backtracking. - Breadth-First Search (BFS): Retail Management: Companies can use data about time data (from data warehouses to people who can actually
BFS is another algorithm for traversing or searching a tree or customers to restructure retail management. This do something with that data) becomes an undercurrent of
graph data structure. It starts at the root node and explores restructuring using data often occurs in dynamic scheduling insights powering important daily decisions, both big and
all the neighbor nodes at the current depth before moving on and worker evaluations. Through dynamic scheduling, small. Why You Should Use Operational Analytics - Any
to the next depth level. - A* Search: A* search is an companies optimize staffing through predictive scheduling company with more than a handful of customers will likely
informed search algorithm that uses heuristics to guide the software based on predictive customer traffic. -  need operational analytics. Once you reach a point of having
search. It is often used in pathfinding and other optimization Criticisms of Use: As retail technologies become more data a few customers’ data flowing into your data warehouse,
problems. driven, use of customer analytics use has raised criticisms you’ll quickly find the limitations of the individual tools you’re
specifically in how they affect the retail worker. -  Finance: currently using. Operational analytics not only provides a way
Banks, insurance companies and pension funds make use of to overcome these limitations at scale but also allows your
customer analytics in understanding customer lifetime value, data team to step up and take a more proactive role in how
identifying below-zero customers which are estimated to be your business uses data. Without operational analytics, you
around 30% of customer base, increasing cross-sales, have to rely on the analytics capabilities of your individual
managing customer attrition as well as migrating customers tools.
to lower cost channels in a targeted manner. - 
Community: Municipalities utilize customer analytics in an
effort to lure retailers to their cities. -  Customer
relationship management: Analytical Customer Relationship
Management, commonly abbreviated as CRM, enables
measurement of and prediction from customer data to
provide a 360° view of the client.
DIGITAL ADVERTISEMENTS - Digital advertising refers to
marketing through online channels, such as websites,
streaming content, and more. Digital ads span media formats,
including text, image, audio, and video. They can help you COMPLIANCE ANALYTICS- compliance analytics: It’s the
achieve a variety of business goals across the marketing process of gathering all the data the company holds (and
funnel, ranging from brand awareness to customer even data that it does not hold) and analyzing it using
engagement, to launching new products and driving repeat statistical algorithms to mine for patterns and anomalies to MACHINE LEARNING - Machine learning (ML) is the study of
sales. The field of digital advertising is relatively young, in uncover things like fraud, policy violations, and other computer algorithms that can improve automatically through
comparison to traditional channels such as magazines, misconduct. “Compliance analytics is about using data to experience and by the use of data. It is seen as a part of
billboards, and direct mail. The evolution of advertising isn't derive insights from a compliance perspective,” says Seth artificial intelligence. Machine learning algorithms build a
just about what the ads look like or where they appear, but Rosensweig, a partner at PwC and head of its digital risk, model based on sample data, known as training data, in order
also the ways they're built, sold, and measured. What are the regulatory, and compliance practice. Depending on where a to make predictions or decisions without being explicitly
different types of digital advertising? -  Search advertising: company is along the analytics maturity spectrum, programmed to do so. Machine learning algorithms are used
Search ads, also called search engine marketing (SEM), Rosensweig explains, compliance analytics can be used to in a wide variety of applications, such as in medicine, email
appear in search engine results pages (SERPs). These are derive insights for a variety of purposes, including:  filtering, speech recognition, and computer vision, where it is
typically text ads that appear above or alongside organic Descriptive analytics: What happened in a given situation? difficult or unfeasible to develop conventional algorithms to
search results. -  Display advertising: Display ads are online Diagnostic analytics: Why did it happen? Predictive perform the needed tasks. A subset of machine learning is
ads that use text and visual elements, such as an image or analytics: What could happen? Prescriptive analytics: What closely related to computational statistics, which focuses on
animation, and can appear on websites, apps, and devices. is the best course of action for a given situation? What can making predictions using computers; but not all machine
They appear in or alongside the content of a website.\ -  the business do to improve? - “Dealing with a huge amount learning is statistical learning. The study of mathematical
Online video advertising: Online video ads are ads that use a of data traditionally was a very laborious activity for optimization delivers methods, theory and application
video format. Out-stream video ads appear in places similar compliance functions,” says Shaheen Dil, managing director domains to the field of machine learning. Data mining is a
to display ads: on websites, apps, and devices. In-stream and global solution leader for data management and related field of study, focusing on exploratory data analysis
video ads appear before, during, or after video content. -  advanced analytics at Protiviti. Manually sifting through data through unsupervised learning. Some implementations of
Streaming media advertising: Also known as over-the-top also leaves the door open for misconduct or a policy violation machine learning use data and neural networks in a way that
(OTT), these are a specific type of video ad that appears in to go undetected—a very real concern for a global financial mimics the working of a biological brain. In its application
streaming media content delivered over the Internet without institution, for example, that typically has dozens of lines of across business problems, machine learning is also referred to
satellite or cable. -  Audio advertising: In the context of business, has millions of customers, and manages billions of as predictive analytics. Machine learning programs can
digital advertising, audio ads are ads that play before, during, records. Merely taking a risk-based sample of data doesn’t perform tasks without being explicitly programmed to do so.
or after online audio content, such as streaming music or satisfy regulators, Dil says, because it raises the question, It involves computers learning from data provided so that
podcasts. -  Social media advertising: Social media ads “How do you know you’ve picked a comprehensive data they carry out certain tasks. For simple tasks assigned to
appear in social media platforms, such as Twitter or LinkedIn. sample? How do you know this sample covers all your computers, it is possible to program algorithms telling the
RECOMMENDER SYSTEMS - A recommender system, or a potential risks?”. FRAUD ANALYTICS - Fraud analytics is the machine how to execute all steps required to solve the
recommendation system (sometimes replacing 'system' with use of big data analysis techniques to prevent online financial problem at hand; on the computer's part, no learning is
a synonym such as platform or engine), is a subclass of fraud. It can help financial organizations predict future needed. For more advanced tasks, it can be challenging for a
information filtering system that seeks to predict the "rating" fraudulent behavior, and help them apply fast detection and human to manually create the needed algorithms. In practice,
or "preference" a user would give to an item. Recommender mitigation of fraudulent activity in real time. The challenge of it can turn out to be more effective to help the machine
systems are used in a variety of areas, with commonly financial fraud - Banks and other financial institutions have a develop its own algorithm, rather than having human
recognised examples taking the form of playlist generators for responsibility to their customers to secure their data and programmers specify every needed step. Describe machine
video and music services, product recommenders for online finances against fraud or outright theft. This has become a intelligence applications in business and data analytics. -
stores, or content recommenders for social media platforms complex task due, at least in part, to customers being able to Predictive analytics: Machine learning algorithms can analyze
and open web content recommenders. These systems can access their accounts via multiple channels. They can do their large volumes of data to identify patterns and make
operate using a single input, like music, or multiple inputs banking transactions using a mobile banking app, online predictions about future trends or events. This helps
within and across platforms like news, books, and search banking portal, by calling into the call center, or even visiting businesses make informed decisions, such as predicting
queries. There are also popular recommender systems for the bank in person. A teller can verify a customer’s identity customer behavior, identifying market trends, or forecasting
specific topics like restaurants and online dating. with reasonable confidence. But how do you verify that the sales. - Fraud detection: Machine learning algorithms can
Recommender systems have also been developed to explore person logging into a bank account online is actually that analyze transactions, behavior patterns, and other data to
research articles and experts, collaborators, and financial person and not a fraudster logging in with stolen credentials? detect potential fraud. This helps businesses prevent losses
services. Recommender systems usually make use of either or The number of stolen credentials available to fraudsters is and protect themselves against financial crimes. - Customer
both collaborative filtering and content-based filtering (also staggering. There are over 15 billion stolen credentials for service: Chatbots and virtual assistants powered by natural
known as the personality-based approach), as well as other sale on the dark web. Cybercriminals can purchase them for language processing and machine learning can provide
systems such as knowledge-based systems. Collaborative as little as an average $15.43 for consumer credentials to automated customer support. This helps businesses improve
filtering approaches build a model from a user's past behavior more than an average $3,139 for credentials for an customer satisfaction and reduce costs associated with
(items previously purchased or selected and/or numerical organization’s key systems. Fraud analytics is key to financial customer service. - Supply chain optimization: Machine
ratings given to those items) as well as similar decisions made fraud risk management - The bad news is that online fraud is learning can be used to optimize supply chain operations,
by other users. This model is then used to predict items (or constantly evolving. As banks put remediation measures in such as predicting demand, improving inventory
ratings for items) that the user may have an interest in. place, new threats appear. Traditional, static rules-based management, and optimizing logistics. This helps businesses
Content-based filtering approaches utilize a series of discrete, fraud prevention systems can’t keep pace. The good news is reduce costs and improve efficiency. - Marketing
pre-tagged characteristics of an item in order to recommend that there is a wealth of data available to financial automation: Machine learning can be used to personalize
additional items with similar properties. CUSTOMER organizations that can be used to predict and detect financial marketing campaigns based on customer behavior and
ANALYTICS - Customer analytics is a process by which data fraud and adapt to new threats. Collecting a username and preferences. This helps businesses improve the effectiveness
from customer behavior is used to help make key business password at login is no longer sufficient to guard against of their marketing efforts and increase customer engagement.
decisions via market segmentation and predictive analytics. fraudulent activity. When someone accesses, or attempts to - Sentiment analysis: Machine learning can analyze social
This information is used by businesses for direct marketing, access, an account there is other data that can be used to media and other sources of customer feedback to identify
site selection, and customer relationship management. determine whether or not this is a legitimate customer and trends and sentiment. This helps businesses improve their
Marketing provides services in order to satisfy customers. whether or not the transaction requested is legitimate. products and services based on customer feedback. TYPES OF
With that in mind, the productive system is considered from OPERATIONAL ANALYTICS - Operational analytics is a type of LEARNING ALGORITHMS - Supervised learning -
its beginning at the production level, to the end of the cycle analytics that informs day-to-day decisions with the goal of Supervised learning algorithms build a mathematical model
at the consumer. Customer analytics plays an important role improving the efficiency and effectiveness of your of a set of data that contains both the inputs and the desired
outputs. The data is known as training data, and consists of a introduces non-linearity by taking advantage of the kernel COLLECTION) - Data collection is the process of gathering and
set of training examples. Each training example has one or trick to implicitly map input variables to higher-dimensional measuring information on targeted variables in an established
more inputs and the desired output, also known as a space. - A Bayesian network, belief network, or directed system, which then enables one to answer relevant questions
supervisory signal. In the mathematical model, each training acyclic graphical model is a probabilistic graphical model that and evaluate outcomes. Data collection is a research
example is represented by an array or vector, sometimes represents a set of random variables and their conditional component in all study fields, including physical and social
called a feature vector, and the training data is represented by independence with a directed acyclic graph (DAG). For sciences, humanities, and business. While methods vary by
a matrix. Through iterative optimization of an objective example, a Bayesian network could represent the discipline, the emphasis on ensuring accurate and honest
function, supervised learning algorithms learn a function that probabilistic relationships between diseases and symptoms. collection remains the same. The goal for all data collection is
can be used to predict the output associated with new inputs. Given symptoms, the network can be used to compute the to capture quality evidence that allows analysis to lead to the
- Unsupervised learning : algorithms take a set of data that probabilities of the presence of various diseases. Efficient formulation of convincing and credible answers to the
contains only inputs, and find structure in the data, like algorithms exist that perform inference and learning. questions that have been posed. Data collection and
grouping or clustering of data points. The algorithms, Bayesian networks that model sequences of variables, like validation consists of four steps when it involves taking a
therefore, learn from test data that has not been labeled, speech signals or protein sequences, are called dynamic census and seven steps when it involves sampling. A formal
classified or categorized. Instead of responding to feedback, Bayesian networks. Generalizations of Bayesian networks that data collection process is necessary as it ensures that the data
unsupervised learning algorithms identify commonalities in can represent and solve decision problems under uncertainty gathered are both defined and accurate. This way,
the data and react based on the presence or absence of such are called influence diagrams. - A genetic algorithm (GA) is subsequent decisions based on arguments embodied in the
commonalities in each new piece of data. – Semi- a search algorithm and heuristic technique that mimics the findings are made using valid data. The process provides both
supervised learning : falls between unsupervised learning process of natural selection, using methods such as mutation a baseline from which to measure and in certain cases an
(without any labeled training data) and supervised learning and crossover to generate new genotypes in the hope of indication of what to improve. There are 5 common data
(with completely labeled training data). Some of the training finding good solutions to a given problem. In machine collection methods:1. Closed-ended surveys and quizzes, 2.
examples are missing training labels, yet many machine- learning, genetic algorithms were used in the 1980s and Open-ended surveys and questionnaires, 3. 1-on-1 interviews,
learning researchers have found that unlabeled data, when 1990s. Conversely, machine learning techniques have been 4. Focus groups, and 5. Direct observation. DATA STORAGE
used in conjunction with a small amount of labeled data, can used to improve the performance of genetic and evolutionary AND KNOWLEDGE MANAGEMENT - Data storage and
produce a considerable improvement in learning accuracy. - algorithms. - Training models Typically, machine learning knowledge management are two closely related areas that
Reinforcement learning : is an area of machine learning models require a high quantity of reliable data in order for are essential for organizations to manage and leverage data
concerned with how software agents ought to take actions in the models to perform accurate predictions. When training a effectively. Data storage involves the physical storage and
an environment so as to maximize some notion of cumulative machine learning model, machine learning engineers need to retrieval of data in a way that is secure, reliable, and efficient.
reward. Due to its generality, the field is studied in many target and collect a large and representative sample of data. There are various types of data storage technologies
other disciplines, such as game theory, control theory, Data from the training set can be as varied as a corpus of text, available, including on-premises storage, cloud storage, and
operations research, information theory, simulation-based a collection of images, sensor data, and data collected from hybrid storage. Organizations need to choose the appropriate
optimization, multi-agent systems, swarm intelligence, individual users of a service. Overfitting is something to watch storage technology that meets their needs in terms of
statistics and genetic algorithms. In machine learning, the out for when training a machine learning model. Trained capacity, accessibility, security, and cost. Knowledge
environment is typically represented as a Markov decision models derived from biased or nonevaluated data can result management involves the creation, storage, and retrieval of
process (MDP). in skewed or undesired predictions. Bias models may result in knowledge and information that is relevant to an
detrimental outcomes thereby furthering the negative organization. This includes tacit knowledge, such as expertise
impacts to society or objectives. and experience of employees, as well as explicit knowledge,
such as documents, procedures, and policies. Knowledge
MACHINE LEARNING MODELS - Artificial neural networks - management systems use technologies such as databases,
(ANNs), or connectionist systems, are computing systems search engines, and collaboration tools to manage and share
vaguely inspired by the biological neural networks that AREAS OF APPLICATION - Artificial Intelligence (AI) and knowledge effectively within an organization.
constitute animal brains. Such systems "learn" to perform Machine Learning (ML): AI and ML can be applied in
tasks by considering examples, generally without being numerous areas such as finance, healthcare, transportation, DATA ANALYSIS - Data analysis is a process of inspecting,
programmed with any task-specific rules. An ANN is a model agriculture, e-commerce, customer service, cybersecurity, cleansing, transforming, and modeling data with the goal of
based on a collection of connected units or nodes called education, and many others. For instance, AI can help discovering useful information, informing conclusions, and
"artificial neurons", which loosely model the neurons in a diagnose diseases, detect fraud, and personalize content. - supporting decision-making. Data analysis has multiple facets
biological brain. Each connection, like the synapses in a Internet of Things (IoT): IoT can be applied in various areas and approaches, encompassing diverse techniques under a
biological brain, can transmit information, a "signal", from such as smart homes, smart cities, healthcare, logistics, variety of names, and is used in different business, science,
one artificial neuron to another. An artificial neuron that agriculture, and many others. For example, IoT can help track and social science domains. In today's business world, data
receives a signal can process it and then signal additional inventory, monitor patient health, and optimize energy analysis plays a role in making decisions more scientific and
artificial neurons connected to it. In common ANN consumption. - Blockchain: Blockchain can be applied in helping businesses operate more effectively. Data mining is a
implementations, the signal at a connection between artificial various areas such as finance, supply chain management, particular data analysis technique that focuses on statistical
neurons is a real number, and the output of each artificial voting systems, and identity verification. For example, modelling and knowledge discovery for predictive rather than
neuron is computed by some non-linear function of the sum blockchain can help improve supply chain transparency and purely descriptive purposes, while business intelligence
of its inputs. The connections between artificial neurons are traceability, enhance security in financial transactions, and covers data analysis that relies heavily on aggregation,
called "edges". Artificial neurons and edges typically have a verify identity without the need for intermediaries. - focusing mainly on business information. In statistical
weight that adjusts as learning proceeds. The weight Virtual and Augmented Reality (VR/AR): VR and AR can be applications, data analysis can be divided into descriptive
increases or decreases the strength of the signal at a applied in various areas such as entertainment, education, statistics, exploratory data analysis (EDA), and confirmatory
connection. - Decision tree learning uses a decision tree as healthcare, and marketing. For example, VR can help simulate data analysis (CDA). EDA focuses on discovering new features
a predictive model to go from observations about an item training environments, while AR can help provide interactive in the data while CDA focuses on confirming or falsifying
(represented in the branches) to conclusions about the item's product demonstrations. - Robotics: Robotics can be existing hypotheses. Predictive analytics focuses on the
target value (represented in the leaves). It is one of the applied in various areas such as manufacturing, logistics, application of statistical models for predictive forecasting or
predictive modeling approaches used in statistics, data healthcare, and entertainment. For instance, robotics can classification, while text analytics applies statistical, linguistic,
mining, and machine learning. Tree models where the target help automate repetitive tasks, provide assistance to people and structural techniques to extract and classify information
variable can take a discrete set of values are called with disabilities, and perform surgeries. BUSINESS from textual sources, a species of unstructured data. All of
classification trees; in these tree structures, leaves represent INTELLIGENCE - Business intelligence (BI) comprises the the above are varieties of data analysis. Data integration is a
class labels and branches represent conjunctions of features strategies and technologies used by enterprises for the data precursor to data analysis, and data analysis is closely linked
that lead to those class labels. Decision trees where the analysis and management of business information. Common to data visualization and data dissemination. INTRODUCTION
target variable can take continuous values (typically real functions of business intelligence technologies include TO R PROGRAMMING - R is a programming language for
numbers) are called regression trees. In decision analysis, a reporting, online analytical processing, analytics, dashboard statistical computing and graphics supported by the R Core
decision tree can be used to visually and explicitly represent development, data mining, process mining, complex event Team and the R Foundation for Statistical Computing. Created
decisions and decision making. In data mining, a decision tree processing, business performance management, by statisticians Ross Ihaka and Robert Gentleman, R is used
describes data, but the resulting classification tree can be an benchmarking, text mining, predictive analytics, and among data miners and statisticians for data analysis and
input for decision making. - Support-vector machines prescriptive analytics.BI technologies can handle large developing statistical software. Users have created packages
(SVMs), also known as support-vector networks, are a set of amounts of structured and sometimes unstructured data to to augment the functions of the R language. The official R
related supervised learning methods used for classification help identify, develop, and otherwise create new strategic software environment is an open-source free software
and regression. Given a set of training examples, each marked business opportunities. They aim to allow for the easy environment within the GNU package, available under the
as belonging to one of two categories, an SVM training interpretation of these big data. Identifying new GNU General Public License. It is written primarily in C,
algorithm builds a model that predicts whether a new opportunities and implementing an effective strategy based Fortran, and R itself (partially self-hosting). Precompiled
example falls into one category or the other.[69] An SVM on insights can provide businesses with a competitive market executables are provided for various operating systems. R has
training algorithm is a nonprobabilistic, binary, linear advantage and long-term stability. Business intelligence can a command line interface. Multiple third-party graphical user
classifier, although methods such as Platt scaling exist to use be used by enterprises to support a wide range of business interfaces are also available, such as RStudio, an integrated
SVM in a probabilistic classification setting. In addition to decisions ranging from operational to strategic. Basic development environment, and Jupyter, a notebook interface.
performing linear classification, SVMs can efficiently perform operating decisions include product positioning or pricing. Features - Statistics : R and its libraries implement various
a non-linear classification using what is called the kernel trick, Strategic business decisions involve priorities, goals, and statistical and graphical techniques, including linear and
implicitly mapping their inputs into high-dimensional feature directions at the broadest level. elements of business nonlinear modeling, classical statistical tests, spatial and time-
spaces. - Regression analysis encompasses a large variety of intelligence -  Multidimensional aggregation and allocation. series analysis, classification, clustering, and others. R is easily
statistical methods to estimate the relationship between  Denormalization, tagging, and standardization.  Realtime extensible through functions and extensions, and its
input variables and their associated features. Its most reporting with analytical alert.  A method of interfacing community is noted for contributing packages. Many of R's
common form is linear regression, where a single line is with unstructured data sources.  Group consolidation, standard functions are written in R,[citation needed] which
drawn to best fit the given data according to a mathematical budgeting, and rolling forecasts.  Statistical inference and makes it easy for users to follow the algorithmic choices
criterion such as ordinary least squares. The latter is often probabilistic simulation.  Key performance indicators made. For computationally intensive tasks, C, C++, and
extended by regularization (mathematics) methods to optimization.  Version control and process management. Fortran code can be linked and called at run time. Advanced
mitigate overfitting and bias, as in ridge regression. When  Open item management. Importance -  Identify ways to users can write C, C++, Java, .NET or Python code to
dealing with non-linear problems, go-to models include increase profit.  Analyze customer behavior.  Compare manipulate R objects directly. R is highly extensible through
polynomial regression (for example, used for trendline fitting data with competitors.  Track performance.  Optimize the use of packages for specific functions and specific
in Microsoft Excel[70]), logistic regression (often used in operations.  Predict success.  Spot market trends.  applications. Due to its S heritage, R has stronger
statistical classification) or even kernel regression, which Discover issues or problems. DATA GATHERING (OR DATA objectoriented programming facilities than most statistical
computing languages.[citation needed] Extending it is computations for SEM and displays the results. (usually automated) process of sorting and understanding
facilitated by its lexical scoping rules. - Programming: R is an INTRODUCTION TO MS EXCEL - Microsoft Excel is a textual data. TYPES OF ANALYSIS - 1. Descriptive analysis -
interpreted language; users can access it through a spreadsheet developed by Microsoft for Windows, macOS, What happened.: The descriptive analysis method is the
command-line interpreter. If a user types 2+2 at the R Android and iOS. It features calculation or computation starting point to any analytic process, and it aims to answer
command prompt and presses enter, the computer replies capabilities, graphing tools, pivot tables, and a macro the question of what happened? It does this by ordering,
with 4.Like languages such as APL and MATLAB, R supports programming language called Visual Basic for Applications manipulating, and interpreting raw data from various sources
matrix arithmetic. R's data structures include vectors, (VBA). Excel forms part of the Microsoft Office suite of to turn it into valuable insights to your business. Performing
matrices, arrays, data frames (similar to tables in a relational software. Microsoft Excel has the basic features of all descriptive analysis is essential, as it allows us to present our
database) and lists. Arrays are stored in column-major order. spreadsheets, using a grid of cells arranged in numbered rows data in a meaningful way. Although it is relevant to mention
R's extensible object system includes objects for (among and letter-named columns to organize data manipulations like that this analysis on its own will not allow you to predict
others): regression models, time-series and geospatial arithmetic operations. It has a battery of supplied functions to future outcomes or tell you the answer to questions like why
coordinates. R has no scalar data type. Instead, a scalar is answer statistical, engineering, and financial needs. In something happened, but it will leave your data organized
represented as a length-one vector. Many features of R derive addition, it can display data as line graphs, histograms and and ready to conduct further analysis. 2. Diagnostic analysis -
from Scheme. R uses S-expressions to represent both data charts, and with a very limited three-dimensional graphical Why it happened.: One of the most powerful types of data
and code. Functions are first-class objects and can be display. It allows sectioning of data to view its dependencies analysis. Diagnostic data analytics empowers analysts and
manipulated in the same way as data objects, facilitating on various factors for different perspectives (using pivot business executives by helping them gain a firm contextual
metaprogramming that allows multiple dispatch. Variables in tables and the scenario manager). A PivotTable is a tool for understanding of why something happened. If you know why
R are lexically scoped and dynamically typed. Function data analysis. It does this by simplifying large data sets via something happened as well as how it happened, you will be
arguments are passed by value, and are lazy—that is to say, PivotTable fields It has a programming aspect, Visual Basic for able to pinpoint the exact ways of tackling the issue or
they are only evaluated when they are used, not when the Applications, allowing the user to employ a wide variety of challenge. Designed to provide direct and actionable answers
function is called. INTRODUCTION TO PYTHON - Python is an numerical methods, for example, for solving differential to specific questions, this is one of the world’s most
interpreted high-level general-purpose programming equations of mathematical physics, and then reporting the important methods in research, among its other key
language. Its design philosophy emphasizes code readability results back to the spreadsheet. It also has a variety of organizational functions such as retail analytics, 3. Predictive
with its use of significant indentation. Its language constructs interactive features allowing user interfaces that can analysis - What will happen.: The predictive method allows
as well as its objectoriented approach aim to help completely hide the spreadsheet from the user, so the you to look into the future to answer the question: what will
programmers write clear, logical code for small and large- spreadsheet presents itself as a so-called application, or happen? In order to do this, it uses the results of the
scale projects. Python is dynamically-typed and garbage- decision support system (DSS), via a custom-designed user previously mentioned descriptive, exploratory, and diagnostic
collected. It supports multiple programming paradigms, interface, for example, a stock analyzer, or in general, as a analysis, in addition to machine learning (ML) and artificial
including structured (particularly, procedural), object-oriented design tool that asks the user questions and provides answers intelligence (AI). Like this, you can uncover future trends,
and functional programming. It is often described as a and reports. In a more elaborate realization, an Excel potential problems or inefficiencies, connections, and
"batteries included" language due to its comprehensive application can automatically poll external databases and casualties in your data. 4. Exploratory analysis - How to
measuring instruments using an update schedule, analyze the explore data relationships.: As its name suggests, the main
standard library. Python is a multi-paradigm programming results, make a Word report or PowerPoint slide show, and e- aim of the exploratory analysis is to explore. Prior to it,
language. Object-oriented programming and structured mail these presentations on a regular basis to a list of there's still no notion of the relationship between the data
programming are fully supported, and many of its features participants. Excel was not designed to be used as a database. and the variables. Once the data is investigated, the
support functional programming and aspect-oriented Key data analysis techniques used in creating data sets for exploratory analysis enables you to find connections and
programming (including by metaprogramming and business - 1. Regression analysis: Regression analysis is used generate hypotheses and solutions for specific problems. A
metaobjects (magic methods)). Many other paradigms are to estimate the relationship between a set of variables. When typical area of application for exploratory analysis is data
supported via extensions, including design by contract and conducting any type of regression analysis, you’re looking to mining. 5. Diagnostic analysis - Why it happened.: One of the
see if there’s a correlation between a dependent variable most powerful types of data analysis. Diagnostic data
logic programming. Python uses dynamic typing and a
(that’s the variable or outcome you want to measure or analytics empowers analysts and business executives by
combination of reference counting and a cycle-detecting predict) and any number of independent variables (factors helping them gain a firm contextual understanding of why
garbage collector for memory management. It also features which may have an impact on the dependent variable). The something happened. If you know why something happened
dynamic name resolution (late binding), which binds method aim of regression analysis is to estimate how one or more as well as how it happened, you will be able to pinpoint the
and variable names during program execution. variables might impact the dependent variable, in order to exact ways of tackling the issue or challenge. 6. Prescriptive
identify trends and patterns. This is especially useful for analysis - How will it happen.: Another of the most effective
INTRODUCTION TO SPSS - SPSS Statistics is a statistical making predictions and forecasting future trends. types of data analysis methods in research. Prescriptive data
software suite developed by IBM for data management, techniques cross over from predictive analysis in the way
advanced analytics, multivariate analysis, business 2. Monte Carlo simulation: When making decisions or taking that.
intelligence, criminal investigation. Long produced by SPSS certain actions, there are a range of different possible
Inc., it was acquired by IBM in 2009. Current versions (post outcomes. If you take the bus, you might get stuck in traffic. If
2015) have the brand name: IBM SPSS Statistics. SPSS is a you walk, you might get caught in the rain or bump into your
widely used program for statistical analysis in social science. It chatty neighbor, potentially delaying your journey. In
is also used by market researchers, health researchers, survey everyday life, we tend to briefly weigh up the pros and cons
companies, government, education researchers, marketing before deciding which action to take; however, when the
organizations, data miners, and others. The original SPSS stakes are high, it’s essential to calculate, as thoroughly and
manual (Nie, Bent & Hull, 1970) has been described as one of accurately as possible, all the potential risks and rewards. 3.
"sociology's most influential books" for allowing ordinary Factor analysis: Factor analysis is a technique used to reduce
researchers to do their own statistical analysis. In addition to a large number of variables to a smaller number of factors. It
statistical analysis, data management (case selection, file works on the basis that multiple separate, observable
reshaping, creating derived data) and data documentation (a variables correlate with each other because they are all
metadata dictionary is stored in the datafile) are features of associated with an underlying construct. This is useful not
the base software. The many features of SPSS Statistics are only because it condenses large datasets into smaller, more
accessible via pull-down menus or can be programmed with a manageable samples, but also because it helps to uncover
proprietary 4GL command syntax language. Command syntax hidden patterns. This allows you to explore concepts that
programming has the benefits of reproducible output, cannot be easily measured or observed—such as wealth,
simplifying repetitive tasks, and handling complex data happiness, fitness, or, for a more business-relevant example,
manipulations and analyses. Additionally, some complex customer loyalty and satisfaction. 4. Cohort analysis:
applications can only be programmed in syntax and are not Cohort analysis is defined on Wikipedia as follows: “Cohort
accessible through the menu structure. The pull-down menu analysis is a subset of behavioral analytics that takes the data
interface also generates command syntax: this can be from a given dataset and rather than looking at all users as
displayed in the output, although the default settings have to one unit, it breaks them into related groups for analysis.
be changed to make the syntax visible to the user. They can These related groups, or cohorts, usually share common
also be pasted into a syntax file using the "paste" button characteristics or experiences within a defined time-span.” 5.
present in each menu. Programs can be run interactively or Cluster analysis: Cluster analysis is an exploratory technique
unattended, using the supplied Production Job Facility. that seeks to identify structures within a dataset. The goal of
INTRODUCTION TO AMOS - IBM® SPSS® Amos is a powerful cluster analysis is to sort different data points into groups (or
structural equation modeling (SEM) software helping support clusters) that are internally homogeneous and externally
your research and theories by extending standard heterogeneous. This means that data points within a cluster
multivariate analysis methods, including regression, factor are similar to each other, and dissimilar to data points in
analysis, correlation and analysis of variance. Build attitudinal another cluster. Clustering is used to gain insight into how
and behavioral models reflecting complex relationships more data is distributed in a given dataset, or as a preprocessing
accurately than with standard multivariate statistics step for other algorithms. 6. Time series analysis: Time series
techniques using either an intuitive graphical or analysis is a statistical technique used to identify trends and
programmatic user interface. Amos is included in the cycles over time. Time series data is a sequence of data points
Premium edition of SPSS Statistics (except in Campus Edition, which measure the same variable at different points in time
where it is sold separately). You can also buy Amos as part of (for example, weekly sales figures or monthly email sign-ups).
the Base, Standard and Professional editions of SPSS By looking at time-related trends, analysts are able to forecast
Statistics, or separately as a standalone application. For how the variable of interest may fluctuate in the future. 7.
Windows only. AMOS is statistical software and it stands for Sentiment analysis: When you think of data, your mind
analysis of a moment structures. AMOS is an added SPSS probably automatically goes to numbers and spreadsheets.
module, and is specially used for Structural Equation Many companies overlook the value of qualitative data, but in
Modeling, path analysis, and confirmatory factor analysis. It is reality, there are untold insights to be gained from what
also known as analysis of covariance or causal modeling people (especially customers) write and say about you. So
software. AMOS is a visual program for structural equation how do you go about analyzing textual data? One highly
modeling(SEM). In AMOS, we can draw models graphically useful qualitative technique is sentiment analysis, a technique
using simple drawing tools. AMOS quickly performs the which belongs to the broader category of text analysis—the

You might also like