IDS- UNIT-1
IDS- UNIT-1
Uses:
1. In Search Engines
The most useful application of Data Science is Search Engines. As we know when we want to
search for something on the internet, we mostly use Search engines like Google, Yahoo,
DuckDuckGo and Bing, etc. So Data Science is used to get Searches faster.
2. In Transport
Data Science is also entered in real-time such as the Transport field like Driverless Cars. With the
help of Driverless Cars, it is easy to reduce the number of Accidents.
3. In Finance
Data Science plays a key role in Financial Industries. Financial Industries always have an issue of
fraud and risk of losses. Thus, Financial Industries needs to automate risk of loss analysis in order
to carry out strategic decisions for the company. Also, Financial Industries uses Data Science
Analytics tools in order to predict the future. It allows the companies to predict customer lifetime
value and their stock market moves.
4. In E-Commerce
E-Commerce Websites like Amazon, Flipkart, etc. uses data Science to make a better user
experience with personalized recommendations.
5. In Health Care
In the Healthcare Industry data science act as a boon. Data Science is used for:
• Detecting Tumor.
• Drug discoveries.
• Medical Image Analysis.
• Virtual Medical Bots.
• Genetics and Genomics.
• Predictive Modeling for Diagnosis etc.
6. Image Recognition
Currently, Data Science is also used in Image Recognition.
7. Targeting Recommendation
Targeting Recommendation is the most important application of Data Science. Whatever the user
searches on the Internet, he/she will see numerous posts everywhere.
8. Airline Routing Planning
With the help of Data Science, Airline Sector is also growing like with the help of it, it becomes
easy to predict flight delays. It also helps to decide whether to directly land into the destination or
take a halt in between like a flight can have a direct route from Delhi to the U.S.A or it can halt in
between after that reach at the destination.
9. Data Science in Gaming
In most of the games where a user will play with an opponent i.e. a Computer Opponent, data
science concepts are used with machine learning where with the help of past data the Computer will
improve its performance. There are many games like Chess, EA Sports, etc. will use Data Science
concepts.
10. Medicine and Drug Development
The process of creating medicine is very difficult and time-consuming and has to be done with full
disciplined because it is a matter of Someone’s life. Without Data Science, it takes lots of time,
resources, and finance or developing new Medicine or drug but with the help of Data Science, it
becomes easy because the prediction of success rate can be easily determined based on biological
data or factors. The algorithms based on data science will forecast how this will react to the human
body without lab experiments.
11. In Delivery Logistics
Various Logistics companies like DHL, FedEx, etc. make use of Data Science. Data Science helps
these companies to find the best route for the Shipment of their Products, the best time suited for
delivery, the best mode of transport to reach the destination, etc.
12. Autocomplete
AutoComplete feature is an important part of Data Science where the user will get the facility to
just type a few letters or words, and he will get the feature of auto-completing the line. In Google
Mail, when we are writing formal mail to someone so at that time data science concept of
Autocomplete feature is used where he/she is an efficient choice to auto-complete the whole
line. Also in Search Engines in social media, in various apps, AutoComplete feature is widely used.
3.Facets of Data
Very large amount of data will generate in big data and data science. These data is various types and
main categories of data are as follows:
a) Structured data
b) Unstructured data
c) Natural language
d) Machine-generated data
e) Graph-based data
g) Streaming data
Structured Data
• Structured data is arranged in rows and column format. It helps for application to retrieve and
process data easily. Database management system is used for storing structured data.
• The term structured data refers to data that is identifiable because it is organized in a structure. The
most common form of structured data or records is a database where specific information is stored
based on a methodology of columns and rows.
• Structured data is also searchable by data type within content. Structured data is understood by
computers and is also efficiently organized for human readers.
Unstructured Data
• Unstructured data is data that does not follow a specified format. Row and columns are not used for
unstructured data. Therefore it is difficult to retrieve required information. Unstructured data has no
identifiable structure.
• The unstructured data can be in the form of Text: (Documents, email messages, customer feedbacks),
audio, video, images. Email is an example of unstructured data.
• Even today in most of the organizations more than 80 % of the data are in unstructured form. This
carries lots of information. But extracting information from these various sources is a very big
challenge.
Natural Language
• Natural language processing enables machines to recognize characters, words and sentences, then
apply meaning and understanding to that information. This helps machines to understand language as
humans do.
• Natural language processing is the driving force behind machine intelligence in many modern real-
world applications. The natural language processing community has had success in entity recognition,
topic recognition, summarization, text completion and sentiment analysis.
•For natural language processing to help machines understand human language, it must go through
speech recognition, natural language understanding and machine translation. It is an iterative process
comprised of several layers of text analysis.
• Machine data contains a definitive record of all activity and behavior of our customers, users,
transactions, applications, servers, networks, factory machinery and so on.
• It's configuration data, data from APIs and message queues, change events, the output of diagnostic
commands and call detail records, sensor data from remote equipment and more.
• Examples of machine data are web server logs, call detail records, network event logs and telemetry.
• It can be either structured or unstructured. In recent years, the increase of machine data has surged.
The expansion of mobile devices, virtual servers and desktops, as well as cloud- based services and
RFID technologies, is making IT infrastructures more complex.
•Graphs are data structures to describe relationships and interactions between entities in complex
systems. In general, a graph contains a collection of entities called nodes and another collection of
interactions between a pair of nodes called edges.
• Nodes represent entities, which can be of any object type that is relevant to our problem domain.
By connecting nodes with edges, we will end up with a graph (network) of nodes.
• A graph database stores nodes and relationships instead of tables or documents. Data is stored just
like we might sketch ideas on a whiteboard. Our data is stored without restricting it to a predefined
model, allowing a very flexible way of thinking about and using it.
• Graph databases are used to store graph-based data and are queried with specialized query languages
such as SPARQL.
• Graph databases are capable of sophisticated fraud prevention. With graph databases, we can use
relationships to process financial and purchase transactions in near-real time. With fast graph queries,
we are able to detect that, for example, a potential purchaser is using the same email address and
credit card as included in a known fraud case.
• Graph databases can also help user easily detect relationship patterns such as multiple people
associated with a personal email address or multiple people sharing the same IP address but residing
in different physical addresses.
• Graph databases are a good choice for recommendation applications. With graph databases, we can
store in a graph relationships between information categories such as customer interests, friends and
purchase history. We can use a highly available graph database to make product recommendations to
a user based on which products are purchased by others who follow the same sport and have similar
purchase history.
• Graph theory is probably the main method in social network analysis in the early history of the
social network concept. The approach is applied to social network analysis in order to determine
important features of the network such as the nodes and links (for example influencers and the
followers).
• Influencers on social network have been identified as users that have impact on the activities or
opinion of other users by way of followership or influence on decision made by other users on the
network as shown in Fig.
• Graph theory has proved to be very effective on large-scale datasets such as social network data.
This is because it is capable of by-passing the building of an actual visual representation of the data
to run directly on data matrices.
• Audio, image and video are data types that pose specific challenges to a data scientist. Tasks that
are trivial for humans, such as recognizing objects in pictures, turn out to be challenging for computers.
•The terms audio and video commonly refers to the time-based media storage format for sound/music
and moving pictures information. Audio and video digital recording, also referred as audio and video
codecs, can be uncompressed, lossless compressed or lossy compressed depending on the desired
quality and use cases.
• It is important to remark that multimedia data is one of the most important sources of information
and knowledge; the integration, transformation and indexing of multimedia data bring significant
challenges in data management and analysis. Many challenges have to be addressed including big
data, multidisciplinary nature of Data Science and heterogeneity.
• Data Science is playing an important role to address these challenges in multimedia data.
Multimedia data usually contains various forms of media, such as text, image, video, geographic
coordinates and even pulse waveforms, which come from multiple sources. Data Science can be a
key instrument covering big data, machine learning and data mining solutions to store, handle and
analyze such heterogeneous data.
Streaming Data
Streaming data is data that is generated continuously by thousands of data sources, which typically
send in the data records simultaneously and in small sizes (order of Kilobytes).
• Streaming data includes a wide variety of data such as log files generated by customers using your
mobile or web applications, ecommerce purchases, in-game player activity, information from social
networks, financial trading floors or geospatial services and telemetry from connected devices or
instrumentation in data centers.
Big Data is the extraction, analysis and management of processing a large volume of data. It
revolves around the datatype – Big Data which is a collection of a colossal amount of data. 5
Vs that define big data are velocity, volume, value, variety and veracity.
Such amount of data, which could not be processed earlier due to limitations in the
computational techniques can now be performed with highly advanced tools and
methodologies.
Some of the tools for Big Data are – Apache Hadoop, Spark, Flink etc. Big Data contains a
pool of data that can be both structured and unstructured. By structured data, we mean the
data that mobile devices, services, and websites generate.
The unstructured data is more organized data that is the users generate themselves. For
example, emails, chats, telephone conversations, reviews, etc.
The contemporary Big Data came into existence after Google published its technical paper
on MapReduce. This brought about a revolution in the data community. MapReduce was
developed into an open-source framework called Hadoop.
The big data ecosystem refers to the interconnected network of organizations, technology
platforms and applications that support big data. The ecosystem includes companies that
develop and deploy big data solutions, as well as those who use big data to make
business decisions.
The big data ecosystem is growing at a rapid pace, and it will require significant investment
in order to keep up. As the industry continues to mature, businesses will need to find ways
to work with larger data sets and create efficiencies through collaboration. To do this, they
will need to understand the basics of the big data ecosystem and its components.
The big data ecosystem has five key components:
1. Data sources: Every business needs access to reliable and large data sets in order to make
informed decisions. In order to find these sources, businesses need to identify where their
data comes from and how it can be accessed. This can be done through a variety of
methods, such as market research or surveys.
2. Platforms: Businesses use a number of different platforms to store, process and analyze
their data. These platforms can come from traditional technology companies such as
Microsoft or Amazon, or new entrants such as google Cloud platform or Apples iCloud.
3. Applications: Businesses use a wide range of applications in order to process their data.
These applications can be used for everything from analyzing customer
behavior to manufacturing products.
4. Data management: All businesses require effective ways to manage their data sets so that
they are organized, effective and accessible. This can be done through a number of methods,
including manual process or automatic processes such asimilating cubes from various source
datasets into a single report or exporting all your tables into an Excel file for analysis.)
5. Collaboration: All businesses need effective ways to collaborate with other organizations
in order to share information and make better decisions. This can be done through a variety
of methods, including online surveys or collaborations with outside experts (such as
developers who can help improve the efficiency of your existing solutions).
2. Model: Using statistical models and machine learning algorithms, they create
predictive models that can forecast future trends or behaviors.
3. Interpret: They translate data findings into actionable business strategies and
decisions.
Differences Between Big Data and Data Science