0% found this document useful (0 votes)
71 views17 pages

Data Mining & Data Warehouse

This document discusses data aggregation, which involves gathering data from various sources and presenting it in a summarized format. It defines data aggregation, describes when it is useful, and provides examples of basic and advanced types of data aggregation, including summing values, averaging values over time periods, and calculating rolling windows of aggregated values.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views17 pages

Data Mining & Data Warehouse

This document discusses data aggregation, which involves gathering data from various sources and presenting it in a summarized format. It defines data aggregation, describes when it is useful, and provides examples of basic and advanced types of data aggregation, including summing values, averaging values over time periods, and calculating rolling windows of aggregated values.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

1

0
04/25/202
Institute of Southern Punjab ,Multan
Data Aggregation

 Presented to :
 Mam Mubashra
 Data Mining & Data warehouse
Presented By :..... 2

0
04/25/202
Institute of Southern Punjab ,Multan
 Sajid Abbas MCS-023R18-01
 Ahmad Nasir MCS-023R18-12
 Muhammad Haneef MCS-023R18-20
 Muhammad Waqas MCS-023R18-39
Data Mining & Data
warehouse

DATA AGGREGATION
What does Data Aggregation
mean?
 Data aggregation is a type of data and
information mining process where data
is searched, gathered and presented in a
report-based, summarized format to
achieve specific business objectives or
processes and/or conduct human
analysis.
 Data aggregation may be performed manually or through specialized
software.

 Data aggregation is a component of business intelligence (BI) solutions. Data


aggregation personnel or software search databases find relevant search
query data and present data findings in a summarized format that is
meaningful and useful for the end user or application.
 Data aggregation generally works on big data or data marts that
do not provide much information value as a whole.
 Data aggregation's key applications are the gathering, utilization
and presentation of data that is available and present on the
global Internet.
 Aggregate data is, as the name says, data available only in
aggregate form. Typical examples are: 
 Turnout for each canton in federal elections: Count (aggregated from
individual voters) compared to the overall number of citizens having
the right to vote. Note as the basis is the individual, you can
aggregate at different levels: voting districts (this is usually the
lowest published aggregation level), communes, districts ....
 Unemployment rate: Counts of unemployed and employed persons
are compared at some level (communes, cantons, countries,...).
Note that the definition of what "unemployed" means varies: it
usually does not include persons who have never worked and often
also excludes persons who no longer qualify for a unemployment
allowance, because the did not have a job for some period...
 Infant mortality rate in countries: Based on counts of children
surviving and dying during a specific interval around birth. Note
that this calls for a very precise definition; there are national and
international definitions that sometimes vary over time. In some
countries only children dying in hospitals are recorded; these
figures are often estimated and sometimes manipulated (for
instance politically decided figures to get some foreign aid). For a
particular country the may not have been available for a particular
year and the most recently available year has been used (you will
of course find this kind of the data in the documentation).
 These examples illustrate the importance of getting detailed
information on the definition of the measurement and how the
data has been collected, and what the problems were.
When Is It Time To Do Data
Aggregation?
 Data aggregation tools allow you to look beyond the two
dimensionality of a row and column tool like Excel. For example,
you can apply calculations across categories, and then use the
resulting high-level summary information to present overall
statistics. You might want to use data aggregation tools to bring
together data from your sales regions, product categories and
customer trouble tickets, all by time.
Basic aggregation

 In most cases, aggregation means summing up the individual


values. In general, aggregation is defined by an aggregation
function and its arguments, the set of values to which this
function is applied. The most common aggregation function is
SUM. Other functions might also make sense, for example AVG or
MAX.
 The argument can be the value of a column or a measure from
the input model. If the values to be aggregated are not
immediately available in the input model, you can also compute
the argument values of an aggregation by using an expression
over columns and measures.
The following list shows simple
aggregations:
 Total sales (SALES)
 Summing SALES_AMOUNT to the focus levels Store and Day
 Number of sales transactions (SALES_TRX)
 Counting the number of sales for the focus levels
 Total profit (SALES_PROFIT)

 Summing the differences between SALES_AMOUNT and cost to


the focus levels. The summation is done over the expression
(SALES_AMOUNT - cost).
Aggregation to a higher level

 Sometimes, information needs to be aggregated to a level higher


than the focus level. For example, to compare daily results to
weekly sales, it is necessary to first sum the sales amounts for
weeks instead of days:
Total week sales (SALES_WK)

 Summing SALES_AMOUNT to the week level. The week level is


higher in a hierarchy than the focus level day.
Average sales in month
(SALES_AVG_MTH)
 Averaging individual daily store sales in month. This amounts to
summing SALES_AMOUNT to the day level and the store level,
and then taking the average across the month.
Aggregation for a range of values

 When analyzing sales data, an important input into forecasts is


the sales behavior in comparable earlier periods or in adjacent
periods of time. The extent of such periods directly depends on
the value in the time portion of the focus, because the periods
are defined relatively to some point in time. Therefore, values
cannot simply be aggregated to some hierarchy level, but must
be computed individually for each row of data.
Example:

An example for an aggregation on a rolling window looks like this:


 Past 7 day sales (SALES_7_DAYS)
 Computed by summing the values of the daily sales amounts for
the seven shopping days that immediately precede the current
day.

You might also like