0% found this document useful (0 votes)
17 views

Advanced Users For KNIME Analytics Platform

Uploaded by

abhiram pabba21a
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Advanced Users For KNIME Analytics Platform

Uploaded by

abhiram pabba21a
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 113

Advanced Users Course

for KNIME Analytics Platform


KNIME AG

Copyright © 2018 KNIME AG


Flow Variables

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
1 2 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Goal of this Session

• What is a Flow Variable?

• Create a Flow Variable

• Use a Flow Variable as a parameter in the node settings

• Use a quickform node to parameterise a Wrapped Metanode

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
2 3 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Flow variables: Usage example

• Each month you need to produce


a sales report for the most popular
product Filter only rows
where Products = Gold
Investment

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
3 4 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Flow variables: Usage example

• Each month I need to launch the Analytics Platform, do a count to identify


the most popular product, and update the Row Filter accordingly.

• Or do I? Perhaps flow variables can help…

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
4 5 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Automatically filter by most popular product

Count products, and Create a variable


put most important at containing the most
the top of the list popular product

Pass the flow


variable to the
row filter

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
5 6 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Flow Variable Ports

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
6 7 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Apply a Flow Variable (button)

The Flow Variable button

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
7 8 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Apply a Flow Variable (advanced)

The Flow Variable


tab

List of available Flow


Variables

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
8 9 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Create a Flow Variable (button)

Name of the new


Flow Variable

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
9 10 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Create a Flow Variable (advanced)

Converting a setting value into a Flow Variable

Name of the new


variable

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
10 11 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Summary: Flow Variables

• Flow Variables are workflow parameters used to overwrite existing node


settings.
• A Flow Variable is carried along workflow branches (parallel branches
don’t share local flow variables).
• Flow Variables can be of type String, Integer, or Double.
• Flow Variables can be created in the “Flow Variables” tab of any node,
using the “Table Row to Variable Node”, or using QuickForms.

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
11 12 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: Sorter

• Sorts a table!
• Choice of ascending or descending
• Sort by multiple columns

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
12 13 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: Table Row to Variable

• Takes a table as input and converts


first row to flow variables
– Column names -> Variable names
– Column values -> Variable values
• Only the first row is transformed,
additional rows are discarded

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
13 14 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Flow Variables Exercise: Activity I

Start with exercise: Flow Variables


• Create a workflow that filters the products to contain only rows for the
‘Gold Investment’ product.
• Find the most common product type and automatically filter the table
according to that value.

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
14 15 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Quickform Nodes for Variable Creation and Output

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
15 16 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Quickform Node Configuration

Use Quickforms to create Flow Variables

Default value is selected


(here P+B Investment)

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
16 17 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Wrapped Metanodes

• Similar to Metanodes
• Differ in key areas:
– Local variable scope (global scope for
Metanodes)
– Configurable via Quick Form nodes
• Key to advanced functionality in KNIME
products:
– Use for new WebPortal pages

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
17 18 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Metanodes vs. Wrapped Metanodes

Metanodes Wrapped Metanodes*


Quick Forms Legacy Standard
Variable scope Global Local
WebPortal Execution Old New (work with loops/switches)
JavaScript views in Not supported Supported
WebPortal

WebPortal Usage Quickforms used globally Views/Quickforms must be


embedded in a Wrapped Metanode

Recommended uses Legacy workflows New developments


Compatibility KNIME Server 3.x/4.1.x KNIME Server 4.2+

* Valid for KNIME Analytics Platform 3.1 and above


Licensed under a Creative Commons Attribution- ®
Copyright © 2018 KNIME AG
18 19 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Create a Wrapped Metanode

• Select nodes to wrap


• Right-click a node
• Choose ‘Encapsulate into
Wrapped Metanode’

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
19 20 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Simple configuration of Wrapped Metanode

• Right-click or double-click a
Wrapped Metanode to
configure
• Use in WebPortal

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
20 21 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Configure Wrapped Metanode Ports

Add ports to metanodes or


remove them to adapt to
changes after creation of
metanode

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
21 22 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Passing Variables from Wrapped Metanodes

• Flow variables by default only


available locally within wrapped
metanode
• If needed, they can be passed to
context outside metanode

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
22 23 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Metanode Templates (1/2)

• Metanode templates can be saved


in your KNIME workspace for later
re-use
• To do this, simply right-click any
metanode and select “Save as
Template…”
• Linked metanodes are read-only
instances of a metanode template

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 24 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Metanode Templates (2/2)

• To use a metanode template,


drag&drop it to the workflow
editor
• Template can be updated either
manually or when workflow is
opened
• The template can also be unlinked
from its original location, which
makes it editable in the workflow
directly

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 25 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Flow Variables Exercise: Activity II

Start with exercise: Flow Variables


• Create a Wrapped Metanode that allows users to filter records by
choosing a value from the Products column.

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 26 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Date/Time Data

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
1 27 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Date & Time Overview

• Dedicated data type for date and time data


• Supported in Date&Time nodes
• (and others: GroupBy, Pivot, Line Plot)
• Complete re-write in KNIME 3.4

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
2 28 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: String to Date&Time
Select columns to
transform
• Convert date/time data from string
into a native Date&time cell
• Guesses correct format for many
types of date formatting
• Enter format manually if auto-
guessing didn‘t work
• KNIME automatically adds custom Enter date format
formats to auto-guess list manually

• Convert multiple columns of same


date format in one node

Select type of Click to auto-guess format


output column

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
3 29 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Date&Time – Data Types

Date Time

Date & Time +


Date & Time Time zone

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 30 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: Date&Time-based Row Filter

• Filter rows from a specified time


period
• Range can be limited on upper
bound, lower bound or both
• Options for end point:
• Date&Time: Fixed data and time
• Duration: Duration string (e.g. 2y 3M)
• Numerical: Select granularity from
dropdown and enter number

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
5 31 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: String to Duration

• Takes a string and converts it to a


duration cell
• Three different options to format
input strings
• Example: Convert 1 year, 2 months, 3
weeks, and 4 days to duration cell
• ISO-8601: “P1Y2M3W4D“
• Short letter: “1y 2M 3w 4d”
• Long word: “1 year 2 months 3 weeks
4 days”

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
6 32 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Duration-based Filtering

• Date&Time-based Row Filter allows


to extract time periods
• From the start date, select all rows
within the defined period
• Hours, days, weeks, months, etc.

1 Month

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
7 33 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: Date&Time Difference

• Choose desired resolution (days,


hours, minutes, etc.)
• Check the difference between a time
column and…
• Another time column
• Execution time
• User-defined time
• Time from previous row

To calculate difference to
second column, both columns
need to have the same type!

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
8 34 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: Date&Time Shift

• Shifts date or time by either a


Duration or a numerical value
• Use Duration:
– Use duration column
– Or shift by user-defined value
• E.g. 1y, 2M, 5h, etc.
• Use Numerical in combination with
user-defined granularity
– Use numerical column
– Or shift by user-defined value

Select granularity

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 35 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Nodes: Modify Time / Modify Date

• Modify Date&Time columns


• Three options:
• Append time (date) to date (time)
column
• Change time (date) to fixed value
• Remove time (date) from Date&Time
column
• Column selection shows only
columns suitable for currently
selected option

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 36 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: Modify Time Zone

• Similar to Modify Time/Modify Date

• Input: Date&Time
– Set time zone
• Input: Date&Time (Time zone)
– Set time zone
– Shift time zone
– Remove time zone

Select time zone


from dropdown list

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 37 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Date and Time Analysis Exercise, Activity I

Start with exercise: Date and Time Analysis, Activity I


• Read the file: meter_data.csv
• Use String Manipulation to construct a date/timestamp column from the
individual date and time columns
• Convert the date/time stamp column to a KNIME timestamp with the
String to Date&Time node
• Extract entries from January 2007 using the Date&Time-based Row Filter

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
13 38 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: Extract Date&Time Fields

• Extract date fields (year, day, month)


or time fields (hour, minute, second)
from a date&time cell.
• Pick and choose which fields to
include
• Useful when used in combination
with data aggregation nodes
(groupby, pivot etc.)

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
14 39 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: Moving Average

• Effectively a “smoothing” node


• Smoothing defined by a window
type (centered, forward or
backward) and weighted or not
• Useful when plotting aggregated
time-series data to more easily see
trends

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
15 40 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: Moving Aggregation

• Blend of GroupBy + Moving


Average Functionality
• Group by moving window
• Aggregate using standard KNIME
methods

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
16 41 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: Line Plot (JavaScript)

• Line plot with support for Date columns

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
17 42 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Date and Time Analysis Exercise, Activity II

Start with exercise: Date and Time Analysis, Activity II


• Read the file: sampled_meter_data.table
• Use Extract Date&Time Fields to pull out the Year and Day of Year and
Hour values from the timestamp
• Use GroupBy to aggregate the data by year, day and hour. Group by these
values and calculate mean time and intensity
• Calculate a Gaussian centered moving average for the Intensity column
and plot this data in a line plot
• Calculate the Maximum of the intensity column for the preceding day
(1440 minutes)
Optional: Plot the raw intensity, moving average, and moving maximum in a Line Chart
Licensed under a Creative Commons Attribution- ®
Copyright © 2018 KNIME AG
18 43 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Workflow Control
Loops, switches, try-catch

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
1 44 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Workflow Control Structures

• Loops
– Iterate over a workflow snippet with variable
inputs.
• Switches
– Direct the path of a workflow by selectively
executing one or more workflow branches.
• Try-Catch
– Handle workflow branches that may fail in
execution and you don‘t know before
execution

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
2 45 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
The Loop Block

• A loop block is defined by appropriate loop start and loop end nodes.
• Loop body = Nodes in between and side branches.

Loop body

Loop
end
Loop start node
node

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
3 46 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: Group Loop Start

• Similar to GroupBy except without


aggregation tab.
• Each iteration of the loop passes
the next group of rows.
• You implement the aggregation
task. It can be anything from a
complex calculation to updating a
database.

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
4 47 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: Create File Name

• Inputs: Directory, base file name,


file Extension
• Output: Flow variable -> use as
input for e.g. writer node

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
5 48 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Example: Writing aggregated files

• Group Loop Start → Variable Loop End


• Group data by specific column values
• Iterate over all groups of data
• Create an appropriate file name
• Write grouped data to tables with new file name

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
6 49 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Workflow Control Exercise, Activity I

Start with exercise: Workflow Control, Activity I


• Read the file: CurrentDetailData.table
• Group over all of the values in the Products column
• For each group of data, write a new KNIME table to disk. Give it an
appropriate filename.

(Hint: Group Loop Start creates a flow variable naming the current group)

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
7 50 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: List Files

• List all files in a directory


• Restrict to:
• Top level directory (i.e. not
recursive),
• Specific file extensions
• Matching name patterns (regex
or wildcard)
• Provides file references as a table
of URLs and absolute paths

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
8 51 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: TableRow to Variable Loop Start

• Similar to TableRow to Variable


node.
• Each iteration of the loop
converts the next row of the input
table into flow variables.
• Inject variables into other nodes
(often file readers) to re-execute
subflows with a progression of
settings

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
9 52 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Example: Reading Many Files

• List all files in a directory


• Convert each file to a flow variable (1 per iteration)
• In each iteration, read the file and collect the results

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
10 53 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Workflow Control Exercise, Activity II

Start with exercise: Workflow Control, Activity II


• Use List files to find the file names of the files created in exercise 2
• Iterate over that list of files, and read them into a single KNIME Table

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
11 54 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Switches

• A switch allows you to selectively activate branches of a workflow


• Inactive branches are marked with a red x on their output ports. Inactive
nodes propagate down stream.

Active

Inactive

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
12 55 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: Single Selection

• Quickform: Select single value


from list of strings
• Returns selection as string type
flow variable
• Choose between different layout
options (dropdown, radio
buttons...)

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
13 56 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: Rule Engine/Rule Engine Variable

• Define custom logic to using


simple rules.
• Rules like: <Antecedent> =>
<Consequence>
• (1=1 => “true”)
• May be used in flow variables or
tables
• Easiest way to encode logic for
switches

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
14 57 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: If Switch

• Control which branches of your


workflow are active
programatically.
• Controlled with a flow variable,
setting the value to the literal
strings: “top”, “bottom”, “both”
• May be used in flow variables or
tables (different nodes)

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
15 58 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Nodes: Case Switch Data

• Similar to If-Switch: Takes data


from single input port and passes
it to the active output port
• Nodes connected to inactive
branches are not executed
• Configure via node dialog, or pass
port index as flow variable
– 0, 1, 2 for top, middle, and bottom
port
• Case switches also available for
flow variable and model ports

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 59 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
The difference between loops and switches

Loops
– The Loop Start is connected to the Loop End node, they form a pair
Switches
– A Switch Start can be used without a corresponding Switch End. They
can also be combined.

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 60 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Workflow Control Exercise, Activity III

• Extend the workflow below with a switch to select the type of


visualization.
• Use a Single Input Quickform to let a user choose the values "scatter" or
"bar"
• Use a Rule Engine Variable node to convert the selection into the port
index
• Use a CASE Switch Data (Start) to create either a scatter plot or a bar plot
depending on the input.
• Combine the two paths with a CASE Switch Data (End)

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
18 61 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Try-catch

• A way to catch errors in workflows.


• Useful when it is hard to know if a node will execute (for example, when
connecting to a web service).
• KNIME tries to execute the nodes, but if it fails will fall back to an
alternate branch.

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
19 62 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Streaming

• Standard execution: Node by node. Node processes all data, finishes, then
passes data to next node, etc.
• Streaming: Nodes executed concurrently, each nodes passes data to the
next as soon as it is available, i.e. before node is fully executed
– Faster execution, esp. for reading/preprocessing data
• Create wrapped metanode -> Configure -> Job Manager Selection ->
Simple Streaming
– Not available for all nodes (show in node repository)
– Can only execute entire metanode, not individual nodes
– Intermediate results not available since nothing is cached

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
20 63 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Streaming

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
21 64 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Advanced Data Mining
Random Forest, Tree Ensembles, Parameter Optimization, Cross Validation

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
1 65 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Overview

• Advanced analytics nodes


– Random Forest / Tree Ensembles
– Gradient Boosted Trees
• Parameter optimization
• Cross validation
• H2O integration in KNIME

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 66 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
KNIME’s Tree Ensemble models

• The general idea is to take X

advantage of the “wisdom of the 1 4 1


crowd”
5 2 2 7 7 6

2 9 6 7 6 8 9 3 3 9 5 7

• Combining predictions from a P1 P2


… Pn

large number of weak y


predictors leads to a more
Typically: for classification the
accurate predictor. individual models vote and the
majority wins; for regression, the
• This is called ”bagging”. individual predictions are averaged

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 67 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
How does bagging work?

• Pick a different random subset of the training data for each model in the
ensemble (bag).

Build tree Build tree Build tree

1 4 1

5 2 5 7
… 7 6

2 9 6 7 2 8 9 3 3 9 5 7

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
4 68 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
An extra benefit of bagging: out of bag estimation

• Allows testing the model using the training data: when validating, each
model should only vote on data points that were not used to train it

X1 X2

1 4 1 1 4 1

5 2 2 7 … 7 6 5 2 2 7 … 7 6

2 9 6 7 6 8 9 3 3 9 5 7 2 9 6 7 6 8 9 3 3 9 5 7

P1 P2
… Pn P1 P2
… Pn

y1OOB y2OOB

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
5 69 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Random Forests

• Bags of decision trees, but an extra element


of randomization is applied when building the
trees: each node in the decision tree only
“sees” a subset of the input columns,
typically 𝑁.
• Random forests tend to be very robust w.r.t.
Build tree
overfitting (though the individual trees are
almost certainly overfit)
• Extra benefit: training tends to be much
1

5 2

faster 2 9 6 7

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
6 70 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Nodes: Random Forest

• Ensemble learning method for


classification and regression tasks
• It consists of a chosen number of
decision trees
• Each of the decision tree models
is learned on a different set of
rows (records) and a different set
of columns (describing attributes)

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
7 71 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Nodes: Random Forest

• The output model describes a


random forest and is applied
in the corresponding predictor
node using a simple majority
vote.
• Tree Ensemble Learner node
provides more functionalities

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
8 72 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Tree Ensembles

• Random Forest variant


• More options to set
• Trees may be trained using
subsets of rows and/or columns
and this approach may lead to
greater accuracy.

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
9 73 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Tree Ensembles

• Optimization of a tree ensemble


is complex due to a surplus of
configuration options
• Number of models
• Number of columns
• Number of rows
• Tree depth
• ...

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
10 74 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Nodes: Tree Ensemble Learner/Predictor

• Choose which columns to include


• Configure a prototype tree (depth, split criteria etc.)
• Setup ensemble parameters (model count, row/column
subsampling)

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
11 75 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Gradient Boosting

• Another algorithm for creating ensembles of decision trees


• Starts with a tree built on a subset of the data
• Builds additional trees to fit the residual errors
• Typically uses fairly shallow trees
• Can introduce randomness in choice of data subsets (“stochastic gradient
boosting”) and in variable choice.

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
12 76 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Advanced Data Mining Exercise, Activity I

Start with exercise: Advanced Data Mining, Activity I


• Read the data file CurrentDetailData.table
• Partition the data 50/50 using stratified sampling on the Products column
• Create a Tree Ensemble model to predict the “Products” column
• Use a tree depth of 5, 50 models, and 75% of rows and columns for each
iteration.

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
13 77 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Parameter Optimization

• Some modeling approaches are very sensitive to their configuration.


• Calculating optimum settings is not always possible.
• Parameter Optimization loops may help find a good configuration

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
14 78 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: Parameter Optimization Loop Start

• Define some parameters to


optimize
• Set upper/lower bounds and step
sizes (and flag integers)
• Choose an optimization method
• Brute force for maximum
accuracy but slower computation
• Hill climbing for better faster
runtimes but may get stuck in
local optimum settings

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
15 79 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
New Node: Parameter Optimization Loop End

• Collects some value to optimize


in a flow variable.
• Value may be maximized
(accuracy) or minimized (error)

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
16 80 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Advanced Data Mining Exercise, Activity II

Start with exercise: Advanced Data Mining, Activity II


• Add a parameter optimization loop to your Tree Ensemble Model
• Use Hill climbing to determine the optimum number of models (min=10,
max=200, step=10, int = yes)
• Maximize the accuracy in the Loop End node.
• What were the optimal settings?

(Hint: don’t forget to use the flow variable in your learner)

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
17 81 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Cross Validation

• Used to evaluate model


stability.
• Re-execute the modeling
process many times using
different data partitions.
• Collect aggregated statistics
on model accuracy

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
18 82 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Example: Cross Validation

• X-Partitioner → X-Aggregator
• X-Partitioner replaces Partition
• X-Aggregator replaces Scorer
• Can be used with any learner/predictor

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
19 83 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Advanced Data Mining Exercise, Activity III

Start with exercise: Advanced Data Mining, Activity III


• Create a 10-fold cross validation for your Tree Ensemble Learner.
• Calculate the mean error for the cross validation.
• Does the model seem stable?

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
20 84 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
H2O Integration

• KNIME integrates the H2O


machine learning library
• H2O: Open source, focus on
scalability and performance
• Supports many different models
• Generalized Linear Model
• Gradient Boosting Machine
• Random Forest
• k-Means, PCA, Naive Bayes, etc. and
more to come!
• Includes support for MOJO model
objects for deployment
Licensed under a Creative Commons Attribution- ®
Copyright © 2018 KNIME AG
21 85 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
H2O Integration - Example

Data import Model training and


prediction

Add data from


KNIME to H2O

Scoring
Starting point:
create local H2O
context

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
22 86 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Databases

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
1 87 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Database Extension

• Visually assemble complex SQL statements (no SQL coding needed)


• Connect to all JDBC-compliant databases
• Harness the power of your database within KNIME

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 88 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Database Port Types

Database Connection Port (brown)


• Connection information
• SQL statement

Database Connection Ports can be


connected to
Database JDBC Connection Port (red) Database JDBC Connection Ports
• Connection information but not vice versa

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 89 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Database Table Selector

• Takes connection information and constructs a query


• Explore DB metadata
• Outputs a SQL query

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 90 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Database Connection Table Reader

• Executes incoming SQL Query on Database


• Reads results into a KNIME data table

Database Connection Port KNIME Data Table

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 91 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Database Connectors

• Dedicated nodes to connect to


specific Databases
– Necessary JDBC driver included
– Easy to use
– Import DB specific
behavior/capability
• Hive and Impala connector part of
the KNIME Big Data Connectors
extension
• General Database Connector
– Can connect to any JDBC source
– Register new JDBC driver via
File -> Preferences -> KNIME ->
Databases
Licensed under a Creative Commons Attribution- ®
Copyright © 2018 KNIME AG 92 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Dedicated Database Connectors

• MySQL, MS SQL Server, Postgres, SQLite, Amazon Redshift, etc.


• Propagate connection information to other
DB nodes

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 93 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
«General» Database Connector node

Database type
defines SQL dialect

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 94 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Register JDBC Driver

Register single jar file


JDBC drivers

Register new JDBC driver


with companion files

Increase connection timeout for


Open KNIME and go to
File -> Preferences long running database operations

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 95 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
In-Database Processing

• Database Manipulation node generates a SQL query on top of the input


SQL query (brown square port)

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 96 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Query Nodes

• Filter rows and columns


• Join tables/queries
• Extract samples
• Bin numeric columns
• Sort your data
• Write your own query
• Aggregate your data

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 97 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Data Aggregation

Rowid Group Value Rowid Group Value


r1 M 2 r1+r3+r6 M 8
r2 F 3 r2+r4+r5 F 15
r3 M 1
r4 F 5
r5 F 7
r6 M 5

aggregated on “Group” by method: sum(“Value”)

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 98 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Database GroupBy

• Aggregate to summarize data

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 99 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Database GroupBy

Returns number of rows per group

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 100 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Database GroupBy – DB Specific Aggregation Methods

SQLite: 7 aggregation functions PostgreSQL: 25 aggregation functions

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 101 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Database Joiner

• Combines columns from 2 different


tables
• Top port contains “Left” data table
• Bottom port contains the “Right”
data table

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 102 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Database Row Filter

• Filters rows that do not match the filter criteria


• Use the IS NULL or IS NOT NULL operator to filter missing
values

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 103 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Database Sorter

• Sorts the input data by one or multiple columns

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 104 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Database Query

• Executes arbitrary SQL queries


• #table# is replaced with input
query

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 105 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Database Connection Port View

Copy SQL statement

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 106 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Export Data

• Writing data back into database


• Exporting data into KNIME
• SQL operations are executed on the database!

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 107 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Database Writing Nodes

• Create table as select


• Insert/append data
• Update values in table
• Delete rows from table

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 108 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Database Writer

• Writes data from a KNIME data


table directly into a database table

Increase batch size for


better performance
Append to or
drop existing
table

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 109 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Database Update

• Updates all database records that


match the update criteria

Columns to update

Columns that identify


the records to update

Increase batch size for


better performance

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 110 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Database Delete

• Deletes all database records that match the values


of the selected columns

Increase batch size for


better performance

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG 111 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
Database Exercise

• Connect to the H2 database, using the H2 Connector node.


• Write the Fully Joined Data into the Database as a new table called
"adult"
• Use the Database Table Selector to select the adult table.
• Group the data by products and count the number of occurrences.
• Filter the adult data table to only contain products which occur more
often than 1000 times.
• Read the filtered table into a KNIME data table. (Triangle ports)

Licensed under a Creative Commons Attribution- ®


Copyright © 2018 KNIME AG
27 112 Noncommercial-Share Alike license
https://creativecommons.org/licenses/by-nc-sa/4.0/
The End
[email protected]

Copyright © 2018 KNIME AG

You might also like