© 2016 MapR Technologies© 2016 MapR Technologies
MapR 5.2: Getting More Value from the MapR
Converged Community Edition
Sep 14, 2016
© 2016 MapR Technologies
Today’s Presenters
Deborah Littlefield
Technical Curriculum Developer
Ankur Desai
Sr. Manager, Platform and Products
© 2016 MapR Technologies 3
Today’s Agenda
• Recent updates to the MapR Converged Data Platform
• Latest Ecosystem Support in MapR 5.2
• How to upgrade to the latest version of the Community Edition
• Q&A
© 2016 MapR Technologies 4
The MapR Converged Data Platform
© 2016 MapR Technologies 5
4 Major Additions to the MapR Platform in the past
12 months
• Taking cluster monitoring to the next level with the Spyglass
Initiative
• Real-time streaming with MapR Streams
• MapR-DB JSON document database and application
development with OJAI
• Securing your data with access control expressions (ACEs)
© 2016 MapR Technologies 6© 2016 MapR Technologies
Project Spyglass
© 2016 MapR Technologies 7
MapR Vision: Maximizing User/Operator Productivity
Deep
Visibility
Another
sample
Easy
Management
Full
Control
© 2016 MapR Technologies 8
The MapR Spyglass Initiative
• New approach for increasing user and administrator productivity
– Comprehensive, open, extensible
• Simplifies the management of growing big data deployments
• Starts with 5.2 release
– Phase 1 – MapR Monitoring
– Initial focus on operational visibility
• Helps community innovate faster
– Extensive use of open source visualization and dashboarding tools
© 2016 MapR Technologies 9
Spyglass Initiative Phase 1 - MapR Monitoring
Empower administrators with cluster
monitoring capabilities, including
metric and log collection from nodes,
services, and jobs, with dashboards to
display information in a useful way.
Converged
Customizable
Extensible
© 2016 MapR Technologies 10
Collection VisualizationAggregation &
Storage
MapR Monitoring Architecture
Future
Data Sources
Log Shippers
Metrics
Collectors
Alerting
Node
Environmentals
(CPU, Mem, I/O)
Service
Daemons
(YARN, Drill,
Hive, etc.)
MapR Control System
…
© 2014 MapR Technologies 11
Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring
• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput
by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
© 2014 MapR Technologies 12
Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring
• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput
by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
Cluster Space Utilization Monitoring
• Cluster wide storage utilization
• Storage Utilization Trend
• Utilization per volume and per accountable
entity (data, volume, snapshot and total size)
© 2014 MapR Technologies 13
Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring
• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput
by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
Cluster Space Utilization Monitoring
• Cluster wide storage utilization
• Storage Utilization Trend
• Utilization per volume and per accountable
entity (data, volume, snapshot and total size)
YARN/MR Application Monitoring
• Global YARN trend graphs
• Containers - Pending, Active
• vCores & RAM - Allocated & Used
• Per Queue charts - containers, vCores, RAM
© 2014 MapR Technologies 14
Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring
• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput
by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
Cluster Space Utilization Monitoring
• Cluster wide storage utilization
• Storage Utilization Trend
• Utilization per volume and per accountable
entity (data, volume, snapshot and total size)
YARN/MR Application Monitoring
• Global YARN trend graphs
• Containers - Pending, Active
• vCores & RAM - Allocated & Used
• Per Queue charts - containers, vCores, RAM
Service Daemon Monitoring
• Per-service charts with for (CPU Usage by
type, Memory)
• Centralized, searchable logs
• MapR core and ecosystem services
(includes YARN, Drill and Spark)
© 2016 MapR Technologies 15
Customizable
Dashboards
for Visualizing Metrics
Log
Analytics
© 2016 MapR Technologies 16
Destination to Learn and Collaborate
Blog about topics and ideas
Share code snippets and dashboards
View demos, tutorials, and videos
Engage in use case discussion/development
© 2016 MapR Technologies 17
Dashboards are defined with JSON
and easy to export and import in
Grafana and Kibana
Extend/Integrate using REST API
The Exchange
© 2016 MapR Technologies 18
Dashboards
can be viewed
on mobile
devices.
© 2016 MapR Technologies 19
Summary
● Data collection and storage infrastructure (packaged
and supported)
○ Collection/storage of metrics & logs across node, storage,
services
● Visualization dashboard (Driven via community)
○ Sample dashboards for Grafana & Kibana
5.2 - Spyglass 1.0 GA
CUSTOMIZABLE, shareable and mobile-ready dashboards
CONVERGED monitoring with deep search
EXTENSIBLE and easy to integrate with REST API
© 2016 MapR Technologies 20© 2016 MapR Technologies
MapR Streams
© 2016 MapR Technologies 21
MapR Streams: Enabling Continuous Data Processing
To enable continuous,
globally scalable streaming of
event data, allowing developers to
create real-time applications
that their business can depend on.
Converged
Continuous
Global
© 2016 MapR Technologies 22
MapR Streams:
Publish-subscribe Event Streaming System for Big Data
Producers publish billions of
messages/sec to a topic in a stream.
Guaranteed, immediate delivery
to all consumers.
Standard real-time API (Kafka).
Integrates with Spark Streaming,
Storm, Apex, and Flink
Direct data access (OJAI API) from
analytics frameworks.
To
pi
c
Stream
Producers
Remote sites and consumers
Batch analytics
Topic
Replication
Consumers
Consumers
Available in the Enterprise Edition Only
© 2016 MapR Technologies 23
MapR Streams: Building Faster and Simpler Apps
Simpler and
Faster
Architecture
• Converged platform with file storage and database
reduces data movement, data latency, hardware
cost, and administration cost
• Event streaming and stream processing in the same
cluster enables faster processing
• Unified security framework with files and database
tables reduces administration cost around setting
up and enforcing security policies
• Multi-tenant - topic isolation, quotas, data
placement control allows multiple isolated streaming
applications to run on the same cluster reducing
hardware cost and data movement
© 2016 MapR Technologies 24
Scalable.
• Ingest more events to enable faster insights
• Hold on to events longer to enable deeper insights
• Develop app once and apply to short & long-term
data (i.e. run analysis on 15-days data AND 1-year
data using same application)
MapR Streams: Building Faster and Simpler Apps
© 2016 MapR Technologies 25© 2016 MapR Technologies
MapR-DB JSON document database
and application development with OJAI
© 2016 MapR Technologies 26
Open Source OJAI API for JSON-Based Applications
Open JSON Application Interface (OJAI)
Databases Streams
MapR-Client
File Systems
{JSON}
MapR-Client
© 2016 MapR Technologies 27
Familiar JSON Paradigm – Similar API Constructs
MapR-DB
Document record = Json.newDocument()
.set("firstName", "John")
.set("lastName", "Doe")
.set("age", 50);
table.insert("jdoe", record);
MongoDB
BasicDBObject doc = new BasicDBObject
("firstName", "John")
.append("lastName", "Doe")
.append("age", 50);
coll.insert(doc);
© 2016 MapR Technologies 28
JSON: Easy Variation with Documents
{
"_id" : "rp-prod132546",
"name" : "Marvel T2 Athena”,
"brand" : "Pinarello",
"category" : "bike",
"type" : "Road Bike”,
"price" : 2949.99,
"size" : "55cm",
"wheel_size" : "700c",
"frameset" : {
"frame" : "Carbon Toryaca",
"fork" : "Onda 2V C"
},
"groupset" : {
"chainset" : "Camp. Athena 50/34",
"brake" : "Camp."
},
"wheelset" : {
"wheels" : "Camp. Zonda",
"tyres" : "Vittoria Pro"
}
}
{
"_id" : "rp-prod106702",
"name" : " Ultegra SPD-SL 6800”,
"brand" : "Shimano",
"category" : "pedals",
"type" : "Components,
"price" : 112.99,
"features" : [
"Low profile design increases ...",
"Supplied with floating SH11 cleats",
"Weight: 260g (pair)"
]
}
{
"_id" : "rp-prod113104",
"name" : "Bianchi Pride Jersey SS15”,
"brand" : "Nalini",
"category" : "Jersey",
"type" : "Clothing,
"price" : 76.99,
"features" : [
"100% Polyester",
"3/4 hidden zip",
"3 rear pocket"
],
"color" : "black"
}
jerseypedalbike
© 2016 MapR Technologies 29
Product Catalog - RDBMS
To get a single product“Entity Value Attribute” pattern
SELECT * FROM (
SELECT
ce.sku,
ea.attribute_id,
ea.attribute_code,
CASE ea.backend_type
WHEN 'varchar' THEN ce_varchar.value
WHEN 'int' THEN ce_int.value
WHEN 'text' THEN ce_text.value
WHEN 'decimal' THEN ce_decimal.value
WHEN 'datetime' THEN ce_datetime.value
ELSE ea.backend_type
END AS value,
ea.is_required AS required
FROM catalog_product_entity AS ce
LEFT JOIN eav_attribute AS ea
ON ce.entity_type_id = ea.entity_type_id
LEFT JOIN catalog_product_entity_varchar AS ce_varchar
ON ce.entity_id = ce_varchar.entity_id
AND ea.attribute_id = ce_varchar.attribute_id
AND ea.backend_type = 'varchar'
LEFT JOIN catalog_product_entity_text AS ce_text
ON ce.entity_id = ce_text.entity_id
AND ea.attribute_id = ce_text.attribute_id
AND ea.backend_type = 'text'
LEFT JOIN catalog_product_entity_decimal AS ce_decimal
ON ce.entity_id = ce_decimal.entity_id
AND ea.attribute_id = ce_decimal.attribute_id
AND ea.backend_type = 'decimal'
LEFT JOIN catalog_product_entity_datetime AS ce_datetime
ON ce.entity_id = ce_datetime.entity_id
AND ea.attribute_id = ce_datetime.attribute_id
AND ea.backend_type = 'datetime'
WHERE ce.sku = ‘rp-prod132546’
) AS tab
WHERE tab.value != ’’;
© 2016 MapR Technologies 30
Store the product “as a business object” To get a single product
{
"_id" : "rp-prod132546",
"name" : "Marvel T2 Athena”,
"brand" : "Pinarello",
"category" : "bike",
"type" : "Road Bike”,
"price" : 2949.99,
"size" : "55cm",
"wheel_size" : "700c",
"frameset" : {
"frame" : "Carbon Toryaca",
"fork" : "Onda 2V C"
},
"groupset" : {
"chainset" : "Camp. Athena 50/34",
"brake" : "Camp."
},
"wheelset" : {
"wheels" : "Camp. Zonda",
"tyres" : "Vittoria Pro"
}
}
products
.findById(“rp-prod132546”)
Product Catalog - NoSQL/Document
© 2016 MapR Technologies 31
Native JSON Support in MapR-DB
{
order_num: 5555,
products: [
{ product_id: 348752,
quantity: 1,
unit_price: 149.99,
total_price: 149.99
},
{ product_id: 439322,
quantity: 1,
unit_price: 99.99,
total_price: 99.99
},
{ product_id: 953923,
quantity: 1,
unit_price: 49.99,
total_price: 49.99
},
]
}
Reads/writes at element level
• Granular disk reads/writes
• Less network traffic
• Higher concurrency
Any new elements added on demand
• No predefined schemas
• Easy to store evolving data
Not all NoSQL databases treat JSON as a native data type.
© 2016 MapR Technologies 32
Leverage the Column Family Construct (Optional)
/
{a:
{a1:
{b1: "v1",
b2: [
{c1: "v1",
c2: "v2"}
]
},
a2:
{
e1: "v1",
e2: <inline jpg>
}
}
}
Column Family 1
Column Family 2
Control layout for faster data access
Different TTL requirements
Separate Table Replication settings
Specific data placement policies
Efficient ACEs
© 2016 MapR Technologies 33
Fine Grained Security for JSON Documents
{
“fname”: “John”,
“lname”: “Doe”,
“address”: “111 Main St.”,
“city”: “San Jose”,
“state”: “CA”,
“zip”: “95134”,
“credit_cards”: [
{“issuer”: “Visa”,
“number”: “4444555566667777”},
{“issuer”: “MasterCard”,
“number”: “5555666677778888”}
]
}
Entire document
Element: “fname”
Array: “credit_cards”
Sub-element in array element:
“credit_cards[*].number”
Specify different permissions levels within the document.
© 2016 MapR Technologies 34
Comprehensive Data Type Support for MapR-DB
• NULL
• Boolean
• String
• Map
• Array
• Float, Double
• Binary
• Byte, Short, Int, Long
• Date
• Decimal
• Interval
• Time
• Timestamp
Examples:
{
“sample_int”: {"$numberLong”: 2147483647},
“sample_date”: {“$dateDay”: “2016-02-22”},
“sample_decimal”:{“$decimal”: “1234567890.23456789”},
“sample_time”: {“$time”: “10:26:12.487”},
“sample_timestamp”: {“$date”: “2016-02-22T10:26:12.487+Z”}
}
© 2016 MapR Technologies 35© 2016 MapR Technologies
Data Security with Access Control
Expressions
© 2016 MapR Technologies 36
File ACEs – Key Features
Intuitive
Inheritance
Subdirectories
and files inherit
perms from parent
directory
Whole-Volume
ACEs
Volume-level filter –
useful in multitenant
environments.
Roles
Arbitrary grouping
of users according
to your business
needs
High Performance
No performance hit
Boolean Operators
Allowing for
ultra fine-grain
permissions
AUTHORIZATION
© 2016 MapR Technologies 37
File ACEs: Whole Volume ACE Example
Whole-Volume ACE
r: group:finance
Jane grants read access to Bob.
File: /finance/final_report.csv
r: user:bob
Bob cannot read the file
/finance/final_report.csv because
the whole-volume ACE is set to
allow read-access to finance only.
Jane
(Finance)
Bob
(Developer)
Whole-Volume ACE
AUTHORIZATION
© 2016 MapR Technologies 38
POSIX ACLs vs ACEs
r : user:sally |
(group:dev_team & group:managers)
Access Control Lists
MapR Access Control Expressions
AUTHORIZATION
Which one is easier to set and understand?
Which one allows for higher granularity?
© 2016 MapR Technologies 39
MapR Has ACEs for Files and MapR-DB Records
Example: user:mary | (group:admins & group:VP) & user:!bob
Permissions on files, tables, column families, columns, JSON documents and sub-documents
Use Access Control Expressions (ACEs) to set granular permissions.
AUTHORIZATION
© 2016 MapR Technologies 40© 2016 MapR Technologies
Ecosystem Updates
© 2016 MapR Technologies 41
5.2 Ecosystem Support
These are the only component version changes in MEP 1.0 from 5.2 release date
and all of these have been out for 5.1 already.
Eco on 5.1 today MEP 1.0 on 5.2
Component Released with 5.1
Subsequently released for
5.1
Drill 1.4 1.6 1.6
Spark 1.5.2 1.6.1 1.6.1 (2.0 in dev
preview)
Impala 2.2.0 2.5 2.5
Storm 0.10.0 0.10.1 0.10.1
Mahout 0.11.2 0.12.2 0.12.2
© 2016 MapR Technologies 42
Converging SQL and JSON with Apache Drill 1.6
• Flexible and operational analytics on NoSQL
– MapR-DB plugin allows analysts to perform SQL queries directly on JSON data in MapR-DB tables
– Pushdown capabilities provide optimal interactive experience
• Enhanced query performance
– Provides better query performance via partition pruning, metadata caching and other optimizations
– Delivers up to 10-60X performance gains in query planning compared to the previous releases of Drill
• Better memory management
– Delivers greater stability and scale which enables customers to run not only larger but also more SQL
workloads on a MapR cluster
• Improved integration with visualization tools like Tableau
– Introduces client impersonation for end-to-end security from the visualization tool to data in Hadoop.
– Enhanced SQL Window functions
© 2016 MapR Technologies 43
What’s New in Spark 2.0?
• Structured Streaming with Spark SQL
– The ability to perform interactive queries against live streaming data.
– Output can now be aggregated in a stream for continuous applications.
– Pre-computation of analytics in a continuous fashion can occur as the data is generated
• Whole Stage Code-gen
– Provided by the second-generation Tungsten engine.
– Eliminates the need for multiple JVM calls by flattening SQL queries into one single
function evaluated as bytecode at runtime.
• Dataframe API’s
– Runs on the same engine as SparkSQL.
– Allows access to data from a variety of different data sources.
– Can run database-like operations or allow for passing in custom code.
© 2016 MapR Technologies 44© 2016 MapR Technologies
Upgrade to the Latest MapR
Converged Community Edition
© 2016 MapR Technologies 45
Select an Upgrade Method
Takes advantage of
high-availability features
Offline
Installer
Time
Complexity
Rolling
Manual
Rolling
Scripted
Offline
Manual
Cluster offline during upgrade
© 2016 MapR Technologies 46
Community Edition and Rolling Upgrades
• Expect interruptions to cluster operations when nodes running the
only copy of a service (for example, CLDB) are upgraded
• Minimize cluster access
• With 10 or fewer nodes,
offline upgrade probably
makes the most sense
Offline
Installer
Rolling
Manual
Rolling
Scripted
Offline
Manual
© 2016 MapR Technologies 47
Supported Upgrade Methods
From Version Offline Installer Offline Manual Rolling Manual Rolling Scripted
3.x
4.0
4.1
5.0
5.1
* Supported for clusters that were installed using the MapR Installer. This is the only
method that also upgrades ecosystem components.
© 2016 MapR Technologies 48
High-Level Overview
2
Prepare
1
Plan! Upgrade
3
© 2016 MapR Technologies 49
Plan: Determine What to Include
MapR Core
Ecosystem components not at supported MEP
MapR clients
New features
?
?
© 2016 MapR Technologies 50
Plan: Develop a Test Plan
• Run tests before and after each upgrade step
– Compare results
• Test basic functionality
– Verify cluster access and volumes
– Use maprcli, hadoop fs, MCS
• Test jobs and queries
– Based on the components you use
© 2016 MapR Technologies 51
Plan: Create an Upgrade Schedule
What needs to
happen after the
upgrade?
What can be done
days ahead?
What needs to
happen the day of
the upgrade?
What can be done
weeks ahead?
© 2016 MapR Technologies 52
Prepare: Weeks Ahead
• Review Release Notes
• Verify node specifications
– Update the JDK if needed
• Upgrade on a test cluster
– Document surprises
– Prepare configuration files
Weeks
Ahead
Critical!Critical!
© 2016 MapR Technologies 53
Prepare: Days Ahead
• Download the installer, packages, etc.
• Run tests and record results
• Back up critical data
Days
Ahead
© 2016 MapR Technologies 54
Prepare: Day of Upgrade
• Verify cluster health and clear alarms
• Empty job queue/terminate jobs
• Stop cross-cluster operations
– Volume mirroring
– Table replication
© 2016 MapR Technologies 55
Upgrade Order
1. MapR core
2. Ecosystem components
• Upgraded manually, unless using MapR Installer
3. MapR clients
4. Enable new features
© 2016 MapR Technologies 56
Upgrade MapR Core
Component Includes
MapReduce binaries
MapR Core
Webserver
maprcli command binaries, MCS, REST API
Other services
New features, performance enhancements (varies by release)
© 2016 MapR Technologies 57
Upgrade MapR Core: Config Files
New default configuration files created:
Active Configuration Files
(do not change during upgrade)
New Configuration Files
(added with upgrade)
/opt/mapr/conf /opt/mapr/conf.new
/opt/mapr/conf/conf.d /opt/mapr/conf.d.new
/opt/mapr/hadoop/hadoop-<ver>/conf opt/mapr/hadoop/hadoop-<ver>/conf.new
© 2016 MapR Technologies 58
Upgrade MapR Core: Config Files
New default configuration files created:
Active Configuration Files
(do not change during upgrade)
New Configuration Files
(added with upgrade)
/opt/mapr/conf /opt/mapr/conf.new
/opt/mapr/conf/conf.d /opt/mapr/conf.d.new
/opt/mapr/hadoop/hadoop-<ver>/conf opt/mapr/hadoop/hadoop-<ver>/conf.new
Important! Merge
required changes into
active configuration files
© 2016 MapR Technologies 59
Upgrade MapR Core: Hadoop Common Version
1. New Hadoop directory created at:
/opt/mapr/hadoop/hadoop-<version>
2. Existing Hadoop directory moved to:
/opt/mapr/hadoop/OLD_HADOOP_VERSIONS
3. Links updated for new version:
/opt/mapr/lib/*.jar
4. Paths in service configuration files updated:
/opt/mapr/conf/conf.d/warden.<service name>.conf
© 2016 MapR Technologies 60
Upgrade MapR Core: Post-Upgrade Tasks
• If upgrading from 5.0 or earlier, copy new license file into place on each
node:
cp /opt/mapr/conf.new/BaseLicense.txt /opt/mapr/conf/
• After a manual (rolling, or offline) upgrade, update Hadoop configuration
file with new version:
/opt/mapr/conf/hadoop_version
• Resume cross-cluster operations
– Volume mirroring
– Table replication
© 2016 MapR Technologies 61
Upgrade Ecosystem Components
• Follow pre- and post-upgrade
steps in documentation
• As of MapR 5.2, must upgrade
to ecosystem components that
belong to the same MapR
Ecosystem Pack (MEP)
http://maprdocs.mapr.com/home/InteropMatrix/r_MEP_52.html
© 2016 MapR Technologies 62
Upgrade MapR Clients
MapR Client
(Windows, Mac, Linux)
Cluster
hadoop fs –ls /
maprcli volume list
© 2016 MapR Technologies 63
Upgrade MapR POSIX Clients
• Loopback POSIX client
• FUSE-based POSIX client
– FUSE-based new in MapR 5.1
• Recommend: upgrade to
FUSE-based POSIX client
MapR POSIX Client
(Linux only)
© 2016 MapR Technologies 64
Upgrading from MapR 3.x
• To run MapReduce v1 jobs, change the default MapReduce
mode or submit them with the appropriate command
• May need to recompile MapReduce jobs
• May need to add YARN services to cluster
http://maprdocs.mapr.com/home/UpgradeGuide/RunningMRjobsYarn.html
© 2016 MapR Technologies 65
Other Upgrade Considerations
• Mirroring between clusters
– Volumes must be mirrored to a cluster at the same, or higher, revision
– Upgrade the destination cluster first!
– Consider disabling mirror operations during the upgrades, to avoid
alarms and maximize available bandwidth
• Table replication between clusters
– Clusters involved in table replication can be at different versions
© 2016 MapR Technologies 66
Q&AEngage with us!
• Spyglass Initiative
o https://www.mapr.com/products/spyglass-initiative
• Try out MapR Streams and MapR-DB in the free MapR Community
Edition
o https://www.mapr.com/products/hadoop-download
• Try out MapR Streams and MapR-DB in the MapR Sandbox (virtual
machine)
o https://www.mapr.com/products/mapr-sandbox-hadoop

MapR 5.2: Getting More Value from the MapR Converged Community Edition

  • 1.
    © 2016 MapRTechnologies© 2016 MapR Technologies MapR 5.2: Getting More Value from the MapR Converged Community Edition Sep 14, 2016
  • 2.
    © 2016 MapRTechnologies Today’s Presenters Deborah Littlefield Technical Curriculum Developer Ankur Desai Sr. Manager, Platform and Products
  • 3.
    © 2016 MapRTechnologies 3 Today’s Agenda • Recent updates to the MapR Converged Data Platform • Latest Ecosystem Support in MapR 5.2 • How to upgrade to the latest version of the Community Edition • Q&A
  • 4.
    © 2016 MapRTechnologies 4 The MapR Converged Data Platform
  • 5.
    © 2016 MapRTechnologies 5 4 Major Additions to the MapR Platform in the past 12 months • Taking cluster monitoring to the next level with the Spyglass Initiative • Real-time streaming with MapR Streams • MapR-DB JSON document database and application development with OJAI • Securing your data with access control expressions (ACEs)
  • 6.
    © 2016 MapRTechnologies 6© 2016 MapR Technologies Project Spyglass
  • 7.
    © 2016 MapRTechnologies 7 MapR Vision: Maximizing User/Operator Productivity Deep Visibility Another sample Easy Management Full Control
  • 8.
    © 2016 MapRTechnologies 8 The MapR Spyglass Initiative • New approach for increasing user and administrator productivity – Comprehensive, open, extensible • Simplifies the management of growing big data deployments • Starts with 5.2 release – Phase 1 – MapR Monitoring – Initial focus on operational visibility • Helps community innovate faster – Extensive use of open source visualization and dashboarding tools
  • 9.
    © 2016 MapRTechnologies 9 Spyglass Initiative Phase 1 - MapR Monitoring Empower administrators with cluster monitoring capabilities, including metric and log collection from nodes, services, and jobs, with dashboards to display information in a useful way. Converged Customizable Extensible
  • 10.
    © 2016 MapRTechnologies 10 Collection VisualizationAggregation & Storage MapR Monitoring Architecture Future Data Sources Log Shippers Metrics Collectors Alerting Node Environmentals (CPU, Mem, I/O) Service Daemons (YARN, Drill, Hive, etc.) MapR Control System …
  • 11.
    © 2014 MapRTechnologies 11 Project Spyglass – Monitoring All You Care About Node/Infrastructure Monitoring • Global Aggregates (Average, Min, Max) Charts (e.g. CPU, Disk utilization) • Per-node charts (e.g. I/O Throughput by disk) • MFS read/writes and throughput • DB puts, gets, scans and cache metrics
  • 12.
    © 2014 MapRTechnologies 12 Project Spyglass – Monitoring All You Care About Node/Infrastructure Monitoring • Global Aggregates (Average, Min, Max) Charts (e.g. CPU, Disk utilization) • Per-node charts (e.g. I/O Throughput by disk) • MFS read/writes and throughput • DB puts, gets, scans and cache metrics Cluster Space Utilization Monitoring • Cluster wide storage utilization • Storage Utilization Trend • Utilization per volume and per accountable entity (data, volume, snapshot and total size)
  • 13.
    © 2014 MapRTechnologies 13 Project Spyglass – Monitoring All You Care About Node/Infrastructure Monitoring • Global Aggregates (Average, Min, Max) Charts (e.g. CPU, Disk utilization) • Per-node charts (e.g. I/O Throughput by disk) • MFS read/writes and throughput • DB puts, gets, scans and cache metrics Cluster Space Utilization Monitoring • Cluster wide storage utilization • Storage Utilization Trend • Utilization per volume and per accountable entity (data, volume, snapshot and total size) YARN/MR Application Monitoring • Global YARN trend graphs • Containers - Pending, Active • vCores & RAM - Allocated & Used • Per Queue charts - containers, vCores, RAM
  • 14.
    © 2014 MapRTechnologies 14 Project Spyglass – Monitoring All You Care About Node/Infrastructure Monitoring • Global Aggregates (Average, Min, Max) Charts (e.g. CPU, Disk utilization) • Per-node charts (e.g. I/O Throughput by disk) • MFS read/writes and throughput • DB puts, gets, scans and cache metrics Cluster Space Utilization Monitoring • Cluster wide storage utilization • Storage Utilization Trend • Utilization per volume and per accountable entity (data, volume, snapshot and total size) YARN/MR Application Monitoring • Global YARN trend graphs • Containers - Pending, Active • vCores & RAM - Allocated & Used • Per Queue charts - containers, vCores, RAM Service Daemon Monitoring • Per-service charts with for (CPU Usage by type, Memory) • Centralized, searchable logs • MapR core and ecosystem services (includes YARN, Drill and Spark)
  • 15.
    © 2016 MapRTechnologies 15 Customizable Dashboards for Visualizing Metrics Log Analytics
  • 16.
    © 2016 MapRTechnologies 16 Destination to Learn and Collaborate Blog about topics and ideas Share code snippets and dashboards View demos, tutorials, and videos Engage in use case discussion/development
  • 17.
    © 2016 MapRTechnologies 17 Dashboards are defined with JSON and easy to export and import in Grafana and Kibana Extend/Integrate using REST API The Exchange
  • 18.
    © 2016 MapRTechnologies 18 Dashboards can be viewed on mobile devices.
  • 19.
    © 2016 MapRTechnologies 19 Summary ● Data collection and storage infrastructure (packaged and supported) ○ Collection/storage of metrics & logs across node, storage, services ● Visualization dashboard (Driven via community) ○ Sample dashboards for Grafana & Kibana 5.2 - Spyglass 1.0 GA CUSTOMIZABLE, shareable and mobile-ready dashboards CONVERGED monitoring with deep search EXTENSIBLE and easy to integrate with REST API
  • 20.
    © 2016 MapRTechnologies 20© 2016 MapR Technologies MapR Streams
  • 21.
    © 2016 MapRTechnologies 21 MapR Streams: Enabling Continuous Data Processing To enable continuous, globally scalable streaming of event data, allowing developers to create real-time applications that their business can depend on. Converged Continuous Global
  • 22.
    © 2016 MapRTechnologies 22 MapR Streams: Publish-subscribe Event Streaming System for Big Data Producers publish billions of messages/sec to a topic in a stream. Guaranteed, immediate delivery to all consumers. Standard real-time API (Kafka). Integrates with Spark Streaming, Storm, Apex, and Flink Direct data access (OJAI API) from analytics frameworks. To pi c Stream Producers Remote sites and consumers Batch analytics Topic Replication Consumers Consumers Available in the Enterprise Edition Only
  • 23.
    © 2016 MapRTechnologies 23 MapR Streams: Building Faster and Simpler Apps Simpler and Faster Architecture • Converged platform with file storage and database reduces data movement, data latency, hardware cost, and administration cost • Event streaming and stream processing in the same cluster enables faster processing • Unified security framework with files and database tables reduces administration cost around setting up and enforcing security policies • Multi-tenant - topic isolation, quotas, data placement control allows multiple isolated streaming applications to run on the same cluster reducing hardware cost and data movement
  • 24.
    © 2016 MapRTechnologies 24 Scalable. • Ingest more events to enable faster insights • Hold on to events longer to enable deeper insights • Develop app once and apply to short & long-term data (i.e. run analysis on 15-days data AND 1-year data using same application) MapR Streams: Building Faster and Simpler Apps
  • 25.
    © 2016 MapRTechnologies 25© 2016 MapR Technologies MapR-DB JSON document database and application development with OJAI
  • 26.
    © 2016 MapRTechnologies 26 Open Source OJAI API for JSON-Based Applications Open JSON Application Interface (OJAI) Databases Streams MapR-Client File Systems {JSON} MapR-Client
  • 27.
    © 2016 MapRTechnologies 27 Familiar JSON Paradigm – Similar API Constructs MapR-DB Document record = Json.newDocument() .set("firstName", "John") .set("lastName", "Doe") .set("age", 50); table.insert("jdoe", record); MongoDB BasicDBObject doc = new BasicDBObject ("firstName", "John") .append("lastName", "Doe") .append("age", 50); coll.insert(doc);
  • 28.
    © 2016 MapRTechnologies 28 JSON: Easy Variation with Documents { "_id" : "rp-prod132546", "name" : "Marvel T2 Athena”, "brand" : "Pinarello", "category" : "bike", "type" : "Road Bike”, "price" : 2949.99, "size" : "55cm", "wheel_size" : "700c", "frameset" : { "frame" : "Carbon Toryaca", "fork" : "Onda 2V C" }, "groupset" : { "chainset" : "Camp. Athena 50/34", "brake" : "Camp." }, "wheelset" : { "wheels" : "Camp. Zonda", "tyres" : "Vittoria Pro" } } { "_id" : "rp-prod106702", "name" : " Ultegra SPD-SL 6800”, "brand" : "Shimano", "category" : "pedals", "type" : "Components, "price" : 112.99, "features" : [ "Low profile design increases ...", "Supplied with floating SH11 cleats", "Weight: 260g (pair)" ] } { "_id" : "rp-prod113104", "name" : "Bianchi Pride Jersey SS15”, "brand" : "Nalini", "category" : "Jersey", "type" : "Clothing, "price" : 76.99, "features" : [ "100% Polyester", "3/4 hidden zip", "3 rear pocket" ], "color" : "black" } jerseypedalbike
  • 29.
    © 2016 MapRTechnologies 29 Product Catalog - RDBMS To get a single product“Entity Value Attribute” pattern SELECT * FROM ( SELECT ce.sku, ea.attribute_id, ea.attribute_code, CASE ea.backend_type WHEN 'varchar' THEN ce_varchar.value WHEN 'int' THEN ce_int.value WHEN 'text' THEN ce_text.value WHEN 'decimal' THEN ce_decimal.value WHEN 'datetime' THEN ce_datetime.value ELSE ea.backend_type END AS value, ea.is_required AS required FROM catalog_product_entity AS ce LEFT JOIN eav_attribute AS ea ON ce.entity_type_id = ea.entity_type_id LEFT JOIN catalog_product_entity_varchar AS ce_varchar ON ce.entity_id = ce_varchar.entity_id AND ea.attribute_id = ce_varchar.attribute_id AND ea.backend_type = 'varchar' LEFT JOIN catalog_product_entity_text AS ce_text ON ce.entity_id = ce_text.entity_id AND ea.attribute_id = ce_text.attribute_id AND ea.backend_type = 'text' LEFT JOIN catalog_product_entity_decimal AS ce_decimal ON ce.entity_id = ce_decimal.entity_id AND ea.attribute_id = ce_decimal.attribute_id AND ea.backend_type = 'decimal' LEFT JOIN catalog_product_entity_datetime AS ce_datetime ON ce.entity_id = ce_datetime.entity_id AND ea.attribute_id = ce_datetime.attribute_id AND ea.backend_type = 'datetime' WHERE ce.sku = ‘rp-prod132546’ ) AS tab WHERE tab.value != ’’;
  • 30.
    © 2016 MapRTechnologies 30 Store the product “as a business object” To get a single product { "_id" : "rp-prod132546", "name" : "Marvel T2 Athena”, "brand" : "Pinarello", "category" : "bike", "type" : "Road Bike”, "price" : 2949.99, "size" : "55cm", "wheel_size" : "700c", "frameset" : { "frame" : "Carbon Toryaca", "fork" : "Onda 2V C" }, "groupset" : { "chainset" : "Camp. Athena 50/34", "brake" : "Camp." }, "wheelset" : { "wheels" : "Camp. Zonda", "tyres" : "Vittoria Pro" } } products .findById(“rp-prod132546”) Product Catalog - NoSQL/Document
  • 31.
    © 2016 MapRTechnologies 31 Native JSON Support in MapR-DB { order_num: 5555, products: [ { product_id: 348752, quantity: 1, unit_price: 149.99, total_price: 149.99 }, { product_id: 439322, quantity: 1, unit_price: 99.99, total_price: 99.99 }, { product_id: 953923, quantity: 1, unit_price: 49.99, total_price: 49.99 }, ] } Reads/writes at element level • Granular disk reads/writes • Less network traffic • Higher concurrency Any new elements added on demand • No predefined schemas • Easy to store evolving data Not all NoSQL databases treat JSON as a native data type.
  • 32.
    © 2016 MapRTechnologies 32 Leverage the Column Family Construct (Optional) / {a: {a1: {b1: "v1", b2: [ {c1: "v1", c2: "v2"} ] }, a2: { e1: "v1", e2: <inline jpg> } } } Column Family 1 Column Family 2 Control layout for faster data access Different TTL requirements Separate Table Replication settings Specific data placement policies Efficient ACEs
  • 33.
    © 2016 MapRTechnologies 33 Fine Grained Security for JSON Documents { “fname”: “John”, “lname”: “Doe”, “address”: “111 Main St.”, “city”: “San Jose”, “state”: “CA”, “zip”: “95134”, “credit_cards”: [ {“issuer”: “Visa”, “number”: “4444555566667777”}, {“issuer”: “MasterCard”, “number”: “5555666677778888”} ] } Entire document Element: “fname” Array: “credit_cards” Sub-element in array element: “credit_cards[*].number” Specify different permissions levels within the document.
  • 34.
    © 2016 MapRTechnologies 34 Comprehensive Data Type Support for MapR-DB • NULL • Boolean • String • Map • Array • Float, Double • Binary • Byte, Short, Int, Long • Date • Decimal • Interval • Time • Timestamp Examples: { “sample_int”: {"$numberLong”: 2147483647}, “sample_date”: {“$dateDay”: “2016-02-22”}, “sample_decimal”:{“$decimal”: “1234567890.23456789”}, “sample_time”: {“$time”: “10:26:12.487”}, “sample_timestamp”: {“$date”: “2016-02-22T10:26:12.487+Z”} }
  • 35.
    © 2016 MapRTechnologies 35© 2016 MapR Technologies Data Security with Access Control Expressions
  • 36.
    © 2016 MapRTechnologies 36 File ACEs – Key Features Intuitive Inheritance Subdirectories and files inherit perms from parent directory Whole-Volume ACEs Volume-level filter – useful in multitenant environments. Roles Arbitrary grouping of users according to your business needs High Performance No performance hit Boolean Operators Allowing for ultra fine-grain permissions AUTHORIZATION
  • 37.
    © 2016 MapRTechnologies 37 File ACEs: Whole Volume ACE Example Whole-Volume ACE r: group:finance Jane grants read access to Bob. File: /finance/final_report.csv r: user:bob Bob cannot read the file /finance/final_report.csv because the whole-volume ACE is set to allow read-access to finance only. Jane (Finance) Bob (Developer) Whole-Volume ACE AUTHORIZATION
  • 38.
    © 2016 MapRTechnologies 38 POSIX ACLs vs ACEs r : user:sally | (group:dev_team & group:managers) Access Control Lists MapR Access Control Expressions AUTHORIZATION Which one is easier to set and understand? Which one allows for higher granularity?
  • 39.
    © 2016 MapRTechnologies 39 MapR Has ACEs for Files and MapR-DB Records Example: user:mary | (group:admins & group:VP) & user:!bob Permissions on files, tables, column families, columns, JSON documents and sub-documents Use Access Control Expressions (ACEs) to set granular permissions. AUTHORIZATION
  • 40.
    © 2016 MapRTechnologies 40© 2016 MapR Technologies Ecosystem Updates
  • 41.
    © 2016 MapRTechnologies 41 5.2 Ecosystem Support These are the only component version changes in MEP 1.0 from 5.2 release date and all of these have been out for 5.1 already. Eco on 5.1 today MEP 1.0 on 5.2 Component Released with 5.1 Subsequently released for 5.1 Drill 1.4 1.6 1.6 Spark 1.5.2 1.6.1 1.6.1 (2.0 in dev preview) Impala 2.2.0 2.5 2.5 Storm 0.10.0 0.10.1 0.10.1 Mahout 0.11.2 0.12.2 0.12.2
  • 42.
    © 2016 MapRTechnologies 42 Converging SQL and JSON with Apache Drill 1.6 • Flexible and operational analytics on NoSQL – MapR-DB plugin allows analysts to perform SQL queries directly on JSON data in MapR-DB tables – Pushdown capabilities provide optimal interactive experience • Enhanced query performance – Provides better query performance via partition pruning, metadata caching and other optimizations – Delivers up to 10-60X performance gains in query planning compared to the previous releases of Drill • Better memory management – Delivers greater stability and scale which enables customers to run not only larger but also more SQL workloads on a MapR cluster • Improved integration with visualization tools like Tableau – Introduces client impersonation for end-to-end security from the visualization tool to data in Hadoop. – Enhanced SQL Window functions
  • 43.
    © 2016 MapRTechnologies 43 What’s New in Spark 2.0? • Structured Streaming with Spark SQL – The ability to perform interactive queries against live streaming data. – Output can now be aggregated in a stream for continuous applications. – Pre-computation of analytics in a continuous fashion can occur as the data is generated • Whole Stage Code-gen – Provided by the second-generation Tungsten engine. – Eliminates the need for multiple JVM calls by flattening SQL queries into one single function evaluated as bytecode at runtime. • Dataframe API’s – Runs on the same engine as SparkSQL. – Allows access to data from a variety of different data sources. – Can run database-like operations or allow for passing in custom code.
  • 44.
    © 2016 MapRTechnologies 44© 2016 MapR Technologies Upgrade to the Latest MapR Converged Community Edition
  • 45.
    © 2016 MapRTechnologies 45 Select an Upgrade Method Takes advantage of high-availability features Offline Installer Time Complexity Rolling Manual Rolling Scripted Offline Manual Cluster offline during upgrade
  • 46.
    © 2016 MapRTechnologies 46 Community Edition and Rolling Upgrades • Expect interruptions to cluster operations when nodes running the only copy of a service (for example, CLDB) are upgraded • Minimize cluster access • With 10 or fewer nodes, offline upgrade probably makes the most sense Offline Installer Rolling Manual Rolling Scripted Offline Manual
  • 47.
    © 2016 MapRTechnologies 47 Supported Upgrade Methods From Version Offline Installer Offline Manual Rolling Manual Rolling Scripted 3.x 4.0 4.1 5.0 5.1 * Supported for clusters that were installed using the MapR Installer. This is the only method that also upgrades ecosystem components.
  • 48.
    © 2016 MapRTechnologies 48 High-Level Overview 2 Prepare 1 Plan! Upgrade 3
  • 49.
    © 2016 MapRTechnologies 49 Plan: Determine What to Include MapR Core Ecosystem components not at supported MEP MapR clients New features ? ?
  • 50.
    © 2016 MapRTechnologies 50 Plan: Develop a Test Plan • Run tests before and after each upgrade step – Compare results • Test basic functionality – Verify cluster access and volumes – Use maprcli, hadoop fs, MCS • Test jobs and queries – Based on the components you use
  • 51.
    © 2016 MapRTechnologies 51 Plan: Create an Upgrade Schedule What needs to happen after the upgrade? What can be done days ahead? What needs to happen the day of the upgrade? What can be done weeks ahead?
  • 52.
    © 2016 MapRTechnologies 52 Prepare: Weeks Ahead • Review Release Notes • Verify node specifications – Update the JDK if needed • Upgrade on a test cluster – Document surprises – Prepare configuration files Weeks Ahead Critical!Critical!
  • 53.
    © 2016 MapRTechnologies 53 Prepare: Days Ahead • Download the installer, packages, etc. • Run tests and record results • Back up critical data Days Ahead
  • 54.
    © 2016 MapRTechnologies 54 Prepare: Day of Upgrade • Verify cluster health and clear alarms • Empty job queue/terminate jobs • Stop cross-cluster operations – Volume mirroring – Table replication
  • 55.
    © 2016 MapRTechnologies 55 Upgrade Order 1. MapR core 2. Ecosystem components • Upgraded manually, unless using MapR Installer 3. MapR clients 4. Enable new features
  • 56.
    © 2016 MapRTechnologies 56 Upgrade MapR Core Component Includes MapReduce binaries MapR Core Webserver maprcli command binaries, MCS, REST API Other services New features, performance enhancements (varies by release)
  • 57.
    © 2016 MapRTechnologies 57 Upgrade MapR Core: Config Files New default configuration files created: Active Configuration Files (do not change during upgrade) New Configuration Files (added with upgrade) /opt/mapr/conf /opt/mapr/conf.new /opt/mapr/conf/conf.d /opt/mapr/conf.d.new /opt/mapr/hadoop/hadoop-<ver>/conf opt/mapr/hadoop/hadoop-<ver>/conf.new
  • 58.
    © 2016 MapRTechnologies 58 Upgrade MapR Core: Config Files New default configuration files created: Active Configuration Files (do not change during upgrade) New Configuration Files (added with upgrade) /opt/mapr/conf /opt/mapr/conf.new /opt/mapr/conf/conf.d /opt/mapr/conf.d.new /opt/mapr/hadoop/hadoop-<ver>/conf opt/mapr/hadoop/hadoop-<ver>/conf.new Important! Merge required changes into active configuration files
  • 59.
    © 2016 MapRTechnologies 59 Upgrade MapR Core: Hadoop Common Version 1. New Hadoop directory created at: /opt/mapr/hadoop/hadoop-<version> 2. Existing Hadoop directory moved to: /opt/mapr/hadoop/OLD_HADOOP_VERSIONS 3. Links updated for new version: /opt/mapr/lib/*.jar 4. Paths in service configuration files updated: /opt/mapr/conf/conf.d/warden.<service name>.conf
  • 60.
    © 2016 MapRTechnologies 60 Upgrade MapR Core: Post-Upgrade Tasks • If upgrading from 5.0 or earlier, copy new license file into place on each node: cp /opt/mapr/conf.new/BaseLicense.txt /opt/mapr/conf/ • After a manual (rolling, or offline) upgrade, update Hadoop configuration file with new version: /opt/mapr/conf/hadoop_version • Resume cross-cluster operations – Volume mirroring – Table replication
  • 61.
    © 2016 MapRTechnologies 61 Upgrade Ecosystem Components • Follow pre- and post-upgrade steps in documentation • As of MapR 5.2, must upgrade to ecosystem components that belong to the same MapR Ecosystem Pack (MEP) http://maprdocs.mapr.com/home/InteropMatrix/r_MEP_52.html
  • 62.
    © 2016 MapRTechnologies 62 Upgrade MapR Clients MapR Client (Windows, Mac, Linux) Cluster hadoop fs –ls / maprcli volume list
  • 63.
    © 2016 MapRTechnologies 63 Upgrade MapR POSIX Clients • Loopback POSIX client • FUSE-based POSIX client – FUSE-based new in MapR 5.1 • Recommend: upgrade to FUSE-based POSIX client MapR POSIX Client (Linux only)
  • 64.
    © 2016 MapRTechnologies 64 Upgrading from MapR 3.x • To run MapReduce v1 jobs, change the default MapReduce mode or submit them with the appropriate command • May need to recompile MapReduce jobs • May need to add YARN services to cluster http://maprdocs.mapr.com/home/UpgradeGuide/RunningMRjobsYarn.html
  • 65.
    © 2016 MapRTechnologies 65 Other Upgrade Considerations • Mirroring between clusters – Volumes must be mirrored to a cluster at the same, or higher, revision – Upgrade the destination cluster first! – Consider disabling mirror operations during the upgrades, to avoid alarms and maximize available bandwidth • Table replication between clusters – Clusters involved in table replication can be at different versions
  • 66.
    © 2016 MapRTechnologies 66 Q&AEngage with us! • Spyglass Initiative o https://www.mapr.com/products/spyglass-initiative • Try out MapR Streams and MapR-DB in the free MapR Community Edition o https://www.mapr.com/products/hadoop-download • Try out MapR Streams and MapR-DB in the MapR Sandbox (virtual machine) o https://www.mapr.com/products/mapr-sandbox-hadoop