Andrija Panic™
Cloud Architect
andrija.panic@shapeblue.com
Twitter: @AndrijaRS
Ceph with CloudStack
The Cloud Specialists
The Cloud Specialists
• Cloud Architect @ ShapeBlue
• From Belgrade, Serbia
• Committer and PMC member
• Involved with CloudStack since version 4.0.0-incubating
• Interested in:
• Cloud infrastructure architecture and engineering.
• Virtualization, Storage and SDxx
• Downtime:
• Father to 2 princesses
• Music, gym and hobby electronic
The Cloud Specialists
The Cloud Specialists
“The name Ceph comes from cephalopod, a class of molluscs that
includes the octopus and squid… the reasoning had something to
do with their high level of intelligence and “many-tentacled”,
“distributed” physiology.”
Sage Weil
Fun facts:
• Cephalopods have the most complex nervous system of all the invertebrates.
• Some can fly up to 50m through the air, squirting water to help propel themselves.
• Most have special coloured pigments on their skin that are used for camouflage.
• Cephalopods have advanced vision, but most are colour blind.
• They have an ink sac that they squirt into the water to confuse predators
The Cloud Specialists
• Open source SDS solution
• Highly scalable (tens of thousands of nodes)
• No single point of failure
• Hardware agnostic, “runs on commodity hardware”
• Self-managed whenever possible
• Built around the CRUSH algorithm
• Provides multiple access methods:
• File
• Block
• Object (S3/Swift)
• NFS gateway (third-party sw.) for backward compatibility
The Cloud Specialists
The Cloud Specialists
• The Ceph Storage Cluster (RADOS cluster) is the foundation for all
Ceph deployments.
• Based upon RADOS, consists of three types of daemons:
• Ceph Object Storage Daemon (OSD)
• Ceph Monitor (MON)
• Ceph Meta Data Server (MDS) - optionally
• A minimal possible system will have at least one Ceph Monitor
and two Ceph OSD Daemons for data replication.
• Production system will have at least 3 monitors (redundancy) and
minimum 10 OSD nodes (i.e. 80+ OSDs)
The Cloud Specialists
Ceph Storage Cluster (RADOS cluster)
• OSD and MON are mandatory for every cluster
• MDS is required only if using Ceph FS
OSDs:
• 10s to 10000s in a cluster, one per disk (HDD, SSD, NVME)
• Serve stored objects to clients
• Intelligently peer to perform replication/recovery tasks
MONs:
• Maintain a master copy of the Ceph cluster map,
cluster membership and state
• Provide consensus for distributed decision-making
via PAXOS algorithm
• Small, odd number, do not serve objects to clients
The Cloud Specialists
The Cloud Specialists
Preparation
• Make sure the time across all servers is synced with less then 0.05sec of difference!
(don’t worry, Ceph will complain if not synced)
• Make sure that “hostname --fqdn” is resolvable between all nodes
• Make sure key-based ssh auth from admin node to all cluster nodes is working (sudo)
• Add proper release repo on the “admin” node, install “ceph-deploy”
The Cloud Specialists
Installation (using ceph-deploy from the admin node)
• mkdir mycluster; cd mycluster;
• ceph-deploy new ceph-node1 ceph-node2 ceph-node3 (make cluster def.)
• ceph-deploy install --release nautilus ceph-node1 ceph-node2 ceph-node3 (install binaries only)
• ceph-deploy mon create-initial (create MONs across initially added Ceph nodes)
• ceph-deploy admin ceph-node1 ceph-node2 ceph-node3 (copy ceph.conf and the needed keyrings)
• for n in 1 2 3; do ceph-deploy osd create --data /dev/sdb ceph-node$n; done (deploy single OSD per node)
Ceph dashboard (optional but recommended)
• yum install -y ceph-mgr-dashboard
• ceph config set mgr mgr/dashboard/ssl false
• ceph mgr module enable dashboard
• ceph dashboard ac-user-create admin password administrator
The Cloud Specialists
Create a pool for CloudStack
• ceph osd pool create cloudstack 64 replicated
• ceph osd pool set cloudstack size 3
• rbd pool init cloudstack
• ceph auth get-or-create client.cloudstack mon 'profile rbd' osd 'profile rbd pool=cloudstack’*
Example key:
[client.cloudstack]
key = AQAb6M9cY1epJBAAZgzlOlpZSpBcUpYCBWTFrA==
Configure write-back caching on KVM nodes (setup ssh/name resolution from the admin node)
• cat << EOM >> /root/mycluster/ceph.conf
[client]
rbd cache = true
rbd cache writethrough until flush = true
EOM
• ceph-deploy --overwrite-conf admin kvm1 kvm2 kvm3
The Cloud Specialists
The Cloud Specialists
The Cloud Specialists
View/Manage OSDs
Manage basic cluster configs
The Cloud Specialists
View/Manage pools
View/Manage RBD images
The Cloud Specialists
New in Nautilus (based on SUSE’s OpenATTIC mostly)
• OSD management (mark as down/out, change OSD settings, recovery profiles)
• Cluster config settings editor
• Ceph Pool management (create/modify/delete)
• ECP management
• RBD mirroring configuration
• Embedded Grafana Dashboards (derived from Ceph Metrics)
• CRUSH map viewer
• NFS Ganesha management
• iSCSI target management (via ceph-iscsi)
• RBD QoS configuration
• Ceph Manager (ceph-mgr) module management
• Prometheus alert Management
• Support for multiple users / roles; SSO (SAMLv2) for user authentication
The Cloud Specialists
(Some) Nautilus improvements:
• pg_num can be reduced; can be auto-tuned in the background
• OSD and mon report SMART stats; Failure prediction; Optional automatic migration*
• Mon protocol v2, port 6789 → 3300 (IANA); encryption; dual (v1 and v2) support
• osd_target_memory; NUMA mgmt & OSD pinning; misplaced no more HEALHT_WARN
• S3 tiering policy, bucket versioning
• RBD live image migration (librbd only); rbd-mirror got simpler; rbd top & and rbd CLI;
• CephFS multi-fs support stable; Clustered nfs-ganesha (active/active)
• Run Ceph clusters in Kubernetes (Rook, ceph-ansible)
The Cloud Specialists
Add Ceph to CloudStack Create offerings for Ceph
- Deploy a VM
The Cloud Specialists
Let’s check our ACS volume on Ceph
The Cloud Specialists
Volume provisioning steps:
• Copy template from SS to Ceph: “0a7cd56c-beb0-11e9-b920-1e00c701074a”
• Create a base snapshots and protect it (can’t be deleted): “cloudstack-base-snap”
• Create a VM’s volume as the child (clone) of the snap: “feb056c5-72d4-400a-92c2-f25c64fe9d26”
Find all volumes (children) of specific template (base-snap of the template image)
<root@ceph1># rbd children cloudstack/0a7cd56c-beb0-11e9-b920-1e00c701074a@cloudstack- base-snap
cloudstack/feb056c5-72d4-400a-92c2-f25c64fe9d26
cloudstack/8481fcb1-a91e-4955-a7fc-dd04a44edce5
cloudstack/9b8f978b-74d0-48f7-93f6-5e06b9eb6fd3
cloudstack/3f65da05-268f-41fa-99b2-ce5d4e6d6597
…
The Cloud Specialists
Manually reproducing the ACS behavior:
rbd create -p cloudstack mytemplate --size 100GB (or “qemu-img” convert, or “rbd import”…)
rbd snap create cloudstack/mytemplate@cloudstack-base-snap
rbd snap protect cloudstack/mytemplate@cloudstack-base-snap
rbd clone cloudstack/mytemplate@cloudstack-base-snap cloudstack/myVMvolume
…and the cleanup:
[root@ceph1 ~]# rbd rm cloudstack/myVMvolume
Removing image: 100% complete...done.
[root@ceph1 ~]# rbd snap unprotect cloudstack/mytemplate@cloudstack-base-snap
[root@ceph1 ~]# rbd snap rm cloudstack/mytemplate@cloudstack-base-snap
Removing snap: 100% complete...done.
[root@ceph1 ~]# rbd rm cloudstack/mytemplate
Removing image: 100% complete...done.
The Cloud Specialists
“Hacking” the customer’s volume:
• rbd map myPool/myImage (kernel client)
(will usually fail due to kernel client “rbd.ko” being way behind the cluster version/capabilities)
• rbd-nbd map myPool/myImage (user-space, via librbd)
(requires “yum install rbd-nbd” and “modprobe nbd max_part=15*”)
• qemu-nbd --connect=/dev/nbd0 rbd:myPool/myImage (user-space, via librbd)
(requires “modprobe nbd*”)
Qemu-img:
• qemu-img info rbd:cloudstack/47b1cfe5-6bab-4506-87b6-d85b77d9b69c*
• qemu-img info rbd:cloudstack/47b1cfe5-6bab-4506-87b6-
d85b77d9b69c:mon_host=10.x.x.y:auth_supported=Cephx:id=cloudstack:key=AQAFSZ……..jEtr/g==
The Cloud Specialists
• No support for a full VM snapshot (technically not possible with Ceph/iSCSI/raw block devices)
• No support for the storage heartbeat file (yet…)
• Currently not possible to really restore a volume from a snapshot (old behaviour stays*)
• Two “external” libraries to be aware of – librbd and rados-java
The Cloud Specialists
Not your average NFS:
• Ceph can be a rather complex storage system to comprehend
• Make sure you know the storage system well before relying on it in production
• Make sure to excel at troubleshooting, you’ll need it sooner or later
• Understand how the things works under the hood
• Understand recovery throttling to avoid high impact on customer IO
The Cloud Specialists
• “Works on commodity hardware”, but don’t expect miracles
• Writing data to primary OSD and replicating that write to another 2 OSDs, takes time
• Latency is very good with NVME (0.5ms-1ms)
• Not so very good with HDD/SSD mix (10ms-30ms)
• Never, ever, ever… use consumer SSDs; bench and test specific enterprise SSD models
• Too many parallel stream end up generating pure random IO pattern on the backend
• Ceph was (unofficially) considered unsuitable for serious random IO workload (2-3y ago)*
The Cloud Specialists
Things have seriously changed last few years (especially with the new BlueStore backend)
• Writing to the raw device (“block”) vs. XFS on FileStore;
• RockDB (“block.db”, “block.wal”) vs. LevelDB
• Now suitable for pure SSD/NVME clusters
• Increased throughput 40-300%*, reduced latency 30-50%* vs. FileStore
• Explicit memory management* (BlueStore runs in user-space)
• Data and metadata checksums; Compression
• Reads still served from Primary OSD only 
The Cloud Specialists
Step by step guide for Ceph with CloudStack (Mimic):
• https://www.shapeblue.com/ceph-and-cloudstack-part-1/
• https://www.shapeblue.com/ceph-and-cloudstack-part-2/
• https://www.shapeblue.com/ceph-and-cloudstack-part-3/
The Cloud Specialists
CloudStack
ShapeBlue.com • @ShapeBlue
An d r i j a P a n i c , C l o u d a r c h i t e c t • P M C Ap a c h e C l o u d S t a c k
a n d r i j a . p a n i c @ s h a p e b l u e . c o m • @ An d r i j a R S

More Related Content

PDF
KVM High Availability Regardless of Storage - Gabriel Brascher, VP of Apache ...
PDF
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
PDF
Volume Encryption In CloudStack
PDF
BlueStore, A New Storage Backend for Ceph, One Year In
PDF
OpenShift-Technical-Overview.pdf
PDF
Ceph issue 해결 사례
PDF
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
PDF
[OpenStack] 공개 소프트웨어 오픈스택 입문 & 파헤치기
KVM High Availability Regardless of Storage - Gabriel Brascher, VP of Apache ...
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
Volume Encryption In CloudStack
BlueStore, A New Storage Backend for Ceph, One Year In
OpenShift-Technical-Overview.pdf
Ceph issue 해결 사례
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenStack] 공개 소프트웨어 오픈스택 입문 & 파헤치기

What's hot (20)

PPTX
Introduction to Ansible
PPTX
Ceph Intro and Architectural Overview by Ross Turk
PDF
Ceph - A distributed storage system
PDF
Vmware overview
PDF
MAAS High Availability Overview
PDF
Automation with ansible
PPTX
Kubernetes Networking 101
PPTX
Docker Networking Overview
PDF
CloudStack - Top 5 Technical Issues and Troubleshooting
PPTX
OVN - Basics and deep dive
PDF
[2018] 오픈스택 5년 운영의 경험
PPT
AIXpert - AIX Security expert
PDF
Deploying CloudStack with Ceph
PDF
HAProxy TCP 모드에서 내부 서버로 Source IP 전달 방법
PDF
Accelerating Ceph with RDMA and NVMe-oF
DOCX
AWS | VPC Peering
PDF
Ceph on arm64 upload
PDF
Disaggregating Ceph using NVMeoF
PPTX
Ceph Performance and Sizing Guide
PDF
[오픈소스컨설팅] Red Hat ReaR (relax and-recover) Quick Guide
Introduction to Ansible
Ceph Intro and Architectural Overview by Ross Turk
Ceph - A distributed storage system
Vmware overview
MAAS High Availability Overview
Automation with ansible
Kubernetes Networking 101
Docker Networking Overview
CloudStack - Top 5 Technical Issues and Troubleshooting
OVN - Basics and deep dive
[2018] 오픈스택 5년 운영의 경험
AIXpert - AIX Security expert
Deploying CloudStack with Ceph
HAProxy TCP 모드에서 내부 서버로 Source IP 전달 방법
Accelerating Ceph with RDMA and NVMe-oF
AWS | VPC Peering
Ceph on arm64 upload
Disaggregating Ceph using NVMeoF
Ceph Performance and Sizing Guide
[오픈소스컨설팅] Red Hat ReaR (relax and-recover) Quick Guide
Ad

Similar to Ceph with CloudStack (20)

PDF
Andrija Panic - Ceph with CloudStack
PDF
Wido den Hollander - building highly available cloud with Ceph and CloudStack
PDF
Ceph Day London 2014 - Deploying ceph in the wild
PPTX
Ceph barcelona-v-1.2
PPTX
ceph-barcelona-v-1.2
PPTX
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
PPTX
New Ceph capabilities and Reference Architectures
PPTX
Ceph Day Chicago - Ceph at work at Bloomberg
PPTX
Your 1st Ceph cluster
PPTX
Ceph Deployment at Target: Customer Spotlight
PPTX
Ceph Deployment at Target: Customer Spotlight
PPTX
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
PPTX
Ceph Day SF 2015 - Building your own disaster? The safe way to make Ceph ready!
PDF
Integrating CloudStack & Ceph
PDF
Ceph for Big Science - Dan van der Ster
PPTX
Build an affordable Cloud Stroage
PDF
3 ubuntu open_stack_ceph
PDF
Ceph and cloud stack apr 2014
PDF
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
PPTX
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Andrija Panic - Ceph with CloudStack
Wido den Hollander - building highly available cloud with Ceph and CloudStack
Ceph Day London 2014 - Deploying ceph in the wild
Ceph barcelona-v-1.2
ceph-barcelona-v-1.2
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
New Ceph capabilities and Reference Architectures
Ceph Day Chicago - Ceph at work at Bloomberg
Your 1st Ceph cluster
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph Day SF 2015 - Building your own disaster? The safe way to make Ceph ready!
Integrating CloudStack & Ceph
Ceph for Big Science - Dan van der Ster
Build an affordable Cloud Stroage
3 ubuntu open_stack_ceph
Ceph and cloud stack apr 2014
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ad

More from ShapeBlue (20)

PDF
CloudStack 4.21: First Look Webinar slides
PPTX
The Yotta x CloudStack Advantage: Scalable, India-First Cloud
PPTX
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
PDF
CloudStack GPU Integration - Rohit Yadav
PPTX
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
PDF
Ampere Offers Energy-Efficient Future For AI And Cloud
PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
PDF
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
PDF
Fully Open-Source Private Clouds: Freedom, Security, and Control
PPTX
Pushing the Limits: CloudStack at 25K Hosts
PPTX
Stretching CloudStack over multiple datacenters
PPTX
Proposed Feature: Monitoring and Managing Cloud Usage Costs in Apache CloudStack
PPSX
CloudStack + KVM: Your Local Cloud Lab
PDF
I’d like to resell your CloudStack services, but...
PDF
Storage Setup for LINSTOR/DRBD/CloudStack
PDF
Apache CloudStack 101 - Introduction, What’s New and What’s Coming
PDF
Development of an Оbject Storage Plugin for CloudStack, Christian Reichert, s...
PDF
VM-HA with CloudStack and Linstor, Rene Peinthor
CloudStack 4.21: First Look Webinar slides
The Yotta x CloudStack Advantage: Scalable, India-First Cloud
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
Extensions Framework (XaaS) - Enabling Orchestrate Anything
CloudStack GPU Integration - Rohit Yadav
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
Ampere Offers Energy-Efficient Future For AI And Cloud
Empowering Cloud Providers with Apache CloudStack and Stackbill
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
Fully Open-Source Private Clouds: Freedom, Security, and Control
Pushing the Limits: CloudStack at 25K Hosts
Stretching CloudStack over multiple datacenters
Proposed Feature: Monitoring and Managing Cloud Usage Costs in Apache CloudStack
CloudStack + KVM: Your Local Cloud Lab
I’d like to resell your CloudStack services, but...
Storage Setup for LINSTOR/DRBD/CloudStack
Apache CloudStack 101 - Introduction, What’s New and What’s Coming
Development of an Оbject Storage Plugin for CloudStack, Christian Reichert, s...
VM-HA with CloudStack and Linstor, Rene Peinthor

Recently uploaded (20)

PDF
FASHION-DRIVEN TEXTILES AS A CRYSTAL OF A NEW STREAM FOR STAKEHOLDER CAPITALI...
PDF
TicketRoot: Event Tech Solutions Deck 2025
PPTX
Blending method and technology for hydrogen.pptx
PDF
【AI論文解説】高速・高品質な生成を実現するFlow Map Models(Part 1~3)
PDF
eBook Outline_ AI in Cybersecurity – The Future of Digital Defense.pdf
PDF
“Introduction to Designing with AI Agents,” a Presentation from Amazon Web Se...
PDF
ment.tech-How to Develop an AI Agent Healthcare App like Sully AI (1).pdf
PDF
GDG Cloud Southlake #45: Patrick Debois: The Impact of GenAI on Development a...
PDF
Rooftops detection with YOLOv8 from aerial imagery and a brief review on roof...
PDF
1_Keynote_Breaking Barriers_한계를 넘어서_Charith Mendis.pdf
PDF
Be ready for tomorrow’s needs with a longer-lasting, higher-performing PC
PDF
Revolutionizing recommendations a survey: a comprehensive exploration of mode...
PDF
Optimizing bioinformatics applications: a novel approach with human protein d...
PDF
The Basics of Artificial Intelligence - Understanding the Key Concepts and Te...
PDF
Uncertainty-aware contextual multi-armed bandits for recommendations in e-com...
PDF
CCUS-as-the-Missing-Link-to-Net-Zero_AksCurious.pdf
PDF
Human Computer Interaction Miterm Lesson
PDF
Advancements in abstractive text summarization: a deep learning approach
PPTX
maintenance powerrpoint for adaprive and preventive
PDF
Applying Agentic AI in Enterprise Automation
FASHION-DRIVEN TEXTILES AS A CRYSTAL OF A NEW STREAM FOR STAKEHOLDER CAPITALI...
TicketRoot: Event Tech Solutions Deck 2025
Blending method and technology for hydrogen.pptx
【AI論文解説】高速・高品質な生成を実現するFlow Map Models(Part 1~3)
eBook Outline_ AI in Cybersecurity – The Future of Digital Defense.pdf
“Introduction to Designing with AI Agents,” a Presentation from Amazon Web Se...
ment.tech-How to Develop an AI Agent Healthcare App like Sully AI (1).pdf
GDG Cloud Southlake #45: Patrick Debois: The Impact of GenAI on Development a...
Rooftops detection with YOLOv8 from aerial imagery and a brief review on roof...
1_Keynote_Breaking Barriers_한계를 넘어서_Charith Mendis.pdf
Be ready for tomorrow’s needs with a longer-lasting, higher-performing PC
Revolutionizing recommendations a survey: a comprehensive exploration of mode...
Optimizing bioinformatics applications: a novel approach with human protein d...
The Basics of Artificial Intelligence - Understanding the Key Concepts and Te...
Uncertainty-aware contextual multi-armed bandits for recommendations in e-com...
CCUS-as-the-Missing-Link-to-Net-Zero_AksCurious.pdf
Human Computer Interaction Miterm Lesson
Advancements in abstractive text summarization: a deep learning approach
maintenance powerrpoint for adaprive and preventive
Applying Agentic AI in Enterprise Automation

Ceph with CloudStack

  • 1. Andrija Panic™ Cloud Architect [email protected] Twitter: @AndrijaRS Ceph with CloudStack The Cloud Specialists
  • 2. The Cloud Specialists • Cloud Architect @ ShapeBlue • From Belgrade, Serbia • Committer and PMC member • Involved with CloudStack since version 4.0.0-incubating • Interested in: • Cloud infrastructure architecture and engineering. • Virtualization, Storage and SDxx • Downtime: • Father to 2 princesses • Music, gym and hobby electronic
  • 4. The Cloud Specialists “The name Ceph comes from cephalopod, a class of molluscs that includes the octopus and squid… the reasoning had something to do with their high level of intelligence and “many-tentacled”, “distributed” physiology.” Sage Weil Fun facts: • Cephalopods have the most complex nervous system of all the invertebrates. • Some can fly up to 50m through the air, squirting water to help propel themselves. • Most have special coloured pigments on their skin that are used for camouflage. • Cephalopods have advanced vision, but most are colour blind. • They have an ink sac that they squirt into the water to confuse predators
  • 5. The Cloud Specialists • Open source SDS solution • Highly scalable (tens of thousands of nodes) • No single point of failure • Hardware agnostic, “runs on commodity hardware” • Self-managed whenever possible • Built around the CRUSH algorithm • Provides multiple access methods: • File • Block • Object (S3/Swift) • NFS gateway (third-party sw.) for backward compatibility
  • 7. The Cloud Specialists • The Ceph Storage Cluster (RADOS cluster) is the foundation for all Ceph deployments. • Based upon RADOS, consists of three types of daemons: • Ceph Object Storage Daemon (OSD) • Ceph Monitor (MON) • Ceph Meta Data Server (MDS) - optionally • A minimal possible system will have at least one Ceph Monitor and two Ceph OSD Daemons for data replication. • Production system will have at least 3 monitors (redundancy) and minimum 10 OSD nodes (i.e. 80+ OSDs)
  • 8. The Cloud Specialists Ceph Storage Cluster (RADOS cluster) • OSD and MON are mandatory for every cluster • MDS is required only if using Ceph FS OSDs: • 10s to 10000s in a cluster, one per disk (HDD, SSD, NVME) • Serve stored objects to clients • Intelligently peer to perform replication/recovery tasks MONs: • Maintain a master copy of the Ceph cluster map, cluster membership and state • Provide consensus for distributed decision-making via PAXOS algorithm • Small, odd number, do not serve objects to clients
  • 10. The Cloud Specialists Preparation • Make sure the time across all servers is synced with less then 0.05sec of difference! (don’t worry, Ceph will complain if not synced) • Make sure that “hostname --fqdn” is resolvable between all nodes • Make sure key-based ssh auth from admin node to all cluster nodes is working (sudo) • Add proper release repo on the “admin” node, install “ceph-deploy”
  • 11. The Cloud Specialists Installation (using ceph-deploy from the admin node) • mkdir mycluster; cd mycluster; • ceph-deploy new ceph-node1 ceph-node2 ceph-node3 (make cluster def.) • ceph-deploy install --release nautilus ceph-node1 ceph-node2 ceph-node3 (install binaries only) • ceph-deploy mon create-initial (create MONs across initially added Ceph nodes) • ceph-deploy admin ceph-node1 ceph-node2 ceph-node3 (copy ceph.conf and the needed keyrings) • for n in 1 2 3; do ceph-deploy osd create --data /dev/sdb ceph-node$n; done (deploy single OSD per node) Ceph dashboard (optional but recommended) • yum install -y ceph-mgr-dashboard • ceph config set mgr mgr/dashboard/ssl false • ceph mgr module enable dashboard • ceph dashboard ac-user-create admin password administrator
  • 12. The Cloud Specialists Create a pool for CloudStack • ceph osd pool create cloudstack 64 replicated • ceph osd pool set cloudstack size 3 • rbd pool init cloudstack • ceph auth get-or-create client.cloudstack mon 'profile rbd' osd 'profile rbd pool=cloudstack’* Example key: [client.cloudstack] key = AQAb6M9cY1epJBAAZgzlOlpZSpBcUpYCBWTFrA== Configure write-back caching on KVM nodes (setup ssh/name resolution from the admin node) • cat << EOM >> /root/mycluster/ceph.conf [client] rbd cache = true rbd cache writethrough until flush = true EOM • ceph-deploy --overwrite-conf admin kvm1 kvm2 kvm3
  • 15. The Cloud Specialists View/Manage OSDs Manage basic cluster configs
  • 16. The Cloud Specialists View/Manage pools View/Manage RBD images
  • 17. The Cloud Specialists New in Nautilus (based on SUSE’s OpenATTIC mostly) • OSD management (mark as down/out, change OSD settings, recovery profiles) • Cluster config settings editor • Ceph Pool management (create/modify/delete) • ECP management • RBD mirroring configuration • Embedded Grafana Dashboards (derived from Ceph Metrics) • CRUSH map viewer • NFS Ganesha management • iSCSI target management (via ceph-iscsi) • RBD QoS configuration • Ceph Manager (ceph-mgr) module management • Prometheus alert Management • Support for multiple users / roles; SSO (SAMLv2) for user authentication
  • 18. The Cloud Specialists (Some) Nautilus improvements: • pg_num can be reduced; can be auto-tuned in the background • OSD and mon report SMART stats; Failure prediction; Optional automatic migration* • Mon protocol v2, port 6789 → 3300 (IANA); encryption; dual (v1 and v2) support • osd_target_memory; NUMA mgmt & OSD pinning; misplaced no more HEALHT_WARN • S3 tiering policy, bucket versioning • RBD live image migration (librbd only); rbd-mirror got simpler; rbd top & and rbd CLI; • CephFS multi-fs support stable; Clustered nfs-ganesha (active/active) • Run Ceph clusters in Kubernetes (Rook, ceph-ansible)
  • 19. The Cloud Specialists Add Ceph to CloudStack Create offerings for Ceph - Deploy a VM
  • 20. The Cloud Specialists Let’s check our ACS volume on Ceph
  • 21. The Cloud Specialists Volume provisioning steps: • Copy template from SS to Ceph: “0a7cd56c-beb0-11e9-b920-1e00c701074a” • Create a base snapshots and protect it (can’t be deleted): “cloudstack-base-snap” • Create a VM’s volume as the child (clone) of the snap: “feb056c5-72d4-400a-92c2-f25c64fe9d26” Find all volumes (children) of specific template (base-snap of the template image) <root@ceph1># rbd children cloudstack/0a7cd56c-beb0-11e9-b920-1e00c701074a@cloudstack- base-snap cloudstack/feb056c5-72d4-400a-92c2-f25c64fe9d26 cloudstack/8481fcb1-a91e-4955-a7fc-dd04a44edce5 cloudstack/9b8f978b-74d0-48f7-93f6-5e06b9eb6fd3 cloudstack/3f65da05-268f-41fa-99b2-ce5d4e6d6597 …
  • 22. The Cloud Specialists Manually reproducing the ACS behavior: rbd create -p cloudstack mytemplate --size 100GB (or “qemu-img” convert, or “rbd import”…) rbd snap create cloudstack/mytemplate@cloudstack-base-snap rbd snap protect cloudstack/mytemplate@cloudstack-base-snap rbd clone cloudstack/mytemplate@cloudstack-base-snap cloudstack/myVMvolume …and the cleanup: [root@ceph1 ~]# rbd rm cloudstack/myVMvolume Removing image: 100% complete...done. [root@ceph1 ~]# rbd snap unprotect cloudstack/mytemplate@cloudstack-base-snap [root@ceph1 ~]# rbd snap rm cloudstack/mytemplate@cloudstack-base-snap Removing snap: 100% complete...done. [root@ceph1 ~]# rbd rm cloudstack/mytemplate Removing image: 100% complete...done.
  • 23. The Cloud Specialists “Hacking” the customer’s volume: • rbd map myPool/myImage (kernel client) (will usually fail due to kernel client “rbd.ko” being way behind the cluster version/capabilities) • rbd-nbd map myPool/myImage (user-space, via librbd) (requires “yum install rbd-nbd” and “modprobe nbd max_part=15*”) • qemu-nbd --connect=/dev/nbd0 rbd:myPool/myImage (user-space, via librbd) (requires “modprobe nbd*”) Qemu-img: • qemu-img info rbd:cloudstack/47b1cfe5-6bab-4506-87b6-d85b77d9b69c* • qemu-img info rbd:cloudstack/47b1cfe5-6bab-4506-87b6- d85b77d9b69c:mon_host=10.x.x.y:auth_supported=Cephx:id=cloudstack:key=AQAFSZ……..jEtr/g==
  • 24. The Cloud Specialists • No support for a full VM snapshot (technically not possible with Ceph/iSCSI/raw block devices) • No support for the storage heartbeat file (yet…) • Currently not possible to really restore a volume from a snapshot (old behaviour stays*) • Two “external” libraries to be aware of – librbd and rados-java
  • 25. The Cloud Specialists Not your average NFS: • Ceph can be a rather complex storage system to comprehend • Make sure you know the storage system well before relying on it in production • Make sure to excel at troubleshooting, you’ll need it sooner or later • Understand how the things works under the hood • Understand recovery throttling to avoid high impact on customer IO
  • 26. The Cloud Specialists • “Works on commodity hardware”, but don’t expect miracles • Writing data to primary OSD and replicating that write to another 2 OSDs, takes time • Latency is very good with NVME (0.5ms-1ms) • Not so very good with HDD/SSD mix (10ms-30ms) • Never, ever, ever… use consumer SSDs; bench and test specific enterprise SSD models • Too many parallel stream end up generating pure random IO pattern on the backend • Ceph was (unofficially) considered unsuitable for serious random IO workload (2-3y ago)*
  • 27. The Cloud Specialists Things have seriously changed last few years (especially with the new BlueStore backend) • Writing to the raw device (“block”) vs. XFS on FileStore; • RockDB (“block.db”, “block.wal”) vs. LevelDB • Now suitable for pure SSD/NVME clusters • Increased throughput 40-300%*, reduced latency 30-50%* vs. FileStore • Explicit memory management* (BlueStore runs in user-space) • Data and metadata checksums; Compression • Reads still served from Primary OSD only 
  • 28. The Cloud Specialists Step by step guide for Ceph with CloudStack (Mimic): • https://www.shapeblue.com/ceph-and-cloudstack-part-1/ • https://www.shapeblue.com/ceph-and-cloudstack-part-2/ • https://www.shapeblue.com/ceph-and-cloudstack-part-3/
  • 29. The Cloud Specialists CloudStack ShapeBlue.com • @ShapeBlue An d r i j a P a n i c , C l o u d a r c h i t e c t • P M C Ap a c h e C l o u d S t a c k a n d r i j a . p a n i c @ s h a p e b l u e . c o m • @ An d r i j a R S