The document discusses different NoSQL databases and how Cassandra compares to them. It notes that Cassandra uses a Dynamo-inspired architecture with Bigtable-style columns. Cassandra provides better write performance than MySQL through its use of consistent hashing and replication across multiple data centers for high availability. It also offers better read performance than MySQL for large datasets through its use of column-oriented storage.
The document discusses different NoSQL databases and how Cassandra compares to them. It notes that Cassandra uses a Dynamo-inspired architecture with Bigtable-style columns. Cassandra provides better write performance than MySQL through its use of consistent hashing and replication across multiple data centers for high availability. It also offers better read performance than MySQL for large datasets through its use of column-oriented storage.
We have been offering many internet services and smart phone applications for over 20 years in Japan, and Cassandra has been used by our services since 2010. In this presentation, I will explain some issues and solutions about Cassandra, and our next generation infrastructure for Cassandra.
About the Speaker
Satoshi Konno Technical Manager, Yahoo Japan Corporation
Satoshi Konno is a software engineer with 20 years of experience. He has worked in Yahoo Japan as a programmer for 10 years and in their NoSQL team for the past 4 years and he is currently in a computer science doctoral course studying distributed computing.
How you can contribute to Apache CassandraYuki Morishita
Yuki Morishita discusses how to contribute to the Apache Cassandra project, including submitting code patches as a programmer or contributing in other ways such as reporting bugs, testing patches, sharing use cases, and helping others on mailing lists and IRC channels. Programmers are instructed on tools, coding style, testing using ccm and cassandra-dtest, and submitting patches via JIRA. Non-programmers are encouraged to report bugs, test patches, blog/tweet experiences, and assist others on forums.
Cassandra Meetup in Tokyo, Fall 2014
http://datastaxjp.connpass.com/event/9867/
Yahoo! JAPANにおけるApache Cassandraへの取り組み
Some might think Docker is for developers only, but this is not really the case.Docker is here to stay and we will only see more of it in the future.
In this session learn what Docker is and how it works.This session will be covering core areas such as volumes, but also stepping it up to a few tips and tricks to help you get the most out of your Docker environment.The session will dive into a few examples of how to create a database environment within just a few minutes - perfect for testing,development, and possibly even production systems.
Machine Learning explained with Examples
Everybody is talking about machine learning. What is it actually and how can I use it?
In this presentation we will see some examples of solving real life use cases using machine learning. We will define Tasks and see how that task can be addressed using machine learning.
SQL Server 2017でLinuxに対応し、その延長線でDocker対応やKubernetesによる可用性構成が組めるようになりました。そしてリリースを間近に控えたSQL Server 2019ではKubernetesを活用したBig Data Cluster機能の提供が予定されており、コンテナの活用範囲はさらに広がっています。
本セッションではこれからSQL Serverコンテナに触れていくための基礎知識と実際に触れてみるための手順やサンプルをお届けします。
6. 提供サービス
Media
US
Search Video Answer Mail
JP
US
JP
Membership C2C Payment C2C EC B2C EC Local
Search Knowledge search MailNews
YAHUOKU!Premium Wallet Loco
19. Cassandra process goes down when too many clients
connect it
19
Client machine
Apache child process
Client machine
Client machine
20. • 200 client machines
Cassandra process goes down when too many clients
connect it
20
21. • 200 client machines * 128 apache child
processes
Cassandra process goes down when too many clients
connect it
21
22. • 200 client machines * 128 apache child
processes * 2 (request + heart beat) =
Cassandra process goes down when too many clients
connect it
22
23. • 200 client machines * 128 apache child
processes * 2 (request + heart beat) =
51,200 connections / node
Cassandra process goes down when too many clients
connect it
23
24. Cassandra process goes down when too many clients
connect it
24
Client machine
Apache child process
51,200 > 32,768 ( max open file num )
Client machine
Client machine
写真:アフロ
42. remark
• This is a summary of following tickets:
– https://issues.apache.org/jira/browse/CASSANDRA-11206
– https://issues.apache.org/jira/browse/CASSANDRA-9738
44. High level: read path
Row Cache
Key Cache
SSTables Mem Table
1. Check row cache before going to key cache
2. Check the key cache to get the
offsets to data
3. Find the offsets to data and retrieve data
4. Merge data from sstables and memtable
5. Populate row cache with new row returned
http://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dmlAboutReads.html
45. Pattern 1. The row is in row cache
Partition
Summary
Disk
Mem Table
Compression
Offsets
Bloom Filter
Row Cache
Heap Off Heap
Key Cache
Partition
Index
Data
1. read request
2. return row when that is in row cache
46. Pattern 2. The key is in key cache
Partition
Summary
Disk
Mem Table
Compression
Offsets
Bloom Filter
Row Cache
Heap Off Heap
Key Cache
Partition
Index
Data
1. read request
2. Check bloom filters 3. Check the partition key is in key cache
4. Find the offset to the result set
5. Access the result set
47. Pattern 3. The key is not cached
Partition
Summary
Disk
Mem Table
Compression
Offsets
Bloom Filter
Row Cache
Heap Off Heap
Key Cache
Partition
Index
Data
1. read request
2. Miss -> Check bloom filters
3. Check the partition key is in key cache
4. Miss -> Bsearch the close location of index
5. Disk scan to find the offsets 6. Find the offset into the result set
7. Access the result set
8. Update key cache
49. Partition Index Recap
• http://distributeddatastore.blogspot.jp/2013/08/cassandra-sstable-storage-format.html
50. RowIndexEntry
• Partition size < 64 kb
– RowIndexEntry
• Position
• Seriarized size of data
• Partition size > 64 kb
– IndexedEntry
• Position
• Seriarized size of data
• IndexInfo[]
– Seriarize method
– Offset
– width
– Etc.
Approximation on 16 byte value
1mb : 3kb / > 200 objects
4mb : 11kb / > 800 objects
64mb : 180kb / > 13k objects
512mb : 1.4mb / > 106k objects
51. 3. The key is not cached
Partition
Summary
Disk
Mem Table
Compression
Offsets
Bloom Filter
Row Cache
Heap Off Heap
Key Cache
Partition
Index
Data
1. read request
2. Miss -> Check bloom filters
3. Check the partition key is in key cache
4. Miss -> Bsearch the close location of index
5. Disk scan to find the offsets 6. Find the offsets into the result set
7. Access the result set
8. Update key cache
9. GC, GC, GC…
52. Current solution
• If partition size <
column_index_cache_size_in_kb(configurable)
– IndexedEntry is kept on heap
• Otherwise
– Always read from disk when needed
• https://issues.apache.org/jira/browse/CASSANDRA-11206
• https://www.youtube.com/watch?v=qa84vABqftM
53. Other possible solutions
• IndexInfo never be kept on heap
– Read from disk when needed
– degrades performance when small partition is read
54. Other possible solutions
• Migrate key cache to be fully off heap
– https://issues.apache.org/jira/browse/CASSANDRA-9738
– Serialization & deserialization cost so much when large partition is
read
• Will Birch help us to solve this problem?
– https://issues.apache.org/jira/browse/CASSANDRA-9754
55. What we go for
• 来年もNGCCに呼んでもらえるように頑張ろう!
そのためには?: Cassandraコミュニティに貢献する
1. 日本で一番Apache Cassandraを使っている会社になる。
2. Cassandraのコード改善や問題提起の活動を継続する。
3. Cassandraコミュニティの人と仲良くなる。
55
60. 付録:NGCC動画集
• Next-Gen Schema
– https://www.youtube.com/watch?v=eAWRj0kqpvU
• Change Data Capture
– https://www.youtube.com/watch?v=Y0fOxa3tC98
• Explicit support for time series data
– https://www.youtube.com/watch?v=CmsQNNdDuSA
• Automated Repair
– https://www.youtube.com/watch?v=8sGUn6Q2bUU
• Storage format and key cache changes to support large partitions
– https://www.youtube.com/watch?v=qa84vABqftM
• SASI update
– https://www.youtube.com/watch?v=yUFoSAg6rA4
• Instagram’s use cases
– https://www.youtube.com/watch?v=VwhovoqavT4
• Lightning Talks
– https://www.youtube.com/watch?v=6y5UV4OTawg
60