This document provides an overview and introduction to MongoDB, an open-source, high-performance NoSQL database. It outlines MongoDB's features like document-oriented storage, replication, sharding, and CRUD operations. It also discusses MongoDB's data model, comparisons to relational databases, and common use cases. The document concludes that MongoDB is well-suited for applications like content management, inventory management, game development, social media storage, and sensor data databases due to its flexible schema, distributed deployment, and low latency.
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareAltinity Ltd
Presented on December ClickHouse Meetup. Dec 3, 2019
Concrete findings and "best practices" from building a cluster sized for 150 analytic queries per second on 100TB of http logs. Topics covered: hardware, clients (http vs native), partitioning, indexing, SELECT vs INSERT performance, replication, sharding, quotas, and benchmarking.
MongoDB is an open-source, document-oriented database that provides high performance and horizontal scalability. It uses a document-model where data is organized in flexible, JSON-like documents rather than rigidly defined rows and tables. Documents can contain multiple types of nested objects and arrays. MongoDB is best suited for applications that need to store large amounts of unstructured or semi-structured data and benefit from horizontal scalability and high performance.
This document provides an overview and introduction to ClickHouse, an open source column-oriented data warehouse. It discusses installing and running ClickHouse on Linux and Docker, designing tables, loading and querying data, available client libraries, performance tuning techniques like materialized views and compression, and strengths/weaknesses for different use cases. More information resources are also listed.
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB
“Why is MongoDB so slow?” you may ask yourself on occasion. You’ve created indexes, you’ve learned how to use the aggregation pipeline. What the heck? Could it be your queries? This talk will outline what tools are at your disposal (both in MongoDB Atlas and in MongoDB server) to identify inefficient queries.
24시간 365일 서비스를 위한 MySQL DB 이중화.
MySQL 이중화 방안들에 대해 알아보고 운영하면서 겪은 고민들을 이야기해 봅니다.
목차
1. DB 이중화 필요성
2. 이중화 방안
- HW 이중화
- MySQL Replication 이중화
3. 이중화 운영 장애
4. DNS와 VIP
5. MySQL 이중화 솔루션 비교
대상
- MySQL을 서비스하고 있는 인프라 담당자
- MySQL 이중화에 관심 있는 개발자
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
This document provides an overview of ClickHouse, an open source column-oriented database management system. It discusses ClickHouse's ability to handle high volumes of event data in real-time, its use of the MergeTree storage engine to sort and merge data efficiently, and how it scales through sharding and distributed tables. The document also covers replication using the ReplicatedMergeTree engine to provide high availability and fault tolerance.
This document provides an overview of the architectures and internals of Amazon DocumentDB and MongoDB. It discusses how DocumentDB separates computing and storage layers for improved scalability compared to MongoDB, which couples these layers. It also explains key differences in how each handles data reads/writes, replication, sharding, and other functions. The goal is to help users understand the pros and cons of each for their use cases and needs around performance, scalability and management.
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PGPgDay.Seoul
This document discusses DB2PG, a tool for migrating data between different database management systems. It began as an internal project in 2016 and has expanded its supported migration paths over time. It can now migrate schemas, tables, data types and more between Oracle, SQL Server, DB2, MySQL and other databases. The tool uses Java and supports multi-threaded imports for faster migration. Configuration files allow customizing the data type mappings and queries used during migration. The tool is open source and available on GitHub under the GPL v3 license.
ClickHouse Materialized Views: The Magic ContinuesAltinity Ltd
Slides for the webinar, presented on February 26, 2020
By Robert Hodges, Altinity CEO
Materialized views are the killer feature of ClickHouse, and the Altinity 2019 webinar on how they work was very popular. Join this updated webinar to learn how to use materialized views to speed up queries hundreds of times. We'll cover basic design, last point queries, using TTLs to drop source data, counting unique values, and other useful tricks. Finally, we'll cover recent improvements that make materialized views more useful than ever.
Indexes are references to documents that are efficiently ordered by key and maintained in a tree structure for fast lookup. They improve the speed of document retrieval, range scanning, ordering, and other operations by enabling the use of the index instead of a collection scan. While indexes improve query performance, they can slow down document inserts and updates since the indexes also need to be maintained. The query optimizer aims to select the best index for each query but can sometimes be overridden.
These are slides from our Big Data Warehouse Meetup in April. We talked about NoSQL databases: What they are, how they’re used and where they fit in existing enterprise data ecosystems.
Mike O’Brian from 10gen, introduced the syntax and usage patterns for a new aggregation system in MongoDB and give some demonstrations of aggregation using the new system. The new MongoDB aggregation framework makes it simple to do tasks such as counting, averaging, and finding minima or maxima while grouping by keys in a collection, complementing MongoDB’s built-in map/reduce capabilities.
For more information, visit our website at http://casertaconcepts.com/ or email us at [email protected].
배민찬(https://www.baeminchan.com) 서비스의 백엔드 시스템 중 일부가 지난 1년간 어떤 고민과 아이디어, 결과물을 만들어냈는지 공유하려고 합니다. 발표 중 언급되는 용어나 도구에 대해 일반적인 정의나 간단한 설명은 언급되나 자세히 다루지 않습니다. 사용된 도구들로 어떻게 이벤트 기반 분산 시스템을 만들었는지에 대한 이야기가 중심입니다.
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
MongoDB .local Toronto 2019: Tips and Tricks for Effective IndexingMongoDB
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. I will share more common mistakes observed and some tips and tricks to avoiding them.
MySQL has multiple timeouts variables to control its operations. This presentation focus on the purpose of each timeout variables and how it can be used.
This document provides an introduction and overview of PostgreSQL, an open-source object-relational database management system. It discusses that PostgreSQL supports modern SQL features, has free commercial and academic use, and offers performance comparable to other databases while being very reliable with stable code and robust testing. The architecture uses a client-server model to handle concurrent connections and transactions provide atomic, isolated, and durable operations. PostgreSQL also supports user-defined types, inheritance, and other advanced features.
This document provides an overview and introduction to ClickHouse, an open source column-oriented data warehouse. It discusses installing and running ClickHouse on Linux and Docker, designing tables, loading and querying data, available client libraries, performance tuning techniques like materialized views and compression, and strengths/weaknesses for different use cases. More information resources are also listed.
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB
“Why is MongoDB so slow?” you may ask yourself on occasion. You’ve created indexes, you’ve learned how to use the aggregation pipeline. What the heck? Could it be your queries? This talk will outline what tools are at your disposal (both in MongoDB Atlas and in MongoDB server) to identify inefficient queries.
24시간 365일 서비스를 위한 MySQL DB 이중화.
MySQL 이중화 방안들에 대해 알아보고 운영하면서 겪은 고민들을 이야기해 봅니다.
목차
1. DB 이중화 필요성
2. 이중화 방안
- HW 이중화
- MySQL Replication 이중화
3. 이중화 운영 장애
4. DNS와 VIP
5. MySQL 이중화 솔루션 비교
대상
- MySQL을 서비스하고 있는 인프라 담당자
- MySQL 이중화에 관심 있는 개발자
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
This document provides an overview of ClickHouse, an open source column-oriented database management system. It discusses ClickHouse's ability to handle high volumes of event data in real-time, its use of the MergeTree storage engine to sort and merge data efficiently, and how it scales through sharding and distributed tables. The document also covers replication using the ReplicatedMergeTree engine to provide high availability and fault tolerance.
This document provides an overview of the architectures and internals of Amazon DocumentDB and MongoDB. It discusses how DocumentDB separates computing and storage layers for improved scalability compared to MongoDB, which couples these layers. It also explains key differences in how each handles data reads/writes, replication, sharding, and other functions. The goal is to help users understand the pros and cons of each for their use cases and needs around performance, scalability and management.
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PGPgDay.Seoul
This document discusses DB2PG, a tool for migrating data between different database management systems. It began as an internal project in 2016 and has expanded its supported migration paths over time. It can now migrate schemas, tables, data types and more between Oracle, SQL Server, DB2, MySQL and other databases. The tool uses Java and supports multi-threaded imports for faster migration. Configuration files allow customizing the data type mappings and queries used during migration. The tool is open source and available on GitHub under the GPL v3 license.
ClickHouse Materialized Views: The Magic ContinuesAltinity Ltd
Slides for the webinar, presented on February 26, 2020
By Robert Hodges, Altinity CEO
Materialized views are the killer feature of ClickHouse, and the Altinity 2019 webinar on how they work was very popular. Join this updated webinar to learn how to use materialized views to speed up queries hundreds of times. We'll cover basic design, last point queries, using TTLs to drop source data, counting unique values, and other useful tricks. Finally, we'll cover recent improvements that make materialized views more useful than ever.
Indexes are references to documents that are efficiently ordered by key and maintained in a tree structure for fast lookup. They improve the speed of document retrieval, range scanning, ordering, and other operations by enabling the use of the index instead of a collection scan. While indexes improve query performance, they can slow down document inserts and updates since the indexes also need to be maintained. The query optimizer aims to select the best index for each query but can sometimes be overridden.
These are slides from our Big Data Warehouse Meetup in April. We talked about NoSQL databases: What they are, how they’re used and where they fit in existing enterprise data ecosystems.
Mike O’Brian from 10gen, introduced the syntax and usage patterns for a new aggregation system in MongoDB and give some demonstrations of aggregation using the new system. The new MongoDB aggregation framework makes it simple to do tasks such as counting, averaging, and finding minima or maxima while grouping by keys in a collection, complementing MongoDB’s built-in map/reduce capabilities.
For more information, visit our website at http://casertaconcepts.com/ or email us at [email protected].
배민찬(https://www.baeminchan.com) 서비스의 백엔드 시스템 중 일부가 지난 1년간 어떤 고민과 아이디어, 결과물을 만들어냈는지 공유하려고 합니다. 발표 중 언급되는 용어나 도구에 대해 일반적인 정의나 간단한 설명은 언급되나 자세히 다루지 않습니다. 사용된 도구들로 어떻게 이벤트 기반 분산 시스템을 만들었는지에 대한 이야기가 중심입니다.
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
MongoDB .local Toronto 2019: Tips and Tricks for Effective IndexingMongoDB
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. I will share more common mistakes observed and some tips and tricks to avoiding them.
MySQL has multiple timeouts variables to control its operations. This presentation focus on the purpose of each timeout variables and how it can be used.
This document provides an introduction and overview of PostgreSQL, an open-source object-relational database management system. It discusses that PostgreSQL supports modern SQL features, has free commercial and academic use, and offers performance comparable to other databases while being very reliable with stable code and robust testing. The architecture uses a client-server model to handle concurrent connections and transactions provide atomic, isolated, and durable operations. PostgreSQL also supports user-defined types, inheritance, and other advanced features.
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...Amazon Web Services Korea
대량의 트랜잭션을 빠르고 유연하게 처리하기 위해서는, 데이터 처리 및 저장 방식에 대한 변화를 고려해야 합니다. 본 세션에서는 어플리케이션이 요구하는 다양한 사용 패턴 및 성능 요구사항을 살펴보고, NoSQL(Elasticache Redis, DynamoDB)을 기반으로 이를 효율적으로 처리하기 위한 디자인 및 쿼리 패턴을 포함한 기술적 고려사항을 알아봅니다.
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
This presentation discusses migrating data from other data stores to MongoDB Atlas. It begins by explaining why MongoDB and Atlas are good choices for data management. Several preparation steps are covered, including sizing the target Atlas cluster, increasing the source oplog, and testing connectivity. Live migration, mongomirror, and dump/restore options are presented for migrating between replicasets or sharded clusters. Post-migration steps like monitoring and backups are also discussed. Finally, migrating from other data stores like AWS DocumentDB, Azure CosmosDB, DynamoDB, and relational databases are briefly covered.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
Humana, like many companies, is tackling the challenge of creating real-time insights from data that is diverse and rapidly changing. This is our journey of how we used MongoDB to combined traditional batch approaches with streaming technologies to provide continues alerting capabilities from real-time data streams.
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch".
This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
The document discusses guidelines for ordering fields in compound indexes to optimize query performance. It recommends the E-S-R approach: placing equality fields first, followed by sort fields, and range fields last. This allows indexes to leverage equality matches, provide non-blocking sorts, and minimize scanning. Examples show how indexes ordered by these guidelines can support queries more efficiently by narrowing the search bounds.
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
The document describes a methodology for data modeling with MongoDB. It begins by recognizing the differences between document and tabular databases, then outlines a three step methodology: 1) describe the workload by listing queries, 2) identify and model relationships between entities, and 3) apply relevant patterns when modeling for MongoDB. The document uses examples around modeling a coffee shop franchise to illustrate modeling approaches and techniques.
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms.
How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms?
In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
aux Core Data, appréciée par des centaines de milliers de développeurs. Apprenez ce qui rend Realm spécial et comment il peut être utilisé pour créer de meilleures applications plus rapidement.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $.
La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.
4. 1.1 네이버 컨텐츠검색
인물, 영화, 방송, 날씨, 스포츠 등
다양한 주제의 검색 쿼리 대응.
단순 문서검색이 아닌 고퀄의 검색 결과 제공.
5. 1.2 네이버 컨텐츠검색과 몽고DB
DB
API File
Secondary Secondary Primary Primary Secondary Secondary
MongoDB Replica set
(Apache + Tomcat) * n
Search API
…
Chicago
Spring Batch
Indexer 데이터
Client
JSON
Indexing
HTTP GET
Feeding
7. 2. Why MongoDB?
FAST + SCALABLE + HIGHLY AVAILABLE = MongoDB
- 네이버 통합검색의 1초 rule, 현재 시카고 플랫폼 평균 응답속도 10ms 미만.
- 일 평균 4~5억건 이상 서버호출, 초당 6천건 수준, 몽고DB는 초당 만건 이상.
9. 3-1. MongoDB Index 이해하기
MongoDB는 컬렉션당 최대 64개의 인덱스만 생성 가능.
너무 많은 인덱스를 추가하면, 오히려 side effects가 발생한다.
- Frequent Swap.
- Write performance 감소.
10. 3-2. Index prefix를 사용하자
db.nbaGame.createIndex({teamCode: 1, season: 1, gameDate: 1})
Index prefixes are the beginning subsets of index fields.
- {teamCode: 1, season: 1}
- {teamCode: 1}
Index prefixes of the index
12. 3-2. Index prefix를 사용하자
Not Necessary!
db.nbaGame.createIndex({teamCode: 1, season: 1});
db.nbaGame.createIndex({teamCode: 1});
13. 3-2. Index prefix를 사용하자
= Not Covered!
db.nbaGame.find({season: “2018”});
db.nbaGame.find({gameDate: “20180414”});
db.nbaGame.find({season: “2018”, gameDate: “20180414”});
위 모든 쿼리들이 teamCode 필드를 가지고 있지 않음. (not index prefix)
db.nbaGame.find({teamCode: “gsw”, gameDate: “20180414”});
컨디션이 teamCode로 시작하지만 season 필드가 뒤이어 나오지 않음.
= Half Covered!
db.nbaGame.createIndex({teamCode: 1, season: 1, gameDate: 1});
14. 3-3. 멀티 소팅
Sort key들은 반드시 인덱스와 같은 순서로 나열 되어야만 한다.
db.nbaGame.find().sort({gameDate: 1, teamName: 1 }); // covered
db.nbaGame.find().sort({teamName: 1, gameDate: 1 }); // not covered
db.nbaGame.createIndex({gameDate: 1, teamName: 1});
15. 3-3. 멀티 소팅
- 그러나 compound 인덱스의 경우, 소팅 방향이 중요함.
(멀티소팅의 방향은 반드시 index key pattern 또는 index key pattern의 inverse와 일치 해야함)
db.nbaGame.find().sort({gameDate: 1, teamName: 1 });
db.nbaGame.find().sort({gameDate: -1, teamName: -1 }); // the inverse
db.nbaGame.find().sort({gameDate: 1}); // index prefix
db.nbaGame.find().sort({gameDate: -1});
db.nbaGame.createIndex({gameDate: 1, teamName: 1});
- single-field 인덱스의 경우, 소팅 방향을 걱정할 필요 없음.
= Covered!
16. 3-3. 멀티 소팅
db.nbaGame.find().sort({gameDate: -1, teamName: 1 }); // not matched
db.nbaGame.find().sort({gameDate: 1, teamName: -1 });
db.nbaGame.find().sort({teamName : 1}); // not index prefix
db.nbaGame.find().sort({teamName : -1});
db.nbaGame.createIndex({gameDate: 1, teamName: 1});
Not Covered!
17. 3-3. 멀티 소팅
db.nbaGame.createIndex({gameDate: 1, teamName: 1});
db.nbaGame.createIndex({gameDate: -1, teamName: 1});
db.nbaGame.createIndex({teamName: 1, gameDate: 1});
db.nbaGame.createIndex({teamName: -1, gameDate: 1});
- 위 4개 인덱스로 {gameDate, teamName}로 만들 수 있는 아래 12가지 조합들 지원 .
- 아래 12개의 쿼리를 위해 각각 인덱스를 만들 필요가 없다.
1. find().sort({gameDate: 1, teamName: 1}); 2. find().sort({gameDate: -1, teamName: -1});
3. find().sort({gameDate: 1, teamName: -1}); 4. find().sort({gameDate: -1, teamName: 1});
5. find().sort({teamName: 1, gameDate: 1}); 6. find().sort({teamName: -1, gameDate: -1});
7. find().sort({teamName: 1, gameDate: -1}); 8. find().sort({teamName: -1, gameDate: 1});
9. find().sort({gameDate: 1}); 10. find().sort({gameDate: -1});
11. find().sort({teamName: 1}); 12. find().sort({teamName: -1});
20. 4-1. 하나의 컬렉션을 여러 컬렉션으로 나누자
- 하나의 컬렉션이 너무 많은 문서를 가질 경우, 인덱스 사이즈가 증가하고 인덱스 필드의
cardinality가 낮아질 가능성이 높다.
- 이는 lookup performance에 악영향을 미치고 slow query를 유발한다.
- 한 컬렉션에 너무 많은 문서가 있다면 반드시 컬렉션을 나눠서 query processor가 중복되는 인덱스
key를 look up하는 것을 방지해야 한다.
21. 4-2. 쓰레드를 이용해 대량의 Document를 upsert
- 여러 개의 thread에서 Bulk Operation으로 많은 document를 한번에 write.
- document transaction과 a relation은 지원안됨.
- Writing time을 획기적으로 개선 가능. Spring Batch
Writing
Secondary Secondary Primary Primary Secondary Secondary
MongoDB Replica set
Thread Thread Thread Thread Thread Thread…
22. 4-3. MongoDB 4.0으로 업그레이드
- 몽고DB 4.0 이전에는 non-blocking secondary read 기능이 없었음.
- Write가 primary에 반영 되고 secondary들에 다 전달 될때까지 secondary는 read를 block해서
데이터가 잘못된 순서로 read되는 것을 방지함.
- 그래서 주기적으로 높은 global lock acquire count가 생기고 read 성능이 저하됨.
- 몽고DB 4.0부터는 data timestamp와 consistent snapshot을 이용해서 이 이슈를 해결함.
- 이게 non-blocking secondary read.
35. 5-5. Index가 미운짓을 하는 이유
왜 자꾸 엉뚱한 인덱스를 타는걸까? 에서 의문시작.
몽고 DB는 쿼리가 들어왔을때 어떻게 최선의 쿼리플랜을 찾을까?
여러개의 쿼리 플랜들을 모두 실행해 볼 수도 없고…
36. 5-5. Index가 미운짓을 하는 이유
답은 Query Planner 에서 찾을 수 있었음.
이전에 실행한 쿼리 플랜을 캐싱 해놓음.
캐싱된 쿼리 플랜이 없다면
가능한 모든 쿼리 플랜들을 조회해서
첫 batch(101개)를 가장 좋은 성능으로
가져오는 플랜을 다시 캐싱함.
성능이 너무 안좋아지면 위 작업을 반복.
* 캐싱의 사용 조건은 같은 Query Shape 일 경우
37. 5-6. Index가 미운짓을 하는 원인
db.concert.count({
serviceCode: { $in: [ 2, 3 ]}, startDate: { $lte: 20190722 }, endDate: { $gte: 20190722}
});
이 쿼리는 전체 조회가 필요해 느릴 수 있지만 (key examined : 88226)
db.concert.find({
serviceCode: { $in: [ 2, 3 ]}, startDate: { $lte: 20190722 }, endDate: { $gte: 20190722}
}).limit(101);
101개의 오브젝트만 가져온다면 순식간에 가져옴.
(101개를 찾자 마자 return 하므로 key examined : 309)
결국 { serviceCode: 1, weight:1 } 인덱스를 사용하더라도
38. 5-6. Index가 미운짓을 하는 원인
그래서 Query Planner가 count() 쿼리의 Query Shape로
첫번째 batch를 가져오는 성능을 테스트 했을때
{serviceCode: 1, weight:1} 같은 엄한 인덱스가 제일 좋은 성능을 보일 수 있는거임.
* 만약에 동점이 나올 경우, in-memory sort를 하지 않아도 되는 쿼리 플랜을 선택함.
(몽고DB는 32MB가 넘는 결과값들에 대해선 in-memory sort가 불가능 하므로)
39. 5-7. 솔로몬의 선택
Hint 이용
VS
엄한 인덱스를 지우기
- 확실한 쿼리플랜 보장
- 더 효율적인 인덱스가 생겨도 강제 고정
- 데이터 분포에 대한 정보를 계속 follow 해야함
- 32MB넘는 응답결과를 sort해야 할 경우 에러
- 데이터에 따라 더 효율적인 인덱스가 생기면 자동 대응
- 또다른 엄한 케이스가 생길 수 있음
- 삭제로 인해 영향 받는 다른 쿼리가 생길 수 있음
40. 5-7. 솔로몬의 선택
Hint 이용
VS
엄한 인덱스를 지우기
- 확실한 쿼리플랜 보장
- 더 효율적인 인덱스가 생겨도 강제 고정
- 데이터 분포에 대한 정보를 계속 follow 해야함
- 32MB넘는 응답결과를 sort해야 할 경우 에러
- 데이터에 따라 더 효율적인 인덱스가 생기면 자동 대응
- 또다른 엄한 케이스가 생길 수 있음
- 삭제로 인해 영향 받는 다른 쿼리가 생길 수 있음