活用段階に入ったNoSQLですがまだまだ実際どう使えるのかご存じ無い方も多いのでは無いでしょうか。当セッションでは、MapR-DB(Hbase互換のNoSQL)が企業でどう活用されているのか、インドのマイナンバー事例や国内事例を元に実際の使い方のイメージと技術的な裏付けをご説明します。2015年6月10〜12日に開催されたdb tech showcase Tokyo 2015での講演資料です。
Domain Driven Design 기반의 마이크로서비스 디자인 방법론에 대해 설명을 하고 피보탈이 권장하는 모노리스 애플리케이션의 마이크로서비스 전환 방법론에 대해 살펴봅니다. 또한 실제 마이크로서비스 프로젝트에서 발생할 수 있는 우려사항들에 대해서도 국내 프로젝트 경험을 바탕으로 짚어봅니다.
This document provides an overview of WiredTiger, an open-source embedded database engine that provides high performance through its in-memory architecture, record-level concurrency control using multi-version concurrency control (MVCC), and compression techniques. It is used as the storage engine for MongoDB and supports key-value data with a schema layer and indexing. The document discusses WiredTiger's architecture, in-memory structures, concurrency control, compression, durability through write-ahead logging, and potential future features including encryption and advanced transactions.
29回勉強会資料「PostgreSQLのリカバリ超入門」
See also http://www.interdb.jp/pgsql (Coming soon!)
初心者向け。PostgreSQLのWAL、CHECKPOINT、 オンラインバックアップの仕組み解説。
これを見たら、次は→ http://www.slideshare.net/satock/29shikumi-backup
AWS 클라우드를 활용하면 사용자의 트래픽에 따라 IT 인프라 아키텍처를 확장할 수 있습니다. 이번 강연에서는 서비스 초기의 작은 트래픽에 대응할 수 있는 단순한 아키텍처로 시작해 사업 성장 후의 수백만 사용자에 달하는 대규모 트래픽을 지탱할 수 있는 고확장성 아키텍처에 이르기까지의 단계별 아키텍처 구성 방법에 대해 소개해 드리고 컴퓨팅 및 데이터베이스 선택 및 사용자 증가에 따른 트래픽 경감 방법, 오토스케일링 및 모니터링과 자동화, DB 부하 분산, 고가용성 확보 등에 대한 다양한 모범사례를 알려드릴 예정입니다.
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...HostedbyConfluent
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Ethan Guo | Current 2022
Back in 2016, Apache Hudi brought transactions, change capture on top of data lakes, what is today referred to as the Lakehouse architecture. In this session, we first introduce Apache Hudi and the key technology gaps it fills in the modern data architecture. Bridging traditional data lakes and warehouses, Hudi helps realize the Lakehouse vision, by bringing transactions, optimized table metadata to data lakes and powerful storage layout optimizations, moving them closer to cloud warehouses of today. Viewed from a data engineering lens, Hudi also plays a key unifying role between the batch and stream processing worlds, by acting as a columnar, server-less ""state store"" for batch jobs, ushering in what we call the incremental processing model, where batch jobs can consume new data, update/delete intermediate results in a Hudi table, instead of re-computing/re-write entire output like old-school big batch jobs.
Rest of talk focusses on a deep dive into the some of the time-tested design choices and tradeoffs in Hudi, that helps power some of the largest transactional data lakes on the planet today. We will start by describing a tour of the storage format design, including data, metadata layouts and of course Hudi's timeline, an event log that is central to implementing ACID transactions and concurrency control. We will delve deeper into the practical concurrency control pitfalls in data lakes, and show how Hudi's hybrid approach combining MVCC with optimistic concurrency control, lowers contention and unlocks minute-level near real-time commits to Hudi tables. We will conclude with code examples that showcase Hudi's rich set of table services that perform vital table management such as cleaning older file versions, compaction of delta logs into base files, dynamic re-clustering for faster query performance, or the more recently introduced indexing service that maintains Hudi's multi-modal indexing capabilities.
Storm is a distributed and fault-tolerant realtime computation system. It was created at BackType/Twitter to analyze tweets, links, and users on Twitter in realtime. Storm provides scalability, reliability, and ease of programming. It uses components like Zookeeper, ØMQ, and Thrift. A Storm topology defines the flow of data between spouts that read data and bolts that process data. Storm guarantees processing of all data through its reliability APIs and guarantees no data loss even during failures.
The document appears to be a technical paper on graph databases and Amazon Neptune. It discusses challenges in building applications with highly connected data and how Neptune can help by allowing storage and querying of graph data through traversal of relationships between entities. It provides examples of using Neptune to represent social network data and querying graph data through Gremlin.
The document discusses local secondary indexes in Apache Phoenix. Local indexes are stored in the same region as the base table data, providing faster index building and reads compared to global indexes. The write process involves preparing index updates along with data updates and writing them atomically to memstores and the write ahead log. Reads scan the local index and retrieve any missing columns from the base table. Local indexes improve write performance over global indexes due to reduced network utilization. The document provides performance results and tips on using local indexes.
Domain Driven Design 기반의 마이크로서비스 디자인 방법론에 대해 설명을 하고 피보탈이 권장하는 모노리스 애플리케이션의 마이크로서비스 전환 방법론에 대해 살펴봅니다. 또한 실제 마이크로서비스 프로젝트에서 발생할 수 있는 우려사항들에 대해서도 국내 프로젝트 경험을 바탕으로 짚어봅니다.
This document provides an overview of WiredTiger, an open-source embedded database engine that provides high performance through its in-memory architecture, record-level concurrency control using multi-version concurrency control (MVCC), and compression techniques. It is used as the storage engine for MongoDB and supports key-value data with a schema layer and indexing. The document discusses WiredTiger's architecture, in-memory structures, concurrency control, compression, durability through write-ahead logging, and potential future features including encryption and advanced transactions.
29回勉強会資料「PostgreSQLのリカバリ超入門」
See also http://www.interdb.jp/pgsql (Coming soon!)
初心者向け。PostgreSQLのWAL、CHECKPOINT、 オンラインバックアップの仕組み解説。
これを見たら、次は→ http://www.slideshare.net/satock/29shikumi-backup
AWS 클라우드를 활용하면 사용자의 트래픽에 따라 IT 인프라 아키텍처를 확장할 수 있습니다. 이번 강연에서는 서비스 초기의 작은 트래픽에 대응할 수 있는 단순한 아키텍처로 시작해 사업 성장 후의 수백만 사용자에 달하는 대규모 트래픽을 지탱할 수 있는 고확장성 아키텍처에 이르기까지의 단계별 아키텍처 구성 방법에 대해 소개해 드리고 컴퓨팅 및 데이터베이스 선택 및 사용자 증가에 따른 트래픽 경감 방법, 오토스케일링 및 모니터링과 자동화, DB 부하 분산, 고가용성 확보 등에 대한 다양한 모범사례를 알려드릴 예정입니다.
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...HostedbyConfluent
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Ethan Guo | Current 2022
Back in 2016, Apache Hudi brought transactions, change capture on top of data lakes, what is today referred to as the Lakehouse architecture. In this session, we first introduce Apache Hudi and the key technology gaps it fills in the modern data architecture. Bridging traditional data lakes and warehouses, Hudi helps realize the Lakehouse vision, by bringing transactions, optimized table metadata to data lakes and powerful storage layout optimizations, moving them closer to cloud warehouses of today. Viewed from a data engineering lens, Hudi also plays a key unifying role between the batch and stream processing worlds, by acting as a columnar, server-less ""state store"" for batch jobs, ushering in what we call the incremental processing model, where batch jobs can consume new data, update/delete intermediate results in a Hudi table, instead of re-computing/re-write entire output like old-school big batch jobs.
Rest of talk focusses on a deep dive into the some of the time-tested design choices and tradeoffs in Hudi, that helps power some of the largest transactional data lakes on the planet today. We will start by describing a tour of the storage format design, including data, metadata layouts and of course Hudi's timeline, an event log that is central to implementing ACID transactions and concurrency control. We will delve deeper into the practical concurrency control pitfalls in data lakes, and show how Hudi's hybrid approach combining MVCC with optimistic concurrency control, lowers contention and unlocks minute-level near real-time commits to Hudi tables. We will conclude with code examples that showcase Hudi's rich set of table services that perform vital table management such as cleaning older file versions, compaction of delta logs into base files, dynamic re-clustering for faster query performance, or the more recently introduced indexing service that maintains Hudi's multi-modal indexing capabilities.
Storm is a distributed and fault-tolerant realtime computation system. It was created at BackType/Twitter to analyze tweets, links, and users on Twitter in realtime. Storm provides scalability, reliability, and ease of programming. It uses components like Zookeeper, ØMQ, and Thrift. A Storm topology defines the flow of data between spouts that read data and bolts that process data. Storm guarantees processing of all data through its reliability APIs and guarantees no data loss even during failures.
The document appears to be a technical paper on graph databases and Amazon Neptune. It discusses challenges in building applications with highly connected data and how Neptune can help by allowing storage and querying of graph data through traversal of relationships between entities. It provides examples of using Neptune to represent social network data and querying graph data through Gremlin.
The document discusses local secondary indexes in Apache Phoenix. Local indexes are stored in the same region as the base table data, providing faster index building and reads compared to global indexes. The write process involves preparing index updates along with data updates and writing them atomically to memstores and the write ahead log. Reads scan the local index and retrieve any missing columns from the base table. Local indexes improve write performance over global indexes due to reduced network utilization. The document provides performance results and tips on using local indexes.
This document discusses strategies for optimizing access to large "master data" files in PHP applications. It describes converting master data files from PHP arrays to tab-separated value (TSV) files to reduce loading time. Benchmark tests show the TSV format reduces file size by over 50% and loading time from 70 milliseconds to 7 milliseconds without OPcache. Accessing rows as arrays by splitting on tabs is 3 times slower but still very fast at over 350,000 gets per second. The TSV optimization has been used successfully in production applications.
Presto is a fast, distributed SQL query engine that allows for ad-hoc queries against data sources like Cassandra, Hive, Kafka and others. It uses a pluggable connector architecture that allows it to connect to different data sources. Presto's query execution is distributed across worker nodes and queries are compiled to Java bytecode for efficient execution. Some limitations of Presto include its inability to handle large joins and lack of fault tolerance.
[db tech showcase Tokyo 2015] C32:「データ一貫性にこだわる日立のインメモリ分散KVS~こだわりの理由と実現方法とは~」 ...Insight Technology, Inc.
分散KVSの特徴や使い方、分散KVS製品の選択指標である「一貫性」と「可用性」を解説します。また日立が開発したインメモリ分散KVS【Hitachi Elastic Application Data Store(EADS)】とその活用事例を紹介しつつ、EADSが「一貫性」にこだわる理由と、「一貫性」を実現するポイントとなったPaxosなど製品で使用されている技術についてお話しします。
[db tech showcase Tokyo 2015] C16:Oracle Disaster Recovery at New Zealand sto...Insight Technology, Inc.
This document provides an agenda and introduction for a presentation on disaster recovery using physical replication technology. The presentation will include an overview of Dbvisit Standby software, which enables disaster recovery for Oracle Standard Edition databases. It will also present a case study of how the New Zealand Stock Exchange uses Dbvisit Standby to ensure continuous availability of critical trading systems across two data centers.
[db tech showcase Tokyo 2015] D25:The difference between logical and physical...Insight Technology, Inc.
This document discusses the differences between physical and logical database replication in Oracle. It begins with introductions and an overview of Dbvisit Software. The main sections summarize physical replication, logical replication, and compare the two approaches. Physical replication uses complete redo blocks to keep the target database identical to the source. Logical replication mines redo logs and converts the information to SQL statements to replicate the data. The document outlines the advantages and disadvantages of each approach and how they work at a technical level.
This document introduces Hivemall, an open-source machine learning library built as a collection of Hive user-defined functions (UDFs). Hivemall allows users to perform scalable machine learning on large datasets stored in Hive/Hadoop. It supports various classification, regression, recommendation, and feature engineering algorithms. Some key algorithms include logistic regression, matrix factorization, random forests, and anomaly detection. Hivemall is designed to perform machine learning efficiently by avoiding intermediate data reads/writes to HDFS. It has been used in industry for applications such as click-through rate prediction, churn detection, and product recommendation.
Apache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッションMapR Technologies Japan
数あるSQL-on-Hadoopエンジンの中でも、標準SQL準拠、柔軟で動的なデータ解釈、様々なデータソースや格納形式への対応という特徴を持つApache Drill。デモを中心に、Drillの便利な機能を利用したデータ検索・分析の楽しみ方をご紹介します。2014年11月6日に開催されたCloudera World Tokyo 2014 LTセッションでの講演資料です。
実践機械学習 — MahoutとSolrを活用したレコメンデーションにおけるイノベーション - 2014/07/08 Hadoop Conference ...MapR Technologies Japan
機械学習は、増え続けるデータをもとに、事業戦略の判断やより正確な予測、関連性の推定を行うための、重要なツールです。機械学習の中でも、最も幅広く活用されているアプリケーションはレコメンデーションエンジンです。スケーラブルな機械学習ライブラリであるMahoutは、レコメンデーションの生成とデータの扱いをシンプルなものにしてくれます。本講演では、より構築が簡単なレコメンデーションエンジンのデザインと、そのイノベーティブな実装方法を活用した場合の利点を紹介します。2014年7月8日に開催されたHadoop Conference Japan 2014での講演資料です。
24. Template Tables Audit Tables
Auth,
BFD, OTP
Auth Data
Server
FMS, Portals
Template Tables Audit Tables
Auth,
BFD, OTP
FMS, Portals
B
Data
Center
A
Data
Center
Read
Read
Write
Write
Read
Read Write
S
y
n
c
S
y
n
c
Aadhaar システムアーキテクチャ概要