29回勉強会資料「PostgreSQLのリカバリ超入門」
See also http://www.interdb.jp/pgsql (Coming soon!)
初心者向け。PostgreSQLのWAL、CHECKPOINT、 オンラインバックアップの仕組み解説。
これを見たら、次は→ http://www.slideshare.net/satock/29shikumi-backup
Data analytics with hadoop hive on multiple data centersHirotaka Niisato
This document discusses GMO Internet's data analytics system for analyzing social game data from over 500 game titles across multiple data centers in Japan and the US. It summarizes the system's architecture, which uses Hadoop/Hive to process logging data from game servers into hourly, daily, weekly, and monthly reports on key performance indicators. The system partitions and stores large volumes of data across multiple NameNodes and processes over 6 million blocks and 44,000 jobs per day to generate conversion counts and other analytics for A/B testing.
Data analytics with hadoop hive on multiple data centersHirotaka Niisato
This document discusses GMO Internet's data analytics system for analyzing social game data from over 500 game titles across multiple data centers in Japan and the US. It summarizes the system's architecture, which uses Hadoop/Hive to process logging data from game servers into hourly, daily, weekly, and monthly reports on key performance indicators. The system partitions and stores large volumes of data across multiple NameNodes and processes over 6 million blocks and 44,000 jobs per day to generate conversion counts and other analytics for A/B testing.
Hadoop YARN is the next generation computing platform in Apache Hadoop with support for programming paradigms besides MapReduce. In the world of Big Data, one cannot solve all the problems wholly using the Map Reduce programming model. Typical installations run separate programming models like MR, MPI, graph-processing frameworks on individual clusters. Running fewer larger clusters is cheaper than running more small clusters. Therefore,_leveraging YARN to allow both MR and non-MR applications to run on top of a common cluster becomes more important from an economical and operational point of view. This talk will cover the different APIs and RPC protocols that are available for developers to implement new application frameworks on top of YARN. We will also go through a simple application which demonstrates how one can implement their own Application Master, schedule requests to the YARN resource-manager and then subsequently use the allocated resources to run user code on the NodeManagers.
Future of HCatalog - Hadoop Summit 2012Hortonworks
This document discusses the future of HCatalog, which provides a table abstraction and metadata layer for Hadoop data. It summarizes Alan Gates' background with Hadoop projects like Pig and Hive. It then outlines how HCatalog opens up metadata to MapReduce and Pig. It describes the Templeton REST API for HCatalog and how it allows creating, describing and listing tables. It proposes using HCatReader and HCatWriter to read and write data between Hadoop and parallel systems in a language-independent way. It also discusses using HCatalog to store semi-structured data and improving ODBC/JDBC access to Hive through a REST server.
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterBill Graham
The document discusses Twitter's data analytics platform, including Hadoop and Vertica. It outlines Twitter's data flow, which ingests 400 million tweets daily into HDFS, then uses various tools like Crane, Oink, and Rasvelg to run jobs on the main Hadoop cluster before loading analytics into Vertica and MySQL for web tools and analysts. It also describes Twitter's heterogeneous technology stack and the various teams that use the analytics platform.
33.
34
EXPLAIN ANALYZE
SELECT e.empno,d.dname,s.grade FROM
emp e JOIN dept d ON e.deptno=d.deptno
JOIN salgrade s on e.sal between s.losal and s.hisal
where e.job='SALESMAN';
Column | Type
----------+-----------------------------
empno | integer
ename | character varying(10)
job | character varying(9)
mgr | integer
hiredate | timestamp without time zone
sal | integer
comm | integer
deptno | integer
EMP表
Column | Type
--------+-----------------------
deptno | integer
dname | character varying(14)
loc | character varying(13)
DEPT表
Column | Type
--------+---------
grade | integer
losal | integer
hisal | integer
SALGRADE表
4.切り札!EXPLAIN
34.
35
Nested Loop
(cost=0.00..7.85 rows=1 width=50)
(actual time=0.031..0.089 rows=4 loops=1)
Join Filter: ((emp.sal >= s.losal) AND (emp.sal <= s.hisal))
Rows Removed by Join Filter: 16
-> Nested Loop
(cost=0.00..5.67 rows=1 width=54)
(actual time=0.027..0.060 rows=4 loops=1)
Join Filter: (emp.deptno = d.deptno)
Rows Removed by Join Filter: 12
-> Seq Scan on emp
(cost=0.00..1.18 rows=1 width=12)
(actual time=0.014..0.020 rows=4 loops=1)
Filter: ((job)::text = 'SALESMAN'::text)
Rows Removed by Filter: 10
-> Seq Scan on dept d
(cost=0.00..1.05 rows=4 width=50)
(actual time=0.001..0.003 rows=4 loops=4)
-> Seq Scan on salgrade s
(cost=0.00..1.04 rows=5 width=8)
(actual time=0.001..0.002 rows=5 loops=4)
①
②
③
④
⑤
4.切り札!EXPLAIN
出力結果
35. 36
実行結果をツリーにすると
Seq Scan on emp
cost=1.18
time=0.020
Seq Scan on dept d
cost=1.05
time=0.003
x 4
Nested Loop
cost=5.67
time=0.060
Seq Scan on salgrade s
cost=1.04
time= 0.002
x 4
Nested Loop
cost=7.85
time=0.089