Quick Links

Re: pgbench-ycsb

Lists:	pgsql-hackers

From:	a(dot)bykov(at)postgrespro(dot)ru
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	pgbench-ycsb
Date:	2018-07-19 12:46:59
Message-ID:	20180719154659.4becab56@anthony-24-g082ur
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, hackers.

It might be a good idea to give users an opportunity to test their
applications with pgbench under different real-life-like load. So that
they will be able to see what's going to happen on production.

YCSB (Yahoo! Cloud Serving Benchmark) was taken as a concept. YCSB tests
were originally designed to facilitate performance comparisons of
different cloud data serving systems and it takes into account different
application workloads like:
workload A - assumes that application do a lot of reads(50%) and
updates(50%).
workload B - case when application do 95% of cases reads
and 5% updates
workload C - models behavior of read-only application.
workload E - the workload of the applications which in 95% of cases
requests for several neighboring tuples and in 5% of cases - does
updates.

In the patch those workloads were implemented to be executed by pgbench:
pgbench -b ycsb-A

--
Anthony Bykov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001-pgbench-ycsb-v3.patch	text/x-patch	6.0 KB

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	a(dot)bykov(at)postgrespro(dot)ru
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pgbench-ycsb
Date:	2018-07-19 13:35:44
Message-ID:	alpine.DEB.2.21.1807190910380.31076@lancre
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello Anthony,

> applications with pgbench under different real-life-like load. So that
> they will be able to see what's going to happen on production.
>
> YCSB (Yahoo! Cloud Serving Benchmark) was taken as a concept. YCSB tests
> were originally designed to facilitate performance comparisons of
> different cloud data serving systems and it takes into account different
> application workloads like:
> workload A - assumes that application do a lot of reads(50%) and
> updates(50%).
> workload B - case when application do 95% of cases reads
> and 5% updates
> workload C - models behavior of read-only application.
> workload E - the workload of the applications which in 95% of cases
> requests for several neighboring tuples and in 5% of cases - does
> updates.
>
> In the patch those workloads were implemented to be executed by pgbench:
> pgbench -b ycsb-A

Could you provide a link to the specification?

I cannot find something simple, and I was kind of hoping to avoid diving
into the source code of the java tool on github:-) In particular, I'm
looking for a description of the expected underlying schema and its size
(scale) parameters.

Patch does not include any documentation, nor help, nor tests. It should.

+ "\\set write_weight 0\n"
+ "\\set operation random(1,:total_weight)\n"
+ "\\if (:operation < :write_weight)\n"

This is dead code:-( A lot of copy-paste between the cases, that should be
avoided if possible.

Note that pgbench already has a builtin weight management. I'd suggest
that the implementation could reuse it instead of reimplementing them
within these duplicated scripts.

Maybe add simple builtins (eg ycsb-read/write/...) for individual
transactions and a new --load=ycsb-A which would set the various
transactions with their expected weights.

A, B, C, E... What is missing to get the D bench as well?

--
Fabien.

From:	Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc:	a(dot)bykov(at)postgrespro(dot)ru, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pgbench-ycsb
Date:	2018-07-19 13:50:59
Message-ID:	CA+q6zcW7Jg1xKe1iJv-z-ApkKBrO0XpuhUEOiREw6ce_L1bmTQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Thu, 19 Jul 2018 at 15:36, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
>
> Hello Anthony,
>
> > applications with pgbench under different real-life-like load. So that
> > they will be able to see what's going to happen on production.
> >
> > YCSB (Yahoo! Cloud Serving Benchmark) was taken as a concept. YCSB tests
> > were originally designed to facilitate performance comparisons of
> > different cloud data serving systems and it takes into account different
> > application workloads like:
> > workload A - assumes that application do a lot of reads(50%) and
> > updates(50%).
> > workload B - case when application do 95% of cases reads
> > and 5% updates
> > workload C - models behavior of read-only application.
> > workload E - the workload of the applications which in 95% of cases
> > requests for several neighboring tuples and in 5% of cases - does
> > updates.
> >
> > In the patch those workloads were implemented to be executed by pgbench:
> > pgbench -b ycsb-A
>
> Could you provide a link to the specification?
>
> I cannot find something simple, and I was kind of hoping to avoid diving
> into the source code of the java tool on github:-) In particular, I'm
> looking for a description of the expected underlying schema and its size
> (scale) parameters.

There are the description files for different workloads, like [1], (with the
custom amount of records, of course) and the schema [2]. Would this
information be enough?

[1]: https://github.com/brianfrankcooper/YCSB/blob/master/workloads/workloada
[2]: https://github.com/brianfrankcooper/YCSB/blob/master/jdbc/src/main/resources/sql/create_table.sql

From:	a(dot)bykov(at)postgrespro(dot)ru
To:	Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pgbench-ycsb
Date:	2018-07-19 14:24:10
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2018-07-19 16:50, Dmitry Dolgov wrote:
>> On Thu, 19 Jul 2018 at 15:36, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
>> wrote:
>>
>>
>> Hello Anthony,
>>
>> > applications with pgbench under different real-life-like load. So that
>> > they will be able to see what's going to happen on production.
>> >
>> > YCSB (Yahoo! Cloud Serving Benchmark) was taken as a concept. YCSB tests
>> > were originally designed to facilitate performance comparisons of
>> > different cloud data serving systems and it takes into account different
>> > application workloads like:
>> > workload A - assumes that application do a lot of reads(50%) and
>> > updates(50%).
>> > workload B - case when application do 95% of cases reads
>> > and 5% updates
>> > workload C - models behavior of read-only application.
>> > workload E - the workload of the applications which in 95% of cases
>> > requests for several neighboring tuples and in 5% of cases - does
>> > updates.
>> >
>> > In the patch those workloads were implemented to be executed by pgbench:
>> > pgbench -b ycsb-A
>>
>> Could you provide a link to the specification?
>>
>> I cannot find something simple, and I was kind of hoping to avoid
>> diving
>> into the source code of the java tool on github:-) In particular, I'm
>> looking for a description of the expected underlying schema and its
>> size
>> (scale) parameters.
>
> There are the description files for different workloads, like [1],
> (with the
> custom amount of records, of course) and the schema [2]. Would this
> information be enough?
>
> [1]:
> https://github.com/brianfrankcooper/YCSB/blob/master/workloads/workloada
> [2]:
> https://github.com/brianfrankcooper/YCSB/blob/master/jdbc/src/main/resources/sql/create_table.sql

Hi.
Thanks for your feedback, I'll fix it soon.
Actually I used the article "Brian F. Cooper, Adam Silberstein, Erwin
Tam,
Raghu Ramakrishnan and Russell Sears. Benchmarking Cloud Serving Systems
with YCSB. ACM Symposium on Cloud Computing (SoCC), Indianapolis, IN,
USA, 2010"
It is available here:
https://github.com/brianfrankcooper/YCSB/wiki/Papers-and-Presentations

But maybe an article is more complicated then your example.

--
Anthony Bykov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc:	a(dot)bykov(at)postgrespro(dot)ru, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pgbench-ycsb
Date:	2018-07-21 20:40:59
Message-ID:	alpine.DEB.2.21.1807211628450.22035@lancre
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

>> Could you provide a link to the specification?
>>
>> I cannot find something simple, and I was kind of hoping to avoid diving
>> into the source code of the java tool on github:-) In particular, I'm
>> looking for a description of the expected underlying schema and its size
>> (scale) parameters.
>
> There are the description files for different workloads, like [1], (with the
> custom amount of records, of course) and the schema [2]. Would this
> information be enough?
>
> [1]: https://github.com/brianfrankcooper/YCSB/blob/master/workloads/workloada
> [2]: https://github.com/brianfrankcooper/YCSB/blob/master/jdbc/src/main/resources/sql/create_table.sql

The second link is a start.

I notice that the submitted patch transactions do not apply to this
schema, which is significantly different from the pgbench TPC-B (like)
benchmark.

The YCSB schema is key -> fields[0-9], all of them TEXT, somehow expected
to be 100 bytes each, and update is expected to update one of these
fields.

This suggest that maybe a -i extension would be in order. Possibly

pgbench -i -s 1 --layout={tpcb,ycsb} (or schema ?)

where "tpcb" would be the default?

I'm sceptical about using a textual primary key as it corresponds more to
NoSQL limitations than to an actual design choice. I'd be okay with INT8
as a pkey.

I find the YSCB tablename "usertable" especially unhelpful. Maybe
"pgbench_ycsb"?

--
Fabien.

From:	Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc:	a(dot)bykov(at)postgrespro(dot)ru, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pgbench-ycsb
Date:	2018-07-22 10:22:45
Message-ID:	CA+q6zcXRqgTD-uG1snC4LiNAsBoujFHbnyZ3OL8q+1QPJso_mw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Sat, 21 Jul 2018 at 22:41, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
> >> Could you provide a link to the specification?
> >>
> >> I cannot find something simple, and I was kind of hoping to avoid diving
> >> into the source code of the java tool on github:-) In particular, I'm
> >> looking for a description of the expected underlying schema and its size
> >> (scale) parameters.
> >
> > There are the description files for different workloads, like [1], (with the
> > custom amount of records, of course) and the schema [2]. Would this
> > information be enough?
> >
> > [1]: https://github.com/brianfrankcooper/YCSB/blob/master/workloads/workloada
> > [2]: https://github.com/brianfrankcooper/YCSB/blob/master/jdbc/src/main/resources/sql/create_table.sql
>
> The second link is a start.
>
> I notice that the submitted patch transactions do not apply to this
> schema, which is significantly different from the pgbench TPC-B (like)
> benchmark.
>
> The YCSB schema is key -> fields[0-9], all of them TEXT, somehow expected
> to be 100 bytes each, and update is expected to update one of these
> fields.
>
> This suggest that maybe a -i extension would be in order. Possibly
>
> pgbench -i -s 1 --layout={tpcb,ycsb} (or schema ?)
>
> where "tpcb" would be the default?
>
> I'm sceptical about using a textual primary key as it corresponds more to
> NoSQL limitations than to an actual design choice. I'd be okay with INT8
> as a pkey.
>
> I find the YSCB tablename "usertable" especially unhelpful. Maybe
> "pgbench_ycsb"?

Just to clarify - if I understand Anthony correctly, this proposal is not about
implementing exactly YCSB as it is, but more about using zipfian distribution
for an id in the regular pgbench table structure in conjunction with read/write
balance to simulate something similar to it.

And probably instead of implementing the exact YCSB workload inside pgbench, it
makes more sense to add PostgreSQL Jsonb as one of the options into the
framework itself (I was in the middle of it few years ago, but then was
distracted by some interesting benchmarking results).

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc:	a(dot)bykov(at)postgrespro(dot)ru, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pgbench-ycsb
Date:	2018-07-22 13:56:08
Message-ID:	alpine.DEB.2.21.1807220942050.3848@lancre
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Just to clarify - if I understand Anthony correctly, this proposal is not about
> implementing exactly YCSB as it is, but more about using zipfian distribution
> for an id in the regular pgbench table structure in conjunction with read/write
> balance to simulate something similar to it.

Ok, I misunderstood. My 0.02€: If it does not implement YCSB, and the
point is not to implement YCSB, then do not call it YCSB:-)

Maybe there could be other simpler builtins to use non uniform
distributions: {zipf,exp,...}-{simple,select} and default values
(exp_param, zipf_param?) for the random distribution parameters.

\set id random_zipfian(1, 100000*:scale, :zipf_param)
\set val random(-5000, 5000)
UPDATE pgbench_whatever ...;

Then

pgbench -b zipf-se(at)1 -b zipf-si(at)1 [ -D zipf_param=1.1 ... ] -T 10000 ...

> And probably instead of implementing the exact YCSB workload inside pgbench, it
> makes more sense to add PostgreSQL Jsonb as one of the options into the
> framework itself (I was in the middle of it few years ago, but then was
> distracted by some interesting benchmarking results).

Sure.

--
Fabien.

From:	a(dot)bykov(at)postgrespro(dot)ru
To:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc:	Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pgbench-ycsb
Date:	2018-07-22 17:16:55
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2018-07-22 16:56, Fabien COELHO wrote:
>> Just to clarify - if I understand Anthony correctly, this proposal is
>> not about
>> implementing exactly YCSB as it is, but more about using zipfian
>> distribution
>> for an id in the regular pgbench table structure in conjunction with
>> read/write
>> balance to simulate something similar to it.
>
> Ok, I misunderstood. My 0.02€: If it does not implement YCSB, and the
> point is not to implement YCSB, then do not call it YCSB:-)
>
> Maybe there could be other simpler builtins to use non uniform
> distributions: {zipf,exp,...}-{simple,select} and default values
> (exp_param, zipf_param?) for the random distribution parameters.
>
> \set id random_zipfian(1, 100000*:scale, :zipf_param)
> \set val random(-5000, 5000)
> UPDATE pgbench_whatever ...;
>
> Then
>
> pgbench -b zipf-se(at)1 -b zipf-si(at)1 [ -D zipf_param=1.1 ... ] -T 10000
> ...
>
>> And probably instead of implementing the exact YCSB workload inside
>> pgbench, it
>> makes more sense to add PostgreSQL Jsonb as one of the options into
>> the
>> framework itself (I was in the middle of it few years ago, but then
>> was
>> distracted by some interesting benchmarking results).
>
> Sure.

Hello,
thank you for your interest. I'm still improving this idea, the patch
and I'm very happy about the discussion we have. It really helps.

The idea was to implement the workloads as close to YCSB as possible
using pgbench.

So, the schema it should be applied to - is default schema generated by
pgbnench -i (pgbench_accounts).

--
Anthony Bykov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	a(dot)bykov(at)postgrespro(dot)ru
Cc:	Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pgbench-ycsb
Date:	2018-07-22 20:42:14
Message-ID:	alpine.DEB.2.21.1807221615000.13768@lancre
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

>>> Just to clarify - if I understand Anthony correctly, this proposal is
>>> not about implementing exactly YCSB as it is, but more about using
>>> zipfian distribution for an id in the regular pgbench table structure
>>> in conjunction with read/write balance to simulate something similar
>>> to it.
>>
>> Ok, I misunderstood. My 0.02€: If it does not implement YCSB, and the
>> point is not to implement YCSB, then do not call it YCSB:-)
>>
>> Maybe there could be other simpler builtins to use non uniform
>> distributions: {zipf,exp,...}-{simple,select} and default values
>> (exp_param, zipf_param?) for the random distribution parameters.
>>
>> \set id random_zipfian(1, 100000*:scale, :zipf_param)
>> \set val random(-5000, 5000)
>> UPDATE pgbench_whatever ...;
>>
>> Then
>>
>> pgbench -b zipf-se(at)1 -b zipf-si(at)1 [ -D zipf_param=1.1 ... ] -T 10000 ...
>>
>>> And probably instead of implementing the exact YCSB workload inside
>>> pgbench, it makes more sense to add PostgreSQL Jsonb as one of the
>>> options into the framework itself (I was in the middle of it few years
>>> ago, but then was distracted by some interesting benchmarking
>>> results).
>>
>> Sure.
>
> Hello,
> thank you for your interest. I'm still improving this idea, the patch
> and I'm very happy about the discussion we have. It really helps.
>
> The idea was to implement the workloads as close to YCSB as possible
> using pgbench.

Basically I'm against having something called YCSB if it is not YCSB;-)

> So, the schema it should be applied to - is default schema generated by
> pgbnench -i (pgbench_accounts).

This is a contradiction, because pgbench_accounts table is in no way, even
remotely, conformant to the YCSB benchmark test table.

So for me there are three possibilities:

(1) do nothing, always an option as committers may be against extending
pgbench in this direction anyway. Personally I'm fine with having it.

(2) implement YCSB cleanly, i.e. both initialization and operations, at
least if this is "reasonable" (i.e. it does not result in 2000 lines of
new code). ISTM that it can be done, given that the YCSB schema is very
simple, hence I suggested "pgbench -i --schema yscb" to trigger a non
default initialization.

(3) if you are interested in demonstrating non uniform distribution on
pgbench_accounts, I'm also fine with it, just do so, but do *NOT* call it
YCSB.

Also it seems that the YCSB bench uses some hashing to mix keys and avoid
having 1 as the most frequent, 2 as the second, and so on. There is a hash
function in pgbench which can be used (although the solution is not
perfect, some values cannot be reached), but it is used by YCSB. Otherwise
I'm planning to submit a pseudo-random permutation function to ease this
some day, provided that the size of the table stays constant.

--
Fabien.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc:	anthony <a(dot)bykov(at)postgrespro(dot)ru>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pgbench-ycsb
Date:	2018-07-23 15:34:48
Message-ID:	CA+TgmoZQz0MNuBDPoNGJUWy=gVU_F_h3R-rhtgq=D8RTtJp2oA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jul 22, 2018 at 4:42 PM, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
> Basically I'm against having something called YCSB if it is not YCSB;-)

Yep, that seems pretty clear.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company