Re: expanding our usage of POSIX_FADVISE

Lists: pgsql-hackers
From: Cédric Villemain <cedric(dot)villemain(at)dalibo(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: expanding our usage of POSIX_FADVISE
Date: 2009-08-12 14:07:04
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

I wonder if POSIX_FADV_RANDOM and POSIX_FADV_SEQUENTIAL are still innacurate
for postgreSQL ?

I find
«A related problem is that the smgr uses the same FD to access the same
relation no matter how many scans are in progress. Think about a complex query
that is doing both a seqscan and an indexscan on the same relation (a self-
join could easily do this). You'd really need to change this if you want
POSIX_FADV_SEQUENTIAL and POSIX_FADV_RANDOM to get set usefully.
» (tom lane, 2003)

And also :
«
Surely POSIX_FADV_SEQUENTIAL is the one intended to hint seq scans, and
POSIX_FADV_RANDOM to hint random access. No?
ISTM, _WILLNEED seems just right for small random-access blocks.
Anyway, for those who want to see what they do in Linux,
http://www.gelato.unsw.edu.au/lxr/source/mm/fadvise.c Pretty scary that Bruce
said it could make older linuxes dump core - there isn't a lot of code there.
» (ron mayer, 2006)

But that seems a bit old.
----
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Cédric Villemain <cedric(dot)villemain(at)dalibo(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: expanding our usage of POSIX_FADVISE
Date: 2009-08-12 15:28:40
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 12, 2009 at 3:07 PM, Cédric
Villemain<cedric(dot)villemain(at)dalibo(dot)com> wrote:
>
> I wonder if POSIX_FADV_RANDOM and POSIX_FADV_SEQUENTIAL are still innacurate
> for postgreSQL ?
>
> I find
> «A related problem is that the smgr uses the same FD to access the same
> relation no matter how many scans are in progress. Think about a complex
> query that is doing both a seqscan and an indexscan on the same relation (a
> self-join could easily do this). You'd really need to change this if you
> want POSIX_FADV_SEQUENTIAL and POSIX_FADV_RANDOM to get set usefully.
> » (tom lane, 2003)

I had a version of the POSIX_FADV_SEQUENTIAL patch going which set the
appropriate mode before every block read (skipping it if it was the
same mode as last set -- just like we handle lseek). I couldn't
measure any consistent improvement on sequential scans though which,
at least on Linux, already saturdate any i/o system I tested. Mileage
on other operating systems or i/o systems may vary of course.

I think the real benefit of this would be avoiding polluting the
filesystem cache with blocks which we have no intention of reading.
That will be a hard benefit to measure though. Especially since just
because we're doing a random i/o doesn't actually mean we won't read
nearby blocks eventually. If we're scanning an index range and the
table is actually mostly clustered then our random i/o won't be so
random after all...

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Cédric Villemain <cedric(dot)villemain(at)dalibo(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Greg Stark <gsstark(at)mit(dot)edu>
Subject: Re: expanding our usage of POSIX_FADVISE
Date: 2009-08-12 16:07:07
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Le mercredi 12 août 2009, Greg Stark a écrit :
> On Wed, Aug 12, 2009 at 3:07 PM, Cédric
>
> Villemain<cedric(dot)villemain(at)dalibo(dot)com> wrote:
> > I wonder if POSIX_FADV_RANDOM and POSIX_FADV_SEQUENTIAL are still
> > innacurate for postgreSQL ?
> >
> > I find
> > «A related problem is that the smgr uses the same FD to access the same
> > relation no matter how many scans are in progress. Think about a complex
> > query that is doing both a seqscan and an indexscan on the same relation
> > (a self-join could easily do this). You'd really need to change this if
> > you want POSIX_FADV_SEQUENTIAL and POSIX_FADV_RANDOM to get set usefully.
> > » (tom lane, 2003)
>
> I had a version of the POSIX_FADV_SEQUENTIAL patch going which set the
> appropriate mode before every block read (skipping it if it was the
> same mode as last set -- just like we handle lseek). I couldn't
> measure any consistent improvement on sequential scans though which,
> at least on Linux, already saturdate any i/o system I tested. Mileage
> on other operating systems or i/o systems may vary of course.

yes as stated before by Greg Smith, some OS use more or less the POSIX_FADV_*
depending on their default. Linux is agresive and the POSIX_FADV_SEQUENTIAL
have probably only poor benefit on it. I wonder what happen with the
POSIX_FADV_RANDOM one.

>
> I think the real benefit of this would be avoiding polluting the
> filesystem cache with blocks which we have no intention of reading.

and be sure we readhead when needed, bypassing system default.

> That will be a hard benefit to measure though. Especially since just
> because we're doing a random i/o doesn't actually mean we won't read
> nearby blocks eventually. If we're scanning an index range and the
> table is actually mostly clustered then our random i/o won't be so
> random after all...

Probably, yes... :/

----
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org