Lists: | pgsql-hackers |
---|
From: | Christian Ullrich <chris(at)chrullrich(dot)net> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Crash with old Windows on new CPU |
Date: | 2016-02-12 16:24:24 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hello,
I just found a compatibility issue when I was migrating an elderly VM to
a new host. The VM is running Windows Server 2008 SP2, and it has the
EDB build of PostgreSQL 9.4.5 on it. (9.4.6 behaves the same.) It is
also not dependent on running in a VM; it would fail on the hardware as
well.
Backends (and possibly other processes) crash at the slightest
provocation, such as "SELECT * FROM pg_stat_activity;" or VACUUM. The
log says either "exception 0xC0000005" (segfault) or "exception
0xC000001D" (illegal instruction).
The interesting reason: The old host had a Core-generation CPU, which
does not support the AVX2 instruction set. The new one has a
Haswell-generation one, and this one does. The EDB distribution of 9.4
was built with the Visual Studio 2013 compiler, whose CRT (aka libc) has
a bug where it uses AVX2 instructions if the *CPU* supports them, but
does not care whether the *OS* does, and 2008 doesn't. That support was
added in SP1 for 7/2008R2.
There is a workaround, see
<https://connect.microsoft.com/VisualStudio/feedback/details/811093/visual-studio-2013-rtm-c-x64-code-generation-bug-for-avx2-instructions>.
It consists of a single function call, required only if the OS does not
support AVX2.
I just tried it, and it appears to work. If there is any interest in
fixing this, I'll be happy to prepare a patch. (Where would be the best
place to put a function call from <math.h> that has to be done during
startup of each server process, on Windows only?)
Otherwise, it may be time to update the manual (15.6 Supported
Platforms) where it says PostgreSQL "can be expected to work on these
operating systems: [...] Windows (Win2000 SP4 and later), [...]".
Perhaps we could add "except Windows before 7 SP1/2008R2 SP1 when
running in x64 mode on Intel CPUs introduced after May 2013 (Haswell and
later)"?
--
Christian
From: | Christian Ullrich <chris(at)chrullrich(dot)net> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Crash with old Windows on new CPU |
Date: | 2016-02-13 00:26:00 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
* Christian Ullrich wrote:
> Backends (and possibly other processes) crash at the slightest
> provocation, such as "SELECT * FROM pg_stat_activity;" or VACUUM. The
> log says either "exception 0xC0000005" (segfault) or "exception
> 0xC000001D" (illegal instruction).
>
> The interesting reason: The old host had a Core-generation CPU, which
> does not support the AVX2 instruction set. The new one has a
> Haswell-generation one, and this one does. The EDB distribution of 9.4
> was built with the Visual Studio 2013 compiler, whose CRT (aka libc) has
> a bug where it uses AVX2 instructions if the *CPU* supports them, but
> does not care whether the *OS* does, and 2008 doesn't. That support was
> added in SP1 for 7/2008R2.
> I just tried it, and it appears to work. If there is any interest in
> fixing this, I'll be happy to prepare a patch. (Where would be the best
> place to put a function call from <math.h> that has to be done during
> startup of each server process, on Windows only?)
startup_hacks(), I think. Proposed patch attached.
--
Christian
Attachment | Content-Type | Size |
---|---|---|
avx2-crash.diff | text/plain | 1016 bytes |
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Christian Ullrich <chris(at)chrullrich(dot)net> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Crash with old Windows on new CPU |
Date: | 2016-02-13 03:36:46 |
Message-ID: | CA+TgmobZk_o_KFL1EZy-bJyd_=2r8oyHimZ2rTR41jqdgpzaLw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, Feb 12, 2016 at 7:26 PM, Christian Ullrich <chris(at)chrullrich(dot)net> wrote:
> * Christian Ullrich wrote:
>> Backends (and possibly other processes) crash at the slightest
>> provocation, such as "SELECT * FROM pg_stat_activity;" or VACUUM. The
>> log says either "exception 0xC0000005" (segfault) or "exception
>> 0xC000001D" (illegal instruction).
>>
>> The interesting reason: The old host had a Core-generation CPU, which
>> does not support the AVX2 instruction set. The new one has a
>> Haswell-generation one, and this one does. The EDB distribution of 9.4
>> was built with the Visual Studio 2013 compiler, whose CRT (aka libc) has
>> a bug where it uses AVX2 instructions if the *CPU* supports them, but
>> does not care whether the *OS* does, and 2008 doesn't. That support was
>> added in SP1 for 7/2008R2.
>
>> I just tried it, and it appears to work. If there is any interest in
>> fixing this, I'll be happy to prepare a patch. (Where would be the best
>> place to put a function call from <math.h> that has to be done during
>> startup of each server process, on Windows only?)
>
> startup_hacks(), I think. Proposed patch attached.
Thanks for the report and patch. Regrettably I haven't the Windows
knowledge to have any idea whether it's right or wrong, but hopefully
someone who knows Windows will jump in here.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From: | Christian Ullrich <chris(at)chrullrich(dot)net> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Crash with old Windows on new CPU |
Date: | 2016-02-13 06:53:22 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
* Robert Haas wrote:
> On Fri, Feb 12, 2016 at 7:26 PM, Christian Ullrich <chris(at)chrullrich(dot)net> wrote:
>> startup_hacks(), I think. Proposed patch attached.
>
> Thanks for the report and patch. Regrettably I haven't the Windows
> knowledge to have any idea whether it's right or wrong, but hopefully
> someone who knows Windows will jump in here.
In commitfest now.
--
Christian
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Christian Ullrich <chris(at)chrullrich(dot)net> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Crash with old Windows on new CPU |
Date: | 2016-02-13 15:10:29 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Christian Ullrich <chris(at)chrullrich(dot)net> writes:
> * Robert Haas wrote:
>> Thanks for the report and patch. Regrettably I haven't the Windows
>> knowledge to have any idea whether it's right or wrong, but hopefully
>> someone who knows Windows will jump in here.
> In commitfest now.
FWIW, I'm a tad suspicious of the notion that it's our job to make this
case work. How practical is it really to run a Windows release on
unsupported-by-Microsoft hardware --- aren't dozens of other programs
going to have the same issue?
I'm also suspicious of the "#if _MSC_VER == 1800" tests, that is,
the code compiles on *exactly one* MSVC version. Maybe that's actually
what's needed, but it sure looks fishy. And what connection does the
build toolchain version have to the runtime environment anyway?
Likewise, how can we know that !IsWindows7SP1OrGreater() is the exactly
right runtime test?
Lastly, I'd like to see some discussion of what side effects
"_set_FMA3_enable(0);" has ... I rather doubt that it's really
a magic-elixir-against-crashes-with-no-downsides. That would
give us some context to estimate the risks of this code executing
when it's not really needed. Without that, I'd be worried that
this cure is worse than the disease because it breaks cases that
weren't broken before.
regards, tom lane
From: | Christian Ullrich <chris(at)chrullrich(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Crash with old Windows on new CPU |
Date: | 2016-02-13 15:45:33 |
Message-ID: | AM2PR06MB069042775AAB1D290D95F957D4AA0@AM2PR06MB0690.eurprd06.prod.outlook.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On February 13, 2016 4:10:34 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Christian Ullrich <chris(at)chrullrich(dot)net> writes:
>> * Robert Haas wrote:
>>> Thanks for the report and patch. Regrettably I haven't the Windows
>>> knowledge to have any idea whether it's right or wrong, but hopefully
>>> someone who knows Windows will jump in here.
>
>> In commitfest now.
>
> FWIW, I'm a tad suspicious of the notion that it's our job to make this
> case work. How practical is it really to run a Windows release on
> unsupported-by-Microsoft hardware --- aren't dozens of other programs
> going to have the same issue?
Why would the hardware be unsupported? The problem occurs on new CPUs, not old ones, and even the OS (2008) is still in extended support until next year, IIRC.
> I'm also suspicious of the "#if _MSC_VER == 1800" tests, that is,
> the code compiles on *exactly one* MSVC version.
The bug exists in only that compiler version's CRT, also, that is not the complete version number. There may be different builds somewhere, but they all start with 18.0.
After all, MS is in the business of selling new compilers, not maintaining the old ones.
> Maybe that's actually
> what's needed, but it sure looks fishy. And what connection does the
> build toolchain version have to the runtime environment anyway?
The CRT version is tied to the compiler version. It has mainly to do with matching allocators.
> Likewise, how can we know that !IsWindows7SP1OrGreater() is the exactly
> right runtime test?
Because all sources, including Microsoft, say that AVX2 support was added in 7SP1.
> Lastly, I'd like to see some discussion of what side effects
> "_set_FMA3_enable(0);" has ... I rather doubt that it's really
> a magic-elixir-against-crashes-with-no-downsides.
It tells the math library (in the CRT, no separate libm on Windows) not to use the AVX2-based implementations of log() and possibly other functions. AIUI, FMA means "fused multiply-add" and is apparently something that increases performance and accuracy in transcendental functions.
I can check the CRT source later today and figure out exactly what it does.
Also, if you look at the link I sent, you will find that a member of the Visual C++ Libraries team at MS is the source for the workaround. They probably know what they are doing, present circumstances excepted.
> That would
> give us some context to estimate the risks of this code executing
> when it's not really needed.
Hence all the conditions. The problem is *certain* to occur under these specific conditions (x64 code on Windows before 7SP1 on a CPU with AVX2 when built with VS2013), and under no others, and these conditions flip the switch exactly then.
> Without that, I'd be worried that
> this cure is worse than the disease because it breaks cases that
> weren't broken before.
Isn't that what the buildfarm is (among other things) for?
--
Christian Ullrich
From: | Yury Zhuravlev <u(dot)zhuravlev(at)postgrespro(dot)ru> |
---|---|
To: | <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Crash with old Windows on new CPU |
Date: | 2016-02-13 19:28:19 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Tom Lane wrote:
> Lastly, I'd like to see some discussion of what side effects
> "_set_FMA3_enable(0);" has ... I rather doubt that it's really
> a magic-elixir-against-crashes-with-no-downsides. That would
> give us some context to estimate the risks of this code executing
> when it's not really needed. Without that, I'd be worried that
> this cure is worse than the disease because it breaks cases that
> weren't broken befor
I think I have already proposed to drop MSVC support? :)
Too many problems due to of one compiler...
PS Cry from the heart. I work too much with him. Never mind.
--
Yury Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From: | Christian Ullrich <chris(at)chrullrich(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Crash with old Windows on new CPU |
Date: | 2016-02-13 22:09:28 |
Message-ID: | AM2PR06MB0690415F667B2A8864CEE893D4AA0@AM2PR06MB0690.eurprd06.prod.outlook.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
* From: Christian Ullrich
> On February 13, 2016 4:10:34 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > Christian Ullrich <chris(at)chrullrich(dot)net> writes:
> > Lastly, I'd like to see some discussion of what side effects
> > "_set_FMA3_enable(0);" has ... I rather doubt that it's really
> > a magic-elixir-against-crashes-with-no-downsides.
>
> It tells the math library (in the CRT, no separate libm on Windows)
> not to use the AVX2-based implementations of log() and possibly
> other functions. AIUI, FMA means "fused multiply-add" and is
> apparently something that increases performance and accuracy in
> transcendental functions.
>
> I can check the CRT source later today and figure out exactly what
> it does.
OK, it turns out that the CRT source MS ships is not quite as complete as I thought it was (up until 2013, at least), so I had a look at the disassembly. When the library initializes, it checks whether the CPU supports the FMA instructions by looking at a certain bit in the CPUID result. If that is set, it sets a flag to use the FMA instructions. Later, in exp(), log(), pow() and the trigonometrical functions, it first checks whether that flag is set, and if so, uses the AVX-based implementation. If the flag is not set, it falls back to an SSE2-based one. So, yes, that function only and specifically disables the use of instructions that do not work in the problematic case.
The bug appears to be that it uses all manner of AVX and AVX2 instructions based only on the FMA support flag in CPUID, even though AVX2 has its own bit there.
To reiterate: The problem occurs because the library only asks the CPU whether it is *able* to perform the AVX instructions, but not whether it is *willing* to do so. In this particular situation, the former applies but not the latter, because the CPU needs OS support (saving the XMM/YMM registers across context switches), and the OS has not declared its support for that.
The downside to disabling the AVX implementations is a performance loss compared to using it. I ran a microbenchmark (avg(log(x) from generate_series(1,1e8))), and the result was that with FMA enabled, it is ~5.5% faster than without.
--
Christian
From: | Craig Ringer <craig(at)2ndquadrant(dot)com> |
---|---|
To: | Christian Ullrich <chris(at)chrullrich(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Crash with old Windows on new CPU |
Date: | 2016-02-14 02:39:41 |
Message-ID: | CAMsr+YFzQEuQULk3YixKUiOC0Rp_dZRGx18oG6im3cVDsthnBQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 13 February 2016 at 23:45, Christian Ullrich <chris(at)chrullrich(dot)net>
wrote:
>
> > Maybe that's actually
> > what's needed, but it sure looks fishy. And what connection does the
> > build toolchain version have to the runtime environment anyway?
>
> The CRT version is tied to the compiler version. It has mainly to do with
> matching allocators.
>
In UNIX terms, the compiler and libc are tightly coupled on Windows and
each executable or shared library links to the C library provided by the
compiler that built the DLL or EXE. It's normal to link a program with
multiple C runtime libraries if that program is composed of one or more
DLLs that were each compiled by a different toolchain and therefore linked
to a different C runtime ... which is common in the Windows binary
distribution model.
"msvcrt.dll" is the best-known C runtime, the one bundled in Windows. It's
ancient and buggy and retained only for backward compatibility. mingw has
to pile hacks upon hacks around it to use it.
Microsoft has been bundling the C runtime libraries with the compiler since
at least Visual Studio 7. Application installers are expected to silently
run an installer for the C runtime library before installing the
application. They do not use the C runtime library from the operating
system.
You might have seen "Visual Studio Redistributable" packages installed by
various app installers. That's what they are - they're making sure the
specific C runtime they are built against is installed.
It's a bit like, on UNIX, each shared library expecting to have a different
libc. One expects libc-2.20.so, one expects libc-2.19.so, etc. Each with
linkage private to the shared library that loaded it. (We have this to a
lesser degree with libgcc, but everyone uses one version of gcc for
building everything on a given system so in practice it makes no
difference.)
This is why applications on Windows must be very careful not to malloc()
memory in one DLL and free() it in another, etc. They're effectively
leaking it and freeing unallocated memory at the same time because the two
C libraries have separate memory management structures.
It lets applications be composed of binary-distributed components from
different vendors, using different toolchains. Want to build new parts of
an app with VS 2017 while compiling your legacy components with VS 7,
linking with a libabc built using mingw32 by a vendor who won't show you
the source, and linking a libxyz built with an ancient version of Borland C
from sources you lost 5 years ago? You can do that.
I don't pretend it's a good idea, but it does have advantages sometimes.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | Christian Ullrich <chris(at)chrullrich(dot)net>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Crash with old Windows on new CPU |
Date: | 2016-03-09 02:27:51 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2/12/16 11:24 AM, Christian Ullrich wrote:
> Otherwise, it may be time to update the manual (15.6 Supported
> Platforms) where it says PostgreSQL "can be expected to work on these
> operating systems: [...] Windows (Win2000 SP4 and later), [...]".
> Perhaps we could add "except Windows before 7 SP1/2008R2 SP1 when
> running in x64 mode on Intel CPUs introduced after May 2013 (Haswell and
> later)"?
Wouldn't the fix be for users to upgrade their service packs?
From: | Christian Ullrich <chris(at)chrullrich(dot)net> |
---|---|
To: | Peter Eisentraut <peter_e(at)gmx(dot)net>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Crash with old Windows on new CPU |
Date: | 2016-03-09 04:48:08 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
* Peter Eisentraut wrote:
> On 2/12/16 11:24 AM, Christian Ullrich wrote:
>> Otherwise, it may be time to update the manual (15.6 Supported
>> Platforms) where it says PostgreSQL "can be expected to work on these
>> operating systems: [...] Windows (Win2000 SP4 and later), [...]".
>> Perhaps we could add "except Windows before 7 SP1/2008R2 SP1 when
>> running in x64 mode on Intel CPUs introduced after May 2013 (Haswell and
>> later)"?
> Wouldn't the fix be for users to upgrade their service packs?
Windows 2008 and 2008R2 are entirely different things: 2008 is the
server sibling of Vista (internal version 6.0), 2008R2 is that of
Windows 7 (6.1). There is no version of 2008 that supports AVX2.
--
Christian
From: | Magnus Hagander <magnus(at)hagander(dot)net> |
---|---|
To: | Christian Ullrich <chris(at)chrullrich(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Crash with old Windows on new CPU |
Date: | 2016-03-09 15:08:37 |
Message-ID: | CABUevEzyfdSLPj0uJ-knbr34XDYNaQTKWMJ+5hGE=RSHp-_6TA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sat, Feb 13, 2016 at 4:45 PM, Christian Ullrich <chris(at)chrullrich(dot)net>
wrote:
> On February 13, 2016 4:10:34 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > Christian Ullrich <chris(at)chrullrich(dot)net> writes:
> >> * Robert Haas wrote:
> >>> Thanks for the report and patch. Regrettably I haven't the Windows
> >>> knowledge to have any idea whether it's right or wrong, but hopefully
> >>> someone who knows Windows will jump in here.
> >
> >> In commitfest now.
> >
> > FWIW, I'm a tad suspicious of the notion that it's our job to make this
> > case work. How practical is it really to run a Windows release on
> > unsupported-by-Microsoft hardware --- aren't dozens of other programs
> > going to have the same issue?
>
> Why would the hardware be unsupported? The problem occurs on new CPUs, not
> old ones, and even the OS (2008) is still in extended support until next
> year, IIRC.
>
> > I'm also suspicious of the "#if _MSC_VER == 1800" tests, that is,
> > the code compiles on *exactly one* MSVC version.
>
> The bug exists in only that compiler version's CRT, also, that is not the
> complete version number. There may be different builds somewhere, but they
> all start with 18.0.
>
IIRC, the CRT itself does explicit checks against _MSC_VER == 1800. As in,
they don't actually bump that number in different build numbers.
How does this work wrt mingw, though? Do we have the same problem there?
AIUI this code can never run on mingw, correct?
> Lastly, I'd like to see some discussion of what side effects
> > "_set_FMA3_enable(0);" has ... I rather doubt that it's really
> > a magic-elixir-against-crashes-with-no-downsides.
>
> It tells the math library (in the CRT, no separate libm on Windows) not to
> use the AVX2-based implementations of log() and possibly other functions.
> AIUI, FMA means "fused multiply-add" and is apparently something that
> increases performance and accuracy in transcendental functions.
>
> I can check the CRT source later today and figure out exactly what it does.
>
> Also, if you look at the link I sent, you will find that a member of the
> Visual C++ Libraries team at MS is the source for the workaround. They
> probably know what they are doing, present circumstances excepted.
>
I notice the code checks IsWindows7SP1OrGreater() but the comment refers to
W7SP1 *or* 2008R2 SP1. I assume this is correct, or should there actually
be a separate check for server-windows?
> > That would
> > give us some context to estimate the risks of this code executing
> > when it's not really needed.
>
> Hence all the conditions. The problem is *certain* to occur under these
> specific conditions (x64 code on Windows before 7SP1 on a CPU with AVX2
> when built with VS2013), and under no others, and these conditions flip the
> switch exactly then.
>
Well, it doesn't flip it based on the specifics "running on a CPU with
AVX2". But presumably turning of AVX2 if the CPU doesn't support it is a
no-op.
> Without that, I'd be worried that
> > this cure is worse than the disease because it breaks cases that
> > weren't broken before.
>
> Isn't that what the buildfarm is (among other things) for?
>
The buildfarm doesn't really have a big spread of Windows animals,
unfortunately.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
From: | Christian Ullrich <chris(at)chrullrich(dot)net> |
---|---|
To: | Magnus Hagander <magnus(at)hagander(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Crash with old Windows on new CPU |
Date: | 2016-03-09 15:36:06 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
* Magnus Hagander wrote:
> On Sat, Feb 13, 2016 at 4:45 PM, Christian Ullrich <chris(at)chrullrich(dot)net>
> wrote:
>> On February 13, 2016 4:10:34 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I'm also suspicious of the "#if _MSC_VER == 1800" tests, that is,
>>> the code compiles on *exactly one* MSVC version.
>>
>> The bug exists in only that compiler version's CRT, also, that is not the
>> complete version number. There may be different builds somewhere, but they
>> all start with 18.0.
> IIRC, the CRT itself does explicit checks against _MSC_VER == 1800. As in,
> they don't actually bump that number in different build numbers.
>
> How does this work wrt mingw, though? Do we have the same problem there?
> AIUI this code can never run on mingw, correct?
Not unless mingw defines _MSC_VER.
(If they do, I suggest we make Yury Zhuravlev cry and drop MinGW support
instead. IMHO everyone should boycott them anyway until they come up
with a working libc of their own instead of doing unspeakable things to
a helpless msvcrt.dll that is not intended for use by non-system
components at all. But I digress.)
> I notice the code checks IsWindows7SP1OrGreater() but the comment refers to
> W7SP1 *or* 2008R2 SP1. I assume this is correct, or should there actually
> be a separate check for server-windows?
No, that is fine. I think it's just to keep the function name from
getting too ridiculously long. The functions in <versionhelpers.h> are
all named for the client versions only, and only check the version
number, not the client/server capability flags. Or, rather, there is a
separate function to determine that.
>>> That would
>>> give us some context to estimate the risks of this code executing
>>> when it's not really needed.
>>
>> Hence all the conditions. The problem is *certain* to occur under these
>> specific conditions (x64 code on Windows before 7SP1 on a CPU with AVX2
>> when built with VS2013), and under no others, and these conditions flip the
>> switch exactly then.
> Well, it doesn't flip it based on the specifics "running on a CPU with
> AVX2". But presumably turning of AVX2 if the CPU doesn't support it is a
> no-op.
Precisely.
>> Isn't that what the buildfarm is (among other things) for?
>
> The buildfarm doesn't really have a big spread of Windows animals,
> unfortunately.
And apparently not a single one with VS 2013. OK, I'll see what I can do
about setting some up soonish, at least with (server) 2008 and (client)
7. FWIW, I have a local build of 9.5.1 with this patch in daily use on
2008 now, with no complaints.
--
Christian
From: | Magnus Hagander <magnus(at)hagander(dot)net> |
---|---|
To: | Christian Ullrich <chris(at)chrullrich(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Crash with old Windows on new CPU |
Date: | 2016-03-09 15:39:36 |
Message-ID: | CABUevExtw3Okt0ex3O=Ozb1+DxoaGsdbsUfr=ORM4oO7D7aihw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, Mar 9, 2016 at 4:36 PM, Christian Ullrich <chris(at)chrullrich(dot)net>
wrote:
> * Magnus Hagander wrote:
>
> On Sat, Feb 13, 2016 at 4:45 PM, Christian Ullrich <chris(at)chrullrich(dot)net>
>> wrote:
>>
>
> On February 13, 2016 4:10:34 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>
>>
> I'm also suspicious of the "#if _MSC_VER == 1800" tests, that is,
>>>> the code compiles on *exactly one* MSVC version.
>>>>
>>>
>>> The bug exists in only that compiler version's CRT, also, that is not the
>>> complete version number. There may be different builds somewhere, but
>>> they
>>> all start with 18.0.
>>>
>>
> IIRC, the CRT itself does explicit checks against _MSC_VER == 1800. As in,
>> they don't actually bump that number in different build numbers.
>>
>> How does this work wrt mingw, though? Do we have the same problem there?
>> AIUI this code can never run on mingw, correct?
>>
>
> Not unless mingw defines _MSC_VER.
>
The question is then - should it, under some conditions?
> (If they do, I suggest we make Yury Zhuravlev cry and drop MinGW support
> instead. IMHO everyone should boycott them anyway until they come up with a
> working libc of their own instead of doing unspeakable things to a helpless
> msvcrt.dll that is not intended for use by non-system components at all.
> But I digress.)
Haha, well, at least they're not cygwin...
> I notice the code checks IsWindows7SP1OrGreater() but the comment refers to
>> W7SP1 *or* 2008R2 SP1. I assume this is correct, or should there actually
>> be a separate check for server-windows?
>>
>
> No, that is fine. I think it's just to keep the function name from getting
> too ridiculously long. The functions in <versionhelpers.h> are all named
> for the client versions only, and only check the version number, not the
> client/server capability flags. Or, rather, there is a separate function to
> determine that.
Presumably the link is documented somewhere (the docs don't seem to say
anything about it).
> That would
>>>> give us some context to estimate the risks of this code executing
>>>> when it's not really needed.
>>>>
>>>
>>> Hence all the conditions. The problem is *certain* to occur under these
>>> specific conditions (x64 code on Windows before 7SP1 on a CPU with AVX2
>>> when built with VS2013), and under no others, and these conditions flip
>>> the
>>> switch exactly then.
>>>
>>
> Well, it doesn't flip it based on the specifics "running on a CPU with
>> AVX2". But presumably turning of AVX2 if the CPU doesn't support it is a
>> no-op.
>>
>
> Precisely.
>
> Isn't that what the buildfarm is (among other things) for?
>>>
>>
>> The buildfarm doesn't really have a big spread of Windows animals,
>> unfortunately.
>>
>
> And apparently not a single one with VS 2013. OK, I'll see what I can do
> about setting some up soonish, at least with (server) 2008 and (client) 7.
> FWIW, I have a local build of 9.5.1 with this patch in daily use on 2008
> now, with no complaints.
Having that added to the buildfarm would definitely be very useful!
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
From: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
---|---|
To: | Magnus Hagander <magnus(at)hagander(dot)net> |
Cc: | Christian Ullrich <chris(at)chrullrich(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Crash with old Windows on new CPU |
Date: | 2016-03-09 16:02:15 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Magnus Hagander wrote:
> On Wed, Mar 9, 2016 at 4:36 PM, Christian Ullrich <chris(at)chrullrich(dot)net>
> wrote:
> > And apparently not a single one with VS 2013. OK, I'll see what I can do
> > about setting some up soonish, at least with (server) 2008 and (client) 7.
> > FWIW, I have a local build of 9.5.1 with this patch in daily use on 2008
> > now, with no complaints.
>
> Having that added to the buildfarm would definitely be very useful!
Yeah, ideally we would have one buildfarm member for each MSVC compiler
supported by each Windows version. AFAICS we only have
Windows 8 8.1 Pro Visual Studio 2012 x86_64
Windows Server 2008 R2 (64bit) Visual C++ 2005 AMD64
Windows Server 2003 R2 5.2.3790 MSVC 2005 Express 8.0.50727.42 x86
Vista Ultimate 6.0.6000 MSVC 2005 Pro 8.0.50727.867 x86
Windows XP-PRO SP3 MSVC++ 2008 Express i686
which is pretty unfulfilling.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From: | Christian Ullrich <chris(at)chrullrich(dot)net> |
---|---|
To: | Magnus Hagander <magnus(at)hagander(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Crash with old Windows on new CPU |
Date: | 2016-03-09 16:55:21 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
* Magnus Hagander wrote:
> On Wed, Mar 9, 2016 at 4:36 PM, Christian Ullrich <chris(at)chrullrich(dot)net>
> wrote:
>
>> * Magnus Hagander wrote:
>>> How does this work wrt mingw, though? Do we have the same problem there?
>>> AIUI this code can never run on mingw, correct?
>>
>> Not unless mingw defines _MSC_VER.
>
> The question is then - should it, under some conditions?
I have no clue about MinGW, so all I can say is that msvcrt.dll (even in
Windows 10) does not use the problematic instructions, nor does it
export anything with "FMA" in its name. If there is nothing to turn off,
and nothing to turn it off with, then I suspect we are safe there. (This
is based on a minimal test program, just to see where MinGW takes its
log() from; it comes from msvcrt.)
Also, we have confirmation, in the link I sent when I started this
thread, that the bug is only and specifically in the VS2013 CRT, not
anywhere else before or since.
>>> I notice the code checks IsWindows7SP1OrGreater() but the comment refers to
>>> W7SP1 *or* 2008R2 SP1. I assume this is correct, or should there actually
>>> be a separate check for server-windows?
>> No, that is fine. I think it's just to keep the function name from getting
>> too ridiculously long. The functions in <versionhelpers.h> are all named
>> for the client versions only, and only check the version number, not the
>> client/server capability flags. Or, rather, there is a separate function to
>> determine that.
> Presumably the link is documented somewhere (the docs don't seem to say
> anything about it).
<https://msdn.microsoft.com/en-us/library/windows/desktop/dn424960(v=vs.85).aspx>
("IsWindows7SP1OrGreater function"):
> This function does not differentiate between client and server
> releases. It will return true if the current OS version number is
> equal to or higher than the version of the client named in the call.
> For example, a call to IsWindowsXPSP3OrGreater will return true on
> Windows Server 2008. Applications that need to distinguish between
> server and client versions of Windows should call IsWindowsServer.
The internal version numbers (since NT4) are:
Version | Client | Server
---------+--------+--------
4.0 | NT4 | NT4
5.0 | 2000 | 2000
5.1 | XP |
5.2 | | 2003
6.0 | Vista | 2008
6.1 | 7 | 2008R2
6.2 | 8 | 2012
6.3 | 8.1 | 2012R2
10.0 | 10 | [2016]
The relevant SDK header (<sdkddkver.h>), where constants for the version
numbers are defined, also only has entries named for the client
versions, with the sole exception of 2008, with the same value as Vista,
and the special case of 2003 with no equivalent client version (unless
you count the second attempt at an XP x64 ... but I digress again).
If Microsoft wanted the added complexity of separate symbols for what is
basically the same code, I rather think they would have them.
--
Christian
From: | Magnus Hagander <magnus(at)hagander(dot)net> |
---|---|
To: | Christian Ullrich <chris(at)chrullrich(dot)net> |
Cc: | Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Crash with old Windows on new CPU |
Date: | 2016-03-10 13:13:21 |
Message-ID: | CABUevEyEvpLeVxhzY=sYnO+ZDXuGHir8tcVfbrkcGctSYEBRag@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, Mar 9, 2016 at 5:48 AM, Christian Ullrich <chris(at)chrullrich(dot)net>
wrote:
> * Peter Eisentraut wrote:
>
> On 2/12/16 11:24 AM, Christian Ullrich wrote:
>>
>
> Otherwise, it may be time to update the manual (15.6 Supported
>>> Platforms) where it says PostgreSQL "can be expected to work on these
>>> operating systems: [...] Windows (Win2000 SP4 and later), [...]".
>>> Perhaps we could add "except Windows before 7 SP1/2008R2 SP1 when
>>> running in x64 mode on Intel CPUs introduced after May 2013 (Haswell and
>>> later)"?
>>>
>>
> Wouldn't the fix be for users to upgrade their service packs?
>>
>
> Windows 2008 and 2008R2 are entirely different things: 2008 is the server
> sibling of Vista (internal version 6.0), 2008R2 is that of Windows 7 (6.1).
> There is no version of 2008 that supports AVX2.
>
>
Windows 2008 went out of mainstream support in January last year, but is on
extended support until 2020. Extended support means both paid support and
security support is still there (what you don't get is new hardware support
and generic non-security related updates). That means we're going to see
these versions around for a long time. (And Vista is in extended support
until 2017).
And exactly the type of upgrade scenario outlined in the OP is only going
to be come more common as old hardware gets replaced. If we didn't patch
this, the reasonable thing would be to say we don't support Visual Studio
2013, rather than a specific version of Windows, I think.
Given the small and localized change of this (and hey it even goes inside a
function labeled hacks), I definitely think it's worth doing.
I've commited this one, including a reference to the MS bug report where
they say they're not going to fix it in existing versions. I didn't
actually test it on mingw, but as it doesn't define MSC_VER it shouldn't be
affected (by the bug or by the patch).
I did notice the #ifdef's are actually different in the header and body
section of the patch, which seems wrong. I used the one from the actual
implementation (_M_AMD64) for the header includes as, and also merged the
#ifdef's together to a single #if in each section. Please verify!
Thanks for a very good analysis and patch, and for good explanations of the
details! :)
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
From: | Christian Ullrich <chris(at)chrullrich(dot)net> |
---|---|
To: | Magnus Hagander <magnus(at)hagander(dot)net> |
Cc: | Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Crash with old Windows on new CPU |
Date: | 2016-03-10 13:55:19 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
* Magnus Hagander wrote:
> I did notice the #ifdef's are actually different in the header and body
> section of the patch, which seems wrong. I used the one from the actual
> implementation (_M_AMD64) for the header includes as, and also merged the
> #ifdef's together to a single #if in each section. Please verify!
Should be fine. I remember I had a reason for doing it this way, but it
can't have been a very good one. Thanks for cleaning it up.
--
Christian
From: | Christian Ullrich <chris(at)chrullrich(dot)net> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Michael Meskes <meskes(at)postgresql(dot)org> |
Subject: | New Windows animal, mostly ecpg trouble (was: Crash with old Windows on new CPU) |
Date: | 2016-03-10 15:51:17 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
* Alvaro Herrera wrote:
> Magnus Hagander wrote:
>> On Wed, Mar 9, 2016 at 4:36 PM, Christian Ullrich <chris(at)chrullrich(dot)net>
>> wrote:
>>> And apparently not a single one with VS 2013. OK, I'll see what I can do
>>> about setting some up soonish, at least with (server) 2008 and (client) 7.
>>> FWIW, I have a local build of 9.5.1 with this patch in daily use on 2008
>>> now, with no complaints.
>>
>> Having that added to the buildfarm would definitely be very useful!
>
> Yeah, ideally we would have one buildfarm member for each MSVC compiler
> supported by each Windows version. AFAICS we only have
>
> Windows 8 8.1 Pro Visual Studio 2012 x86_64
> Windows Server 2008 R2 (64bit) Visual C++ 2005 AMD64
> Windows Server 2003 R2 5.2.3790 MSVC 2005 Express 8.0.50727.42 x86
> Vista Ultimate 6.0.6000 MSVC 2005 Pro 8.0.50727.867 x86
> Windows XP-PRO SP3 MSVC++ 2008 Express i686
Add to that Windows 7 with Visual Studio 2013 (x64), named woodlouse,
and there will be another one sometime over the weekend. Setting up
Windows animals is a major pain; no wonder so few people run any.
I spent some hours getting rid of random linker errors in ecpg
(kernel32.lib not found, msvcrt.lib not found, or unresolved symbols
that should be in one of the two); they finally went away when I stopped
defining the build environment in build-farm.conf and ran vcvarsall.bat
instead (this was probably pilot error; I think I had a mix of x86 and
x64 paths there). At that point, I managed one successful run on HEAD.
Things started failing in ecpg again after that, with crashes of the
test programs for the thread-thread and thread-prep test cases,
apparently somewhere in setlocale(). I had to --skip-steps=ecpg-check
because the crashes cause blocking UI.
The other troublemaker is plperl; it produces a wall of compiler errors
which I cleverly did not copy, most are about something not being in a
formal parameter list. Some preprocessor trouble, I suspect. This is
with both ActiveState and Strawberry Perl.
I did not try Tcl, and plpython3 apparently builds, but I cannot see any
tests being run.
--
Christian