The problem that led to the workaround in
f83f14881c7 was not in fact
a compiler bug, but a failure to zero the upper bits of the vector
register containing the initial scalar CRC value. Fix that and revert
the workaround.
Diagnosed-by: Nathan Bossart <[email protected]>
Diagnosed-by: Raghuveer Devulapalli <[email protected]>
Tested-by: Andy Fan <[email protected]>
Tested-by: Soumyadeep Chakraborty <[email protected]>
Reviewed-by: Nathan Bossart <[email protected]>
Reviewed-by: Raghuveer Devulapalli <[email protected]>
Discussion: https://postgr.es/m/PH8PR11MB82866B07AA6758D12F699C00FB70A@PH8PR11MB8286.namprd11.prod.outlook.com
__m512i k;
k = _mm512_broadcast_i32x4(_mm_setr_epi32(0x740eef02, 0, 0x9e4addf8, 0));
- x0 = _mm512_xor_si512(_mm512_castsi128_si512(_mm_cvtsi32_si128(crc0)), x0);
+ x0 = _mm512_xor_si512(_mm512_zextsi128_si512(_mm_cvtsi32_si128(crc0)), x0);
buf += 64;
/* Main loop. */
__cpuidex(exx, 7, 0);
#endif
-#if defined(__clang__) && !defined(__OPTIMIZE__)
- /* Some versions of clang are broken at -O0 */
-#elif defined(USE_AVX512_CRC32C_WITH_RUNTIME_CHECK)
+#ifdef USE_AVX512_CRC32C_WITH_RUNTIME_CHECK
if (exx[2] & (1 << 10) && /* VPCLMULQDQ */
exx[1] & (1 << 31)) /* AVX512-VL */
pg_comp_crc32c = pg_comp_crc32c_avx512;