POD container restarting due to definition changed in 1.14.0 operator after enabling Wiz #2904

Ajith61 · 2025-04-21T06:02:02Z

Which image of the operator are you using? e.g. ghcr.io/zalando/postgres-operator:v1.14.0
Type of issue? - question
Spilo? - https://github.com/zalando/spilo/releases/tag/3.0-p1

Hi All,

We are facing a container restart issue when we use the 1.14.0 operator. This issue does not occur in version 1.9.0. The postgres/postgres exporter container is restarting during any one of the operator sync intervals (we are not facing this issue in every sync).

STS event :

Events:
Type Reason Age From Message

Normal Killing 43m kubelet Container postgres-exporter definition changed, will be restarted
Normal Pulling 43m kubelet Pulling image "docker.com/wrouesnel/postgres_exporter:latest@sha256:54bd3ba6bc39a9da2bf382667db4dc249c96e4cfc837dafe91d6cc7d362829e0"
Normal Created 43m (x2 over 3d22h) kubelet Created container: postgres-exporter
Normal Started 43m (x2 over 3d22h) kubelet Started container postgres-exporter
Normal Pulled 43m kubelet Successfully pulled image "docker.com/wrouesnel/postgres_exporter:latest@sha256:54bd3ba6bc39a9da2bf382667db4dc249c96e4cfc837dafe91d6cc7d362829e0" in 1.071s (1.071s including waiting). Image size: 33164884 bytes.

State: Running
Started: Mon, 21 Apr 2025 10:37:10 +0530
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Thu, 17 Apr 2025 13:09:10 +0530
Finished: Mon, 21 Apr 2025 10:37:09 +0530
Ready: True

We are also noticing the pods are recreating with the reason pod not yet restarted due to lazy update.

Operator Log :

time="2025-04-21T06:43:04Z" level=debug msg="syncing pod disruption budgets" cluster-name=pg-pgspilotest3/pg-pgspilotest3 pkg=cluster worker=2
time="2025-04-21T06:43:04Z" level=debug msg="syncing roles" cluster-name=pg-pgspilotest3/pg-pgspilotest3 pkg=cluster worker=2
time="2025-04-21T06:43:11Z" level=debug msg="syncing Patroni config" cluster-name=pg-pgspilotest1/pg-pgspilotest1 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="making GET http request: http://192.168.14.210:8008/config" cluster-name=pg-pgspilotest1/pg-pgspilotest1 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="making GET http request: http://192.168.14.210:8008/patroni" cluster-name=pg-pgspilotest1/pg-pgspilotest1 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="syncing pod disruption budgets" cluster-name=pg-pgspilotest1/pg-pgspilotest1 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="syncing roles" cluster-name=pg-pgspilotest1/pg-pgspilotest1 pkg=cluster
time="2025-04-21T06:43:11Z" level=info msg="mark rolling update annotation for pg-pgspilotest2-1: reason pod not yet restarted due to lazy update" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="syncing Patroni config" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="making GET http request: http://192.168.38.25:8008/config" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="making GET http request: http://192.168.14.243:8008/config" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="making GET http request: http://192.168.38.25:8008/patroni" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="making GET http request: http://192.168.14.243:8008/patroni" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=info msg="performing rolling update" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=info msg="there are 2 pods in the cluster to recreate" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="subscribing to pod "pg-pgspilotest2/pg-pgspilotest2-0"" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster

FxKu · 2025-04-23T07:36:26Z

@Ajith61 can you find the reason for the rolling update in the operator logs? There must be a diff logged somewhere above around syncing statefulset.

Ajith61 · 2025-04-23T08:07:05Z

@Ajith61 can you find the reason for the rolling update in the operator logs? There must be a diff logged somewhere above around syncing statefulset.

Thanks @FxKu for the response. I think this issue might be due to wiz(https://www.wiz.io/solutions/container-and-kubernetes-security) in my cluster. When we enabled Wiz in the cluster, I was facing this issue. I noticed WIZ-related annotation is added in the pod annotation. I'm not sure whether this is causing a container/rolling update during the operator sync of the cluster.

apiVersion: v1
kind: Pod
metadata:
annotations:
cni.projectcalico.org/containerID: e29feb1637352db0e21600085f40d16eaf7dc094488846425d69bfec297c64c7
cni.projectcalico.org/podIP: 192.168.35.167/32
cni.projectcalico.org/podIPs: 192.168.35.167/32
image-integrity-validator.wiz.io-0: docker.com/dev/platform/postgres/spilocustom/test/carbonspilo:1.16->sha256:6a5a3ad3b10c80dcba8a6a1df359d67d55fec24c4f183662bfa84e2e3ec9eee7
creationTimestamp: "2025-04-23T06:43:10Z"
generateName: pg-pgspilotest3-
labels:
application: spilo
apps.kubernetes.io/pod-index: "0"

Observations

During operator sync (not every sync, intermittently), the Postgres container restart/rolling update happens after the WIZ is enabled in the cluster.
I'm not facing any issue till the 1.12.2 operator version even though wiz is enabled and running. facing this issue only on 1.13.0 and 1.14.0.
After disabling the wiz in 1.13.0/1.14.0 operator, I don't see any issue. I think Wiz is causing the issue here. operator doing the rolling update/container restart when there are changes in the pod annotation, it seems. Could you please let me know how to avoid this issue in the latest operator? Thanks in advance.

FxKu · 2025-05-13T09:58:57Z

Sorry for late reply. This might have to do with how we compare annotations starting from v1.13.0. If Wiz adds an extra annotation which you want to ignore on diff you have to add it to the ignored_annotations config option.

Ajith61 · 2025-05-16T07:23:06Z

@FxKu Thanks for the response.

I added the annotations under ignored_annotations flag in operator config file like below.Its not working.

I added the annotation key alone image-integrity-validator.wiz.io-0 and full annotation as well.

The annotation has "->" char after the tag in image which converted to \u003e while init the operator. so I think due to the string mismatch the annotations are not being ignored. Any idea how we can fix this?

U+003E > Greater-than sign

ignored_annotations:
- image-integrity-validator.wiz.io-0
- "image-integrity-validator.wiz.io-0: docker.com/dev/platform/postgres/spilocustom/test/
carbonspilo:1.16->sha256:6a5a3ad3b10c80dcba8a6a1df359d67d55fec24c4f183662bfa84e2e3ec9eee7"

operator log :

time="2025-05-05T09:27:42Z" level=info msg=" "image-integrity-validator.wiz.io-0"," pkg=controller
time="2025-05-05T09:27:42Z" level=info msg=" "docker.com/dev/platform/postgres/spilocustom/test/carbonspilo:1.16-\u003esha256:6a5a3ad3b10c80dcba8a6a1df359d67d55fec24c4f183662bfa84e2e3ec9eee7"," pkg=controller

FxKu added the technical issue label May 13, 2025

FxKu changed the title ~~POD container restarting due to definition changed in 1.14.0 operator~~ POD container restarting due to definition changed in 1.14.0 operator after enabling Wiz May 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

POD container restarting due to definition changed in 1.14.0 operator after enabling Wiz #2904

POD container restarting due to definition changed in 1.14.0 operator after enabling Wiz #2904

Ajith61 commented Apr 21, 2025 •

edited

Loading

FxKu commented Apr 23, 2025

Uh oh!

Ajith61 commented Apr 23, 2025

Uh oh!

FxKu commented May 13, 2025

Uh oh!

Ajith61 commented May 16, 2025 •

edited

Loading

Uh oh!

POD container restarting due to definition changed in 1.14.0 operator after enabling Wiz #2904

POD container restarting due to definition changed in 1.14.0 operator after enabling Wiz #2904

Comments

Ajith61 commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

FxKu commented Apr 23, 2025

Uh oh!

Ajith61 commented Apr 23, 2025

Uh oh!

FxKu commented May 13, 2025

Uh oh!

Ajith61 commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ajith61 commented Apr 21, 2025 •

edited

Loading

Ajith61 commented May 16, 2025 •

edited

Loading