Skip to content

POD container restarting due to definition changed in 1.14.0 operator after enabling Wiz #2904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Ajith61 opened this issue Apr 21, 2025 · 4 comments

Comments

@Ajith61
Copy link

Ajith61 commented Apr 21, 2025

Hi All,

We are facing a container restart issue when we use the 1.14.0 operator. This issue does not occur in version 1.9.0. The postgres/postgres exporter container is restarting during any one of the operator sync intervals (we are not facing this issue in every sync).

STS event :

Events:
Type Reason Age From Message


Normal Killing 43m kubelet Container postgres-exporter definition changed, will be restarted
Normal Pulling 43m kubelet Pulling image "docker.com/wrouesnel/postgres_exporter:latest@sha256:54bd3ba6bc39a9da2bf382667db4dc249c96e4cfc837dafe91d6cc7d362829e0"
Normal Created 43m (x2 over 3d22h) kubelet Created container: postgres-exporter
Normal Started 43m (x2 over 3d22h) kubelet Started container postgres-exporter
Normal Pulled 43m kubelet Successfully pulled image "docker.com/wrouesnel/postgres_exporter:latest@sha256:54bd3ba6bc39a9da2bf382667db4dc249c96e4cfc837dafe91d6cc7d362829e0" in 1.071s (1.071s including waiting). Image size: 33164884 bytes.

State: Running
Started: Mon, 21 Apr 2025 10:37:10 +0530
Last State: Terminated
Reason: Error

Exit Code: 2
Started: Thu, 17 Apr 2025 13:09:10 +0530
Finished: Mon, 21 Apr 2025 10:37:09 +0530
Ready: True

We are also noticing the pods are recreating with the reason pod not yet restarted due to lazy update.

Operator Log :

time="2025-04-21T06:43:04Z" level=debug msg="syncing pod disruption budgets" cluster-name=pg-pgspilotest3/pg-pgspilotest3 pkg=cluster worker=2
time="2025-04-21T06:43:04Z" level=debug msg="syncing roles" cluster-name=pg-pgspilotest3/pg-pgspilotest3 pkg=cluster worker=2
time="2025-04-21T06:43:11Z" level=debug msg="syncing Patroni config" cluster-name=pg-pgspilotest1/pg-pgspilotest1 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="making GET http request: http://192.168.14.210:8008/config" cluster-name=pg-pgspilotest1/pg-pgspilotest1 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="making GET http request: http://192.168.14.210:8008/patroni" cluster-name=pg-pgspilotest1/pg-pgspilotest1 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="syncing pod disruption budgets" cluster-name=pg-pgspilotest1/pg-pgspilotest1 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="syncing roles" cluster-name=pg-pgspilotest1/pg-pgspilotest1 pkg=cluster
time="2025-04-21T06:43:11Z" level=info msg="mark rolling update annotation for pg-pgspilotest2-1: reason pod not yet restarted due to lazy update" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="syncing Patroni config" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="making GET http request: http://192.168.38.25:8008/config" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="making GET http request: http://192.168.14.243:8008/config" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="making GET http request: http://192.168.38.25:8008/patroni" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="making GET http request: http://192.168.14.243:8008/patroni" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=info msg="performing rolling update" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=info msg="there are 2 pods in the cluster to recreate" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="subscribing to pod "pg-pgspilotest2/pg-pgspilotest2-0"" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster

@FxKu
Copy link
Member

FxKu commented Apr 23, 2025

@Ajith61 can you find the reason for the rolling update in the operator logs? There must be a diff logged somewhere above around syncing statefulset.

@Ajith61
Copy link
Author

Ajith61 commented Apr 23, 2025

@Ajith61 can you find the reason for the rolling update in the operator logs? There must be a diff logged somewhere above around syncing statefulset.

Thanks @FxKu for the response. I think this issue might be due to wiz(https://www.wiz.io/solutions/container-and-kubernetes-security) in my cluster. When we enabled Wiz in the cluster, I was facing this issue. I noticed WIZ-related annotation is added in the pod annotation. I'm not sure whether this is causing a container/rolling update during the operator sync of the cluster.

apiVersion: v1
kind: Pod
metadata:
annotations:
cni.projectcalico.org/containerID: e29feb1637352db0e21600085f40d16eaf7dc094488846425d69bfec297c64c7
cni.projectcalico.org/podIP: 192.168.35.167/32
cni.projectcalico.org/podIPs: 192.168.35.167/32
image-integrity-validator.wiz.io-0: docker.com/dev/platform/postgres/spilocustom/test/carbonspilo:1.16->sha256:6a5a3ad3b10c80dcba8a6a1df359d67d55fec24c4f183662bfa84e2e3ec9eee7
creationTimestamp: "2025-04-23T06:43:10Z"
generateName: pg-pgspilotest3-
labels:
application: spilo
apps.kubernetes.io/pod-index: "0"

Observations

  1. During operator sync (not every sync, intermittently), the Postgres container restart/rolling update happens after the WIZ is enabled in the cluster.

  2. I'm not facing any issue till the 1.12.2 operator version even though wiz is enabled and running. facing this issue only on 1.13.0 and 1.14.0.

  3. After disabling the wiz in 1.13.0/1.14.0 operator, I don't see any issue. I think Wiz is causing the issue here. operator doing the rolling update/container restart when there are changes in the pod annotation, it seems. Could you please let me know how to avoid this issue in the latest operator? Thanks in advance.

@FxKu
Copy link
Member

FxKu commented May 13, 2025

Sorry for late reply. This might have to do with how we compare annotations starting from v1.13.0. If Wiz adds an extra annotation which you want to ignore on diff you have to add it to the ignored_annotations config option.

@FxKu FxKu changed the title POD container restarting due to definition changed in 1.14.0 operator POD container restarting due to definition changed in 1.14.0 operator after enabling Wiz May 13, 2025
@Ajith61
Copy link
Author

Ajith61 commented May 16, 2025

@FxKu Thanks for the response.

I added the annotations under ignored_annotations flag in operator config file like below.Its not working.

I added the annotation key alone image-integrity-validator.wiz.io-0 and full annotation as well.

The annotation has "->" char after the tag in image which converted to \u003e while init the operator. so I think due to the string mismatch the annotations are not being ignored. Any idea how we can fix this?

U+003E > Greater-than sign

ignored_annotations:
- image-integrity-validator.wiz.io-0
- "image-integrity-validator.wiz.io-0: docker.com/dev/platform/postgres/spilocustom/test/
carbonspilo:1.16->sha256:6a5a3ad3b10c80dcba8a6a1df359d67d55fec24c4f183662bfa84e2e3ec9eee7"

operator log :

time="2025-05-05T09:27:42Z" level=info msg=" "image-integrity-validator.wiz.io-0"," pkg=controller
time="2025-05-05T09:27:42Z" level=info msg=" "docker.com/dev/platform/postgres/spilocustom/test/carbonspilo:1.16-\u003esha256:6a5a3ad3b10c80dcba8a6a1df359d67d55fec24c4f183662bfa84e2e3ec9eee7"," pkg=controller

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants