-
Notifications
You must be signed in to change notification settings - Fork 1k
POD container restarting due to definition changed in 1.14.0 operator after enabling Wiz #2904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@Ajith61 can you find the reason for the rolling update in the operator logs? There must be a diff logged somewhere above around syncing statefulset. |
Thanks @FxKu for the response. I think this issue might be due to wiz(https://www.wiz.io/solutions/container-and-kubernetes-security) in my cluster. When we enabled Wiz in the cluster, I was facing this issue. I noticed WIZ-related annotation is added in the pod annotation. I'm not sure whether this is causing a container/rolling update during the operator sync of the cluster. apiVersion: v1 Observations
|
Sorry for late reply. This might have to do with how we compare annotations starting from v1.13.0. If Wiz adds an extra annotation which you want to ignore on diff you have to add it to the |
@FxKu Thanks for the response. I added the annotations under ignored_annotations flag in operator config file like below.Its not working. I added the annotation key alone image-integrity-validator.wiz.io-0 and full annotation as well. The annotation has "->" char after the tag in image which converted to \u003e while init the operator. so I think due to the string mismatch the annotations are not being ignored. Any idea how we can fix this? U+003E > Greater-than sign ignored_annotations: operator log : time="2025-05-05T09:27:42Z" level=info msg=" "image-integrity-validator.wiz.io-0"," pkg=controller |
Uh oh!
There was an error while loading. Please reload this page.
Hi All,
We are facing a container restart issue when we use the 1.14.0 operator. This issue does not occur in version 1.9.0. The postgres/postgres exporter container is restarting during any one of the operator sync intervals (we are not facing this issue in every sync).
STS event :
Events:
Type Reason Age From Message
Normal Killing 43m kubelet Container postgres-exporter definition changed, will be restarted
Normal Pulling 43m kubelet Pulling image "docker.com/wrouesnel/postgres_exporter:latest@sha256:54bd3ba6bc39a9da2bf382667db4dc249c96e4cfc837dafe91d6cc7d362829e0"
Normal Created 43m (x2 over 3d22h) kubelet Created container: postgres-exporter
Normal Started 43m (x2 over 3d22h) kubelet Started container postgres-exporter
Normal Pulled 43m kubelet Successfully pulled image "docker.com/wrouesnel/postgres_exporter:latest@sha256:54bd3ba6bc39a9da2bf382667db4dc249c96e4cfc837dafe91d6cc7d362829e0" in 1.071s (1.071s including waiting). Image size: 33164884 bytes.
State: Running
Started: Mon, 21 Apr 2025 10:37:10 +0530
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Thu, 17 Apr 2025 13:09:10 +0530
Finished: Mon, 21 Apr 2025 10:37:09 +0530
Ready: True
We are also noticing the pods are recreating with the reason pod not yet restarted due to lazy update.
Operator Log :
time="2025-04-21T06:43:04Z" level=debug msg="syncing pod disruption budgets" cluster-name=pg-pgspilotest3/pg-pgspilotest3 pkg=cluster worker=2
time="2025-04-21T06:43:04Z" level=debug msg="syncing roles" cluster-name=pg-pgspilotest3/pg-pgspilotest3 pkg=cluster worker=2
time="2025-04-21T06:43:11Z" level=debug msg="syncing Patroni config" cluster-name=pg-pgspilotest1/pg-pgspilotest1 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="making GET http request: http://192.168.14.210:8008/config" cluster-name=pg-pgspilotest1/pg-pgspilotest1 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="making GET http request: http://192.168.14.210:8008/patroni" cluster-name=pg-pgspilotest1/pg-pgspilotest1 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="syncing pod disruption budgets" cluster-name=pg-pgspilotest1/pg-pgspilotest1 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="syncing roles" cluster-name=pg-pgspilotest1/pg-pgspilotest1 pkg=cluster
time="2025-04-21T06:43:11Z" level=info msg="mark rolling update annotation for pg-pgspilotest2-1: reason pod not yet restarted due to lazy update" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="syncing Patroni config" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="making GET http request: http://192.168.38.25:8008/config" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="making GET http request: http://192.168.14.243:8008/config" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="making GET http request: http://192.168.38.25:8008/patroni" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="making GET http request: http://192.168.14.243:8008/patroni" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=info msg="performing rolling update" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=info msg="there are 2 pods in the cluster to recreate" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
time="2025-04-21T06:43:11Z" level=debug msg="subscribing to pod "pg-pgspilotest2/pg-pgspilotest2-0"" cluster-name=pg-pgspilotest2/pg-pgspilotest2 pkg=cluster
The text was updated successfully, but these errors were encountered: