Description
I'm dealing with a server returning broken configuration. There's three breakages.
Log:
2021-11-03 08:13:44.018 GMT [port:8080] UNAVAILABLE: Credentials failed to obtain metadata
2021-11-03 08:13:44.019 GMT [xds-client<6>] Retry ADS stream in 21,552,486,369 ns
2021-11-03 08:14:05.573 GMT [xds-client<6>] Sending LDS request for resources: [grpc/server?xds.resource.listening_address=0.0.0.0:8080]
2021-11-03 08:14:05.573 GMT [xds-client<6>] ADS stream started
2021-11-03 08:14:05.575 GMT Sent DiscoveryRequest <snip>
2021-11-03 08:14:05.773 GMT [xds-client<6>] Received LDS response: <snip>
2021-11-03 08:14:06.120 GMT [xds-client<6>] Received LDS Response version 1635927232752380394 nonce 1. Parsed resources: [grpc/server?xds.resource.listening_address=0.0.0.0:8080]
2021-11-03 08:14:06.120 GMT [xds-client<6>] Failed processing LDS Response version 1635927232752380394 nonce 1. Errors:
LDS response Listener 'grpc/server?xds.resource.listening_address=0.0.0.0:8080' validation error: HttpConnectionManager contains invalid HttpFilter: Invalid filter config for HttpFilter [envoy.filters.http.rbac]: Encountered error parsing policy: com.google.re2j.PatternSyntaxException: error parsing regexp: missing argument to repetition operator: `+`
2021-11-03 08:14:06.120 GMT [xds-client<6>] Sending NACK for LDS update, nonce: 1, current version:
2021-11-03 08:14:06.137 GMT [xds-client<6>] Sent DiscoveryRequest <snip; it was the NACK>
2021-11-03 08:14:20.575 GMT [xds-client<6>] LDS resource grpc/server?xds.resource.listening_address=0.0.0.0:8080 initial fetch timeout
2021-11-03 08:14:20.576 GMT [xds-client<6>] Conclude LDS resource grpc/server?xds.resource.listening_address=0.0.0.0:8080 not exist
2021-11-03 08:14:20.579 GMT Exception in thread "main" java.io.IOException: io.grpc.StatusException: UNAVAILABLE: Listener grpc/server?xds.resource.listening_address=0.0.0.0:8080 unavailable
at io.grpc.xds.XdsServerWrapper.start(XdsServerWrapper.java:168)
at io.grpc.testing.integration.XdsTestServer.start(XdsTestServer.java:184)
at io.grpc.testing.integration.XdsTestServer.main(XdsTestServer.java:97)
Caused by: io.grpc.StatusException: UNAVAILABLE: Listener grpc/server?xds.resource.listening_address=0.0.0.0:8080 unavailable
at io.grpc.Status.asException(Status.java:543)
at io.grpc.xds.XdsServerWrapper$DiscoveryState$3.run(XdsServerWrapper.java:428)
at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:95)
at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:127)
at io.grpc.xds.XdsServerWrapper$DiscoveryState.onResourceDoesNotExist(XdsServerWrapper.java:421)
at io.grpc.xds.ClientXdsClient$ResourceSubscriber.onAbsent(ClientXdsClient.java:2226)
at io.grpc.xds.ClientXdsClient$ResourceSubscriber$1ResourceNotFound.run(ClientXdsClient.java:2172)
at io.grpc.SynchronizationContext$ManagedRunnable.run(SynchronizationContext.java:182)
at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:95)
at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:127)
at io.grpc.SynchronizationContext$1.run(SynchronizationContext.java:155)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2021-11-03 08:14:20.582 GMT Shutting down
2021-11-03 08:14:20.583 GMT java.lang.NullPointerException
at io.grpc.testing.integration.XdsTestServer.stop(XdsTestServer.java:211)
at io.grpc.testing.integration.XdsTestServer.access$000(XdsTestServer.java:49)
at io.grpc.testing.integration.XdsTestServer$1.run(XdsTestServer.java:91)
The first is that the resource is considered not to exist. That is not right. The watcher should have been delivered an error and the resource wait timer cancelled. I'll note that in this case we had previously gotten a lot of UNAVAILABLE: Credentials failed to obtain metadata
failures, and this is the first response to arrive (not included because that log was painful to copy).
Even assuming that the resource is properly determined to not exist, it shouldn't cause start()
to fail. From A36 xDS for Servers:
XdsServer's start must not fail due to transient xDS issues, like missing xDS configuration from the xDS server.
And then there's a bug in XdsTestServer if start()
throws, since server
was never assigned.