Skip to content

Add timeout in aiohttp.close #41374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Add timeout in aiohttp.close #41374

wants to merge 13 commits into from

Conversation

xiangyan99
Copy link
Member

No description provided.

@xiangyan99 xiangyan99 marked this pull request as ready for review June 2, 2025 15:17
@jabbera
Copy link
Contributor

jabbera commented Jun 2, 2025

This doesn't fix the performance issue but reduced it to 100ms from 30s for every connection close. It looks like aiohttp plans to release a 3.12.7 that moves the error to debug, but I really think all of this is a bad idea because it's just masking 100ms of latency on closing every connection. It sounds like the SDK should be aborting the transport instead of closing it, or the Azure TLS stack should be sending a close_notify back when it gets one from the client.

@xiangyan99
Copy link
Member Author

This work around the performance issue but still adds 100ms of latency to every connection close. It looks like aiohttp plans to release a 3.12.7 that moves the error to debug, but I really think all of this is a bad idea because it's just masking 100ms of latency on closing every connection. It sounds like the SDK should be aborting the transport instead of closing it, or the Azure TLS stack should be sending a close_notify back when it gets one from the client.

Agreed. This solution isn't perfect.

But TLS 1.3 does not seem to require sending one immediately.

The change sets a limit to ensure added latency doesn't exceed 100ms, which helps reduce risk.

@jabbera
Copy link
Contributor

jabbera commented Jun 3, 2025

The change sets a limit to ensure added latency doesn't exceed 100ms, which helps reduce risk.

Since we know Azure is not sending the close_notify why not replace this with a call_soon or something that doesn't wait at all? The TLS spec 1.3 allows for this behavior explicitly. Adding 100ms to every close connection doesn't make any sense even though it's better than 30.

TLS 1.3 RFC

   Both parties need not wait to receive a
   "close_notify" alert before closing their read side of the
   connection, though doing so would introduce the possibility of
   truncation.

@xiangyan99
Copy link
Member Author

The change sets a limit to ensure added latency doesn't exceed 100ms, which helps reduce risk.

Since we know Azure is not sending the close_notify why not replace this with a call_soon or something that doesn't wait at all? The TLS spec 1.3 allows for this behavior explicitly. Adding 100ms to every close connection doesn't make any sense even though it's better than 30.

TLS 1.3 RFC

   Both parties need not wait to receive a
   "close_notify" alert before closing their read side of the
   connection, though doing so would introduce the possibility of
   truncation.

aiohttp is not happy if we close it immediately w/o waiting.

I saw

Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x0000024289BA3F50>
Unclosed connector
connections: ['deque([(<aiohttp.client_proto.ResponseHandler object at 0x000002428AD5A210>, 115033.39)])']
connector: <aiohttp.connector.TCPConnector object at 0x000002428AD84A70>

0.1s seems also the default timeout for aiohttp.

ssl_shutdown_timeout - Grace period for SSL shutdown handshake on TLS
connections. Default is 0.1 seconds. This usually
allows for a clean SSL shutdown by notifying the
remote peer of connection closure, while avoiding
excessive delays during connector cleanup.
Note: Only takes effect on Python 3.11+.

@jabbera
Copy link
Contributor

jabbera commented Jun 3, 2025

ssl_shutdown_timeout - Grace period for SSL shutdown handshake on TLS
connections. Default is 0.1 seconds. This usually
allows for a clean SSL shutdown by notifying the
remote peer of connection closure, while avoiding
excessive delays during connector cleanup.
Note: Only takes effect on Python 3.11+.

That setting was added in 3.12.6 if you can take a minimum dependency on that you can add ssl_shutdown_timeout to the client session args here and set it to something comically small: .000001 for example. aio-libs/aiohttp#11091 (comment)

@xiangyan99
Copy link
Member Author

ssl_shutdown_timeout - Grace period for SSL shutdown handshake on TLS
connections. Default is 0.1 seconds. This usually
allows for a clean SSL shutdown by notifying the
remote peer of connection closure, while avoiding
excessive delays during connector cleanup.
Note: Only takes effect on Python 3.11+.

That setting was added in 3.12.6 if you can take a minimum dependency on that you can add ssl_shutdown_timeout to the client session args here and set it to something comically small: .000001 for example. aio-libs/aiohttp#11091 (comment)

But unfortunately, this parameter does not work for py3.11- and our library needs to support py3.9 & 3.10.

@jabbera
Copy link
Contributor

jabbera commented Jun 3, 2025

But unfortunately, this parameter does not work for py3.11- and our library needs to support py3.9 & 3.10.

That's fine, but even the aiohttp people are saying you want something smaller then 100ms: aio-libs/aiohttp#11091 (comment)

100ms is an astronomical amount of time to wait for something in network land you know will never happen. We have pinned to 3.12.3 so I'm going to drop this issue for now, but I don't know any other way to say that I think this timeout is too long given that you know it will always be triggered.

@xiangyan99
Copy link
Member Author

But unfortunately, this parameter does not work for py3.11- and our library needs to support py3.9 & 3.10.

That's fine, but even the aiohttp people are saying you want something smaller then 100ms: aio-libs/aiohttp#11091 (comment)

100ms is an astronomical amount of time to wait for something in network land you know will never happen. We have pinned to 3.12.3 so I'm going to drop this issue for now, but I don't know any other way to say that I think this timeout is too long given that you know it will always be triggered.

changed to 1ms.

@jabbera
Copy link
Contributor

jabbera commented Jun 3, 2025

changed to 1ms.

Thanks! Here is the perf difference between 100ms and 1ms with raw aiohttp in a loop using the new parameter. I really think this change makes more sense now.

image



@pytest.mark.skipif(
sys.version_info < (3, 11), reason="ssl_shutdown_timeout in aiohttp only takes effect on Python 3.11+"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to skip 3.9/3.10? Ideally, we see no warnings logged for those versions, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see errors when using aiohttp 3.12.7 with py 3.9.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What errors? Things should still work with Python 3.9+ since we are only adding a timeout to asyncio.wait_for and not using ssl_shutdown_timeout.

Also, let's @pytest.mark.live_test_only this test. Ideally, we use the storage account that's provisioned in test-resources.bicep.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aiohttp hangs in Python 3.9 (compared to a 30-second delay in 3.11). In 3.9, we have to forcefully stop aiohttp, and an error from aiohttp appeared during my test on Python 3.9.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to reproduce an error locally with 3.9/3.10. Did you see this on CI?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this on py3.9 w/ timeout in azure-core.aiohttp added:

Exception ignored in: <function _ProactorBasePipeTransport.del at 0x000002255D531670>
Traceback (most recent call last):
File "C:\Users\xiangyan\AppData\Local\Programs\Python\Python39\lib\asyncio\proactor_events.py", line 116, in del
self.close()
File "C:\Users\xiangyan\AppData\Local\Programs\Python\Python39\lib\asyncio\proactor_events.py", line 108, in close
self._loop.call_soon(self._call_connection_lost, None)
File "C:\Users\xiangyan\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 751, in call_soon
self._check_closed()
File "C:\Users\xiangyan\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 515, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I'm not seeing it because I'm running on WSL. Looks like that ^ only shows up in windows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that runtime exception show up in windows/py3.9 even without the timeout changes in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. aiohttp 3.12.7 does not solve the problem on py 3.9. (I did not test py 3.10)



@pytest.mark.skipif(
sys.version_info < (3, 11), reason="ssl_shutdown_timeout in aiohttp only takes effect on Python 3.11+"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What errors? Things should still work with Python 3.9+ since we are only adding a timeout to asyncio.wait_for and not using ssl_shutdown_timeout.

Also, let's @pytest.mark.live_test_only this test. Ideally, we use the storage account that's provisioned in test-resources.bicep.

@xiangyan99 xiangyan99 requested a review from pvaneck June 3, 2025 23:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants