From: merch-redmine@... Date: 2020-03-05T04:04:07+00:00 Subject: [ruby-core:97367] [Ruby master Bug#16672] net/http leaves original content-length header intact after inflating response Issue #16672 has been updated by jeremyevans0 (Jeremy Evans). jmreid (Justin Reid) wrote in #note-5: > > So the method appears to be operating exactly as documented. > > I'm not saying that method isn't working as intended. It's working as intended and I'm just using it to show the size of the header for the request net/http made. My comment saying that `content-length` needs to match 9995 meant: The `Content-Length` header that net/http returns needs to actually match the content length of the body for that request. I think I understand your reasoning better now. net/http is currently deleting the `Content-Encoding` header, so deleting or modifying the `Content-Length` header makes sense in that light. Modifying the `Content-Length` header is tricky because you do not know the decoded size until after decoding. Anyway, the current implementation, by deleting `Content-Encoding` and not changing `Content-Length`, is inconsistent. I don't think net/http should be modifying the response headers. I think the response headers should remain exactly as sent by the server. The deletion of the `Content-Encoding` header has been present since the initial support was merged in commit:ff7f462bf49a1199b1657de6a73a0dc91deae1fa back in 2007. My guess as to why is so that existing callers that decoded bodies based on the value of the `Content-Encoding` header would not attempt to decode an already decoded body after the support was merged. And that does seem a reasonable way of keeping backwards compatibility while adding transparent decoding of bodies. I'm not sure how much code still exists that uses net/http and manually decodes bodies based on the `Content-Encoding` header, but maybe we can consider whether to remove the automatic deletion of the `Content-Encoding` header when decoding, possibly indicating whether decoding happened using a separate method. ---------------------------------------- Bug #16672: net/http leaves original content-length header intact after inflating response https://bugs.ruby-lang.org/issues/16672#change-84493 * Author: jmreid (Justin Reid) * Status: Open * Priority: Normal * ruby -v: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin19] * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- When using net/http to make a request to a resource, the default request headers are the following (when you have ZLIB available): `"accept-encoding"=>["gzip;q=1.0,deflate;q=0.6,identity;q=0.3"], "accept"=>["*/*"], "user-agent"=>["Ruby"]` This means that a resource will return a gzipped response if it can provide it. Take this URL for example: `https://storage.googleapis.com/justin-reid-test/test.js` This is a JS file that has a `content-length` of `2733` when gzipped and `9995` when inflated: ``` curl "https://storage.googleapis.com/justin-reid-test/test.js" -H "accept-encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3" | wc -c 2733 curl "https://storage.googleapis.com/justin-reid-test/test.js" | wc -c 9995 ``` When making a simple request for this asset using net/http: ``` uri = URI('https://storage.googleapis.com/justin-reid-test/test.js') res = Net::HTTP.get_response(uri) ``` Ruby will (https://github.com/ruby/ruby/blob/f08cd708b11dd5b293986b92bb5e227731665b36/lib/net/http/response.rb#L264-L278): - Delete the `content-encoding` header - inflate the body - return the inflated body The issue here is that Ruby also leaves the `content-length` header set to the original request's value: ``` require 'net/http' uri = URI('https://storage.googleapis.com/justin-reid-test/test.js') res = Net::HTTP.get_response(uri) puts "Fetching: https://storage.googleapis.com/justin-reid-test/test.js" puts "Body size using String#bytesize: #{res.body.to_s.bytesize}" puts "Content-Length response header: #{res.content_length}" ``` Results in: ``` Fetching: https://storage.googleapis.com/justin-reid-test/test.js Body size using String#bytesize: 9995 Content-Length response header: 2733 ``` This means that an incorrect `content-length` header is passed back when net/http makes requests for gzip objects and inflates them. This issue was noticed when Rack changed their behaviour in how they compute content-length. They used to compute the content-length for each body, but that changed in 2.0.8: https://github.com/rack/rack/commit/8c62821f4a464858a6b6ca3c3966ec308d2bb53e#diff-10b933d2c1fdc82ceecade456c64e1c2L92 https://github.com/rack/rack/issues/1472#issuecomment-574362342 Using `Rack::ContentLength` is now the method they prefer if you need to compute the content-length. However, `Rack::ContentLength` will not try to re-compute the value if that header already exists: https://github.com/rack/rack/blob/6196377654b7ff7ce7abaecea62bb285d77d53aa/lib/rack/content_length.rb#L21 Should Ruby: - Do a `self.delete 'content-length'` in the inflater? - Compute the `content-length` itself and update the header? (Hacky example: https://github.com/ruby/ruby/compare/master...jmreid:content-length) -- https://bugs.ruby-lang.org/ Unsubscribe: