From: "kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core" Date: 2023-02-11T06:33:38+00:00 Subject: [ruby-core:112356] [Ruby master Feature#19430] Contribution wanted: DNS lookup by c-ares library Issue #19430 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). I totally agree that this problem is worth solving, but moving away from platform-native libc-based DNS lookups will definitely have an impact on our use of Ruby (at Zendesk). I'd like to share a bit of information about our use-case. We have a development environment on MacOS based on Docker for Mac. We have a dnsmasq container running inside the Docker for Mac VM, and we configure the MacOS DNS resolver (via editing files in `/etc/resolver`) to forward queries for `.docker` (and some other domains, like `.consul` and `.zd-dev`) to the dnsmasq container (which is listening on a port forwarded to the host by Docker). This makes it possible to connect to things like `mysql.docker` etc from a console on macOS, including from Ruby scripts. As is, if c-ares is used for DNS resolution, then this kind of domain-specific DNS routing will stop working. c-ares will simply forward all queries to the DNS servers returned by `res_getservers` from `resolv.h`, and they won't know how to handle `.docker` etc. I'd be very curious to hear from other Docker for Mac users if they do something similar, or if our setup at Zendesk is just crazy. I can think of two ways to fix this problem: * Implement support for reading the DNS configuration out of the system configuration framework in c-ares, and using that information to implement per-domain DNS dispatch. * Implement some kind of external DNS resolver which just satisfies all queries by calling `gethostbyname(3)`, and point c-ares at that. This is essentially how systemd-resolved handles this problem; apps _can_ talk to resolved directly over its DBus API (or by using the `nss_resolve` NSS module). However, it also exposes a stub DNS resolver at `127.0.0.53:53` and publishes that in `/etc/resolv.conf`; apps which do their own DNS by looking for servers in `/etc/resolv.conf` will thus be sending their queries to resolved and get the same answers as everybody else. (A few Q's about the stub resolver - would it need to be a system or user service? Is it something the Ruby interpreter could fork off itself?) Anyway - I'm sharing this not to try and suggest we shouldn't do a switch to c-ares, but rather to see how we can keep some use-cases working while we do it :) ---------------------------------------- Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430#change-101799 * Author: mame (Yusuke Endoh) * Status: Open * Priority: Normal ---------------------------------------- ## Problem At the present time, Ruby uses `getaddrinfo(3)` to resolve names. Because this function is synchronous, we cannot interrupt the thread performing name resolution until the DNS server returns a response. We can see this behavior by setting blackhole.webpagetest.org (72.66.115.13) as a DNS server, which swallows all packets, and resolving any name: ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` As we see, Ctrl+C does not stop ruby. The current workaround that users can take is to do name resolution in a Ruby thread. ```ruby Thread.new { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }.value ``` The thread that calls this code is interruptible. (Note that the newly created thread itself will be stuck until the DNS lookup exceeds the time out.) ## Proposal We can solve this problem by using c-ares, which is an asynchronous name resolver, as a backend of `Addrinfo.getaddrinfo`, etc. (@sorah told me about this library, thanks!) https://c-ares.org/ I have created a PoC patch. https://github.com/mame/ruby/commit/547806146993bbc25984011d423dcc0f913b211c By applying this patch, we can interrupt `Addrinfo.getaddrinfo` by Ctrl+C. ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `
' ``` ## Discussion ### About c-ares According to the site of c-ares, some major tools including libcurl, Wireshark, and Apache Arrow are already using c-ares. In the language interpreter, node.js seems to be using c-ares. I am honestly not sure about the compatibility of c-ares with `getaddrinfo(3)`. I guess there is no major incompatibility because I have not experienced any name resolution problem of curl. @akr (who is the author and maintainer of Ruby's socket library) suggested to check if OS-specific name resolution, e.g., WINS on Windows, NIS on Solaris, etc., is supported. He also said that it may be acceptable even if they are not supported. Whether to bundle c-ares source code with ruby would require further discussion. If this proposal is accepted, then c-ares will become a de facto essential dependency for practical use, like gmp, in my opinion. Incidentally, node.js bundles c-ares: https://github.com/nodejs/node/tree/main/deps/cares ### Alternative approaches Recent glibc provides `getaddrinfo_a(3)` which performs asynchronous name resolution. However, this function has a fatal problem of being incompatible with `fork(2)`, which is heavily used in the Ruby ecosystem. In fact, the attempt to use `getaddrinfo_a(3)` (#17134) has been revert because it fails rails tests. (#17220) Another alternative is to have a worker pthread inside Ruby that calls getaddrinfo(3). Instead of calling getaddrinfo(3) directly, `Addrinfo.getaddrinfo` would ask the worker to resolve a name and wait for a response. This method should be able to implement cancellation. (Simply put, this means reimplementation of getaddrinfo_a(3) on our own, taking into account of `fork(2).) This has the advantages: not adding dependencies on external libraries and not having compatibility issues with `getaddrinfo(3)`. However, it is considerably more difficult to implement and maintain. An internal pthread may have a non-trivial impact on the execution efficiency and memory usage. Also, we may need to implement a mechanism to dynamically change the number of workers depending on the load. It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise. ## Contribution wanted I have made it up to the PoC, but don't have much time to complete this. @naruse suggested me to create a ticket asking for contributions. Is anyone interested in this? * This patch changes `rsock_getaddrinfo` to accept a timeout argument. There are several places where Qnil is passed as a timeout (where I add `// TODO` in the PoC). We need to consider what timeout we should pass. * This cares only `getaddrinfo`, but we also need to care `getnameinfo` (and something else if any). There may be some issues I'm not aware of. * I have not yet tested this PoC seriously. It would be great if we could evaluate it with some real apps. Also, it would be great to hear from someone who knows more about c-ares. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/