May 7, 2020May 7, 2020
Common mistakes when
Common mistakes when
using libcurl - and how to fix them!
using libcurl - and how to fix them!
Daniel Stenberg
@bagder
Daniel Stenberg
@bagder
common libcurl mistakes
Documentation HTTP method CURLOPT_NOSIGNAL
Return codes Certificate checks -DCURL_STATICLIB
Verbose option Zero termination Set the URL
curl_global_init C++ strings callback invokes
Redirects Threading C++ methods
@bagder@bagder
Q&A in the end!Q&A in the end!
Why are these mistakes made?
Humans are lazy
Copy and pasted from questionable sources
Documentation is hard
Internet transfers are complicated
Maybe, just maybe, the curl way isn’t always the smartest...
@bagder@bagder
11
@bagder@bagder
Skipping the documentationSkipping the documentation
Lots of options have plain English names
Might trick you think you know what it does
Still might not work like you presume it does
Copy and paste from random web sites
There are also details
The devil is always in the details
@bagder@bagder
Lots of documentationLots of documentation
We offer man pages for every setopt option
We host over 100 stand-alone examples
Consider which docs you rely on (hello
stackoverflow.com)
@bagder@bagder
@bagder@bagder
22
Failure to check return codesFailure to check return codes
@bagder@bagder
Return codes areReturn codes are usefuluseful cluesclues
How to know if the call succeeded?
How to know why something doesn’t do what you expected?
What if the feature isn’t even built-in?
Our example source codes might be bad examples
@bagder@bagder
@bagder@bagder
33
Forgetting the verbose option
Strange, how come it doesn’t work?
Hm, why does it act like this?
Also:
/* please be verbose */
rc = curl_easy_setopt(hnd, CURLOPT_VERBOSE, 1L);
/* provide a buffer to store errors in */
curl_easy_setopt(curl, CURLOPT_ERRORBUFFER, errbuf);
@bagder@bagder
libcurl or content?
By using verbose, you’ll spot if this was libcurl that said it or if this
was actual content delivered from the server!
$ ./app
Error 505: HTTP Version Not Supported
Maybe even in production?
Consider it for debug options
Direct the output somewhere suitable with
CURLOPT_STDERR
Alternatively: CURLOPT_DEBUGFUNCTION
@bagder@bagder
44
@bagder@bagder
There's a global init function
It is called implicitly by curl_easy_perform() if not done
explicitly
Not calling it means relying on default, implicit behavior
It typically then implies not calling curl_global_cleanup()
This may result in not releasing all used memory (“Dear sirs,
why does valgrind report that...”)
@bagder@bagder
curl_global_init isn't thread-safe
curl_global_init needs to be called as a singleton
It is not thread-safe due to legacy and “reasons”
Will hopefully be rectified in a near future
@bagder@bagder
There's a global init function!
Call curl_global_init first
Alone!
Call curl_global_cleanup last
@bagder@bagder
55
@bagder@bagder
Consider the redirects!
HTTP/1.1 301 Moved Permanently
Server: M4gic server/3000
Retry-After: 0
Location: https://curl.haxx.se/
Content-Length: 0
Accept-Ranges: bytes
Date: Thu, 07 May 2020 08:59:56 GMT
Connection: close
@bagder@bagder
Consider the redirects!
Rethink if redirect-following is good
Limit what protocols to allow redirects
Do not set custom HTTP methods on requests that follow
redirects
@bagder@bagder
66
@bagder@bagder
Let users set (parts of) the URL
Scheme (maybe even use another protocol?)
Host name (maybe target a malicious server)
Extreme lengths (pass in 2GB of data?)
Also consider other inputs: user name, password etc risk
getting abused
@bagder@bagder
Limit scope!
Set CURLOPT_PROTOCOLS!
Whitelist/filter
Set only a limited part of the URL
@bagder@bagder
77
@bagder@bagder
Setting the HTTP method
CURLOPT_CUSTOMREQUEST is a footgun
will be used in follow-up requests as well in
redirects
Does not change libcurl's behavior
@bagder@bagder
88
@bagder@bagder
Disabled certificate checks
Widely abused and misunderstood
Only use while experimenting / developing
Never ship in production
This also goes for HTTPS proxies
SCP and SFTP is different
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
@bagder@bagder
Verify server certificates!
Avoid man-in-the-middle attacks
HTTPS is not secure without it!
May require regularly updating the CA store
Alternative: CURLOPT_PINNEDPUBLICKEY
@bagder@bagder
99
@bagder@bagder
Assume zero terminated data in callbacks
CURLOPT_WRITEFUNCTION and CURLOPT_HEADERFUNCTION set
callbacks
Libcurl provide data to the application using these callbacks
The data is provided as a pointer to the data and length of that data
When that data is primarily text oriented, many users wrongly assume
that this means the data comes as zero terminated “strings”.
size_t write_callback(char *dataptr, size_t size, size_t nmemb, void *userp);
@bagder@bagder
Typical mistake
size_t cb(char *dataptr, size_t size, size_t nmemb, void *userp)
{
printf(“Incoming data: %sn”, dataptr);
if(!strncmp(“Foo:”, dataptr, 4)) {
...
}
char *pos = strchr(dataptr, ‘n’);
}
@bagder@bagder
The callback data is binary
The data isn’t text or “string” based
printf(“%s”, ...), strcpy(), strlen() and similar will not work
on this pointer!
@bagder@bagder
1010
@bagder@bagder
C++ strings are not C strings
libcurl provides a C API
C and C++ are similar
C and C++ are also different!
C++ users like their std::string types
C++ Strings are not C strings
curl_easy_setopt() takes a vararg...
@bagder@bagder
C++ string bad code
// Keep the URL as a C++ string object
std::string url("https://example.com/");
// Pass it to curl
curl_easy_setopt(curl, CURLOPT_URL, url);
@bagder@bagder
C++ string good code
// Keep the URL as a C++ string object
std::string url("https://example.com/");
// Pass it to curl as a C string!
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
@bagder@bagder
1111
@bagder@bagder
Threading mistakes
libcurl is thread-safe but there are caveats:
1) No concurrent use of handles
2) OpenSSL < 1.1.0 need mutex callbacks setup
3) curl_global_init is not thread-safe
yet
@bagder@bagder
1212
@bagder@bagder
Understanding CURLOPT_NOSIGNAL
Signals is a unix-concept: “an asynchronous notification sent to a
process or to a specific thread within the same process in order to notify it of
an event that occurred”
Signals are complicated in a multi-threaded world and
when used by a library
@bagder@bagder
What does libcurl use signals for?
When using the synchronous name resolver, libcurl uses alarm()
to abort slow name resolves (if a timeout is set), which ultimately
sends a SIGALARM to the process and is caught by libcurl
libcurl installs its own sighandler while running, and restores the
original one again on return – for SIGALARM and SIGPIPE.
Closing TLS (with OpenSSL) can trigger a SIGPIPE if the connection
is dead.
Unless CURLOPT_NOSIGNAL is set!
@bagder@bagder
What does CURLOPT_NOSIGNAL do?
It stops libcurl from triggering signals
It prevents libcurl from installing its own sighandler
Generated signals must then be handled by the libcurl-
using application!
@bagder@bagder
1313
@bagder@bagder
Forgetting -DCURL_STATICLIB
Creating and using libcurl statically is easy and convenient
Seems especially popular on Windows
Requires the CURL_STATICLIB define to be set when building your
application!
Omission causes linker errors:
"unknown symbol __imp__curl_easy_init”
Because Windows need __declspec to be present or absent in the headers
depending on how it links!
@bagder@bagder
Static builds mean chasing deps
Libcurl can use many 3rd party dependencies
When linking statically, all those need to be provided to the linker
The curl build scripts (as well as your application linking) usually
need manual help to find them all
@bagder@bagder
1414
@bagder@bagder
@bagder@bagder
C++ methods
(Sibling to the C++ strings mistake)
C++ class methods look like functions
C++ class methods cannot be used as callbacks with
libcurl
… since they assume a ‘this’ pointer to the current object
Static member functions work!
@bagder@bagder
A C++ method that works
// f is the pointer to your object.
static size_t YourClass::func(void *buffer, size_t sz, size_t n, void *f)
{
// Call non-static member function.
static_cast<YourClass*>(f)->nonStaticFunction();
}
// This is how you pass pointer to the static function:
curl_easy_setopt(hcurl, CURLOPT_XFERINFOFUNCTION, YourClass::func);
curl_easy_setopt(hcurl, CURLOPT_XEFRINFODATA, this);
1515
@bagder@bagder
@bagder@bagder
Write callback invokes
Data is delivered by callback (CURLOPT_WRITEFUNCTION)
It might be called none, one, two or many times
Never assume you will get a certain amount of calls
Independently of the data amount
Because of network, server, kernel or other reasons
54
You can help!You can help!
@bagder@bagder
https://curl.haxx.se/book.html
@bagder@bagder
Daniel Stenberg
@bagder
https://daniel.haxx.se/
Thank you!Thank you!
Questions?Questions?
@bagder@bagder
License
This presentation and its contents are
licensed under the Creative Commons
Attribution 4.0 license:
http://creativecommons.org/licenses/by/4.0/
@bagder@bagder

common mistakes when using libcurl

  • 1.
    May 7, 2020May7, 2020 Common mistakes when Common mistakes when using libcurl - and how to fix them! using libcurl - and how to fix them!
  • 2.
  • 3.
  • 4.
    common libcurl mistakes DocumentationHTTP method CURLOPT_NOSIGNAL Return codes Certificate checks -DCURL_STATICLIB Verbose option Zero termination Set the URL curl_global_init C++ strings callback invokes Redirects Threading C++ methods @bagder@bagder
  • 5.
    Q&A in theend!Q&A in the end!
  • 6.
    Why are thesemistakes made? Humans are lazy Copy and pasted from questionable sources Documentation is hard Internet transfers are complicated Maybe, just maybe, the curl way isn’t always the smartest... @bagder@bagder
  • 7.
  • 8.
    Skipping the documentationSkippingthe documentation Lots of options have plain English names Might trick you think you know what it does Still might not work like you presume it does Copy and paste from random web sites There are also details The devil is always in the details @bagder@bagder
  • 9.
    Lots of documentationLotsof documentation We offer man pages for every setopt option We host over 100 stand-alone examples Consider which docs you rely on (hello stackoverflow.com) @bagder@bagder
  • 10.
  • 11.
    Failure to checkreturn codesFailure to check return codes @bagder@bagder
  • 12.
    Return codes areReturncodes are usefuluseful cluesclues How to know if the call succeeded? How to know why something doesn’t do what you expected? What if the feature isn’t even built-in? Our example source codes might be bad examples @bagder@bagder
  • 13.
  • 14.
    Forgetting the verboseoption Strange, how come it doesn’t work? Hm, why does it act like this? Also: /* please be verbose */ rc = curl_easy_setopt(hnd, CURLOPT_VERBOSE, 1L); /* provide a buffer to store errors in */ curl_easy_setopt(curl, CURLOPT_ERRORBUFFER, errbuf); @bagder@bagder
  • 15.
    libcurl or content? Byusing verbose, you’ll spot if this was libcurl that said it or if this was actual content delivered from the server! $ ./app Error 505: HTTP Version Not Supported
  • 16.
    Maybe even inproduction? Consider it for debug options Direct the output somewhere suitable with CURLOPT_STDERR Alternatively: CURLOPT_DEBUGFUNCTION @bagder@bagder
  • 17.
  • 18.
    There's a globalinit function It is called implicitly by curl_easy_perform() if not done explicitly Not calling it means relying on default, implicit behavior It typically then implies not calling curl_global_cleanup() This may result in not releasing all used memory (“Dear sirs, why does valgrind report that...”) @bagder@bagder
  • 19.
    curl_global_init isn't thread-safe curl_global_initneeds to be called as a singleton It is not thread-safe due to legacy and “reasons” Will hopefully be rectified in a near future @bagder@bagder
  • 20.
    There's a globalinit function! Call curl_global_init first Alone! Call curl_global_cleanup last @bagder@bagder
  • 21.
  • 22.
    Consider the redirects! HTTP/1.1301 Moved Permanently Server: M4gic server/3000 Retry-After: 0 Location: https://curl.haxx.se/ Content-Length: 0 Accept-Ranges: bytes Date: Thu, 07 May 2020 08:59:56 GMT Connection: close @bagder@bagder
  • 23.
    Consider the redirects! Rethinkif redirect-following is good Limit what protocols to allow redirects Do not set custom HTTP methods on requests that follow redirects @bagder@bagder
  • 24.
  • 25.
    Let users set(parts of) the URL Scheme (maybe even use another protocol?) Host name (maybe target a malicious server) Extreme lengths (pass in 2GB of data?) Also consider other inputs: user name, password etc risk getting abused @bagder@bagder
  • 26.
    Limit scope! Set CURLOPT_PROTOCOLS! Whitelist/filter Setonly a limited part of the URL @bagder@bagder
  • 27.
  • 28.
    Setting the HTTPmethod CURLOPT_CUSTOMREQUEST is a footgun will be used in follow-up requests as well in redirects Does not change libcurl's behavior @bagder@bagder
  • 29.
  • 30.
    Disabled certificate checks Widelyabused and misunderstood Only use while experimenting / developing Never ship in production This also goes for HTTPS proxies SCP and SFTP is different curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L); curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L); @bagder@bagder
  • 31.
    Verify server certificates! Avoidman-in-the-middle attacks HTTPS is not secure without it! May require regularly updating the CA store Alternative: CURLOPT_PINNEDPUBLICKEY @bagder@bagder
  • 32.
  • 33.
    Assume zero terminateddata in callbacks CURLOPT_WRITEFUNCTION and CURLOPT_HEADERFUNCTION set callbacks Libcurl provide data to the application using these callbacks The data is provided as a pointer to the data and length of that data When that data is primarily text oriented, many users wrongly assume that this means the data comes as zero terminated “strings”. size_t write_callback(char *dataptr, size_t size, size_t nmemb, void *userp); @bagder@bagder
  • 34.
    Typical mistake size_t cb(char*dataptr, size_t size, size_t nmemb, void *userp) { printf(“Incoming data: %sn”, dataptr); if(!strncmp(“Foo:”, dataptr, 4)) { ... } char *pos = strchr(dataptr, ‘n’); } @bagder@bagder
  • 35.
    The callback datais binary The data isn’t text or “string” based printf(“%s”, ...), strcpy(), strlen() and similar will not work on this pointer! @bagder@bagder
  • 36.
  • 37.
    C++ strings arenot C strings libcurl provides a C API C and C++ are similar C and C++ are also different! C++ users like their std::string types C++ Strings are not C strings curl_easy_setopt() takes a vararg... @bagder@bagder
  • 38.
    C++ string badcode // Keep the URL as a C++ string object std::string url("https://example.com/"); // Pass it to curl curl_easy_setopt(curl, CURLOPT_URL, url); @bagder@bagder
  • 39.
    C++ string goodcode // Keep the URL as a C++ string object std::string url("https://example.com/"); // Pass it to curl as a C string! curl_easy_setopt(curl, CURLOPT_URL, url.c_str()); @bagder@bagder
  • 40.
  • 41.
    Threading mistakes libcurl isthread-safe but there are caveats: 1) No concurrent use of handles 2) OpenSSL < 1.1.0 need mutex callbacks setup 3) curl_global_init is not thread-safe yet @bagder@bagder
  • 42.
  • 43.
    Understanding CURLOPT_NOSIGNAL Signals isa unix-concept: “an asynchronous notification sent to a process or to a specific thread within the same process in order to notify it of an event that occurred” Signals are complicated in a multi-threaded world and when used by a library @bagder@bagder
  • 44.
    What does libcurluse signals for? When using the synchronous name resolver, libcurl uses alarm() to abort slow name resolves (if a timeout is set), which ultimately sends a SIGALARM to the process and is caught by libcurl libcurl installs its own sighandler while running, and restores the original one again on return – for SIGALARM and SIGPIPE. Closing TLS (with OpenSSL) can trigger a SIGPIPE if the connection is dead. Unless CURLOPT_NOSIGNAL is set! @bagder@bagder
  • 45.
    What does CURLOPT_NOSIGNALdo? It stops libcurl from triggering signals It prevents libcurl from installing its own sighandler Generated signals must then be handled by the libcurl- using application! @bagder@bagder
  • 46.
  • 47.
    Forgetting -DCURL_STATICLIB Creating andusing libcurl statically is easy and convenient Seems especially popular on Windows Requires the CURL_STATICLIB define to be set when building your application! Omission causes linker errors: "unknown symbol __imp__curl_easy_init” Because Windows need __declspec to be present or absent in the headers depending on how it links! @bagder@bagder
  • 48.
    Static builds meanchasing deps Libcurl can use many 3rd party dependencies When linking statically, all those need to be provided to the linker The curl build scripts (as well as your application linking) usually need manual help to find them all @bagder@bagder
  • 49.
  • 50.
    @bagder@bagder C++ methods (Sibling tothe C++ strings mistake) C++ class methods look like functions C++ class methods cannot be used as callbacks with libcurl … since they assume a ‘this’ pointer to the current object Static member functions work!
  • 51.
    @bagder@bagder A C++ methodthat works // f is the pointer to your object. static size_t YourClass::func(void *buffer, size_t sz, size_t n, void *f) { // Call non-static member function. static_cast<YourClass*>(f)->nonStaticFunction(); } // This is how you pass pointer to the static function: curl_easy_setopt(hcurl, CURLOPT_XFERINFOFUNCTION, YourClass::func); curl_easy_setopt(hcurl, CURLOPT_XEFRINFODATA, this);
  • 52.
  • 53.
    @bagder@bagder Write callback invokes Datais delivered by callback (CURLOPT_WRITEFUNCTION) It might be called none, one, two or many times Never assume you will get a certain amount of calls Independently of the data amount Because of network, server, kernel or other reasons
  • 54.
    54 You can help!Youcan help! @bagder@bagder
  • 55.
  • 56.
  • 57.
    License This presentation andits contents are licensed under the Creative Commons Attribution 4.0 license: http://creativecommons.org/licenses/by/4.0/ @bagder@bagder