Excited to announce Curb 1.2.0 with support for Fiber aware IO scheduling. It also builds way faster now by using multiple CPU's to configure itself in parallel.
A new saner default when using Curl.head to not wait for a body since many http servers when sending a HEAD request will still respond with a Content-Length setting CURLOPT_NOBODY=1 is a much better default.
Here's an example using making requests with async:
You can still use Curl::Multi but now if your code is already in a http server you don't have to do anything to get the benefit of the overlapping IO
sdwolfz · 1d ago
Congratulations on your release!
Just a few questions:
1. I've never felt the need to use anything else apart from ruby's Net::HTTP, while I see every project I've worked on in the past add in stuff like faraday or httparty for doing JSON REST calls. Apart for the convenience aspect of it in terms of lines of code, is there any advantage for me to use your gem in such cases? (for example performance?)
2. I'm confused as to why you would need to do something special for fibre aware scheduling. Is it the case that ruby considers anything traversing into C world as "CPU bound" even when it sits idle and you need to instruct it otherwise?
3. How does it behave with ractors? I mean does it work called inside a ractor or when I'm initialising a "client" object in the main body, then trying to pass it to many ractors for them to use it in parallel. Prior to ruby 3.4 I've had issues even trying to use `pp` inside a ractor so I'm not expecting miracles, just curious how things are progressing in that area.
taf2 · 1d ago
Thanks it was a fun to dive back into the multiplexing loop logic and finally refresh that code with socket action from libcurl and new ruby APIs...
1. the C parser in libcurl is much faster then ruby's and libcurls support for HTTP/2 and multiplexing - even back in 2010 while at LivingSocial we used curb to improve site throughput (it's improved a lot these days) with the multi interface at my current company CTM we can push out millions of webhook requests with very little CPU burn.
2. Fiber support is needed because like a normal C extension would call a function to open a socket write some bytes, wait, write some more and then wait, and read some, wait and read some more again. All of this while basically the CPU is idle and we're just waiting there blocking a whole ruby thread or even worse process (typical ruby process running rails could be pretty big too so that's a lot of memory consumed just waiting around doing basically nothing). Ruby threading model really helped this and curb was one of the few libraries or only one i know of that played nice with the ruby global interperter lock so multi-thread web apps could do stuff on other threads while one thread was blocked waiting on IO. With this new curb update now a single thread can do multiple http requests with overlapping IO meaning while one fiber is waiting another fiber can do some busy work like queue up more http requests or read something from a db.
3. The key is we're rb_wait_for_single_fd so any reactor (if i'm understanding what a "reactor" really is correctly) can run and do busy work while we're waiting on a file descriptor. It should* work correctly now, if you run into issues i'd love to know about it and help get them fixed.
What I'm hoping for now for ruby is either puma implements a Fiber pool rails db adaptors fully add fiber support then we should as a ruby community have a really nice story around multiplexing I/O to be more competitive with node.js / golang for web socket and other I/O heavy workloads. By competitive here i mean we can use less memory since we'll be able to keep our CPU's busy while we wait for the network or even disk (maybe).
sdwolfz · 1d ago
Thanks for the replies!
1. Good to know, I'll keep this in mind for when I'll have a bottleneck in http parsing/throughput.
2. OK so curb now yields on fiber IO where it previously did not. Got it!
Ah - thanks for the link. Now I have maybe another thing to see if we can improve in curb to ensure Ractor is fully supported. (it might be i just have not tried)
A new saner default when using Curl.head to not wait for a body since many http servers when sending a HEAD request will still respond with a Content-Length setting CURLOPT_NOBODY=1 is a much better default.
Here's an example using making requests with async:
require 'curl' require 'async'
Async { |task|
}You can still use Curl::Multi but now if your code is already in a http server you don't have to do anything to get the benefit of the overlapping IO
Just a few questions:
1. I've never felt the need to use anything else apart from ruby's Net::HTTP, while I see every project I've worked on in the past add in stuff like faraday or httparty for doing JSON REST calls. Apart for the convenience aspect of it in terms of lines of code, is there any advantage for me to use your gem in such cases? (for example performance?)
2. I'm confused as to why you would need to do something special for fibre aware scheduling. Is it the case that ruby considers anything traversing into C world as "CPU bound" even when it sits idle and you need to instruct it otherwise?
3. How does it behave with ractors? I mean does it work called inside a ractor or when I'm initialising a "client" object in the main body, then trying to pass it to many ractors for them to use it in parallel. Prior to ruby 3.4 I've had issues even trying to use `pp` inside a ractor so I'm not expecting miracles, just curious how things are progressing in that area.
1. the C parser in libcurl is much faster then ruby's and libcurls support for HTTP/2 and multiplexing - even back in 2010 while at LivingSocial we used curb to improve site throughput (it's improved a lot these days) with the multi interface at my current company CTM we can push out millions of webhook requests with very little CPU burn.
2. Fiber support is needed because like a normal C extension would call a function to open a socket write some bytes, wait, write some more and then wait, and read some, wait and read some more again. All of this while basically the CPU is idle and we're just waiting there blocking a whole ruby thread or even worse process (typical ruby process running rails could be pretty big too so that's a lot of memory consumed just waiting around doing basically nothing). Ruby threading model really helped this and curb was one of the few libraries or only one i know of that played nice with the ruby global interperter lock so multi-thread web apps could do stuff on other threads while one thread was blocked waiting on IO. With this new curb update now a single thread can do multiple http requests with overlapping IO meaning while one fiber is waiting another fiber can do some busy work like queue up more http requests or read something from a db.
3. The key is we're rb_wait_for_single_fd so any reactor (if i'm understanding what a "reactor" really is correctly) can run and do busy work while we're waiting on a file descriptor. It should* work correctly now, if you run into issues i'd love to know about it and help get them fixed.
What I'm hoping for now for ruby is either puma implements a Fiber pool rails db adaptors fully add fiber support then we should as a ruby community have a really nice story around multiplexing I/O to be more competitive with node.js / golang for web socket and other I/O heavy workloads. By competitive here i mean we can use less memory since we'll be able to keep our CPU's busy while we wait for the network or even disk (maybe).
1. Good to know, I'll keep this in mind for when I'll have a bottleneck in http parsing/throughput.
2. OK so curb now yields on fiber IO where it previously did not. Got it!
3. I was referring to Ractors, the experimental Ruby’s Actor-like concurrent abstraction: https://docs.ruby-lang.org/en/master/ractor_md.html