chunkloris: falcon (on async-http / protocol-http1)

part of the chunkloris per-chunk amplification survey. this page is the per-server record for falcon (on async-http / protocol-http1) under http/1.1 chunked transfer encoding.

at a glance

server: falcon (on async-http / protocol-http1) 0.49.0
runtime: ruby-3.3.11
ecosystem: ruby
concurrency model: fiber-per-connection (async gem)
parser: Protocol::HTTP1::Body::Chunked (pure Ruby, async-http gem)
delivery granularity: per-chunk-streaming-to-handler
chunk-limit helper: none exposed by the framework
verdict: per-chunk — the parser/dispatcher boundary delivers one event per wire chunk. cpu cost under paced mode b is measurable per chunk.
scaling exponent (mode a): 1.00 (wall time vs N, log-log slope across common cells)

measurements

all cells run on a 1-vcpu docker container. cpu cost is derived from the target container’s cgroup v2 cpu.stat usage_usec delta around each cell.

mode	N	wall (s)	server cpu %	µs / chunk	basis	ok
`A-bridge-coalesced`	50,000	0.096	133.4	2.565	server-cpu-cgroup	✓
`A-bridge-coalesced`	100,000	0.176	118.7	2.085	server-cpu-cgroup	✓
`A-bridge-coalesced`	?	0.606	—	2.420	wall	✓
`B-paced-100us`	50,000	5.130	8.8	9.062	server-cpu-cgroup	✓
`B-paced-100us`	100,000	10.245	8.6	8.761	server-cpu-cgroup	✓
`B-paced-100us`	?	26.084	10.0	10.400	server-cpu-overhead	✓

parser path — source citations

chunked-decoder — protocol-http1 gem: lib/protocol/http1/body/chunked.rb
rack-input-bridge — protocol-rack gem: lib/protocol/rack/input.rb (#sysread)

what this means

the parser/dispatcher path on this server delivers one event per chunked-transfer-encoding chunk, so an attacker who sends a body as N one-byte chunks consumes roughly N × (mode-b µs/chunk) of server cpu on a single core. amplification scales linearly with N until the framework’s max_request_body_size (or equivalent) is hit.

what to do today

if this server runs as an origin behind nginx with the default proxy_request_buffering on, the per-chunk attack shape does not reach this server — nginx delivers one content-length-framed body to the upstream in a single recv().
if deployed direct-exposed, behind haproxy with default streaming, or behind any reverse proxy with proxy_request_buffering off, the per-chunk cost reaches this server.
there is no framework-level chunk-count limit in the default config; use a frontend buffer, transport-layer rate limiting, or a wrapping middleware that imposes a chunk-count cap before draining the body.

reproducer

the full reproducer for this server is in the paper repo. the docker container pins falcon (on async-http / protocol-http1) 0.49.0 and constrains the test container to a single cpu (--cpus=1). the prober script implements mode a (bridge-coalesced) and mode b (paced 100 µs) per the methodology section.

see the draft pdf for the full per-framework discussion.