chunkloris: cowboy

part of the chunkloris per-chunk amplification survey. this page is the per-server record for cowboy under http/1.1 chunked transfer encoding.

at a glance

server: cowboy 2.15.0 (cowlib 2.16.1, ranch 2.2.0)
runtime: erlang-otp-27.1.2
ecosystem: beam
concurrency model: actor
parser: cowlib cow_http_te (hand-rolled chunked-TE state machine)
delivery granularity: per-chunk
chunk-limit helper: none exposed by the framework
verdict: per-chunk — the parser/dispatcher boundary delivers one event per wire chunk. cpu cost under paced mode b is measurable per chunk.
scaling exponent (mode a): 0.89 (wall time vs N, log-log slope across common cells)
scaling exponent (mode b): 1.00

measurements

all cells run on a 1-vcpu docker container. cpu cost is derived from the target container’s cgroup v2 cpu.stat usage_usec delta around each cell.

mode	N	wall (s)	server cpu %	µs / chunk	basis	ok
`A-bridge-coalesced`	50,000	0.135	75.0	2.000	wall	✓
`A-bridge-coalesced`	100,000	0.232	75.0	1.700	wall	✓
`A-bridge-coalesced`	250,000	0.569	75.0	1.700	wall	✓
`B-paced-100us`	50,000	5.154	20.0	20.000	server-cpu-overhead	✓
`B-paced-100us`	100,000	10.299	12.0	12.000	server-cpu-overhead	✓
`B-paced-100us`	250,000	25.769	14.0	14.000	server-cpu-overhead	✓

parser path — source citations

decoder — cowlib src/cow_http_te.erl stream_chunked/3 → source
handler-delivery — cowboy src/cowboy_stream_h.erl data/4 → source

what this means

the parser/dispatcher path on this server delivers one event per chunked-transfer-encoding chunk, so an attacker who sends a body as N one-byte chunks consumes roughly N × (mode-b µs/chunk) of server cpu on a single core. amplification scales linearly with N until the framework’s max_request_body_size (or equivalent) is hit.

what to do today

if this server runs as an origin behind nginx with the default proxy_request_buffering on, the per-chunk attack shape does not reach this server — nginx delivers one content-length-framed body to the upstream in a single recv().
if deployed direct-exposed, behind haproxy with default streaming, or behind any reverse proxy with proxy_request_buffering off, the per-chunk cost reaches this server.
there is no framework-level chunk-count limit in the default config; use a frontend buffer, transport-layer rate limiting, or a wrapping middleware that imposes a chunk-count cap before draining the body.

reproducer

the full reproducer for this server is in the paper repo. the docker container pins cowboy 2.15.0 (cowlib 2.16.1, ranch 2.2.0) and constrains the test container to a single cpu (--cpus=1). the prober script implements mode a (bridge-coalesced) and mode b (paced 100 µs) per the methodology section.

see the draft pdf for the full per-framework discussion.