AMD DEVELOPER HACKATHON · MAY 4–10, 2026 · DAY 5 LIVE
🚀 Try the live app on Hugging Face → or read the source

Four Qwen models plus Wan on one MI300X read a 2,468-page bill and tell you what's in it.

This is a working proof-of-concept built in 48 hours on AMD MI300X (192 GB VRAM) for the lablab.ai AMD Developer Hackathon. The story below is the whole pitch: vLLM's Automatic Prefix Caching on ROCm makes multi-agent legislative analysis cheap, because 10 specialist agents share one prefill of the bill text.

94.15×
APC TTFT speedup
(99,727-token prefix)
14.8×
Wall-clock speedup
(232K-tok summarizer)
68.5%
Sustained APC
cache hit rate
5
4 Qwen models concurrent
plus Wan 2.2

The architecture

One MI300X. Four Qwen models plus Wan 2.2 hot at the same time, sharing 192 GB — the entire pipeline (analysis + slide gen + animation + narration + critic) runs on a single GPU with zero model swapping mid-pipeline.

spine     Qwen3-30B-A3B-Instruct-2507-FP8   vLLM      port 8001   ~35 GB   max_model_len=262,144
vision    Qwen3-VL-8B-Thinking-FP8          vLLM      port 8002    ~9 GB   dual-call critic
imagegen  Qwen-Image-2512 FP8 + Lightning   ComfyUI   port 8188   ~22 GB   1280x720 in ~5 s
videogen  Wan 2.2 i2v 14B FP8 + LightX2V    ComfyUI   port 8188   ~30 GB   832x480 81-frame clips
tts       Qwen3-TTS-12Hz-1.7B-CustomVoice   ComfyUI   port 8188    ~5 GB   Alex/Jordan voices

combined  ~150 / 192 GB VRAM (78.1%, with 40+ GB headroom for KV cache)
backend   vllm/vllm-openai-rocm:v0.17.1 on ROCm 7.2.3 + ComfyUI on PyTorch 2.10.0

The spine holds a 250K-token bill chunk in its context. Every specialist agent hits the same chunk. With --enable-prefix-caching, the chunk is prefilled once; every additional agent against that chunk pays only its short instruction overhead. On a 32-80 GB GPU you cannot even fit this stack — the spine alone needs more than a 4090. On smaller GPUs, agents have to spill, re-prefill, or model-swap — so on consumer hardware this whole pattern doesn't work. 192 GB is the unlock.

Demo arc — three real bills

ActBillPagesTokensChunks
IHR 1 — OBBB Act 2025 (enacted)330230,9251
IIHR 5376 — Build Back Better 20212,468927,2926
IIIHR 2670 — FY24 NDAA1,236541,3093

Smart chunker is title-boundary aware: never splits a TITLE / Subtitle mid-section, falls back to Subtitle-level cuts when a single Title overflows the 200K cl100k-token budget (calibrated for Qwen tokenizer's ~16% inflation over cl100k_base).

Headline #1 — APC delivers 94× TTFT speedup

First Day-1 measurement on AMD MI300X. Same 99,727-token prefix from HR 1, two different follow-up questions:

{
  "model": "spine",
  "endpoint": "http://localhost:8001/v1",
  "prefix_tokens": 99727,
  "request_a_cold": {
    "ttft_ms": 51046.1,
    "question": "Summarize the first major Title of the bill in two sentences."
  },
  "request_b_warm": {
    "ttft_ms": 542.2,
    "question": "List three named entities mentioned in the bill text above."
  },
  "speedup_ttft": 94.15,
  "passes_3x_floor": true,
  "passes_5x_target": true
}

That number alone is the elevator pitch. Whatever else this hackathon ships, that ratio is real and reproducible. Source: eval/apc-benchmark-hr1.json.

Headline #2 — Same effect on real agents

Cold call: Plain-English Summarizer on BBB-2021 chunk 1 (TITLE I — AGRICULTURE, 232K Qwen tokens, ~$100 B in forest-restoration funding). 5 minutes.

Warm rerun, identical chunk text, same agent: 21.4 seconds. 14.8× wall-clock speedup, no code changes — just APC reusing the KV cache from the first call.

RunStateelapsedprompt_tokcompletion_tok
Summarizercold prefill316.4 s232,853369
Summarizer (rerun)APC warm21.4 s232,853369
USC Cross-ReferenceAPC partial~95 s232,994~6,000
Pork FinderAPC partial92.4 s232,9942,216
Conflict SpotterAPC partial270.6 s233,01925

Sustained Prefix cache hit rate: 68.5% on the spine across all four agents on the same chunk. This is the core hackathon thesis — 14 agents amortized across one prefill instead of paying for 14 prefills.

Headline #3 — The output is grounded in real statute text

The USC Cross-Reference agent identifies citations the bill makes (e.g. "Section 4(a) of the Food and Nutrition Act of 2008"), then a local LMDB lookup attaches the actual current US Code text to each citation. This stops the LLM from hallucinating about what cited statutes say — the LLM identifies where, the LMDB grounds what.

Five real recovered citations from BBB-2021 chunk 1:

7 U.S.C. 3103
Definitions
National Agricultural Research, Extension, and Teaching Policy Act of 1977 — defines “insular area,” cited in BBB-2021 to scope SNAP benefits.
16 U.S.C. 7303
Collaborative Forest Landscape Restoration Program
Existing forest-restoration grant authority that BBB-2021 augments with $4.5 B in vegetation management.
42 U.S.C. 4321
Congressional declaration of purpose (NEPA)
National Environmental Policy Act — bill cross-references baseline review framework.
16 U.S.C. 6511
Definitions (HFRA)
Healthy Forests Restoration Act — bill amends HFRA project authority for $10 B in hazardous fuels reduction.
16 U.S.C. 7125
Resource advisory committees
RAC structure for Forest Service decision-making cross-referenced for new $9 B grant program governance.

The USC Cross-Reference agent extracts every citation in the bill text, then enriches each one against a local USC LMDB built from Title 1-54 of the U.S. Code (60K+ sections, ~10 us per hit). Sub-paragraph notations like 7 U.S.C. 3103(2) are canonicalized down to their parent section. Multi-chunk bills get their citations concatenated with a chunk provenance tag so the merged report can attribute every reference to the chunk it came from.

Headline #4 — Vision pipeline reads tax tables off bill PDFs

Qwen3-VL-8B-Thinking-FP8 (port 8002) renders a single page of BBB-2021 (page 2222 — a tax bracket schedule for joint filers) and emits structured JSON in 11.4 seconds:

{
  "type": "tax_bracket",
  "applies_to": "MARRIED INDIVIDUALS FILING JOINT RETURNS AND SURVIVING SPOUSES",
  "rows": [{
    "lower_bound_usd": 400000,
    "upper_bound_usd": 450000,
    "tax_rate_pct": 35,
    "notes": "plus 35% of the excess over $400,000"
  }]
}

All values verified exact against the source bill. The model can see the page, read the structure, and emit JSON. No OCR, no tabula post-processing.

Headline #5 — Lipsync avatar mode + hybrid podcast

On Day 7 the Qwen+Wan pipeline gained a fifth output mode: a talking-head podcast where two AI hosts (Alex and Jordan) deliver the bill summary on camera. Audio is Qwen3-TTS (Ryan + Ono_anna voices), faces and lipsync come from InfiniteTalk-on-Wan22 (832×480 / 25 fps, ~10 s pair clips), and a brand overlay composites DeadAir Broadcasting cards on top.

The hybrid compositor alternates slide pairs (Wan-animated stills with Qwen-Image visuals) and talking-head pairs (lipsynced avatars), cutting on dialog boundaries. The result is a slide → talking-head → slide → talking-head hybrid the brain can actually follow at podcast pacing, with a lipsync intro/outro setting tone.

BBB hybrid master   final-bbb-cloud-hybrid-podcast.mp4
                    17.2 MB  /  3:08  /  17 clips (14 body + 3 brand)
                    5 slide pairs + 4 talking-head pairs, alternating
                    -c copy concat (no re-encode) - all clips share
                    h264 High yuv420p 832x480 @25fps + AAC LC mono 24kHz

Lipsync intro/outro overlays add 14 seconds of DeadAir branding without breaking the audio-aligned cut chain. The same compositor wires all three output modes — a single source script can render slides-only, all-talking-head, or the alternating hybrid — so the right format for the bill (data-heavy => lean slides; vibe-heavy => lean avatars) is a one-line argument change.

How to read the artifacts

Every claim on this page links to a real file in the repo:

What's next

Roadmap through Day 7: 10 more specialist agents (Citation Validator, Effective Date Tracker, Stakeholder Tracer, Constitutional/Preemption Analyst, Final Synthesizer); orchestrator that runs them in parallel against one chunk; full-bill end-to-end on BBB-2021 in <8 minutes target time at FP8.