The framing is the right one. Cost per token is the axis, not parameter count, and inference is where the bill actually runs. That part I will defend all day.
The 23.4x is where I would push. 25.7 to 601 tok/s on a 35B-A3B MoE (3B active) on a B200: the ceiling number is aggregate-under-concurrency, but 25.7 reads like single-stream unbatched generate. Those are different axes. Batching alone moves an MoE by an order of magnitude before any VKAE mechanism kicks in. The honest speedup is same-concurrency both sides, same batch, same harness, VKAE on vs off. Otherwise part of what you are measuring is 'we turned batching on.'
Same for 'no quality loss.' Measured on what? Aggregate perplexity or a task average sits still while long-context recall or the rare hard token quietly moves. The loss lives in the tail, not the mean.
When the paper lands, will the headline number be VKAE-on vs VKAE-off at identical concurrency, or optimized-throughput vs a single-stream baseline?