gpt2fp161282566418.068261178996182025-12-30T18:51:52+08:002026-01-01T08:00:24+08:00git_rev=a547c7c, rosellm=0.1.0, vllm=0.7.2, sglang=0.4.6, tensorrt_llm=1.1.0, torch=2.6.0, transformers=4.51.3, python=3.11.11| backend | req/s | output tok/s | total tok/s | total latency (s) |
|---|---|---|---|---|
| roseinfer | 201.13 | 12872.49 | 64362.44 | 0.636 |
| roseinfer (in-proc) | 204.11 | 13062.83 | 65314.13 | 0.627 |
| roseinfer (+prefill meta, +ragged no-past) | 207.21 | 13261.60 | 66308.00 | 0.618 |
| roseinfer (+pprio1) | 203.70 | 13036.67 | 65183.33 | 0.628 |
| SGLang | 243.20 | 15564.48 | 77822.40 | 0.526 |
| TensorRT-LLM | 248.69 | 15916.24 | 79581.21 | 0.515 |
| vLLM | 140.44 | 8988.14 | 44940.70 | 0.911 |