gpt2fp161282566418.068261178996182025-12-30T18:51:52+08:002026-01-01T08:12:29+08:00git_rev=a547c7c, rosellm=0.1.0, vllm=0.7.2, sglang=0.4.6, tensorrt_llm=1.1.0, torch=2.6.0, transformers=4.51.3, python=3.11.11| backend | req/s | output tok/s | total tok/s | total latency (s) |
|---|---|---|---|---|
| roseinfer | 201.13 | 12872.49 | 64362.44 | 0.636 |
| roseinfer (+chunk bucket) | 202.57 | 12964.54 | 64822.70 | 0.632 |
| roseinfer (in-proc) | 204.11 | 13062.83 | 65314.13 | 0.627 |
| SGLang | 243.20 | 15564.48 | 77822.40 | 0.526 |
| TensorRT-LLM | 248.69 | 15916.24 | 79581.21 | 0.515 |
| vLLM | 140.44 | 8988.14 | 44940.70 | 0.911 |