gpt2fp161282566418.068261178996182025-12-30T18:51:52+08:002025-12-30T18:52:10+08:00git_rev=a547c7c, rosellm=0.1.0, vllm=0.7.2, sglang=0.4.6, tensorrt_llm=1.1.0, torch=2.6.0, transformers=4.51.3, python=3.11.11| backend | req/s | output tok/s | total tok/s | total latency (s) |
|---|---|---|---|---|
| roseinfer | 201.13 | 12872.49 | 64362.44 | 0.636 |
| roseinfer (in-proc) | 204.11 | 13062.83 | 65314.13 | 0.627 |
| roseinfer (+pslots, +warmup cg16) | 201.55 | 12899.30 | 64496.48 | 0.635 |
| roseinfer (+warmup cg16) | 202.47 | 12957.92 | 64789.61 | 0.632 |
| SGLang | 243.20 | 15564.48 | 77822.40 | 0.526 |
| TensorRT-LLM | 248.69 | 15916.24 | 79581.21 | 0.515 |
| vLLM | 140.44 | 8988.14 | 44940.70 | 0.911 |