gpt2fp161282566418.068261178996182025-12-30T18:51:52+08:002026-01-01T08:00:24+08:00git_rev=a547c7c, rosellm=0.1.0, vllm=0.7.2, sglang=0.4.6, tensorrt_llm=1.1.0, torch=2.6.0, transformers=4.51.3, python=3.11.11| backend | req/s | output tok/s | total tok/s | total latency (s) |
|---|---|---|---|---|
| roseinfer | 204.09 | 13061.72 | 65308.61 | 0.627 |
| roseinfer (+fast BT sync) | 203.00 | 12991.97 | 64959.87 | 0.631 |
| roseinfer | 201.53 | 12897.97 | 64489.86 | 0.635 |
| roseinfer (in-proc) | 204.11 | 13062.83 | 65314.13 | 0.627 |
| roseinfer (+no item) | 210.32 | 13460.36 | 67301.82 | 0.609 |
| roseinfer | 201.03 | 12865.71 | 64328.54 | 0.637 |
| roseinfer | 203.70 | 13036.67 | 65183.33 | 0.628 |
| SGLang | 243.20 | 15564.48 | 77822.40 | 0.526 |
| TensorRT-LLM | 248.69 | 15916.24 | 79581.21 | 0.515 |
| vLLM | 140.44 | 8988.14 | 44940.70 | 0.911 |