Online Serving Benchmark Summary

scale backend TTFT p50/p90/p99 (ms) TPOT p50/p90/p99 (ms) ITL p50/p90/p99 (ms) E2E p50/p90/p99 (ms)
0.40 roseinfer 9.63/15.60/27.33 1.34/1.52/1.98 1.28/1.51/2.88 174.78/198.08/259.92
0.40 roseinfer (+batch32, -gc freeze, -fast SSE) 9.57/16.39/81.31 1.34/1.51/6.90 1.28/1.54/3.17 174.38/196.30/806.82
0.40 roseinfer (-fast SSE) 9.52/15.85/27.48 1.34/1.52/2.12 1.27/1.52/2.71 173.15/196.69/279.92
0.40 roseinfer (-gc freeze, -fast SSE) 9.45/15.74/62.12 1.34/1.52/6.09 1.28/1.54/3.28 174.73/198.28/709.39
0.40 SGLang 7.67/9.69/14.58 1.10/1.23/1.57 1.07/1.29/3.06 144.10/157.67/197.26
0.40 TensorRT-LLM 5.68/6.28/7.60 1.38/1.41/1.87 1.37/1.51/2.59 180.05/184.11/190.06
0.40 vLLM 9.21/10.28/13.09 1.59/1.84/1.99 1.53/1.86/3.30 200.58/235.18/255.43
0.80 roseinfer 5.18/6.01/6.98 1.27/1.35/1.39 1.24/1.42/1.80 161.45/174.80/180.24
0.80 roseinfer (+batch32, -gc freeze, -fast SSE) 5.22/6.26/7.64 1.27/1.34/1.37 1.24/1.42/1.78 162.40/175.06/179.00
0.80 roseinfer (-fast SSE) 5.23/6.06/7.18 1.27/1.34/1.37 1.24/1.42/1.80 161.58/175.20/179.64
0.80 roseinfer (-gc freeze, -fast SSE) 5.28/6.19/7.02 1.27/1.35/1.37 1.25/1.42/1.79 162.33/175.86/180.11
0.80 SGLang 8.50/10.28/15.90 1.07/1.17/1.39 1.06/1.22/2.14 143.34/152.21/161.56
0.80 TensorRT-LLM 5.77/6.46/7.66 1.37/1.39/1.61 1.36/1.48/2.06 179.11/182.43/191.21
0.80 vLLM 9.20/10.36/11.11 1.45/1.67/1.85 1.42/1.69/2.73 187.58/213.70/233.04
1.60 roseinfer 5.35/6.05/6.58 1.25/1.34/1.41 1.23/1.39/1.71 161.22/172.66/183.70
1.60 roseinfer (+batch32, -gc freeze, -fast SSE) 5.37/6.02/6.84 1.26/1.34/1.43 1.23/1.39/1.72 161.66/173.47/184.96
1.60 roseinfer (-fast SSE) 5.36/6.09/6.59 1.25/1.34/1.43 1.22/1.39/1.69 160.15/171.70/185.17
1.60 roseinfer (-gc freeze, -fast SSE) 5.37/6.03/6.72 1.26/1.35/1.43 1.23/1.40/1.70 161.57/174.13/185.22
1.60 SGLang 9.12/10.73/15.47 1.06/1.16/1.32 1.06/1.20/1.87 143.18/151.19/175.24
1.60 TensorRT-LLM 5.95/6.51/7.43 1.37/1.39/1.52 1.36/1.48/1.88 179.02/182.16/192.10
1.60 vLLM 9.55/10.81/11.40 1.37/1.57/1.74 1.37/1.60/2.05 182.75/202.63/227.41