Wine & Chord - Page 10

Recent posts

从零实现 LLM Inference：020. HuggingFace GPT2 Loader

5 minute read

支持从 HuggingFace 加载 GPT2 权重，为后续和 vLLM/sglang 对齐 benchmark 铺路。

从零实现 LLM Inference：019. Paged Attention

10 minute read

实现真正的 paged attention。

从零实现 LLM Inference：018. Performance

7 minute read

通过性能观测进行性能优化。

从零实现 LLM Inference：017. Profiler

6 minute read

使用 pytorch profiler 进行性能观测。

从零实现 LLM Inference：016. Simple Prefix Caching

8 minute read

实现简单的 prefix caching，通过 prefix cache 来复用之前的 kv-cache。