1
Gemma 4: The 128K Multimodal Powerhouse in Your Terminal
揭秘长上下文推理的内存陷阱:即便模型量化后塞入显存,注意力KV缓存也可能比模型本身更吃内存。
A raw, developer-first look at Google’s new open-weight Gemma 4 family—featuring a hands-on local Python setup, a comparison of the 2B, 9B, and 31B va…