1
LoKA: Low-precision Kernel Applications for Recommendation Models At Scale
FP8在推荐模型上总踩坑?LoKA通过系统-模型协同设计,用LoKA Probe精准量化每层误差、LoKA Mods调整模型组件,让低精度训练在大规模推荐模型中既保质量又提效率,一举突破数值敏感与通信瓶颈。
arXiv:2605.10886v2 Announce Type: replace-cross Abstract: Recent GPU generations deliver significantly higher FLOPs using lower-precision arithmetic, …