1
LLMForge: Multi-Backend Hardware-Aware Neural Architecture Search with Infinite-Head Attention for Edge Language Models
首次将无限头注意力融入硬件感知神经架构搜索,为边缘端百亿参数以下语言模型提供多后端高效部署方案。
arXiv:2605.17653v1 Announce Type: new Abstract: Sub-billion-parameter Transformer language models are increasingly deployed on edge devices, where the…