演講公告

新聞標題：　( 2026-05-18 )

演講主題：(本場次取消) Fast Multipole Attention for Transformer Neural Networks
主講人：Hans De Sterck 教授 (University of Waterloo)
演講日期：2026年6月9日(二) 下午14:00 –15:00
演講地點：
摘要內容：
Abstract. Transformer-based machine learning models have achieved state-of-the-art performance in many areas. However, the quadratic complexity of the self-attention mechanism in Transformer models with respect to the input length hinders the applicability of Transformer-based models to long sequences or large images. To address this, we present Fast Multipole Attention (FMA), a new attention mechanism that uses a divide-and-conquer strategy to reduce the time and memory complexity of attention from $O(n^2)$ to $O(n \log n)$ or $O(n)$, while retaining a global receptive field. The hierarchical approach groups queries, keys, and values into $O(\log n)$ levels of resolution, where groups at greater distances are increasingly larger in size and the weights to compute group quantities are learned. As such, the interaction between tokens far from each other is considered in lower resolution in an efficient hierarchical manner. This multi-level divide-and-conquer strategy is inspired by fast summation methods from n-body physics and the Fast Multipole Method. We perform evaluation on language modeling and image processing tasks and compare our FMA model with other efficient attention variants on medium-size datasets. We find empirically that the Fast Multipole Transformer outperforms other efficient transformers in terms of memory size and accuracy. For large language models, the FMA mechanism has the potential to enable greater sequence lengths, taking the full context into account in an efficient, naturally hierarchical manner during training and when generating long sequences.
相關檔案：演講1150609.png

$返回$ go back

演講公告

新聞標題： ( 2026-05-18 )

新聞標題：　( 2026-05-18 )