Upcoming Events
PhD Defense | On the Efficiency and Steerability of Self-Attention Mechanism of Large Language Models

Title: On the Efficiency and Steerability of Self-Attention Mechanism of Large Language Models
Date: April 9th, 2025
Time: 2:00 pm – 3:30 pm (EST)
Location: Online
Zoom link: https://gatech.zoom.us/j/99605972633?pwd=sXxqHgVu2d3bj129p7kQnqadNk6Xqg.1
Qingru Zhang
Machine Learning PhD Candidate
School of Computational Science and Engineering
Georgia Institute of Technology
Committee
1. Dr. Tuo Zhao (ISYE, Georgia Tech) (Advisor)
2. Dr. Chao Zhang (CSE, Georgia Tech)
3. Dr. Anqi Wu (CSE, Georgia Tech)
4. Dr. Bo Dai (CSE, Georgia Tech)
5. Dr. Xiaodong Liu (Microsoft Research)
Abstract
Large language models (LLMs) have demonstrated exceptional performance across a wide range of real-world tasks. These models leverage self-attention mechanism to capture intricate dependencies between tokens, resulting in precise contextual understanding. However, when handling prompts containing long background contexts, the self-attention mechanism often faces challenges: (1) significant memory and computational overheads when processing long sequences, and (2) difficulty in fully comprehending contexts and performing complex reasoning. In this thesis, we focus on two crucial aspects of self-attention: efficiency and steerability, and explore innovative prompting techniques to address these challenges. In the first part, we tackle the computational and memory overheads of long sequence modeling by introducing mixed attention span and compressing Key-Value caches, achieving near-lossless performance with significantly reduced costs. In the second part, we propose post-hoc attention steering method that guides LLM attention to better align with contextual information and user instructions. In the final part, we present innovative prompting strategies that enhance LLM reading comprehension via steerable prompting and improve complex reasoning through a parallel decomposition approach. Together, these contributions advance the scalability, controllability, and reasoning capabilities of LLMs.
Event Details
Media Contact
EVENTS BY SCHOOL & CENTER
School of Computational Science and Engineering
School of Interactive Computing
School of Cybersecurity and Privacy
Algorithms and Randomness Center (ARC)
Center for 21st Century Universities (C21U)
Center for Deliberate Innovation (CDI)
Center for Experimental Research in Computer Systems (CERCS)
Center for Research into Novel Computing Hierarchies (CRNCH)
Constellations Center for Equity in Computing
Institute for People and Technology (IPAT)
Institute for Robotics and Intelligent Machines (IRIM)