Upcoming Events

PhD Defense | On the Efficiency and Steerability of Self-Attention Mechanism of Large Language Models

Title: On the Efficiency and Steerability of Self-Attention Mechanism of Large Language Models

Date: April 9th, 2025

Time: 2:00 pm – 3:30 pm (EST)

Location: Online

Zoom link: https://gatech.zoom.us/j/99605972633?pwd=sXxqHgVu2d3bj129p7kQnqadNk6Xqg.1

Qingru Zhang

Machine Learning PhD Candidate

School of Computational Science and Engineering

Georgia Institute of Technology

Committee

1. Dr. Tuo Zhao (ISYE, Georgia Tech) (Advisor)

2. Dr. Chao Zhang (CSE, Georgia Tech)

3. Dr. Anqi Wu (CSE, Georgia Tech)

4. Dr. Bo Dai (CSE, Georgia Tech)

5. Dr. Xiaodong Liu (Microsoft Research)

Abstract

Large language models (LLMs) have demonstrated exceptional performance across a wide range of real-world tasks. These models leverage self-attention mechanism to capture intricate dependencies between tokens, resulting in precise contextual understanding. However, when handling prompts containing long background contexts, the self-attention mechanism often faces challenges: (1) significant memory and computational overheads when processing long sequences, and (2) difficulty in fully comprehending contexts and performing complex reasoning. In this thesis, we focus on two crucial aspects of self-attention: efficiency and steerability, and explore innovative prompting techniques to address these challenges. In the first part, we tackle the computational and memory overheads of long sequence modeling by introducing mixed attention span and compressing Key-Value caches, achieving near-lossless performance with significantly reduced costs. In the second part, we propose post-hoc attention steering method that guides LLM attention to better align with contextual information and user instructions. In the final part, we present innovative prompting strategies that enhance LLM reading comprehension via steerable prompting and improve complex reasoning through a parallel decomposition approach. Together, these contributions advance the scalability, controllability, and reasoning capabilities of LLMs.

Date/Time