Softmax Attention

agicy2026/3/9小于 1 分钟

Softmax Attention

题面

Implement a GPU program that computes the softmax attention operation for a given set of matrices. Given the query matrix Q (M×d), key matrix K (N×d), and value matrix V (N×d), compute Attention(Q, K, V) = softmax(Q K^T / √d) V（按行应用 softmax）。

Implementation Requirements

Use only GPU native features (external libraries are not permitted)
The solve function signature must remain unchanged
The final result must be stored in the output matrix output

Examples

Example 1:

Input: Q (2×4), K (3×4), V (3×4)
Output: output (2×4)

Example 2:

Input: Q (1×2), K (2×2), V (2×2)
Output: output (1×2)

Constraints

Matrix Q is of size M×d and matrices K and V are of size N×d
1 ≤ M, N ≤ 100,000
1 ≤ d ≤ 128
Performance is measured with M = 512, N = 256