Softmax Attention
2026/3/9小于 1 分钟
Softmax Attention
题面
Implement a GPU program that computes the softmax attention operation for a given set of matrices. Given the query matrix Q (M×d), key matrix K (N×d), and value matrix V (N×d), compute Attention(Q, K, V) = softmax(Q K^T / √d) V(按行应用 softmax)。
Implementation Requirements
- Use only GPU native features (external libraries are not permitted)
- The solve function signature must remain unchanged
- The final result must be stored in the output matrix
output
Examples
Example 1:
Input: Q (2×4), K (3×4), V (3×4)
Output: output (2×4)Example 2:
Input: Q (1×2), K (2×2), V (2×2)
Output: output (1×2)Constraints
- Matrix Q is of size M×d and matrices K and V are of size N×d
- 1 ≤ M, N ≤ 100,000
- 1 ≤ d ≤ 128
- Performance is measured with M = 512, N = 256