Multi-Head Attention
2026/3/9小于 1 分钟
Multi-Head Attention
题面
实现多头自注意力:MultiHead(Q,K,V)=Concat(head_1..head_h),其中 head_i = softmax(Q_i K_i^T / sqrt(d_k)) V_i,d_k=d_model/h。
Implementation Requirements
- Use only native features (external libraries are not permitted)
- The solve function signature must remain unchanged
- 输出写入
output
Examples
见页面两组示例(不同 N、d_model、h 设置)。
Constraints
- 1 ≤ N ≤ 10000;2 ≤ d_model ≤ 1024;1 ≤ h ≤ d_model;d_model % h == 0
- 值域 −10.0..10.0;Performance: N=1024, d_model=1024