General Matrix Multiplication (GEMM)
2026/3/9小于 1 分钟
General Matrix Multiplication (GEMM)
题面
Implement a basic GEMM over FP16 matrices with FP32 accumulation: ,其中 为 FP16 行优先存储, 为 FP32。
Implementation Requirements
- Use only native features (external libraries other than WMMA are not permitted)
- The solve function signature must remain unchanged
- Accumulate in FP32, then写回 FP16 到矩阵 C
Examples
Input: $A (2 \times 3), B (3 \times 2), C_{initial} (2 \times 2), \alpha=1.0, \beta=0.0 \rightarrow$ Output $C$ (FP16)Constraints
- Performance is measured with