Reduction
2026/3/9小于 1 分钟
Reduction
题面
Write a GPU program that performs parallel reduction on an array of 32-bit floating point numbers to compute their sum. The program should take an input array and produce a single output value containing the sum of all elements.
Implementation Requirements
- Use only GPU native features (external libraries are not permitted)
- The solve function signature must remain unchanged
- The final result must be stored in the output variable
Examples
Example 1:
Input: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]
Output: 36.0Example 2:
Input: [-2.5, 1.5, -1.0, 2.0]
Output: 0.0Constraints
- 1 ≤ N ≤ 100,000,000
- -1000.0 ≤ input[i] ≤ 1000.0
- The final sum will always fit within a 32-bit float
- Performance is measured with N = 4,194,304