Simple Inference

agicy2026/6/6大约 1 分钟

Simple Inference

原始题目：LeetGPU - Simple Inference

题目描述

编写一个 GPU 程序，对 PyTorch 模型执行推理。给定一个输入张量和一个训练好的 torch.nn.Linear 模型，计算前向传播并将结果存储在输出张量中。

模型执行线性变换：

output = input @ weight^T + bias

其中 $weight$ 的形状为 $[output\_size, input\_size]$ ， $bias$ 的形状为 $[output\_size]$ 。

实现要求

使用 PyTorch 的内置函数和操作。
solve 函数签名必须保持不变。
最终结果必须存储在 output 张量中。
模型已加载并准备好进行推理。

示例

示例 1

Input:  input = [[1.0, 2.0]]  (batch_size=1, input_size=2)
Model:  Linear(weight=[[0.5, 1.0], [1.5, 0.5]], bias=[0.1, 0.2])
Output: [[2.6, 2.7]]  (batch_size=1, output_size=2)

示例 2

Input:  input = [[1.0], [2.0], [3.0]]  (batch_size=3, input_size=1)
Model:  Linear(weight=[[2.0]], bias=[1.0])
Output: [[3.0], [5.0], [7.0]]  (batch_size=3, output_size=1)

约束条件

$1 \le batch\_size \le 1{,}000$ 。
$1 \le input\_size, output\_size \le 1{,}000$ 。
$-10.0 \le$ 输入值 $\le 10.0$ 。
性能测试在 $batch\_size = 1{,}000$ 的规模下进行。

解题思路

本题的核心是利用 PyTorch 的向量化运算完成线性层的前向计算。在 GPU 上，F.linear 或直接调用 model(input) 会自动调度到高效的 cuBLAS 矩阵乘法实现。手写 GPU kernel 可以加深对矩阵乘法 GEMM 线程协作的理解。

代码实现

CUDA

// Simple Inference is a PyTorch-based challenge.
// Solution uses PyTorch ops, not raw CUDA.
// Equivalent: output = model(input) where model is torch.nn.Linear
// See simple-inference.py for the PyTorch solution.

Triton

import torch

def solve(input_tensor, model):
    """简单模型的推理前向传播"""
    return model(input_tensor)