解析循环冗余校验码——能检能纠的强大能力

agicy2025/10/14大约 7 分钟

解析循环冗余校验码——能检能纠的强大能力

循环冗余校验码（Cyclic Redundancy Check，CRC）是一种常用于信息传输中的编码技术。正如其名，它通常用于检测数据中的错误，体现了良好的检错能力。但实际上，它可谓是「能检能纠，性质十分丰富」。本文重点在于揭示其在纠错方面的功能，并通过实验给出良好的数据凭证。

项目源码已开源

本文将要介绍的方案实际为本人对「北京邮电大学 2023-2024 春季学期《计算机网络》课程实验——数据链路层滑动窗口协议的设计与实现」的最终优化方案，该实验项目源码已托管至 GitHub，仓库地址： https://github.com/agicy/buptLab-datalink。欢迎查阅、Star 或提出改进意见！

从检错到纠错：重新认识 CRC

CRC 的经典用途是检错——发送方计算校验码并附加在报文末尾，接收方重新计算并比对。若两者不一致，则丢弃报文并请求重传（ARQ）。

但 CRC 的能力不止于此。因为它的核心计算定义在 $\text{GF}(2)$ 上，是一种特殊的比特过滤器。

然而现实中有个重要的微妙之处：标准 CRC 实现（如 CRC-32、CRC-8/CCITT）通常带有非零初始值（init）和最终异或值（xorout）。这使得 CRC 不再是纯粹的线性函数，而是一个仿射函数（Affine Function）。

线性 vs 仿射

CRC 可以分解为两部分：

\text{crc}(M) = \text{core}(M) \oplus C_0

其中 $\text{core}(M) = (M \cdot x^k) \bmod P(x)$ 是纯线性部分（满足 $\text{core}(a \oplus b) = \text{core}(a) \oplus \text{core}(b)$ ），而 $C_0$ 是 init 和 xorout 共同引入的常数偏移。

因此：

二项不成立： $\text{crc}(x \oplus y) \neq \text{crc}(x) \oplus \text{crc}(y)$ ，因为常数项 $C_0$ 抵消不掉。
三项成立： $\text{crc}(x \oplus y \oplus z) = \text{crc}(x) \oplus \text{crc}(y) \oplus \text{crc}(z)$ ，三个 $C_0$ 异或后恰好等于一个 $C_0$ 。

证明如下：

\begin{aligned} \text{crc}(x) \oplus \text{crc}(y) \oplus \text{crc}(z) &= (\text{core}(x) \oplus C_0) \oplus (\text{core}(y) \oplus C_0) \oplus (\text{core}(z) \oplus C_0) \\ &= \text{core}(x) \oplus \text{core}(y) \oplus \text{core}(z) \oplus C_0 \oplus C_0 \oplus C_0 \\ &= \text{core}(x \oplus y \oplus z) \oplus C_0 \\ &= \text{crc}(x \oplus y \oplus z) \end{aligned}

一般地，参与运算的项数为奇数时性质成立，为偶数时则多出一个 $C_0$ 。

正是这一三项 XOR 性质，让 CRC 具备了单比特纠错的潜力。本文将详细推导其数学原理并给出实验验证。

单比特纠错的 O(1) 算法

原理

设发送方发送定长 $L$ 比特的报文 $M$ ，正确的 CRC 值 $C_{\text{correct}} = \text{crc}(M)$ 为接收方所知。

假设传输过程中仅发生至多一个比特错误，即接收到的报文为 $M' = M \oplus e_i$ ，其中 $e_i$ 是仅第 $i$ 位为 1 的错误向量。

接收方计算差错模式 $S$ ：

\begin{aligned} S &= \text{crc}(M') \oplus \text{crc}(M) \oplus \text{crc}(\mathbf{0}) \\ &= \text{crc}(M \oplus e_i) \oplus \text{crc}(M) \oplus \text{crc}(\mathbf{0}) \\ &= \text{crc}(M \oplus e_i \oplus M \oplus \mathbf{0}) \quad (\text{由三项 XOR 性质}) \\ &= \text{crc}(e_i) \end{aligned}

关键洞察：差错模式 $S = \text{crc}(e_i)$ 仅取决于错误位置 $i$ ，与原始报文内容 $M$ 无关。式中 $\mathbf{0}$ 表示与 $M$ 等长的全零报文。

哈希表预计算

既然 $\text{crc}(e_i)$ 只与 $i$ 有关，可以提前建好一张查找表：

对于 $i \in [0, L-1]$ ，构造 $e_i$ （仅第 $i$ 位为 1 的 $L$ 比特向量）。
计算 $h_i = \text{crc}(e_i)$ 。
存入哈希表： $\text{lut}[h_i] = i$ 。
计算常数 $C_{\text{zero}} = \text{crc}(\mathbf{0})$ 。

当接收方收到报文后，纠错只需两步：

计算 $S = \text{crc}(M') \oplus C_{\text{correct}} \oplus C_{\text{zero}}$ 。
查表得到 $i = \text{lut}[S]$ ，翻转 $M'$ 的第 $i$ 位即是正确报文。

整个过程只需一次 CRC 计算和一次哈希表查询，时间复杂度为 $O(1)$ 。

碰撞条件

不同错误位置的 $\text{crc}(e_i)$ 是否可能相同（发生碰撞）？这取决于 CRC 多项式的周期。对于 $n$ 位 CRC 使用本原多项式（Primitive Polynomial），其周期为 $2^n - 1$ 。只要报文长度 $L < 2^n - 1$ ，就能保证每个错误位置产生的校验值唯一。以 CRC-32 为例， $2^{32} - 1 \approx 4.3 \times 10^9$ 比特 ≈ 512 MiB，远超实际报文长度，碰撞在单比特纠错场景下几乎不可能发生。

代码验证

完整实现如下。代码使用 Python 标准库 zlib.crc32（CRC-32/ISO-HDLC，本原多项式周期 $2^{32}-1$ ，无需外部依赖），包含完整的类型注解和边界检查。

CRC single-bit error correction experiment
"""CRC single-bit error correction experiment.

Demonstrates O(1) single-bit error correction using CRC-32's
three-term XOR property: crc(x ^ y ^ z) = crc(x) ^ crc(y) ^ crc(z).

CRC-32/ISO-HDLC uses a primitive polynomial with period 2^32 - 1,
ensuring no hash collisions for messages up to ~512 MiB.
"""

import random
import zlib
from typing import Optional


def calc_crc(data: bytes) -> int:
    """Compute CRC-32/ISO-HDLC (zlib built-in, non-zero init and xorout)."""
    return zlib.crc32(data) & 0xFFFFFFFF


def precompute_lut(msg_len: int) -> tuple[dict[int, int], int]:
    """Build lookup table mapping crc(error_vector) -> bit position.

    For each bit position i in [0, msg_len-1], constructs an error
    vector e_i (all zeros except bit i = 1) and stores crc(e_i) as key.

    Args:
        msg_len: Fixed message length in bits.

    Returns:
        (lut, crc_zero) where lut maps CRC-32 syndrome to bit position,
        and crc_zero is the CRC-32 of an all-zero message.

    Raises:
        RuntimeError: If two distinct bit positions yield the same CRC
            (should never happen within the polynomial's period).
    """
    byte_len = msg_len // 8
    zero = bytes(byte_len)
    lut: dict[int, int] = {}

    for i in range(msg_len):
        err = bytearray(zero)
        byte_idx = i // 8
        bit_idx = 7 - (i % 8)  # MSB first within each byte
        err[byte_idx] |= 1 << bit_idx
        syndrome = calc_crc(bytes(err))
        if syndrome in lut:
            raise RuntimeError(
                f"Hash collision: bit {i} and bit {lut[syndrome]} "
                f"both map to CRC 0x{syndrome:08x}. "
                f"CRC polynomial period may be too short for this message length."
            )
        lut[syndrome] = i

    crc_zero = calc_crc(zero)
    return lut, crc_zero


def correct(
    received: bytes,
    expected_crc: int,
    lut: dict[int, int],
    crc_zero: int,
) -> Optional[tuple[bytes, int]]:
    """Attempt to correct a single-bit error in *received*.

    Uses the three-term XOR property to compute the error syndrome:
        S = crc(M') ^ crc(M) ^ crc(0) = crc(e_i)
    then looks up S in *lut* to find the error position.

    Args:
        received: Received message, possibly with a single-bit error.
        expected_crc: Known correct CRC-32 of the original message.
        lut: Precomputed table from crc(error_vector) to bit position.
        crc_zero: CRC-32 of all-zero message at the same length.

    Returns:
        ``None`` if no error was found or correction failed.
        ``(corrected_message, error_position)`` on successful correction.
    """
    received_crc = calc_crc(received)
    if received_crc == expected_crc:
        return None

    syndrome = received_crc ^ expected_crc ^ crc_zero
    pos = lut.get(syndrome)
    if pos is None:
        return None

    corrected = bytearray(received)
    byte_idx = pos // 8
    bit_idx = 7 - (pos % 8)
    corrected[byte_idx] ^= 1 << bit_idx

    return bytes(corrected), pos


def run_test(msg_bytes: int = 1500, trials: int = 10000) -> None:
    """Randomized test: inject a single-bit error and verify correction.

    Args:
        msg_bytes: Message length in bytes.
        trials: Number of random test rounds.
    """
    msg_bits = msg_bytes * 8
    print(f"Message length: {msg_bits} bits ({msg_bytes} bytes)")
    print(f"Trials: {trials}")

    lut, crc_zero = precompute_lut(msg_bits)
    print(f"Lookup table: {len(lut)} entries, CRC(0) = 0x{crc_zero:08x}")

    success = 0
    failed = 0

    for _ in range(trials):
        msg = random.randbytes(msg_bytes)
        expected_crc = calc_crc(msg)
        received = bytearray(msg)

        err_pos = random.randrange(msg_bits)
        byte_idx = err_pos // 8
        bit_idx = 7 - (err_pos % 8)
        received[byte_idx] ^= 1 << bit_idx

        result = correct(bytes(received), expected_crc, lut, crc_zero)
        if result is None:
            failed += 1
            print(f"FAIL: could not correct error at bit {err_pos}")
            continue

        corrected, detected_pos = result
        if corrected == msg and detected_pos == err_pos:
            success += 1
        else:
            failed += 1
            print(f"MISMATCH: actual bit {err_pos}, detected bit {detected_pos}")

    print(f"Success: {success}/{trials} ({100 * success / trials:.2f}%)")
    if failed:
        print(f"Failed: {failed}/{trials}")


if __name__ == "__main__":
    run_test()

实验结果

使用以太网 MTU（1500 字节 = 12000 比特）报文和 CRC-32 进行 10000 轮随机测试：

Message length: 12000 bits (1500 bytes)
Trials: 10000
Lookup table: 12000 entries, CRC(0) = 0x6f246cbf
Success: 10000/10000 (100.00%)

实验结果验证了理论的正确性：在实际的以太网帧规格下，基于 CRC 三项 XOR 性质的哈希表查找能够 100% 地定位并纠正单比特错误。

扩展到多比特纠错

单比特纠错看似局限，但这一技术可以自然地推广到多比特错误场景。核心思想是将报文分块，对每一块独立使用单比特纠错。

给定报文长度 $L$ ，将其等分为 $B$ 个块，每块长度为 $\ell = L / B$ 比特，对每块计算独立的 CRC 值：

C_j = \text{crc}(\text{block}_j), \quad j = 1, \dots, B

接收方对每块执行上述单比特纠错流程。只要各块内不同时出现两个或以上的错误，整个报文即可被完全纠正。尽管当错误恰好聚集在同一块内时该方法会失效，但通过调整块大小可以在纠错能力与冗余开销之间取得平衡。

实际上，CRC 搭配 ARQ 重传机制即可在大多数场景下兼顾效率与可靠性。而本文所述的纠错方法则提供了一种额外的可选路径——当重传延迟不可接受时，利用 CRC 本身的信息冗余直接纠正错误，避免重传带来的时延代价。

总结

CRC 远不止是一个"校验"工具。得益于 $\text{GF}(2)$ 上的仿射结构及其三项 XOR 性质，对于定长报文，可以通过预计算哈希表实现 $O(1)$ 时间复杂度的单比特纠错：

三项 XOR 性质： $\text{crc}(x \oplus y \oplus z) = \text{crc}(x) \oplus \text{crc}(y) \oplus \text{crc}(z)$ 是整个纠错方案的理论基石。标准 CRC 实现中 init 和 xorout 引入的常数偏移 $C_0$ ，使得二项不成立而三项恰好成立。
差错模式独立： $S = \text{crc}(M') \oplus \text{crc}(M) \oplus \text{crc}(\mathbf{0}) = \text{crc}(e_i)$ ，与报文内容无关。
哈希表预计算： $\text{lut}[\text{crc}(e_i)] = i$ 将错误位置映射为 CRC 差值。
O(1) 纠错：接收方只需一次 CRC 计算、两次 XOR 和一次查表即可定位并纠正单比特错误。

在数据链路层协议（如停止-等待协议、滑动窗口协议）中结合此方法，可以在维持 CRC 检错能力不变的前提下，赋予系统额外的无重传纠错能力，从而优化传输效率。

解析循环冗余校验码——能检能纠的强大能力

解析循环冗余校验码——能检能纠的强大能力

从检错到纠错：重新认识 CRC

CRC 的数学本质：GF(2) 上的仿射过滤器

核心性质：三项 XOR

单比特纠错的 O(1) 算法

原理

哈希表预计算

代码验证

实验结果

扩展到多比特纠错

总结