教程式综述 · 2023.08 — 2026.05A pedagogical survey · 2023.08 — 2026.05

把一亿个 Gaussian 塞进几兆A billion Gaussians in a few megabytes

2023 年 8 月 Kerbl 等人发布 3D Gaussian Splatting 时,一个场景在硬盘上往往 1.4 GB。三年后,同一类场景能压到 3 MB 而几乎看不出区别—— 三个数量级。本文从头到尾讲清楚是怎么做到的为什么有效,以及不同方法之间的关键区别。

预设读者:略懂 NeRF、SDF、机器学习、微积分、线性代数。没碰过 3DGS 也不要紧—— 预备知识自带补课。

约 45 篇论文 · 6 个交互 demo · 多段可读伪代码 · 最后更新 2026-05-19

When Kerbl et al. released 3D Gaussian Splatting in August 2023, a scene on disk was typically 1.4 GB. Three years later, the same scene can be squeezed to 3 MB with almost no visible degradation — three orders of magnitude. This survey walks through how, why it works, and the key distinctions between methods, from the ground up.

Assumed background: basic NeRF, SDF, ML, calculus, linear algebra. If you've never touched 3DGS before, the primer chapters bring you up to speed.

~45 papers · 6 interactive demos · readable pseudo-code · last updated 2026-05-19

阅读路径建议Reading path

没碰过 3DGS:按 §1§5 顺序读完,再看 §6 的分类法,然后挑你最感兴趣的"族"跳读。
熟悉 3DGS:直接从 §6 看分类法,然后 Part 6 / Part 7 (Anchor + Entropy) 才是真正的 SOTA。
只想看最近的:跳到 Part 9 前沿。

New to 3DGS: read §1§5 in order, then the taxonomy in §6, then pick whichever family attracts you.
Familiar with 3DGS: skim §6 for the taxonomy, then go straight to Part 6/Part 7 (Anchor + Entropy) for the real SOTA.
Just want the latest: jump to Part 9 Frontier.

§13D 高斯到底是什么What's a "3D Gaussian," anyway?

先把 radiance field 这词从脑子里赶出去。Splatting 语境下的 "3D Gaussian" 就是一团飘在空间中的各向异性彩色绒球。数学上:

Forget radiance fields for a moment. A 3D Gaussian, in the splatting sense, is just an anisotropic blob of color floating in space. Mathematically:

$$ G(\mathbf{x}) = \exp\!\left(-\tfrac{1}{2}\,(\mathbf{x}-\boldsymbol{\mu})^\top\,\Sigma^{-1}\,(\mathbf{x}-\boldsymbol{\mu})\right) $$

其中 $\boldsymbol{\mu} \in \mathbb{R}^3$ 是中心,$\Sigma \in \mathbb{R}^{3\times3}$ 是对称正定的协方差, 告诉你这团绒球被拉伸成什么形状、转到哪个朝向。协方差永远参数化成 $\Sigma = R\,S\,S^\top R^\top$:$R$ 是单位四元数对应的旋转矩阵(4 floats), $S$ 是对角的各向异性 scale(3 floats)。这样优化的永远是 $q$ 和 $s$, 最后构造出来的 $\Sigma$ 结构上就一定对称正定,避免训练崩溃。

where $\boldsymbol{\mu} \in \mathbb{R}^3$ is the center, $\Sigma \in \mathbb{R}^{3\times3}$ is a symmetric positive-definite covariance saying how the blob is stretched and rotated. Covariance is always parameterized as $\Sigma = R\,S\,S^\top R^\top$ with $R$ from a unit quaternion (4 floats) and $S$ a diagonal anisotropic scale (3 floats). You optimize $q$ and $s$; the resulting $\Sigma$ is positive-definite by construction, never blowing up.

拖动旋转相机 · 拖滑块改形状Drag to orbit · slide to deform
线框是 $2\sigma$ 等密度面——高斯密度衰减到峰值 $e^{-2}$ 的位置。彩色斑点是它真正被渲染(splat)出来的样子。 The wireframe is the $2\sigma$ iso-surface — where the Gaussian's value has fallen to $e^{-2}$ of its peak. The colored blob is what gets splatted on screen.
直觉Mental model

别把 Gaussian 想成实心椭球。把它想成一团染色棉花——中心最浓、边缘逐渐稀薄。渲染时我们把成千上万这种棉球堆在一起。

Don't think of a Gaussian as a hard ellipsoid. Think of it as a fuzzy cotton ball: full density at the center, falling smoothly toward the edges. We render thousands of them stacked together.

§2"Splatting" 是怎么回事"Splatting" — what is it?

要把一个 3D Gaussian 画到屏幕上,不需要 ray marching,也不要求根。走捷径:

  1. 把 3D 中心 $\boldsymbol\mu$ 投影到屏幕得到 2D 点;
  2. 把 3D 协方差 $\Sigma$ 线性化投影到屏幕得到 2D 协方差 $\Sigma'$,这是 EWA splatting 的标准做法;
  3. 得到一个有指数衰减的 2D 椭圆,画进 framebuffer,alpha 混合。

所谓 "splat" 就是第 3 步——Gaussian 像一坨湿颜料被 到屏幕上。比起 NeRF 那种沿射线 64–256 次 MLP 查询,这便宜得离谱,所以 3DGS 能在消费级显卡上跑 100+ FPS。

一个像素的最终颜色,是所有覆盖这个像素的 Gaussian 按深度从后往前排序后,做经典的 Porter-Duff "over" 合成:

To draw a 3D Gaussian on screen, you don't ray-march or root-find. Take a shortcut:

  1. Project the 3D mean $\boldsymbol\mu$ to a 2D screen point;
  2. Project the 3D covariance $\Sigma$ via the EWA-splatting linearization to a 2D $\Sigma'$;
  3. You get a 2D ellipse with exponential falloff. Draw it into the framebuffer, alpha-blended.

That's it. Step (3) is what people mean by "splat" — the Gaussian gets splattered onto the screen like wet paint. Vastly cheaper than NeRF's 64–256 MLP queries per ray, which is why 3DGS hits 100+ FPS on consumer GPUs.

A pixel's final color comes from back-to-front depth-sorting the Gaussians touching it, then classical Porter-Duff "over" compositing:

$$ C_{\text{pixel}} = \sum_i c_i\,\alpha_i\,\prod_{j \lt i}(1-\alpha_j) $$

这就是 1980 年代图形学的 alpha 合成公式。$\alpha_i$ 既包含 Gaussian 自身存储的不透明度,也包含它在这个像素位置上的 2D 足迹。

The 1980s alpha-compositing formula. $\alpha_i$ combines the Gaussian's stored opacity with its 2D footprint at this pixel.

§3球谐函数(Spherical Harmonics)二分钟Spherical harmonics in two minutes

3DGS 渲染出来很像真照片——你转相机时高光会跟着移动——是因为每个 Gaussian 存的不是一种颜色,而是一个关于视角方向的函数:"俯视时我是亮白的,侧视时我是暗灰的。"这个函数定义在单位球面 $S^2$ 上。

球面上怎么压缩函数?跟圆上(Fourier 级数)、直线上(Taylor 级数)一回事:选个基,展开。球面上的天然基就是球谐函数 $Y_\ell^m$——它们是球面 Laplace 算子的特征函数。展开到 degree $L$,一共 $(L+1)^2$ 个基函数:

3DGS scenes look photographic — specular highlights move as you orbit — because each Gaussian doesn't store one color but a tiny function of viewing direction: "from above, I'm bright white; from the side, dim gray." That function lives on the unit sphere $S^2$.

How do you compress a function on a sphere? Same way as on a circle (Fourier series) or a line (Taylor series): pick a basis and expand. The natural basis on the sphere is spherical harmonics $Y_\ell^m$ — eigenfunctions of the spherical Laplacian. Up to degree $L$, there are $(L+1)^2$ basis functions:

degreeDegree # 基函数# bands 单通道系数Per-channel coeffs RGB 系数RGB coeffs 能表达什么What it captures
0113常数颜色(平均色)constant color (average)
13412线性梯度linear gradient
25927宽花瓣——软高光broad lobes — soft specular
371648尖花瓣——锐利高光sharp lobes — crisp highlights

原版 3DGS 用 degree 3,单个 Gaussian 仅颜色就要 48 个系数。这就是文件里的那头大象。

Vanilla 3DGS uses degree 3, so 48 coefficients per Gaussian for color alone. That's the elephant in the storage budget.

不同 SH degree 下的辐射花瓣Radiance lobe at different SH degrees
把 degree 从 3 拉到 0,球面从有视角依赖的彩色花纹变成纯色——这就是 "SH degree 截断" 的代价。后面会看到很多方法对漫反射 Gaussian 退到 degree 0,对高光 Gaussian 才保留 degree 3。 Slide degree 3 → 0, the sphere goes from view-dependent pattern to flat — the price of "SH-degree truncation." Many methods drop matte Gaussians to degree 0 and reserve degree 3 only for the shiny ones.

§4训练循环The training loop

压缩方法挂载在训练流程的不同阶段。30 行伪代码看清整个循环:

Compression methods plug into different stages of training. Here's the loop in 30 lines of pseudo-code:

def train_3dgs(cameras, photos, iters=30000):
    G = init_from_sfm_points()         # 从稀疏点云起步 / start from SfM points
    optimizer = Adam(G.parameters())

    for step in range(iters):
        cam, photo = sample(cameras, photos)
        rendered   = rasterize(G, cam)      # splatting pipeline
        loss = L1(rendered, photo) + λ * D_SSIM(rendered, photo)
        loss.backward()
        optimizer.step()

        # --- adaptive density control(魔法所在 / where the magic is) ---
        if step % 100 == 0 and step < warmup_iters:
            clone_high_gradient_gaussians(G)
            split_oversized_gaussians(G)
            prune_low_opacity_gaussians(G)

    return G   # ~3M Gaussians, ~700 MB on disk

三件事要记住:

  • 全程都是梯度下降。每个 Gaussian 的位置、scale、四元数、opacity、SH 系数都是 Adam 优化的可学参数。
  • Adaptive density control 会在训练期间动态地增长种群——梯度大的 clone/split,opacity 低的 prune。最终的 3M 数量是学出来的。
  • 光栅化对 3D 参数可导。梯度能从像素一路传到 3D Gaussian 的位置和形状——这是整套机制的工程关键。

压缩方法接入这个循环的三个口子:

  1. 训练时:改 density control 规则(Mini-Splatting, Taming-3DGS)或加 rate-aware loss(HAC, ContextGS);
  2. 训练后:剪枝 / 量化 / 重编码 .ply(LightGaussian 第三阶段、MesonGS、FlexGaussian);
  3. 推理时前馈:一个预训练网络一次性压任何场景,不重训(FCGS)。

Three things to internalize:

  • It's gradient descent the whole way. Every per-Gaussian attribute (position, scale, rotation, opacity, SH coefficients) is an Adam-optimized parameter.
  • Adaptive density control grows the population during training: high-gradient Gaussians get cloned/split, low-opacity ones get culled. The final 3M-Gaussian count is a learned outcome.
  • The image-space rasterizer is differentiable. Gradients flow from photo pixels back to 3D positions and shapes — the key engineering trick.

Compression methods hook in at three places:

  1. Train-time: change density-control rules (Mini-Splatting, Taming-3DGS) or add a rate-aware loss (HAC, ContextGS).
  2. Post-hoc: after training, prune / quantize / re-encode the .ply (LightGaussian's stage 3, MesonGS, FlexGaussian).
  3. Feed-forward at inference: one pretrained network compresses any scene, no retraining (FCGS).

§5一个 .ply 里到底有什么Anatomy of a 3DGS .ply

Inria 2023 的原版代码把场景存成 .ply——是的,90 年代的那个点云格式,借尸还魂。打开一个真正的 3DGS .ply,header 长这样:

Inria's original 2023 code saved scenes as .ply — yes, that 1990s point-cloud format, repurposed. Open a real 3DGS .ply and the header looks like this:

ply
format binary_little_endian 1.0
element vertex 3019823          # ~3M Gaussians
property float x
property float y
property float z                # position (3 floats)
property float nx
property float ny
property float nz               # unused normal — always zero
property float f_dc_0
property float f_dc_1
property float f_dc_2           # SH degree 0 ("DC" mean color)
property float f_rest_0
property float f_rest_1
...
property float f_rest_44        # SH degree 1-3 (45 floats = 15 bands × RGB)
property float opacity          # pre-sigmoid logit, 1 float
property float scale_0
property float scale_1
property float scale_2          # log-scale (3 floats)
property float rot_0
property float rot_1
property float rot_2
property float rot_3            # quaternion (4 floats)
end_header
<raw binary, 62 floats × 4 bytes × 3M Gaussians ≈ 750 MB>

连"废东西"都是真的——那三个 normal 永远是 0(.ply 旧约定),白白吃掉 ~36 MB。几个值得注意的格式细节:

  • opacity 存的是 logit,不是 [0,1] 里的概率。要取 $\alpha$ 得过 sigmoid。这样优化平滑,且 logit 可以放心 clip。
  • scale 存的是 log。实际 $\sigma$ 是 exp(scale_i)。同样为了优化友好,且强制非负。
  • SH 拆成 DC + rest。DC 是 degree 0(沿所有视角的平均色),rest 是 45 个高频系数。

Even the "wasted" stuff is real — three normal floats always zero (a .ply convention) eat ~36 MB of nothing. Format choices to note:

  • Opacity is a logit, not a [0,1] probability. Apply sigmoid for $\alpha$. Smoother optimization, and logits can be clipped freely.
  • Scales are log-scale. Actual $\sigma$ is exp(scale_i). Same reason: optimization friendliness + forced positivity.
  • SH splits into DC + rest. DC is degree 0 (average color over all directions); the rest are 45 higher-frequency coefficients.

字节都花到哪儿了

Where the bytes actually live

Storage breakdown of a 750 MB vanilla 3DGS scene 6% 6% 8% 5% 75% — Spherical Harmonics (deg 1-3) position xyz · ≈45 MB scale + rotation + opacity · ≈45 MB SH degree 0 (DC color) · ≈60 MB SH degree 1-3 residue · ≈565 MB Per-Gaussian: 3 pos + 3 scale + 4 rot + 1 opacity + 3 SH-DC + 45 SH-rest = 59 floats (236 B). × 3.2M Gaussians ≈ 750 MB raw binary. Takeaway: 3/4 of the file is high-order SH. Attacking SH alone wins most of the prize.
字节都花在 SH 上。Most bytes go into SH.

为什么会有这么多冗余?

不是格式蠢,而是 750 MB 里藏着几种不同口味的冗余,每个 family 都擅长抓其中一种:

Why so much redundancy?

Not that the format is dumb. Hiding in those 750 MB are several flavors of redundancy, and each compression family is good at exposing a different one:

冗余类型Redundancy type 含义What it means 谁来挖Who exploits it
Bit-level 每个 float 占 32 bit,但实际只要 8-12 bit。Each float is 32 bits, but really only needs 8-12. 量化(Compact3D、EAGLES、MesonGS、NSVQ)Quantization (Compact3D, EAGLES, MesonGS, NSVQ)
Vector-level 很多 Gaussian 的属性向量很接近,丢进 codebook。Many Gaussians have very similar attribute tuples; snap to a codebook. Vector Quantization (VQ)(Compact3D、C3DGS、MesonGS)Vector quantization (Compact3D, C3DGS, MesonGS)
Spatial 空间邻居的颜色 / scale / 朝向高度相关。Spatial neighbors share color / scale / orientation. Anchor + MLP(Scaffold-GS)、2D 排序(SOG)、hash-grid context(HAC)Anchors + MLPs (Scaffold-GS), image-codec sort (SOG), hash-grid context (HAC)
Functional 哑光墙面的"颜色随视角"函数是常数,degree-3 SH 完全 overkill。Matte surfaces have constant color-vs-view; degree-3 SH is overkill. SH-degree 自适应(Reduced-3DGS)、SH 蒸馏(LightGaussian)SH-degree adaptation (Reduced-3DGS), SH distillation (LightGaussian)
Population 很多 Gaussian 完全不贡献——是 density control 长出来后再没用过。Many Gaussians contribute nothing — grown by density control, never visited again. 剪枝(RadSplat、PUP、Mini-Splatting、Trimming)Pruning (RadSplat, PUP, Mini-Splatting, Trimming)
Statistical SH 系数 $\approx$ Laplace 分布,scale $\approx$ log-normal——分布形状可预测。SH coefficients $\approx$ Laplace; scales $\approx$ log-normal — predictable shapes. 熵编码(HAC、ContextGS、EntropyGS、CodecGS)Entropy coding (HAC, ContextGS, EntropyGS, CodecGS)

几乎所有现代方法都把 2–3 种冗余一起挖。一篇压缩论文的"艺术"在于:选哪些冗余,按什么顺序。

Almost every modern method exploits 2-3 of these at once. The art of a compression paper is choosing which combination, and in what order.

组合自己的压缩配方Compose your own compression recipe
原版 · 800 Gaussian · 32-bitOriginal · 800 G · 32-bit
压缩后Compressed
存活live
800 / 800
原大小baseline
— MB
当前current
— MB
压缩比ratio
估计 PSNRest. PSNR
30 dB
注意:右侧的 PSNR 是玩具启发式,不是真 PSNR。但定性走势是对的——剪枝伤几何、量化伤平滑度、SH 截断伤高光、彼此约略。真实方法各种花式手段就是避开每个失败模式。 Note: the PSNR shown is a toy heuristic, not a real PSNR. Qualitative behavior is correct though — pruning kills geometry, quantization kills smoothness, SH truncation kills highlights, they roughly compose. Real methods use cleverness to dodge each failure mode.

§6六个旋钮 · 总体分类The six knobs — overall taxonomy

3DGS.zip 综述和 IEEE 2025 综述都点明了一个微妙但有用的区分:

Both the 3DGS.zip survey and the IEEE 2025 survey draw a subtle but useful distinction:

CompactionCompression
减少 Gaussian 数量(或换更强的 primitive)。每个原语的比特数不变,只是变少了。例:pruning、GES、Mini-Splatting、Reduced-3DGS。Reduce the number of Gaussians (or substitute a stronger primitive). Bit length per primitive stays roughly fixed; there are just fewer of them. Examples: pruning, GES, Mini-Splatting, Reduced-3DGS. 数量大致不变,但 每个 Gaussian 的比特数变少。下游 renderer 可能解码回原始数量。例:量化、熵编码、SOG、anchor 重建。Same number of Gaussians (more or less), but fewer bits per Gaussian. The renderer may decode back to the original count. Examples: quantization, entropy coding, SOG, anchor-based regeneration.

两者可以同时用——大部分 SOTA 都是两者并行。下面六个旋钮把正交的杠杆分开讲,方便你想清楚怎么组合。

You can do both at once — most SOTA pipelines do. The six knobs below split the orthogonal levers so you can reason about combinations.

3DGS compression Pruning delete unimportant Gaussians Quantization fewer bits per attribute SH attack replace 48-coeff color with MLP / grid Anchors store sparse, decode dense at runtime Entropy coding arithmetic-code with a learned prior Standards PNG/HEVC/WebP/.spz reuse what exists LightGaussian Mini-Splatting · RadSplat PUP · Trimming Taming · GaussianSpa Compact3D (Navaneet) EAGLES · MesonGS C3DGS (Niedermayr) NSVQ · FlexGaussian Compact-3DGS (Lee) SG-Splatting F-3DGS · GES EntropyGS · Reduced-3DGS Scaffold-GS Octree-GS GaussianForest IGS · CompGS (Liu) Smol-GS HAC · HAC++ ContextGS FCGS · PCGS CodecGS · LocoGS SOG (PLAS sort) PlayCanvas SOGS .spz (Niantic) .ksplat · glTF KHR Most modern SOTA methods (HAC++, ContextGS, CodecGS) combine 3+ families at once.

六个旋钮简述

① Pruning — "扔掉没用的"。最直接。训练后常能扔掉 80-90% 还几乎无损,得益于 adaptive density control 通常过头。问题在于怎么打分:opacity、ray hit、Hessian 行列式、view-frustum $\max \alpha \cdot \tau$……纯 pruning 一般能换 $5\text{-}10\times$。

② Quantization — "每个数少几个比特"。Float32 对几乎所有属性都 overkill。三种风味:scalar / vector (VQ) / 学到的 latent。一般能在 pruning 之上再换 $5\text{-}10\times$。

③ SH attack — "75% 子问题"。SH 占文件的 3/4,三个子策略——每个 Gaussian 自适应 degree、distillation 到低 degree、整个换成 hash grid / SG / 因子分解。每招都是高杠杆。

④ Anchors — "从稀疏 anchor 解码出密集 Gaussian"。Scaffold-GS 的开山之举:稀疏 anchor + 小 MLP 在推理时生成邻近的 Gaussian。这是文献里最具结构性的改写,绝大多数 SOTA 都建立在 Scaffold-GS 之上。

⑤ Entropy coding — "按惊讶程度花比特"。Shannon 的老想法,结合学到的概率先验(hash-grid hyperprior、自回归 context、SH 的 Laplace 闭式分布……)。和 Anchor 联手是当前 SOTA。

⑥ Industry formats — "用现成的"。SOG / SOGS(图像编码器)、SPZ(quantize + gzip)、glTF KHR_gaussian_splatting(Khronos 2026 标准)。研究方法和工业部署的桥梁。

The six knobs at a glance

① Pruning — "delete what doesn't matter." Most direct. Post-training you can usually drop 80-90% with no perceptible loss because adaptive density control overshoots. The interesting question is the scoring function: opacity, ray hits, Hessian log-determinant, $\max \alpha \cdot \tau$… Pure pruning typically buys $5\text{-}10\times$.

② Quantization — "fewer bits per number." Float32 is overkill for nearly every attribute. Three flavors: scalar / vector (VQ) / learned latent. Adds another $5\text{-}10\times$ on top of pruning.

③ SH attack — "the 75% sub-problem." SH dominates 3/4 of the file. Three sub-strategies: per-Gaussian adaptive degree, distillation to lower degree, full replacement with hash grid / SGs / factorized. Highest leverage attack in the family.

④ Anchors — "decode dense from sparse." Scaffold-GS's foundational move: sparse anchors + tiny MLP generate nearby Gaussians at render time. The most architectural reformulation in the literature; most SOTA papers build on the Scaffold-GS backbone.

⑤ Entropy coding — "spend bits proportional to surprise." Shannon's old idea, combined with learned priors (hash-grid hyperpriors, autoregressive context, closed-form Laplace for SH...). Combined with anchors, this is what gets you below 10 MB without quality loss.

⑥ Industry formats — "reuse what works." SOG/SOGS (image codecs), SPZ (quantize + gzip), glTF KHR_gaussian_splatting (Khronos 2026 standard). The bridge between research methods and real deployment.

哪些组合真的赢

Which combinations actually win

配方Recipe 代表Examples Mip-NeRF 360 典型大小Typical Mip-NeRF 360 size
剪枝 + 标量量化Prune + scalar quantLightGaussian, Trimming, FlexGaussian30–70 MB
剪枝 + VQ + SH 蒸馏Prune + VQ + SH distillationLightGaussian (full), Compact3D18–45 MB
SH 换成 hash grid + 残差 VQSH→hash grid + residual VQCompact-3DGS (Lee), EAGLES20–50 MB
排到 2D 网格 + 图像编码Sort into 2D grid + image codecSOG, PlayCanvas SOGS16–40 MB
Anchor + 熵编码 (hash-grid context)Anchors + entropy (hash-grid context)HAC, HAC++, CompGS (Liu)6–16 MB
Anchor + 自回归熵Anchors + AR entropyContextGS, PCGS7–13 MB
Feature plane + HEVCFeature planes + HEVCCodecGS~10 MB
激进压缩 + 扩散修复Aggressive compress + diffusion restoreExGS / Zip-GS, NiFi1–4 MB (+扩散模型)
几条经验Rules of thumb
  • 别把 position 量化得太狠。1mm 偏移渲染就会很不一样。所有方法都把 position 保留在 $\geq 16$ bit,或用别的机制(anchor offset、octree code)。
  • VQ + 熵编码几乎总比 VQ 单独强。index 分布从来不均匀,套个 arithmetic coder 再省 $1.5\text{-}2\times$。
  • Anchor + 图像编码排序不好叠,anchor 已经不是规整网格。一种结构性改写挑一个。
  • 扩散修复在另一个维度。"解码器是大生成模型"换"splat 文件超小"。手机上能不能跑得动是另一回事。
  • Don't quantize position aggressively. 1mm offsets render very differently. All methods keep position at $\geq 16$ bits, or store it differently (anchor offsets, octree codes).
  • VQ + entropy coding $\gg$ VQ alone. Index distributions are never uniform; throw an arithmetic coder on top and shave another $1.5\text{-}2\times$.
  • Anchors and image-codec sort don't stack well — anchors aren't a regular grid. Pick one structural reformulation.
  • Diffusion restoration is a different regime. You trade "decoder is a giant generative model" for "splat file is tiny." Whether that wins depends entirely on deployment.

§7大小–质量 Pareto 一张图The size–PSNR Pareto in one chart

在切进每个 family 之前,先看全景:Mip-NeRF 360 上每个方法的位置。横轴是文件大小(log scale,越左越好),纵轴是 PSNR(越上越好)。悬浮看名字:

Before diving into individual families, the lay of the land. Each method on Mip-NeRF 360 — x is size (log scale, lower-left is better), y is PSNR (higher is better). Hover for names:

悬浮显示Hover for details
几个观察点:原版 3DGS 坐在 ~734 MB / 27.4 dB;HAC++、ContextGS、CodecGS 在 3-10 MB 还能匹配甚至超过 baseline——$70\text{-}100\times$ 已经"现役";SOG/SOGS 略大一点但是工业部署版;FCGS 一族在右边角,付出大小代价换"不用每场景重训"。 A few observations: vanilla 3DGS sits at ~734 MB / 27.4 dB; HAC++, ContextGS, CodecGS hover at 3–10 MB while matching or exceeding the baseline — $70\text{--}100\times$ already in production; SOG/SOGS is a notch larger but is what production runtimes actually ship; FCGS sits at the right edge of the entropy cluster, paying a size premium to skip per-scene training.

§Part 3剪枝(Pruning)—— 哪些 Gaussian 可以直接删Pruning — which Gaussians can we just delete?

原版 3DGS 在训练时会不断增长 Gaussian 数量:哪里梯度大就 clone 或 split,故意"宁多勿少",期待 opacity 阈值把无用的清理掉。问题是没清理干净——最终的 3M Gaussian 里只有一小部分真的扛着画面,其余的太暗、太小、或者藏在别的后面。

压缩问题就简化成打分函数的设计。

Vanilla 3DGS grows the Gaussian population during training: anywhere the loss has a high gradient, the densifier clones or splits. Intentionally aggressive — better too many than too few, hoping the opacity prune kills the freeloaders. Except it doesn't, really. The final 3M-Gaussian count includes a long tail of dim, tiny, or occluded ones that contribute nothing.

The compression question reduces to designing a scoring function.

不同打分函数下扔掉了什么What different scoring functions throw away
"随机"几乎立刻就把轮廓打废了,opacity 单独则会先丢高频。真正好的打分函数试图在不算 Hessian 的前提下逼近 Hessian。 Random destroys the silhouette before anything else; opacity-only gives up high-frequency detail first. The art is approximating the loss-Hessian without computing it.
LightGaussian — Unbounded Compression for 3DGS
NeurIPS 2024 Spotlight Fan et al. · UT Austin / Nvidia · arXiv:2311.17245 · project · code

原版 .ply 太大;既要剪、又要把剩下的 SH 颜色压扁。 The raw .ply is too big; we want to both prune and shrink the SH on what's left.

关键想法 三阶段流水线: (1) global significance 打分剪枝——按 opacity $\times$ hit count $\times$ volume 给每个 Gaussian 一个分数,砍掉低分; (2) SH 蒸馏——以全 SH 模型为 teacher,用 pseudo-view 训一个 degree 3→2 的 student; (3) VecTree 量化——Morton-order octree on positions、K=8192 codebook on lowest-significance SH、float16 on geometry。

体积归一化(除以 90 分位数)是关键创新:没有它,背景那种"覆盖很多 ray 但没编码任何细节"的大 blob 会主导分数。

Key idea A three-stage pipeline: (1) global-significance pruning by opacity $\times$ hit count $\times$ volume; (2) SH distillation—a full-SH teacher supervises a degree 3→2 student on pseudo-views; (3) VecTree quantization—Morton-order octree on positions, K=8192 codebook on lowest-significance SH, float16 on geometry.

The volume normalization (by the 90th-percentile Gaussian volume) is the load-bearing innovation: without it, the score is dominated by background blobs that cover many rays but encode no detail.

大小size 727 → 42 MB (Mip-NeRF 360, $\sim\!17\times$)
PSNR 29.13 → 28.45 ($-0.68$ dB)
FPS 139 → 215
canonical 三段式:剪 → 蒸馏 → 量化。后续几乎每篇都是在和它对比。 The canonical "prune → distill → quantize" template that almost every later paper compares against.
Mini-Splatting — Representing Scenes with a Constrained Number of Gaussians
ECCV 2024 Fang & Wang · arXiv:2403.14166

很多 Gaussian 长得太大,在高频区域抹成一片,但 density control 已经罢工了。 Many Gaussians are oversized blurry blobs that smear high-frequency regions; density control has given up.

关键想法 不是,而是重新分布。三段式:

  • Blur split:屏幕投影超过阈值就强制 split——这些就是高频区里的"花猫"。
  • Depth re-initialization:射线-椭球求交得到稠密深度,从深度点重新撒 Gaussian 填几何洞。
  • Probabilistic simplification:按 blending weight 概率保留,比硬阈值更好地保住覆盖统计。

Key idea Don't just deleteredistribute. Three stages:

  • Blur split: any Gaussian whose screen footprint exceeds a threshold is force-split — these are the smeared "cat hair" blobs in high-freq regions.
  • Depth re-init: ray-ellipsoid intersection gives dense depth; reseed Gaussians from depth points to fill geometry holes.
  • Probabilistic simplification: keep with probability $\propto$ blending weight. Preserves coverage statistics better than hard thresholding.
3.35M → 0.49M Gaussians ($\sim\!7\times$)
PSNR 27.47 → 27.34
Mini-Splatting-D 变体在 Mip-NeRF 360 的 SSIM/LPIPS 上甚至超过 Zip-NeRF——更少的 Gaussian,更好的渲染。 Mini-Splatting-D beats Zip-NeRF on Mip-NeRF 360 SSIM/LPIPS — fewer Gaussians, better renders.
RadSplat — Radiance Field-Informed Gaussian Splatting
3DV 2025 Oral Niemeyer et al. · Google · arXiv:2403.13806 · project

用噪声照片做监督,Gaussian 会浪费容量去拟合噪声。 Training against noisy photos makes Gaussians waste capacity modeling the noise itself.

关键想法 先训一个 Zip-NeRF 作为 teacher 兼初始化,再训 Gaussians 拟合 teacher 的"干净渲染",最后用"max-contribution" 剪枝。

Max(不是 sum!)是关键:"只从一侧能看到"的 Gaussian 不应被惩罚——那一侧也许就是它存在的全部理由。score $h(p_i) = \max_r \alpha_i^r \tau_i^r$,其中 $\tau$ 是从前面累积的透过率。

Key idea Train a Zip-NeRF first as both teacher and initializer; then fit Gaussians to the teacher's clean renders; finally prune by max-contribution.

Max — not sum — is essential: a Gaussian seen only from one side shouldn't be penalized; that side might be the whole reason it exists. Score $h(p_i) = \max_r \alpha_i^r \tau_i^r$ over all training rays.

Mip-NeRF 360 PSNR 28.14 (vs 3DGS 27.20)
3.16M → 0.37M Gaussians (lightweight)
907 FPS
唯一一个在大幅剪枝的同时提升了 PSNR 的方法。"NeRF 当 teacher"自此变成标准技巧。 The only method that improves PSNR while pruning aggressively. "NeRF as teacher" became a standard pattern.
Trimming the Fat — Efficient Compression of 3D Gaussian Splats
BMVC 2024 Ali et al. · arXiv:2406.18214

想用 3DGS 已经在算的信号做剪枝,不额外开销。 Want to reuse signals 3DGS already computes — no extra forward passes.

关键想法 双信号:保留当 $|\alpha|$ 和 $|\nabla\alpha|$ 大于 $\gamma$ 分位数时才留。低 opacity = 不可见;低梯度 = loss 不再关心它。两者都低意味着完全无用。迭代剪 + 短期 fine-tune。

Key idea Dual signal: keep Gaussian iff $|\alpha|$ AND $|\nabla\alpha|$ both exceed the $\gamma$-quantile. Low $\alpha$ = invisible; low gradient = loss no longer cares. Both low $\Rightarrow$ truly useless. Iterative prune + brief fine-tune.

734 → 119 MB at $\gamma=0.5$ ($\sim\!6\times$)
激进模式aggressive $\sim\!50\times$
600 FPS
几乎免费的剪枝(两个信号都本来就有)。文献里最便宜的默认 baseline。 Nearly free pruning — both signals already exist. The cheapest baseline in the literature.
PUP 3D-GS — Principled Uncertainty Pruning
CVPR 2025 Hanson et al. · arXiv:2406.10219 · project

opacity heuristic 不知道"这个 Gaussian 对 loss surface 究竟多重要"。 Opacity heuristics don't actually tell you how much a Gaussian matters to the loss landscape.

关键想法 直接计算每个 Gaussian 关于空间参数的 L2 loss 的 Hessian 的 log-determinant。高分意味着 loss 在那个 Gaussian 周围"很陡"——它确实重要。$U_i = \log \det\!\left(\nabla_{x,s} I_G \cdot \nabla_{x,s} I_G^\top\right)$。在收敛模型上这等价于 Fisher information 限制到空间参数。一次性剪 90%,再短期 fine-tune。

Key idea Compute the log-determinant of the Hessian of the L2 loss w.r.t. each Gaussian's spatial parameters. High score = loss is sharp around this Gaussian — it really matters. $U_i = \log\det(\nabla_{x,s}I_G \cdot \nabla_{x,s}I_G^\top)$. At a converged model this equals Fisher information restricted to spatial params. One-shot 90% prune + brief fine-tune.

PSNR 26.67 @ 90% prune (vs LightGaussian 26.28)
746 → 74.65 → 14.44 MB (+VecTree, $\sim\!51\times$)
204 FPS
理论上最干净的打分——直接来自二阶优化。代价是 per-Gaussian Hessian 计算比 opacity 贵。 The most theoretically grounded score in the family — straight from second-order optimization. Trade-off: per-Gaussian Hessian is more expensive than opacity heuristics.
Taming 3DGS — High-Quality Radiance Fields with Limited Resources
SIGGRAPH Asia 2024 Mallick et al. · arXiv:2406.15643 · project

既然密度控制总过头,能不能预防过头,而不是事后剪? If density control always overshoots, can we prevent the overshoot instead of pruning after?

关键想法 训练时带预算地控制密度。综合 score = 位置梯度$\times 50$ + opacity$\times 100$ + blending weight$\times 50$ + 距中心$\times 50$ + view saliency$\times 10$ + scale$\times 25$ + pixel coverage$\times 0.1$ + depth$\times 5$。每轮只让 top-score 的 Gaussian clone/split,直到用户给定的预算。Gaussian 数量从不爆炸。

Key idea Budget-aware density control during training. Composite score combines positional gradient$\times 50$ + opacity$\times 100$ + blending weights$\times 50$ + distance-to-center$\times 50$ + view saliency$\times 10$ + scale$\times 25$ + coverage$\times 0.1$ + depth$\times 5$. Each round only the top-scoring Gaussians clone/split until the user budget is hit. Population never explodes.

0.63M Gaussians vs 3.31M baseline ($\sim\!5\times$)
PSNR 27.31 vs 27.46
训练时间train 11 min vs 43 min
"预防性医疗"而非"事后开刀"。教学上很重要:压缩可以从训练时就开始。 Preventive medicine instead of post-hoc surgery. Pedagogically: compression can begin at training time, not just at the end.
GaussianSpa — Sparse 3D Gaussian Splatting via Optimization-based Pruning
arXiv 2024 Zhang et al. · arXiv:2411.06019

关键想法 把"Gaussian 数量"当成稀疏优化问题——加一个 $\ell_0$-like 罚项约束"存在性",ADMM 风格交替优化。$10\text{-}15\times$ 压缩、几乎无 quality loss。

Key idea Frame Gaussian-count reduction as a sparse optimization problem with an explicit $\ell_0$-like penalty on Gaussian "existence." ADMM-style alternating optimization. $10\text{--}15\times$ compression, no quality loss.

Mip-NeRF 360 ~17 MB / 27.4 dB
把经典稀疏优化机器(LASSO、ADMM)搬到 3DGS。"压缩 $\equiv \ell_0$ 正则化"是个统一框架。 Brings classical sparse-optimization machinery (LASSO, ADMM) to 3DGS. "Compression as $\ell_0$ regularization" is a unifying view.
实战配方Practical recipe
# 在常规 3DGS 训练之后 / after vanilla 3DGS training:
score = compute_score(G, training_rays)   # max α·τ (RadSplat) 或 PUP Hessian
keep_mask = score >= percentile(score, 85)
G = G[keep_mask]
finetune(G, photos, iters=5000)    # 让幸存者来补偿 / let survivors compensate

5k 步 fine-tune 不能省——即使 Hessian-based 完美剪枝,幸存者也能在 10 分钟内吸收掉残余误差。The 5k-step fine-tune is essential — even a perfect Hessian-based prune leaves a few percent on the table that survivors absorb in <10 min.

单独剪枝的天花板The limits of pruning alone

$\sim\!10\times$ 之后就边际递减。每个 Gaussian 还是 236 字节,扔 $10\times$ 也就省 $10\times$。再往下走必须攻击 每个 Gaussian 的比特预算——量化、SH 压缩、或 anchor 改写。下面接着讲。

Diminishing returns past $\sim\!10\times$. Each Gaussian still costs 236 bytes; cut $10\times$, save $10\times$. To go further you must attack the per-Gaussian bit budget — quantization, SH compression, or anchors. Continue below.

§Part 4量化(Quantization)—— 每个数少花几个比特Quantization — fewer bits per number

把连续(或高比特)值映射到更小的离散字母表,就是量化。3DGS 上有四种风味:

Quantization maps a continuous (or high-bit) value to a smaller discrete alphabet. For 3DGS, four flavors dominate:

类型Type做法How用在哪Used for
Scalar (SQ)每个 float 独立量化到 N bitRound each float independently to N bitsopacity, log-scale, 四元数分量quaternion components
Vector (VQ)K-means 聚类后用 index 替代K-means clustering, replace with indexSH coefficients, 完整协方差full covariance
Residual VQ (RVQ)codebook 级联,每级编码上一级残差Cascade of codebooks; each stage codes the residualCompact-3DGS (Lee) 的几何geometry
Latent-space学一个 encoder/decoder,量化中间 latentLearn an encoder/decoder, quantize the latentEAGLES
Compact3D (Navaneet) — Compressing Gaussian Splat Radiance Fields
ECCV 2024 Navaneet et al. · UCDvision · arXiv:2311.18159 · code

⚠ 命名混乱:这篇不是 Lee 等人的 Compact-3DGS,也不是 Liu 等人的 CompGS。 ⚠ Naming mess: this is not the Lee Compact-3DGS, nor the Liu CompGS.

关键想法 在 SH 系数和协方差上做 K-means 矢量量化(QAT,训练后期重跑 K-means)。两个 codebook:4096 codes for color (SH), 16384 codes for covariance (scale+rotation)。位置和 opacity 量化——太敏感。Index 按 Morton 顺序排序后 RLE。

相邻 Gaussian 落在同一 cluster 的概率高,于是 RLE 自然有长 run。

Key idea K-means VQ on SH coefficients and covariance, jointly with training (re-run K-means during the last 10K iters). Two codebooks: 4096 codes for color (SH), 16384 codes for covariance (scale+rotation). Position and opacity are not quantized — too sensitive. Indices Morton-sorted then RLE'd.

Adjacent Gaussians fall into the same cluster, so RLE gets long runs for free.

Mip-NeRF 360 778 → 19 MB ($\sim\!41\times$)
PSNR 27.42 → 27.12
$2.5\times$ faster rendering
Compact-3DGS (Lee) — Compact 3D Gaussian Representation for Radiance Field
CVPR 2024 Lee et al. · arXiv:2311.13681 · project

SH 占 75% 文件——能不能整个换成共享神经场? SH dominates 75% of the file — can we replace it entirely with a shared neural field?

关键想法 完全抛弃 SH。view-dependent color 从 共享 Instant-NGP 风格 hash grid 查询:以 Gaussian 位置 + 视角方向送入小 MLP 得到 RGB。scale & rotation 用 Residual VQ 级联 codebook;opacity 用 8-bit scalar;hash-grid 参数用 8-bit + Huffman。

RVQ:256-code codebook 抓粗几何,第二个 256-code 抓残差……4 级 = 32 bit,但表达力等价于 $256^4 \approx 4$ G 条目的单 codebook。

Key idea Drop SH entirely. View-dependent color comes from a shared Instant-NGP-style hash grid queried at the Gaussian's position + view direction through a tiny MLP. Scale & rotation get Residual VQ; opacity gets 8-bit scalar; hash-grid params get 8-bit + Huffman.

RVQ trick: a 256-code book captures coarse geometry, a second 256-code book captures the residual… 4 stages = 32 bits but expressive equivalent to a $256^4 \approx 4$-billion-entry single codebook.

$\gt 25\times$ compression on Mip-NeRF 360
~28–48 MB 取决于配置depending on settings
文献里结构最激进的量化论文。"用 hash grid 共享 appearance"这条思路改变了之后所有人怎么看 SH。 The most structurally aggressive quantization paper. The "shared hash grid for appearance" thread changed how everyone thinks about SH.
EAGLES — Efficient Accelerated 3D Gaussians with Lightweight Encodings
ECCV 2024 Girish et al. · arXiv:2312.04564 · project

关键想法 不是量化原始属性,而是量化一个学到的 latent,再用小 MLP 解码回属性。 SH color → 16-D latent;rotation → 8-D;opacity → 1-D。position、scale、SH-DC 不动(太敏感)。forward 时 round 到整数,用 straight-through estimator 让梯度通过。

类比:VQ 是字典查找,EAGLES 是 autoencoder——用小 MLP 换"低比特 latent 仍能覆盖整个属性流形"。

Key idea Don't quantize raw attributes — quantize a learned latent, decode with a small MLP. SH color → 16-D latent; rotation → 8-D; opacity → 1-D. Position, scale, SH-DC stay full precision (too sensitive). Round latents in the forward pass; use straight-through estimator for backprop.

Analogy: VQ is a dictionary lookup; EAGLES is an autoencoder — trade a small MLP for low-bit integer latents that still cover the full attribute manifold.

Mip-NeRF 360 PSNR 27.15 ($-0.3$ dB)
量化属性quantized attrs: 211 → 6 MB ($\sim\!35\times$)
整体场景whole scene $10\text{--}20\times$
Compressed 3DGS (Niedermayr) — Sensitivity-Aware Vector Quantization
CVPR 2024 Niedermayr et al. · arXiv:2401.02436 · project

关键想法 标准 K-means 把每个维度等权重对待,但引起渲染明显变化的属性槽位应该更靠近 codebook entry。用 sensitivity = 渲染对参数的梯度,做敏感度加权 K-means,然后 QAT。

Key idea Standard K-means weights all dimensions equally. But a slot whose change visibly distorts rendering should be closer to its codebook entry. Compute per-parameter sensitivity $= \partial\text{image}/\partial\text{param}$, weight K-means by it. Then quantization-aware fine-tune.

最高Up to $31\times$ 压缩compression (avg $26\times$)
Bicycle: 1.5 GB → 47 MB
Truck: 600 MB → 21 MB
$\sim\!4\times$ faster rendering
敏感度加权 = importance sampling 的自然推广。RadSplat 加权 Gaussian,这篇加权属性槽位 Sensitivity-weighting = natural generalization of importance sampling. RadSplat weights Gaussians, this weights attribute slots.
MesonGS — Post-training Compression of 3D Gaussians via Eulerian + RAHT
ECCV 2024 Xie et al. · arXiv:2409.09756 · project

关键想法MPEG 点云压缩(G-PCC)那边借工具——RAHT 小波变换、四元数→欧拉角(3 数字代替 4)、块标量量化 + 短期 fine-tune。

RAHT (Region Adaptive Hierarchical Transform) 是点云编码标准里的层级小波——把空间相关的属性集中到 DC + 低熵 AC,跟 JPEG 的 DCT 思路一致,只是适配在不规则点云上。

Key idea Borrow tools from MPEG point-cloud compression (G-PCC): RAHT wavelet transform, quaternion→Euler (3 numbers instead of 4), block scalar quantization + brief fine-tune.

RAHT (Region Adaptive Hierarchical Transform) is the layered wavelet from PCC standards — concentrates spatially-correlated attributes into DC + low-entropy AC residuals, like JPEG's DCT but on irregular point sets.

Mip-NeRF 360: 641.7 → 27.6 MB ($23.2\times$)
PSNR 28.98 → 28.61 ($-0.37$)
T&T: 421.9 → 17.0 MB, PSNR drop $-0.04$
把 3DGS 接上成熟的点云压缩世界——跨领域借力的好例子。 Bridges 3DGS to the mature point-cloud compression world — a clean cross-pollination moment.
NSVQ — Noise-Substituted Vector Quantization
arXiv 2025 · arXiv:2504.03059

关键想法 训练 codebook 时不用 straight-through estimator,而是在前向注入和量化等价的噪声作为代理。梯度自然流过,推理时换回真正的量化。

这是一个小但重要的训练 trick:经典 VQ 训练有先有鸡蛋还是先有鸡的问题(梯度过不了 arg-min);STE 有偏;matched-noise 无偏且实际表现强很多。

Key idea Don't use straight-through estimator. Instead, inject matched noise as a surrogate for the quantization step during training. Gradients flow naturally; at inference, the same codebook quantizes for real.

A small but important trick: classical VQ training has a chicken-and-egg problem (gradients can't flow through arg-min); STE is biased; matched-noise substitution is unbiased and dramatically better in practice.

最高Up to $45\times$ model-size reduction
Reduced-3DGS — Reducing the Memory Footprint of 3D Gaussian Splatting
High-Perf Graphics 2024 Papantonakis et al. · arXiv:2406.17074

关键想法 每个 Gaussian 拥有自己的 SH degree(0/1/2/3)。哑光墙上的只需 degree 0(一个 RGB triple),高光上的保留 degree 3。然后对残余系数做 codebook 量化。

Key idea Each Gaussian gets its own SH degree from {0,1,2,3}. Matte Gaussians on a wall only need degree 0 (one RGB triple); shiny ones keep degree 3. Then codebook-quantize the remainder.

$\sim\!27\times$ 总压缩total compression
PSNR drop: 0.21 dB
"不是所有像素都该分到相同比特"的最干净教学例子。后面 anchor 和 SH 章里都会再见。 Cleanest example of "not all pixels deserve the same bitrate." You'll see this idea again in the Anchor and SH chapters.
FlexGaussian — Training-free, On-device 3DGS Compression
ACM MM 2025 · arXiv:2507.06671

关键想法 给一个刚训练好的 3DGS 文件,几秒内压完,不用 fine-tune。属性自适应剪枝 + INT8/INT4 channel-wise 混合精度量化 + 在线适配。手机可部署。

Key idea Given a freshly-trained 3DGS file, compress it in seconds, no fine-tuning. Attribute-discriminative pruning + INT8/INT4 channel-wise mixed-precision quantization + online adaptation. Mobile-deployable.

最高Up to 96.4% 压缩reduction
<1 dB PSNR drop
文献里最快的"实战编码器"——秒级出可用文件。Capture-then-share 场景的关键。 The fastest practical compressor — usable file in seconds. Critical for capture-then-share workflows.
量化章的几个教训Lessons from the quant chapter
  • position 是圣域——上面每篇都把 position $\geq 16$ bit 或换成 anchor offset / octree code。
  • VQ + QAT 几乎总更好——在训练最后 10-20% 重跑 K-means,让网络补偿误差。
  • 给 index 套上熵编码是免费午餐——VQ index 从来不均匀。再省 $1.5\text{-}2\times$。
  • SH 是大头。本章里大部分增益都来自把 SH 压得更聪明。下一章专门讲。
  • Position is sacred — every paper keeps it $\geq 16$ bits or replaces with anchor offset / octree code.
  • VQ + QAT is almost always worth it — re-running K-means in the last 10-20% of training lets the network compensate.
  • Entropy-coding indices is free lunch — VQ indices are never uniform. Another $1.5\text{-}2\times$ for free.
  • SH dominates everything. Most gains in this chapter come from smarter SH compression. The SH chapter is next.

§Part 5球谐压缩 —— 75% 子问题Spherical Harmonics — the 75% sub-problem

回忆:每个 Gaussian 48 个 SH 系数($16 \text{ bands} \times 3 \text{ RGB}$),3M 个 Gaussian 加起来 ~570 MB——~75% 的文件。在 SH 上做任何工作都有 4-5 倍的杠杆。

有意思的是大多数系数都接近 0。大部分表面接近漫反射——degree 0 就够了。只有高光和视角依赖反射才真的需要高 band。

Recall: $48 \text{ SH coefficients per Gaussian} \times 3\text{M Gaussians} = $~570 MB — ~75% of the file. Anything you do to SH has $4\text{-}5\times$ the leverage of equivalent work elsewhere.

Interestingly, most coefficients are near zero. Most surfaces are nearly diffuse — degree 0 is enough. Only specular highlights and view-dependent reflections actually need the high bands.

Distribution of SH coefficient magnitudes (typical scene) |coeff| = 0 large ↖ Laplace distribution — what EntropyGS exploits
高 band SH 系数严重集中在 0 附近——天然适合熵编码或激进剪枝。Higher-band coefficients are peaked near zero — perfect for entropy coding or aggressive pruning.

五条策略,按激进程度递增:

  1. Per-Gaussian degree adaptation — 每个 Gaussian 自己的 degree
  2. SH distillation — 高 degree teacher 蒸馏到低 degree student
  3. 用 neural field 替代 SH — 整个换成共享 hash grid
  4. 换基 — 用 Spherical Gaussian lobe / 因子分解
  5. 对 SH 做参数化熵编码 — 直接利用 Laplace 结构

Five strategies, in increasing aggressiveness:

  1. Per-Gaussian degree adaptation
  2. SH distillation from high-degree teacher to low-degree student
  3. Replace SH with a shared neural field
  4. Change the basis — Spherical Gaussian lobes or factorized
  5. Parametric entropy code — exploit the Laplace shape
Strategy 1: Reduced-3DGS

已在 §Part 4 详细介绍。要点:每个 Gaussian 自适应 degree 0/1/2/3 + codebook 量化剩余。$\sim\!27\times$ 总压缩,PSNR 仅降 0.21 dB。

Covered in §Part 4. Highlight: per-Gaussian adaptive degree 0/1/2/3 + codebook quantize the rest. $\sim\!27\times$ compression at only 0.21 dB drop.

Strategy 2: LightGaussian SH distillation

已在 §Part 3 详细介绍。SH 蒸馏部分:以全 SH 模型为 teacher,pseudo-view jitter 训一个低 degree student。学生学着用低 band 系数"假装"高 band 的高光效果。

跟 Hinton 等人 2015 经典知识蒸馏一回事——只是换到 view-dependent appearance 上。

Covered in §Part 3. SH distillation: take the full-SH model as teacher, train a low-degree student on pseudo-view-jittered renderings. The student learns to fake the high-band specular look with limited coefficients.

Same trick as classical Hinton-style knowledge distillation, ported to view-dependent appearance.

Strategy 3: Compact-3DGS (Lee) — SH → hash grid

已在 §Part 4 详细介绍。要点:完全抛弃 SH,view-dependent color 从共享 hash grid 查询 + 小 MLP 解码。每 Gaussian SH 成本:0 floats。共享成本:~30 MB hash grid + ~100 KB MLP,摊到几百万 Gaussian 上。

本质上 = NeRF appearance + 3DGS geometry。两个世界融合最干净的一例。

Covered in §Part 4. Drop SH entirely; view-dependent color comes from a shared hash grid + tiny MLP. Per-Gaussian SH cost: 0 floats. Shared: ~30 MB hash grid + ~100 KB MLP, amortized over millions of Gaussians.

Effectively = NeRF appearance + 3DGS geometry. The cleanest fusion of the two worlds.

GES — Generalized Exponential Splatting
CVPR 2024 Hamdi · arXiv:2402.10128

关键想法 偷换 primitive:把 $\exp(-x^2)$ 换成 $\exp(-|x|^\beta)$,$\beta$ 可学。锐边能用更少 primitive 表示 → 间接省 SH。

严格说这不是 SH 压缩,是"primitive design 改了之后 SH 总量随之减少"。常和 SH 方法一起讨论但更属于 compaction 维度。

Key idea Swap the primitive: $\exp(-x^2)$ → $\exp(-|x|^\beta)$ with learnable $\beta$. Sharper edges need fewer primitives → indirectly saves SH.

Strictly speaking this isn't SH compression — it's "primitive redesign that shrinks the SH footprint as a side effect." Often filed with SH methods but really belongs to the compaction axis.

Mip-NeRF 360: 734 → 377 MB
186 FPS vs 134
F-3DGS — Factorized 3D Gaussian Splatting
ACM MM 2024 Sun et al. · arXiv:2405.17083

关键想法 把 $(N_\text{Gaussian} \times N_\text{attr})$ 大矩阵当成低秩,CP 分解(rank $\sim\!16$)从几个轴向因子向量恢复每个 Gaussian 的属性。

TensoRF 一脉的延续——3D tensor 可分解成 1D 向量外积之和。3DGS 的 tensor 是 "Gaussian $\times$ attribute",rank 远小于两个维度。

Key idea Treat the giant $(N_\text{gaussians} \times N_\text{attr})$ matrix as low-rank. CP decomposition (rank $\sim\!16$) reconstructs each Gaussian's attributes from a few axis-wise factor vectors.

The TensoRF lineage: a 3D tensor factorizes as a sum of outer products of 1D vectors. For 3DGS, the tensor is "Gaussian $\times$ attribute"; rank is much smaller than either dimension.

~4–7 MB 每个场景per scene
SG-Splatting — Spherical Gaussian lobes
arXiv Jan 2025 Wang et al. · arXiv:2501.00342

关键想法 用 ~10 个 Spherical Gaussian lobe 替换 48 个 SH 系数。一个 SG 是 $\exp(\lambda(\mathbf{d}\cdot\boldsymbol\mu - 1))$ —— 球面上以 $\boldsymbol\mu$ 为中心、$\lambda$ 为锐度的方向性 Gaussian。

SH 是全局基(每个 band 在整个球面上有支撑),SG 是局部基(只在中心方向附近"亮")。对典型场景外观(少量高光 + 大部分漫反射),SG 是更紧凑的表示。

2000 年代图形学(Tsai & Shih 2006)就用 SG 做 precomputed radiance transfer,3DGS 时代被重新发掘。

Key idea Replace 48 SH coefficients with ~10 Spherical Gaussian lobes. An SG is $\exp(\lambda(\mathbf{d}\cdot\boldsymbol\mu - 1))$ — a directional Gaussian on the sphere centered at $\boldsymbol\mu$ with sharpness $\lambda$.

SH is a global basis (each band has support over the whole sphere); SGs are local (each lobe only lights up near its center). For typical appearance (few highlights, mostly diffuse), SGs are a much more compact representation.

2000s graphics already used SGs for precomputed radiance transfer (Tsai & Shih 2006); this is the rediscovery in the 3DGS era.

48 → 10 颜色参数color params ($\sim\!5\times$)
EntropyGS — Parametric Entropy Coding
arXiv Aug 2025 · arXiv:2508.10227

关键想法 把 SH AC 系数的直方图画出来——几乎完美的 Laplace 分布。那就拟合 Laplace,按这个先验做 arithmetic coding。不需要 context model、不需要 MLP——就一个闭式分布。

其它属性(rotation、scale、opacity)用 Gaussian mixture。per-Gaussian 分布参数估计后驱动 arithmetic coder。

Key idea Plot the SH AC coefficient histogram. It's almost exactly Laplace. So just fit a Laplace, arithmetic-code under it. No context model, no MLP — just a closed-form distribution.

Other attributes (rotation, scale, opacity) use Gaussian mixtures. Per-Gaussian distribution parameters drive the arithmetic coder.

$\sim\!30\times$ rate reduction
$10\times$ model size
解码快fast decode (无神经网no neural net)
教学宝藏:100 年前的统计学还在管用,不是所有事都要神经网络。 A pedagogical gem: 100-year-old statistics still works. Sometimes you don't need a neural network.
策略Strategy 机制Mechanism 代表Representative 杠杆Leverage
degree 自适应Degree adapt每 Gaussian degree 不同per-Gaussian degreeReduced-3DGS$\sim\!3\times$
蒸馏Distillationteacher–studentLightGaussian$\sim\!2\times$
换成 fieldReplace w/ field共享 hash gridshared hash gridCompact-3DGS (Lee)$\sim\!5\times$
换基Change basisSG lobes / CP 分解SG lobes / CP factorSG-Splatting, F-3DGS$\sim\!3\text{--}8\times$
熵编码EntropyLaplace / GMMEntropyGS$\sim\!3\text{--}4\times$

§Part 6锚点法 —— 存稀疏,渲染时生成密集Anchor-based — Store sparse, decode dense

在 3DGS 场景里走一遭,会发现一个明显事实:相邻 Gaussian 的属性高度相似——咖啡桌面被几百个 Gaussian 覆盖,它们都想说同一件事:"我棕色、扁平、哑光。" 每个 Gaussian 都存满 59 floats 是天大的浪费。

那能不能改成:把属性存在稀疏 anchor 点上(voxel 网格),渲染时让一个小 MLP 现场生成邻近的 Gaussian?这就是 Scaffold-GS 的赌注——也是整个领域的转折点。

Walk through a 3DGS scene and one fact is obvious: neighboring Gaussians have nearly identical attributes. A coffee tabletop is covered by hundreds of Gaussians all saying the same thing: "I'm brown, flat, matte." Storing 59 floats per Gaussian for every one of them is staggeringly wasteful.

What if you stored attributes at sparse anchor points (voxel grid) and let a tiny MLP regenerate the local Gaussians on demand? That's the Scaffold-GS bet — and it changed the field.

VANILLA 3DGS SCAFFOLD-GS ~3M Gaussians stored explicitly · 750 MB 9 anchors (■) + MLP regenerates k=4 dots each · 8 MB Same Gaussians at render time. Different bytes on disk.
Scaffold-GS — Structured 3D Gaussians for View-Adaptive Rendering
CVPR 2024 Highlight Lu, Yu et al. · CUHK / Shanghai AI Lab · arXiv:2312.00109 · code

每个 Gaussian 独立漂浮,冗余巨大;且对视角/光照变化脆弱。 Each Gaussian floats independently — massive redundancy + brittleness to view/lighting changes.

关键想法 不存每个 Gaussian。在 voxel 网格上撒稀疏 anchor,每个 anchor 携带一个 feature vector(如 32-D)和 $k$ 个可学 offset 向量(如 $k=10$)。渲染时,MLP 接 (anchor feature, view direction, view distance) → 预测 $k$ 个 neural Gaussian 在 anchor_pos + offset_i 处的 opacity、颜色(替代 SH 直接出 RGB)、scale、rotation。anchor 沿训练动态 grow/prune。

磁盘存:anchor 位置 + feature + 缩放因子 + $k$ 个 offset + MLP 权重(几百 KB)。推理生成:每个 Gaussian 的所有属性。

View-dependent trick: MLP 直接吃 view direction,不再需要 SH。这也是为什么 Scaffold-GS 经常 PSNR 还更高——MLP 能表达比 degree-3 SH 更平滑的视角依赖。

Key idea Don't store each Gaussian. Scatter sparse anchors on a voxel grid; each anchor carries a feature vector (e.g. 32-D) and $k$ learnable offset vectors (e.g. $k=10$). At render time, an MLP takes (anchor feature, view direction, view distance) → predicts opacity, color (no SH — direct RGB), scale, rotation of $k$ neural Gaussians at anchor_pos + offset_i. Anchors grow/prune dynamically during training.

Stored on disk: anchor positions + features + scaling + $k$ offsets + MLP weights (~few hundred KB). Regenerated at render time: every Gaussian's attributes.

View-dependent trick: the MLP takes view direction as input — no SH needed. That's why Scaffold-GS often improves PSNR: the MLP expresses smoother view-dependence than degree-3 SH.

Mip-NeRF 360: 734 → 156 MB ($\sim\!5\times$)
PSNR: 27.4 → 27.50 (更高!)(improved!)
Deep Blending: 676 → 66 MB
原版 3DGS 之后最重要的一篇。不是压缩论文,是重新定义了"3DGS 场景是什么"。之后几乎每篇 SOTA(HAC, HAC++, ContextGS, CompGS, GaussianForest)都在 Scaffold-GS backbone 上。 The most important paper after vanilla 3DGS. Not really a compression paper — a reformulation of what a 3DGS scene is. Almost every SOTA paper (HAC, HAC++, ContextGS, CompGS, GaussianForest) builds on the Scaffold-GS backbone.
Octree-GS — LOD-Structured 3D Gaussians
TPAMI 2025 Ren et al. · arXiv:2403.17898

关键想法 把 Scaffold-GS anchor 放进 octree,按视距动态决定走多深。远处只解码粗 LOD。每层 anchor 沿 cumulative LOD 累加渲染;可学的 per-anchor LOD bias 补强高频区域。

Key idea Put Scaffold-GS anchors in an octree; choose fractional LOD per pixel based on camera footprint. Distant pixels decode only coarse levels. Cumulative LOD across octree levels; a learnable per-anchor LOD bias supplements high-freq regions.

Mip-NeRF 360 PSNR 28.05
Deep Blending: 30.49 / 112 MB
anchor compression 与"超大场景可扩展渲染"的桥。size 与 Scaffold-GS 接近,主要赢在不同视距下的渲染一致性。 The bridge between anchor compression and "scalable rendering for huge scenes." Size comparable to Scaffold-GS; the win is rendering consistency across view distances.
GaussianForest — Hierarchical-Hybrid 3DGS
arXiv 2024 Zhang et al. · arXiv:2406.08759

关键想法 把 Gaussian 组织成森林(树)存快变属性(position, opacity)显式;内部节点持共享 feature,MLP 解码出多个叶子的慢变属性(rotation, scale, color)。

教学陈述:变化剧烈的存进来,变化平缓的共享。

Key idea Organize Gaussians into trees: leaves store rapidly-varying attributes (position, opacity) explicitly; internal nodes hold shared features that an MLP decodes into smoothly-varying attributes (rotation, scale, color) for many leaves.

The clean pedagogical statement: store what varies a lot, share what varies smoothly.

Mip-NeRF 360: 827 → 85 MB (GF-Large), PSNR 27.45
Deep Blending: 701 → 98 MB
IGS / Implicit-GS — Multi-level Tri-plane
arXiv 2024 Wu et al. · arXiv:2408.10041

关键想法多分辨率 tri-plane(三个 2D feature grid)代替 per-Gaussian 属性;小 MLP 按位置查询解码。tri-plane 是 2D 平滑场,设计上就能被现成图像编码器良好压缩。位置点云另行重排做无损压缩。

Key idea Replace per-Gaussian attributes with a multi-level tri-plane (three 2D feature grids); a tiny MLP decodes each Gaussian by position lookup. The tri-plane is a 2D smooth field, designed to be compressible by off-the-shelf image codecs. Positions are stored separately for lossless coding.

"anchor → tiny MLP → Gaussian" 模板的灵活性体现——anchor 可以是点、voxel、hash 桶、tri-plane 采样点、tree 节点。 Shows how flexible the "anchors → tiny MLP → Gaussians" template is — the "anchor" can be a point, voxel, hash bucket, tri-plane sample, or tree node.
CompGS (Liu) — Compressed Gaussian Splatting via Hybrid Primitives
ACM MM 2024 Liu et al. · arXiv:2404.09458 · ⚠ 与 Compact3D (Navaneet)、Compact-3DGS (Lee) 都是不同的工作。 · ⚠ Different from Compact3D (Navaneet) and Compact-3DGS (Lee).

关键想法 两类 primitive:少量 anchor primitive 持完整几何,大量 coupled primitive 只存从 anchor 预测出的小残差。Rate-distortion loss $\lambda R + D$ + hyperprior Gaussian entropy model 驱动量化。

视频编码的精确类比:anchor = I-frame,coupled = P-frame,只存 delta。

Key idea Two kinds of primitives: a few anchor primitives with full geometry + many coupled primitives that store only tiny residuals predicted from the anchor. Rate-distortion loss $\lambda R + D$ + hyperprior Gaussian entropy models drive quantization.

Exact video codec analogy: anchors = I-frames, coupled = P-frames, only delta stored.

压缩比Compression: $45\text{--}175\times$
Mip-NeRF 360 ~17 MB / 27.26 PSNR
Playroom: 550 → 5 MB
Smol-GS — Compact Splat via Octree Positional Encoding
arXiv Dec 2025 · arXiv:2512.00850

关键想法 围绕octree positional encoding + 学到的 per-splat feature 构建紧凑表示。坐标递归 voxel 层级;feature 经熵编码。Mip-NeRF 360 上 4.87 MB / 27.61 PSNR——2025 年末最紧凑之一。

Key idea Compact representation around octree positional encoding + learned per-splat features. Recursive voxel hierarchy for coordinates; entropy-based feature compression. Mip-NeRF 360: 4.87 MB / 27.61 PSNR — among the smallest reported in late 2025.

为什么 anchor 这么有效Why anchors work so well
  1. 局部冗余是主要冗余源。Anchor 把它显式化:一个 feature 服务 $k=10$ 个邻居。
  2. MLP 比 SH 更紧凑地编码视角依赖。一个共享 MLP(吃 view direction)比每 Gaussian $48 \text{ SH coefficients} \times 3\text{M Gaussian}$ 高效得多。
  3. anchor 适合做熵编码靶子。规整的网格结构允许 spatial context model——预测每个 anchor 用它的邻居。这就是 HAC / HAC++ / ContextGS 在下一章做的,把压缩推过 $100\times$。
  1. Local redundancy is the dominant redundancy. Anchors make this explicit — one feature serves $k=10$ neighbors.
  2. The MLP encodes view-dependence more compactly than SH. One shared MLP (taking view direction as input) beats $48 \text{ SH coefficients} \times 3\text{M Gaussians}$.
  3. Anchors are great entropy-coding targets. Sitting on a structured grid (voxel/octree), they enable spatial context models — predict each anchor from neighbors. This is what HAC, HAC++, ContextGS exploit in the next chapter to push compression past $100\times$.

§Part 7熵编码 —— SOTA 所在Entropy coding — where the SOTA lives

如果量化是"把每个数 round 到几比特",熵编码是"按 surprise 付钱"。和 anchor 联手是把 3DGS 推过 $100\times$ 的关键。

If quantization is "round each number," entropy coding is "spend bits proportional to surprise." Combined with anchors, this is what pushes 3DGS past $100\times$ without quality loss.

Shannon 速成A quick Shannon refresher

要传一长串字母,'a' 出现 90%、'b' 出现 10%。一字节一个字母太奢侈了——大部分时候一个字母只需要不到 1 bit,因为罕见的 'b' 才是 surprise。Shannon 定理说每个符号最少需要的比特数等于它的负 log 概率

To transmit a long string where 'a' occurs 90% and 'b' 10%, you wouldn't waste a byte per letter. You'd use ~1 bit per letter because only the rare 'b' is surprising. Shannon's theorem says the minimum bits per symbol equals its negative log probability:

$$ \text{bits}(s) = -\log_2 p(s) $$

Arithmetic coding 接近最优逼近这个下界。代价:你得有一个概率模型 $p(s)$。模型越好,文件越小。

对 3DGS:「符号」是量化后的属性值;「概率模型」就是这章每篇论文要解决的核心问题。

Arithmetic coding achieves this in practice. The catch: you need a probability model $p(s)$. The better the model, the smaller the file.

For 3DGS: the "symbol" is a quantized attribute value; the "probability model" is what every paper in this chapter is fighting about.

三种拿先验的办法

Three ways to get a prior

类型Type怎么来How代表方法Representative
参数化(闭式)Parametric直接拟合 Laplace / Gaussian / GMMFit Laplace/Gaussian/GMM to empirical histogramEntropyGS
hyperprior(学到的粗粒度)Hyperprior联合学一个低分辨率 feature grid 预测细粒度分布Jointly learn coarse-grained feature grid predicting fine-grained distributionHAC, HAC++, CompGS, FCGS
自回归 contextAutoregressive context用已解码邻居预测下一个(像 PixelCNN)Predict each anchor from already-coded neighbors (like PixelCNN)ContextGS, PCGS, FCGS+
现成 codec 先验Off-the-shelf codec prior让 HEVC/WebP/JPEG-XL 内建的预测去编Let HEVC/WebP/JPEG-XL's built-in predictors handle itCodecGS, SOG/SOGS
HAC — Hash-grid Assisted Context for 3DGS Compression
ECCV 2024 Chen et al. · arXiv:2403.14530

Scaffold-GS 把 size 砍到 156 MB 就停了,能再往下吗? Scaffold-GS got down to 156 MB and stopped — can we go further?

关键想法 取 Scaffold-GS anchor,联合训一个多分辨率 binary hash grid。每个 anchor 按位置查 hash 得 feature;小 MLP_c 把 feature 转成每个属性的 Gaussian 分布 $(\mu, \sigma)$;anchor 在这个预测分布下做 arithmetic coding。

这是学到的图像压缩(Ballé 2017/2018)框架移植到 3DGS——hash grid 是 hyperprior,MLP 把 hyperprior 翻译成 per-anchor distribution。训练时显式最小化表示的熵(loss 里加 rate 项 $\lambda R + D$)。

"binary" 是指 hash grid 值量化到 $\{-1, +1\}$——hyperprior 自身基本免费。

Key idea Take Scaffold-GS anchors. Jointly train a multiresolution binary hash grid. Each anchor queries the hash grid at its position → feature; a tiny MLP_c outputs $(\mu, \sigma)$ of a Gaussian distribution for each attribute; arithmetic-code the anchor under that predicted distribution.

This is the learned image compression framework (Ballé 2017/2018) ported to 3DGS — hash grid = hyperprior, MLP translates hyperprior into per-anchor distribution. The entropy is explicitly minimized as a training loss ($\lambda R + D$).

"Binary" = hash grid values quantized to $\{-1, +1\}$ — the hyperprior itself is essentially free.

Mip-NeRF 360: 15.3 / 21.9 MB (low/high)
PSNR: 27.53 / 27.77
Deep Blending: 4.4 MB / 29.98 PSNR
$75\times$ over 3DGS, $11\times$ over Scaffold-GS, 无 PSNR 下降。后续所有 entropy-coding 工作的蓝图。 $75\times$ over 3DGS, $11\times$ over Scaffold-GS, no PSNR drop. The blueprint for everything that followed.
HAC++ — Towards $100\times$ Compression of 3DGS
TPAMI 2025 Chen et al. · arXiv:2501.12255

关键想法 在 HAC 之上加 (1) intra-anchor context——同一 anchor 的 $k$ 个 sibling Gaussian 互相预测;(2) 每属性自适应量化步长;(3) 可学的 mask 训练时丢弃无用 Gaussian。

intra-anchor 部分填上了 HAC 的漏洞:一个 anchor 衍生的 $k=10$ 个 Gaussian 显然相关(共享同一 parent feature),但 HAC 没用上。HAC++ 在 hash-grid prior 上叠加 sibling 间的自回归。

Key idea On top of HAC: (1) intra-anchor context — the $k$ sibling Gaussians of one anchor predict each other; (2) per-attribute adaptive quantization; (3) a learnable mask drops useless Gaussians during training.

Intra-anchor fills HAC's gap: the $k=10$ Gaussians from one anchor are obviously correlated (shared parent feature), but HAC didn't exploit it. HAC++ adds autoregressive coupling among siblings on top of the hash-grid prior.

Mip-NeRF 360: 8.7 MB / 27.60 PSNR
T&T: 5.4 MB / 24.22
Deep Blending: 3.1 MB / 30.16 PSNR
$\gt 100\times$ over 3DGS, $\gt 20\times$ over Scaffold-GS
在 Deep Blending 上首次跌破 1 MB/scene 同时提升 PSNR——"高质量区间的压缩问题基本被解决"的最干净证明。 Broke 1 MB per scene on Deep Blending while improving PSNR — the cleanest "compression is solved in the high-quality regime" demonstration.
ContextGS — Compact 3DGS with Anchor-Level Context Model
NeurIPS 2024 Wang et al. · arXiv:2405.20721

关键想法 HAC 用 hash-grid feature 预测每个 anchor,ContextGS 用已解码的邻居预测——3D anchor 上的 PixelCNN。

把 anchor 分层 (coarse → medium → fine)。最粗一层先 hyperprior 编码;解码出来后作为 context 预测下一层的分布,arithmetic-code;如此递推。直接利用 anchor 间空间冗余——HAC 只通过 hash grid 间接利用。

Key idea Where HAC predicts each anchor from a hash-grid feature, ContextGS predicts each anchor from its already-coded neighbors. Like PixelCNN for 3D anchors.

Split anchors into hierarchical levels (coarse → medium → fine). Coarsest first, coded under a small hyperprior. Once decoded, those anchors serve as context for the next level's distribution, arithmetic-coded. And so on. Directly exploits inter-anchor spatial redundancy — which HAC exploits only indirectly via the hash grid.

Mip-NeRF 360 low: 13.3 MB / 27.62 PSNR
Deep Blending: ~6 MB / 30.09
"3DGS 像图像编码"最干净的写法——经典 autoregressive context 在 3D 稀疏结构上的复刻。 The cleanest "3DGS as image codec" formulation — the classical autoregressive context model applied to a sparse 3D structure.
CodecGS — Feature Planes + HEVC
arXiv Jan 2025 Lee et al. · Fraunhofer HHI · arXiv:2501.03399

关键想法 把 3DGS 属性铺到 2D feature plane,跑一遍 HEVC——就是给你流 Netflix 的那个视频编码器。25 年工程红利免费拿,硬件解码器现成。

HEVC 有 intra prediction、变换编码、CABAC、码率控制——为什么要重造?把 feature plane 训练成 HEVC-friendly(频率域 entropy model 对齐 HEVC 实际行为)就行。

Key idea Lay 3DGS attributes on 2D feature planes; run HEVC — the same codec that streams Netflix. Get 25 years of video-codec engineering for free, with hardware decoders included.

HEVC has intra prediction, transform coding, CABAC, rate control — why reinvent? Train the feature planes to be HEVC-friendly (frequency-domain entropy model aligned with what HEVC will do).

Mip-NeRF 360: 10.3 MB / 27.30 PSNR
Tanks & Temples: 7.8 MB / 23.63
证明压缩可以彻底脱离定制学习编码器。手机里 2014 年后的 HEVC 硬解都现成。 Demonstrates compression can be fully decoupled from a custom learned codec. Hardware HEVC decoders exist in every phone from 2014+.
FCGS — Fast Feedforward 3DGS Compression
ICLR 2025 Chen et al. · arXiv:2410.08017

关键想法 上面每篇都得给每个新场景重训 neural codec(每场景几分钟)。FCGS 不用:一个预训练网络一次前向压完任何场景——每 100K Gaussian 大约 1 秒。

Entropy 模块用 3 分量 Gaussian Mixture,条件于:(1) hyperprior $h$(全场景粗 latent),(2) inter-Gaussian context $s$(已解码 Gaussian 的 grid interpolation),(3) intra-Gaussian context $c$(同一 Gaussian 内通道分块)。多路径分支把不同属性路由到不同 rate 约束。

Key idea Every method above re-trains a neural codec on each new scene (minutes per scene). FCGS doesn't: a single pretrained network compresses any 3DGS scene in one forward pass — ~1 second per 100K Gaussians.

Entropy module is a 3-component Gaussian Mixture conditioned on: (1) hyperprior $h$ (coarse scene-wide latent), (2) inter-Gaussian context $s$ (grid-interpolated from already-decoded Gaussians), (3) intra-Gaussian context $c$ (within-Gaussian channel chunks). Multi-path entropy module routes each attribute to a different rate-constraint path.

Mip-NeRF 360 low: 36.3 MB / 27.05
$\gt 20\times$ 压缩 / 秒级compression / seconds
无需逐场景训练no per-scene training
"压缩 = 推理"的拐点。比 HAC++ 大 $\sim\!5\times$,但快 $100\times$。和学到的图像编码器从逐图训练走向 amortized 的轨迹一模一样。 The amortization turning point. $\sim\!5\times$ larger than HAC++ but $\sim\!100\times$ faster. Mirrors the trajectory of learned image codecs from per-image training to amortized.
PCGS — Progressive Compression
AAAI 2026 Oral Chen et al. · arXiv:2503.08511

关键想法 一条 bitstream,多档解码质量。客户端可以早停,得到一个 OK 的预览;继续读则细节越来越好。Progressive JPEG 的 3DGS 版。

Progressive masking(逐步加 anchor)+ Progressive quantization(步长逐步减小)。一次训练,多档码率。

Key idea One bitstream, multiple decode quality levels. Clients can stop early for a decent preview; reading more refines. Progressive JPEG for 3DGS.

Progressive anchor masking (add anchors as you decode more) + progressive quantization (step size shrinks as more bits arrive). Single training, multiple bitrates extractable at decode time.

3DGS 第一个真正的渐进式码流——流媒体和带宽自适应 AR/VR 的天然继承者。 The first true progressive 3DGS — natural successor to HAC++ for streaming and bandwidth-adaptive AR/VR.
LocoGS — Locality-aware Gaussian Compression
arXiv Jan 2025 · arXiv:2501.05757

关键想法 沿 Morton (Z-order) 曲线排序 Gaussian——比特流里相邻的就是 3D 里相邻的。然后用 neural field + 自适应 SH 带宽利用产生的相干性。

Key idea Sort Gaussians along a Morton (Z-order) curve: nearby in the bitstream $\Leftrightarrow$ nearby in 3D. Then exploit the resulting coherence with a neural field + adaptive SH bandwidth.

$54.6\times \text{--} 96.6\times$ compression
$2.1\text{--}2.4\times$ rendering speed-up
Morton 排序是 1960 年代图形学技巧(用于纹理 cache),加上现代熵编码就是 $100\times$ 杠杆。 Morton sorting is a 1960s computer-graphics trick (texture caching). Combined with modern entropy coding it becomes a $100\times$ lever.
需求Need用什么Use
绝对最小文件,可每场景训练Absolute smallest file, can train per sceneHAC++ or ContextGS
小且能用硬件解码器Small + hardware decoderCodecGS (HEVC) or SOG/SOGS
秒级压缩,不分钟Compress in seconds, not minutesFCGS 前馈(feed-forward)
流媒体自适应码率Streaming adaptive bitratePCGS
代码极简,解码极快Minimal code, fast decodeEntropyGS 参数化parametric

§Part 8工业格式 —— 你手机真正收到的是什么Industry formats — what your phone actually receives

HAC++ 给你 3 MB 文件,然后呢?真正部署还需要:

  • 跨平台二进制,不假设 CUDA;
  • 解码器能在 WebGL/WebGPU/Metal/Vulkan 上毫秒级跑;
  • 容器能嵌入现有 3D pipeline (glTF, USD);
  • 理想情况是 标准,所有引擎都能读。

这些都不是学术论文交付的东西。2024–2026 工业格式逐渐定型。

HAC++ gives you a 3 MB file. Now what? Real deployment needs:

  • a cross-platform binary that doesn't assume CUDA;
  • a decoder that runs in WebGL/WebGPU/Metal/Vulkan in milliseconds;
  • a container that slots into 3D pipelines (glTF, USD);
  • ideally, a standard all engines agree on.

None of which is what an academic paper ships. 2024–2026 has crystallized a small set of contenders.

SOG — Self-Organizing Gaussians
ECCV 2024 Morgenstern et al. · Fraunhofer HHI · arXiv:2312.13299 · project · code

3DGS 是无结构点云——图像编码器没法直接用。 3DGS is unstructured — image codecs can't be applied directly.

关键想法 把 $N$ 个 Gaussian 排进 $\sqrt N \times \sqrt N$ 2D 网格,让网格邻居有相似属性。每个属性(position-x, scale-y, SH-coeff-k...)就变成一张平滑 2D 图像。然后用 PNG/JPEG-XL/WebP/AVIF 存——让图像编码器做熵编码。

排序算法叫 PLAS (Parallel Linear Assignment Sorting)——一个自定义 GPU 算法,几秒解决"把 $N$ 个高维向量分配到 2D 网格,使邻居相似"的指派问题。

整个领域最具教学意义的想法:"把无结构问题变成结构化的,然后已有工具就能解。"

Key idea Sort $N$ Gaussians onto a $\sqrt N \times \sqrt N$ 2D grid so that grid-neighbors have similar attributes. Each attribute (position-x, scale-y, SH-coeff-k…) becomes a smooth 2D image. Save with PNG/JPEG-XL/WebP/AVIF — let the image codec do the entropy coding.

Sorting algorithm: PLAS (Parallel Linear Assignment Sorting) — a custom GPU algorithm that assigns $N$ high-D vectors to a 2D grid in seconds, optimizing local smoothness.

The most pedagogically elegant idea in the whole field. Make the unstructured problem structured; existing tools solve it.

PLAS 排序:噪声属性 → 平滑属性图PLAS sort: noisy attributes → smooth attribute image
$19.9\times \text{--} 39.5\times$ 压缩compression
Mip-NeRF 360 ~40 MB / 27.64 PSNR
无 SH 时可达w/o SH: up to $123\times$
PlayCanvas SOGS / SOG v2 — Production WebP-based format
Production · PlayCanvas Engine 2.7.5 (2024) + .sog v2 (late 2025)

关键想法 SOG 的工业版。每个属性是 WebP texture,封装在一个 archive 里。浏览器原生解码,按 Morton 顺序直接上 GPU。

SOG v2 (2025 末) 加:Morton ordering 让 GPU 加载友好、WebGPU-only 编码 pipeline(写时不需要 CUDA)、单文件自包含 archive。

Key idea SOG productionized. Each attribute is a WebP texture bundled in an archive. Decoded natively by the browser; loaded straight to the GPU in Morton order.

SOG v2 (late 2025) adds: Morton ordering for GPU-friendly loading, WebGPU-only encoder (no CUDA needed to write .sog), single self-contained archive.

~95% 尺寸减少size reduction
1 GB PLY → 42 MB .sog
PlayCanvas Engine 部署中deployed in PlayCanvas
SPZ / SPZ 4 — Splat Zip (Niantic)
Production · MIT license · SPZ 4 announced May 2026 · github.com/nianticlabs/spz

关键想法 每个属性 16-bit 定点量化 → gzip 整个 blob。完毕。

SOG 的故意-简单替代品:

  • 16-bit fixed-point positions
  • quantized quaternion / scale / opacity
  • 保留完整 degree-3 SH(其他压缩格式经常不留)
  • 整个 blob gzip

SPZ 4 (2026 May) 加:vendor extensions、$\sim\!3\text{-}5\times$ 编码加速、$\sim\!1.5\text{-}2\times$ 解码加速。

Key idea Quantize each attribute to small int range. Gzip the whole blob. Done.

The deliberately-simple alternative to SOG:

  • 16-bit fixed-point positions
  • quantized quaternion / scale / opacity
  • preserves full degree-3 SH (many compressed formats don't)
  • final gzip pass

SPZ 4 (May 2026) adds: vendor extensions, $\sim\!3\text{-}5\times$ faster encode, $\sim\!1.5\text{-}2\times$ faster decode.

$\sim\!10\times$ smaller than .ply
~90% 尺寸减少size reduction
"无 ML、就工程做到位"的范例。Scaniverse 设备端捕获采用,glTF KHR_gaussian_splatting_compression_spz 的官方载荷。 The "boring baseline that actually works." Used by Scaniverse on-device. Now the official compression payload of Khronos KHR_gaussian_splatting_compression_spz.
.ksplat — Three.js viewer's home format
Community mkkellogg · GaussianSplats3D

关键想法 一个"瘦身的 PLY"二进制:8-bit SH,struct layout 直接对齐 Three.js 内部 Gaussian 结构。加载即上 GPU,零转换。

多档压缩级别,最激进的会 8-bit 量化 SH。强调解码速度而非绝对体积。

Key idea A "trimmed PLY" binary: 8-bit SH, struct layout matching the Three.js renderer's internal Gaussian. Load bytes, point a buffer view, render — zero transformation.

Multiple compression levels; the most aggressive 8-bit-quantizes SH. Emphasizes decode speed over absolute size.

"磁盘上小"和"GPU 上快"是两件事。KSPLAT 比 SOGS 大,但加载更快。 "Small on disk" $\ne$ "fast on GPU." KSPLAT is bigger than SOGS but loads faster.
glTF KHR_gaussian_splatting — Khronos Industry Standard
Standard · Release candidate Feb 2026 · press release

关键想法 第一个跨厂商 3DGS in glTF 标准。SPZ 是官方载荷;设计与算法无关,未来 codec 可插拔。

两个 extension:

  1. KHR_gaussian_splatting 定义无压缩结构:position、rotation、scale、opacity、SH 分成 diffuse (degree 0) + specular (degree 1-3)。
  2. KHR_gaussian_splatting_compression_spz 把 SPZ-压缩 blob 包装成 glTF buffer。

Niantic、Cesium/Bentley、Esri、OGC、Khronos 联合背书。类似于 JPEG 被写进 web 标准的时刻。

Key idea The first cross-vendor 3DGS-in-glTF standard. SPZ is the official compression payload; design is algorithm-agnostic so future codecs can drop in.

Two extensions:

  1. KHR_gaussian_splatting defines the uncompressed structure: position, rotation, scale, opacity, SH split into diffuse (deg 0) + specular (deg 1-3).
  2. KHR_gaussian_splatting_compression_spz wraps SPZ-compressed blobs as glTF buffers.

Backed by Niantic, Cesium/Bentley, Esri, OGC, and Khronos. The analog of the moment JPEG got written into a web spec.

MPEG-GSC — Gaussian Splatting Coding (ISO Future Standard)
Future ISO Standard · MPEG 153rd meeting Jan 2026

关键想法 制定 MPEG-2、HEVC 的同一标准体把 3DGS 当一类 first-class 媒体。目标:可互操作的 ISO 编码器,HEVC/VVC 作为起点。预计 2027-2028 产出参考标准。

Key idea The same standards body that gave us MPEG-2 and HEVC now treats 3DGS as a first-class media type. Goal: interoperable ISO codec, with HEVC/VVC as starting points. Reference standard expected 2027-2028.

glTF + SPZ 解决今天的 asset distribution;MPEG-GSC 将解决广播/流媒体的 codec 互操作。 glTF + SPZ solves asset distribution today; MPEG-GSC will solve codec interop for broadcast/streaming.
需求Need格式Format
最小、浏览器直送、可逐场景编Smallest, browser-delivery, per-scene OKPlayCanvas .sog / SOGS
标准化 glTF pipeline、简单解码Standards-friendly glTF pipeline, simple decode.spz / SPZ4 via glTF KHR_gaussian_splatting_compression_spz
Three.js viewer、解码速度优先Three.js viewer, decode speed first.ksplat
广播流媒体(未来)Broadcast streaming (future)MPEG-GSC
研究 / leaderboardResearch / leaderboard方法专用二进制(HAC, ContextGS 等)Method-specific binaries (HAC, ContextGS, etc.)

§Part 9前沿 2025-2026 —— 30 个月时间线Frontier 2025-2026 — 30 months in

Aug 2023
3D Gaussian Splatting (Kerbl et al., SIGGRAPH 2023)

起点。场景 ~1.4 GB。The starting point. Scenes ~1.4 GB.

Nov 2023
LightGaussian · Compact3D (Navaneet) · Compact-3DGS (Lee) · EAGLES · SOG

仅 4 个月后,基本压缩方向就已列齐。大小掉到 20-60 MB。Four months in, basic directions already mapped. Sizes drop to ~20–60 MB.

Mar 2024
Scaffold-GS · HAC

anchor + entropy 组合成型。HAC 把无损情况下推到 ~15 MB。Anchor + entropy combination crystallizes. HAC pushes to ~15 MB at no quality loss.

Oct 2024
FCGS — feed-forward compression

amortization 时刻。压缩 = 推理,不再 = 优化。The amortization moment. Compression becomes inference, not optimization.

Jan 2025
HAC++ · CodecGS · Splatpress · SG-Splatting

sub-10 MB 成为家常。Khronos、MPEG 开始开会。Sub-10 MB scenes become routine. Khronos and MPEG start meeting.

Mar 2025
PCGS (AAAI 2026 Oral) — progressive 3DGS

"progressive JPEG" for splats。一码流多质量。"Progressive JPEG" for splats. One bitstream, multiple qualities.

Aug–Sep 2025
EntropyGS · ExGS / Zip-GS · MEGS²

三条新支线:参数化熵编码、扩散辅助修复、面向移动端。Three new branches: parametric entropy, diffusion-assisted restoration, mobile-targeted.

Dec 2025
Smol-GS · Splatwizard · RAVE

Mip-NeRF 360 上 4.87 MB SOTA。统一基准工具包。可变码率单模型解码器。SOTA at 4.87 MB on Mip-NeRF 360. Unified benchmarking toolkit. Variable-bitrate single-model decoder.

Feb 2026
NiFi — $1000\times$ via diffusion · glTF KHR_gaussian_splatting released

压缩与标准化同时进入成熟期。Compression and standardization hitting maturity simultaneously.

May 2026
SPZ 4 · MPEG-GSC exploration ongoing

问题从"能多小"变成"怎么发出去"。The field moved from "how small can we make a scene?" to "how do we ship it?"

方向 1:前馈式 / amortized codecDirection 1: Feed-forward / amortized codecs

直到 2024 末,每个 3DGS 压缩器都要每场景训练。FCGS 之后这条路解锁。未来主流编解码器会是单个预训练网络,一次前向压完任何场景。

  • FCGS (ICLR 2025, arXiv 2410.08017) — 第一个前馈式 3DGS 压缩器。秒级 $\sim\!20\times$ 压缩。
  • FCGS+ / Long-Context FCGS (arXiv 2512.00877, 2025) — Morton serialization 构建千-Gaussian 级别 context window。前馈式 SOTA。
  • D-FCGS (arXiv 2507.05859, 2025) — 自由视点视频的前馈动态压缩。

为什么重要:每场景训练是 capture-to-share 工作流(手机端 capture、AR 内容创建)的瓶颈。秒级而不是 10 分钟,用户体验从根本上不同。

Until late 2024, every 3DGS compressor required per-scene training. FCGS broke that. Going forward, expect dominant codecs to be single pretrained networks that compress any scene in one forward pass.

  • FCGS (ICLR 2025, arXiv 2410.08017) — first feed-forward 3DGS compressor. $\sim\!20\times$ compression in seconds.
  • FCGS+ / Long-Context FCGS (arXiv 2512.00877, 2025) — Morton serialization to build thousands-of-Gaussian context windows. SOTA among generalizable codecs.
  • D-FCGS (arXiv 2507.05859, 2025) — feedforward dynamic-3DGS for free-viewpoint video.

Why it matters: per-scene training is the bottleneck for capture-to-share workflows. Seconds, not 10 minutes, qualitatively changes UX.

方向 2:扩散辅助修复Direction 2: Diffusion-assisted restoration

如果你能事后修复渲染坏掉的图,就可以更激进地压。decompression pipeline 变成:

If you can repair a poorly-rendered image after the fact, you can compress your scene more aggressively. The decompression pipeline becomes:

$$ \text{scene}' = \text{compress}(\text{scene},\text{ very\_aggressive}) \;\longrightarrow\; \text{render}(\text{scene}') \;\longrightarrow\; \text{diffusion\_repair}(\text{render}) $$
ExGS / Zip-GS
arXiv 2509.24758 (Sep 2025)

关键想法 激进 training-free pruning (UGC) + mask-guided 单步扩散 (GaussPainter) 修复渲染。实时推理。$100\times$ 压缩,354 MB → 3.31 MB。

Key idea Aggressive training-free pruning (UGC) + mask-guided one-step diffusion (GaussPainter) restores renders. Real-time inference. $100\times$ compression, 354 MB → 3.31 MB.

NiFi — Nix-and-Fix
arXiv 2602.04549 (Feb 2026)

关键想法 把压缩推到 $1000\times$——大胆扔掉大部分细节,再用 artifact-aware 单步扩散解码器复活。这条线上"解码 = 生成"已经名副其实。

Key idea Push to $1000\times$ by destroying most detail, then resurrecting with an artifact-aware one-step diffusion decoder. The line between compression and synthesis blurs.

方向 3:4D 动态压缩Direction 3: 4D / dynamic compression

4DGS / 3DGStream 让动态场景成为可能后,自然问题:怎么压缩"splat 的视频"?标准视频编码器对像素的解法,splat 版正在 active research。

  • 4DGC (arXiv 2503.18421) — rate-aware streamable 4DGS, $\sim\!16\times$ smaller than 3DGStream
  • GIFStream (arXiv 2505.07539, CVPR 2025) — canonical-space + deformation-field, 30 Mbps real-time on RTX 4090
  • 4DGCPro (arXiv 2509.17513) — hierarchical progressive 4D for volumetric video
  • P-4DGS (arXiv 2510.10030) — predictive 4DGS, $90\times$ compression
  • MEGA (arXiv 2410.13613) — memory-efficient 4DGS
  • CompGS++ (arXiv 2504.13022) — static + dynamic 双兼容
  • D-FCGS (arXiv 2507.05859) — 前馈动态压缩

所有这些的概念套路:canonical-space + deformation。一次性存 $t=0$ 的场景,每帧只存低带宽 deformation field(或 anchor motion)。和 MPEG 的 I-frame + P-frame 思路一模一样。

Once 4DGS / 3DGStream made dynamic scenes possible, the natural question: how to compress a video of splats? Video codecs do this for pixels; the splat analog is active research.

  • 4DGC (arXiv 2503.18421) — rate-aware streamable, $\sim\!16\times$ smaller than 3DGStream
  • GIFStream (CVPR 2025, arXiv 2505.07539) — canonical + deformation field, 30 Mbps real-time on RTX 4090
  • 4DGCPro (arXiv 2509.17513) — hierarchical progressive 4D for volumetric video
  • P-4DGS (arXiv 2510.10030) — predictive 4DGS, $90\times$ compression
  • MEGA (arXiv 2410.13613) — memory-efficient dynamic 4DGS
  • CompGS++ (arXiv 2504.13022) — static + dynamic both
  • D-FCGS (arXiv 2507.05859) — feedforward dynamic codec

Conceptual pattern across all of them: canonical-space + deformation. Store the scene once at $t=0$, then store low-bandwidth deformation fields (or anchor motion) per frame. Same as the MPEG I-frame + P-frame story.

方向 4:移动端 / on-deviceDirection 4: Mobile / on-device

当前大部分 SOTA 假设 24 GB 数据中心 GPU。这条线是要让 3DGS 在手机(带电池)上跑好:

  • 3DGauCIM (arXiv 2507.19133) — digital compute-in-memory 加速器
  • StreamingGS (arXiv 2506.09070) — 移动端体素式 streaming
  • MEGS² (arXiv 2509.07021) — SG-based memory-efficient + 统一剪枝
  • FlexGaussian (ACM MM 2025) — INT8/INT4 混合精度,秒级部署
  • Real-Time On-Device 3DGS with Reuse (arXiv 2511.12930)

注意:这条线的论文 headline 不是 PSNR,而是"瓦特/FPS、DRAM 字节、每帧渲染预算"——另一个优化制度。

Most current SOTA assumes a 24 GB datacenter GPU. This direction is good 3DGS rendering on phones, with batteries:

  • 3DGauCIM (arXiv 2507.19133) — compute-in-memory accelerator
  • StreamingGS (arXiv 2506.09070) — voxel-based mobile streaming
  • MEGS² (arXiv 2509.07021) — SG-based memory-efficient + unified pruning
  • FlexGaussian (ACM MM 2025) — INT8/INT4 mixed-precision, deployable in seconds
  • Real-Time On-Device 3DGS with Reuse (arXiv 2511.12930)

Note: these papers headline "watts per FPS, DRAM bytes, per-frame render budget" rather than PSNR — a different optimization regime.

方向 5:基准 & 工具链Direction 5: Tooling & benchmarks

这个领域正逐步规范化。3DGS.zip(Bagdasarian et al., 2025)是一个活的 leaderboard,钉死了跨论文的可比性。两个工具包正在汇聚成统一基准:

  • Splatwizard (arXiv 2512.24742) — 统一评测框架,10+ rasterizer、熵估计、metrics 一站式。
  • GSCodec Studio (arXiv 2506.01822) — 静态+动态模块化框架。
  • RAVE (arXiv 2512.07052) — 单模型输出连续 RD 曲线,解码时选码率不用重训。
  • 3DGS.zip 综述w-m.github.io/3dgs-compression-survey

The field is starting to grow up. 3DGS.zip (Bagdasarian et al., 2025) is a live leaderboard pinning down cross-paper comparisons. Two toolkits are converging on a standard harness:

  • Splatwizard (arXiv 2512.24742) — unified evaluation harness, 10+ rasterizers, entropy estimation, metrics.
  • GSCodec Studio (arXiv 2506.01822) — modular static + dynamic GS framework.
  • RAVE (arXiv 2512.07052) — single trained model emits a continuous RD curve; pick bitrate at decode time without retraining.
  • 3DGS.zip surveyw-m.github.io/3dgs-compression-survey
大局观A speculative big picture

把过去三年挤压看,三条线在同时收敛

  1. 压缩 → 推理(FCGS, FCGS+)
  2. 解码器 → 生成模型(ExGS, NiFi)
  3. 格式 → 标准(glTF KHR → MPEG-GSC)

三者合起来指向 2027–2028 可能的样子:一个前馈编解码器输出 Khronos 标准 glTF asset,超低带宽场景由生成式 prior 解码器兜底。

Three things are happening at once:

  1. Compression → inference (FCGS, FCGS+)
  2. Decoder → generative model (ExGS, NiFi)
  3. Format → standard (glTF KHR → MPEG-GSC)

Together they suggest where 2027–2028 lands: a feed-forward compressor emitting a Khronos-standard glTF asset, with a generative-prior fallback decoder for ultra-low-bandwidth delivery.

§A交互演示场Playground

本文用到的所有交互 demo 集中在这里。前面的章节里都已嵌入;这里只是方便集中把玩。所有代码纯 Canvas2D,零依赖,gauss-engine.js $\approx 280$ 行——可以直接 fork。

All interactive demos from this survey, collected here for convenience. Each is already embedded in its chapter. All pure Canvas2D, zero dependencies — gauss-engine.js $\approx 280$ lines, easy to fork.

使用 GaussEngineUsing GaussEngine
const cam = GaussEngine.makeCamera({
  eye: [0, 1, 3], target: [0, 0, 0], fov: 40,
});
const gaussians = GaussEngine.generateBlob(500, seed=7);
// gaussians = [{position, scale, rotation, color, opacity}, ...]

const compressed = GaussEngine.applyVQCopy(
  GaussEngine.pruneByImportance(gaussians, 0.3),
  K=64
);
GaussEngine.render(ctx, compressed, cam);

本文用到的 demo(按出现顺序):

  1. 单个 Gaussian 的解剖§1,6 个 slider 拖动一个 anisotropic Gaussian。
  2. 不同 SH degree 的辐射花瓣§3,degree 3 → 0 视觉变化。
  3. 压缩配方组合§5,pruning + 比特数 + VQ + SH 同时调,看 size 与 PSNR 估计。
  4. 不同打分函数下的剪枝 — §Part 3,opacity / volume / Hessian-ish 等。
  5. SOG 排序前后对比 — §Part 8,从无结构到 2D 网格。
  6. 大小–PSNR Pareto 散点图§7,所有方法在 Mip-NeRF 360 上的位置。

Demos used in this survey (in order of appearance):

  1. Anatomy of a single Gaussian§1, six sliders deforming an anisotropic Gaussian.
  2. Radiance lobe at each SH degree§3, degree 3 → 0 visual change.
  3. Compression recipe composer§5, pruning + bits + VQ + SH all at once, watch size and PSNR estimate.
  4. Pruning under different scoring criteria — §Part 3, opacity / volume / Hessian-ish, etc.
  5. SOG before/after sort — §Part 8, from unstructured to 2D grid.
  6. Size–PSNR Pareto scatter§7, every method on Mip-NeRF 360.

§B阅读清单 & 资源Reading list & resources

基础Foundations

综述 & 活跃 leaderboardSurveys & live leaderboards

剪枝(§Part 3)Pruning (Part 3)

量化(§Part 4)Quantization (Part 4)

球谐压缩(§Part 5)SH compression (Part 5)

锚点法(§Part 6)Anchor-based (Part 6)

熵编码(§Part 7)Entropy coding (Part 7)

工业格式(§Part 8)Industry formats (Part 8)

前沿 2025-2026(§Part 9)Frontier 2025-2026 (Part 9)

动态 4D 压缩Dynamic / 4D compression

移动端 / on-deviceMobile / on-device

相邻工作Adjacent work

引用本综述How to cite

@misc{3dgs-compression-survey-2026,
  title  = {Compressing 3D Gaussian Splatting: A Friendly Bilingual Survey},
  year   = {2026},
  month  = {May},
  note   = {Educational survey covering pruning, quantization, SH compression,
            anchor-based, entropy coding, industry formats, and 2025-2026
            frontiers. Bilingual English/Chinese.}
}

更好的做法:引用原始论文 + 3DGS.zip live leaderboard。

Better: cite the underlying papers + the 3DGS.zip live leaderboard.

新鲜度说明A note on freshness

3DGS 压缩动得很快——2026 年 1-5 月就有 ~30 篇 arXiv 新论文。本综述编纂于 2026 年 5 月。文中数字来自各论文在标准 benchmark (Mip-NeRF 360 / Tanks & Temples / Deep Blending) 上的报告;部分未独立验证。要权威数据请回到原论文或 3DGS.zip leaderboard。

发现错误、过时主张、遗漏论文——你读到时领域大概又前进了。把本文当作起点。

3DGS compression moves fast — ~30 new arXiv papers in Jan-May 2026 alone. This survey was compiled in May 2026. Numbers come from each paper's reported tables on Mip-NeRF 360, Tanks & Temples, Deep Blending; some are unverified or paraphrased. For definitive figures, go to the original paper or the 3DGS.zip leaderboard.

If you spot an error, an outdated claim, or a missing paper — the field will have moved on by the time you read this. Treat the survey as a starting point.