When Kerbl et al. released 3D Gaussian Splatting in August 2023, a scene on disk was typically
1.4 GB. Three years later, the same scene can be squeezed to 3 MB
with almost no visible degradation — three orders of magnitude. This survey walks through
how, why it works, and the key distinctions between methods, from the ground up.
Assumed background: basic NeRF, SDF, ML, calculus, linear algebra. If you've never touched 3DGS
before, the primer chapters bring you up to speed.
没碰过 3DGS:按 §1 → §5 顺序读完,再看 §6 的分类法,然后挑你最感兴趣的"族"跳读。
熟悉 3DGS:直接从 §6 看分类法,然后 Part 6 / Part 7 (Anchor + Entropy) 才是真正的 SOTA。
只想看最近的:跳到 Part 9 前沿。
New to 3DGS: read §1 → §5 in order, then the taxonomy in §6, then pick whichever family attracts you.
Familiar with 3DGS: skim §6 for the taxonomy, then go straight to Part 6/Part 7 (Anchor + Entropy) for the real SOTA.
Just want the latest: jump to Part 9 Frontier.
§13D 高斯到底是什么What's a "3D Gaussian," anyway?
先把 radiance field 这词从脑子里赶出去。Splatting 语境下的 "3D Gaussian"
就是一团飘在空间中的各向异性彩色绒球。数学上:
Forget radiance fields for a moment. A 3D Gaussian, in the splatting sense, is just an
anisotropic blob of color floating in space. Mathematically:
where $\boldsymbol{\mu} \in \mathbb{R}^3$ is the center, $\Sigma \in \mathbb{R}^{3\times3}$ is a
symmetric positive-definite covariance saying how the blob is stretched and rotated. Covariance is
always parameterized as $\Sigma = R\,S\,S^\top R^\top$ with $R$ from a unit quaternion
(4 floats) and $S$ a diagonal anisotropic scale (3 floats). You optimize $q$ and $s$;
the resulting $\Sigma$ is positive-definite by construction, never blowing up.
拖动旋转相机 · 拖滑块改形状Drag to orbit · slide to deform
线框是 $2\sigma$ 等密度面——高斯密度衰减到峰值 $e^{-2}$ 的位置。彩色斑点是它真正被渲染(splat)出来的样子。The wireframe is the $2\sigma$ iso-surface — where the Gaussian's value has fallen to $e^{-2}$ of its peak. The colored blob is what gets splatted on screen.
Don't think of a Gaussian as a hard ellipsoid. Think of it as a fuzzy cotton ball: full density at the center, falling smoothly toward the edges. We render thousands of them stacked together.
§2"Splatting" 是怎么回事"Splatting" — what is it?
要把一个 3D Gaussian 画到屏幕上,不需要 ray marching,也不要求根。走捷径:
To draw a 3D Gaussian on screen, you don't ray-march or root-find. Take a shortcut:
Project the 3D mean $\boldsymbol\mu$ to a 2D screen point;
Project the 3D covariance $\Sigma$ via the EWA-splatting linearization to a 2D $\Sigma'$;
You get a 2D ellipse with exponential falloff. Draw it into the framebuffer, alpha-blended.
That's it. Step (3) is what people mean by "splat" — the Gaussian gets splattered onto the screen like wet paint. Vastly cheaper than NeRF's 64–256 MLP queries per ray, which is why 3DGS hits 100+ FPS on consumer GPUs.
A pixel's final color comes from back-to-front depth-sorting the Gaussians touching it, then classical Porter-Duff "over" compositing:
3DGS scenes look photographic — specular highlights move as you orbit — because each Gaussian doesn't store one color but a tiny function of viewing direction: "from above, I'm bright white; from the side, dim gray." That function lives on the unit sphere $S^2$.
How do you compress a function on a sphere? Same way as on a circle (Fourier series) or a line (Taylor series): pick a basis and expand. The natural basis on the sphere is spherical harmonics $Y_\ell^m$ — eigenfunctions of the spherical Laplacian. Up to degree $L$, there are $(L+1)^2$ basis functions:
Vanilla 3DGS uses degree 3, so 48 coefficients per Gaussian for color alone. That's the elephant in the storage budget.
不同 SH degree 下的辐射花瓣Radiance lobe at different SH degrees
把 degree 从 3 拉到 0,球面从有视角依赖的彩色花纹变成纯色——这就是 "SH degree 截断" 的代价。后面会看到很多方法对漫反射 Gaussian 退到 degree 0,对高光 Gaussian 才保留 degree 3。Slide degree 3 → 0, the sphere goes from view-dependent pattern to flat — the price of "SH-degree truncation." Many methods drop matte Gaussians to degree 0 and reserve degree 3 only for the shiny ones.
§4训练循环The training loop
压缩方法挂载在训练流程的不同阶段。30 行伪代码看清整个循环:
Compression methods plug into different stages of training. Here's the loop in 30 lines of pseudo-code:
deftrain_3dgs(cameras, photos, iters=30000):
G = init_from_sfm_points() # 从稀疏点云起步 / start from SfM points
optimizer = Adam(G.parameters())
for step inrange(iters):
cam, photo = sample(cameras, photos)
rendered = rasterize(G, cam) # splatting pipeline
loss = L1(rendered, photo) + λ * D_SSIM(rendered, photo)
loss.backward()
optimizer.step()
# --- adaptive density control(魔法所在 / where the magic is) ---if step % 100 == 0and step < warmup_iters:
clone_high_gradient_gaussians(G)
split_oversized_gaussians(G)
prune_low_opacity_gaussians(G)
return G # ~3M Gaussians, ~700 MB on disk
三件事要记住:
全程都是梯度下降。每个 Gaussian 的位置、scale、四元数、opacity、SH 系数都是 Adam 优化的可学参数。
Adaptive density control 会在训练期间动态地增长种群——梯度大的 clone/split,opacity 低的 prune。最终的 3M 数量是学出来的。
光栅化对 3D 参数可导。梯度能从像素一路传到 3D Gaussian 的位置和形状——这是整套机制的工程关键。
压缩方法接入这个循环的三个口子:
训练时:改 density control 规则(Mini-Splatting, Taming-3DGS)或加 rate-aware loss(HAC, ContextGS);
It's gradient descent the whole way. Every per-Gaussian attribute (position, scale, rotation, opacity, SH coefficients) is an Adam-optimized parameter.
Adaptive density control grows the population during training: high-gradient Gaussians get cloned/split, low-opacity ones get culled. The final 3M-Gaussian count is a learned outcome.
The image-space rasterizer is differentiable. Gradients flow from photo pixels back to 3D positions and shapes — the key engineering trick.
Compression methods hook in at three places:
Train-time: change density-control rules (Mini-Splatting, Taming-3DGS) or add a rate-aware loss (HAC, ContextGS).
Post-hoc: after training, prune / quantize / re-encode the .ply (LightGaussian's stage 3, MesonGS, FlexGaussian).
Feed-forward at inference: one pretrained network compresses any scene, no retraining (FCGS).
Inria's original 2023 code saved scenes as .ply — yes, that 1990s point-cloud format, repurposed. Open a real 3DGS .ply and the header looks like this:
SH 拆成 DC + rest。DC 是 degree 0(沿所有视角的平均色),rest 是 45 个高频系数。
Even the "wasted" stuff is real — three normal floats always zero (a .ply convention) eat ~36 MB of nothing. Format choices to note:
Opacity is a logit, not a [0,1] probability. Apply sigmoid for $\alpha$. Smoother optimization, and logits can be clipped freely.
Scales are log-scale. Actual $\sigma$ is exp(scale_i). Same reason: optimization friendliness + forced positivity.
SH splits into DC + rest. DC is degree 0 (average color over all directions); the rest are 45 higher-frequency coefficients.
字节都花到哪儿了
Where the bytes actually live
字节都花在 SH 上。Most bytes go into SH.
为什么会有这么多冗余?
不是格式蠢,而是 750 MB 里藏着几种不同口味的冗余,每个 family 都擅长抓其中一种:
Why so much redundancy?
Not that the format is dumb. Hiding in those 750 MB are several flavors of redundancy, and each compression family is good at exposing a different one:
冗余类型Redundancy type
含义What it means
谁来挖Who exploits it
Bit-level
每个 float 占 32 bit,但实际只要 8-12 bit。Each float is 32 bits, but really only needs 8-12.
注意:右侧的 PSNR 是玩具启发式,不是真 PSNR。但定性走势是对的——剪枝伤几何、量化伤平滑度、SH 截断伤高光、彼此约略乘。真实方法各种花式手段就是避开每个失败模式。Note: the PSNR shown is a toy heuristic, not a real PSNR. Qualitative behavior is correct though — pruning kills geometry, quantization kills smoothness, SH truncation kills highlights, they roughly compose. Real methods use cleverness to dodge each failure mode.
§6六个旋钮 · 总体分类The six knobs — overall taxonomy
3DGS.zip 综述和 IEEE 2025 综述都点明了一个微妙但有用的区分:
Both the 3DGS.zip survey and the IEEE 2025 survey draw a subtle but useful distinction:
Compaction
Compression
减少 Gaussian 数量(或换更强的 primitive)。每个原语的比特数不变,只是变少了。例:pruning、GES、Mini-Splatting、Reduced-3DGS。Reduce the number of Gaussians (or substitute a stronger primitive). Bit length per primitive stays roughly fixed; there are just fewer of them. Examples: pruning, GES, Mini-Splatting, Reduced-3DGS.
数量大致不变,但 每个 Gaussian 的比特数变少。下游 renderer 可能解码回原始数量。例:量化、熵编码、SOG、anchor 重建。Same number of Gaussians (more or less), but fewer bits per Gaussian. The renderer may decode back to the original count. Examples: quantization, entropy coding, SOG, anchor-based regeneration.
两者可以同时用——大部分 SOTA 都是两者并行。下面六个旋钮把正交的杠杆分开讲,方便你想清楚怎么组合。
You can do both at once — most SOTA pipelines do. The six knobs below split the orthogonal levers so you can reason about combinations.
六个旋钮简述
① Pruning — "扔掉没用的"。最直接。训练后常能扔掉 80-90% 还几乎无损,得益于 adaptive density control 通常过头。问题在于怎么打分:opacity、ray hit、Hessian 行列式、view-frustum $\max \alpha \cdot \tau$……纯 pruning 一般能换 $5\text{-}10\times$。
① Pruning — "delete what doesn't matter." Most direct. Post-training you can usually drop 80-90% with no perceptible loss because adaptive density control overshoots. The interesting question is the scoring function: opacity, ray hits, Hessian log-determinant, $\max \alpha \cdot \tau$… Pure pruning typically buys $5\text{-}10\times$.
② Quantization — "fewer bits per number." Float32 is overkill for nearly every attribute. Three flavors: scalar / vector (VQ) / learned latent. Adds another $5\text{-}10\times$ on top of pruning.
③ SH attack — "the 75% sub-problem." SH dominates 3/4 of the file. Three sub-strategies: per-Gaussian adaptive degree, distillation to lower degree, full replacement with hash grid / SGs / factorized. Highest leverage attack in the family.
④ Anchors — "decode dense from sparse." Scaffold-GS's foundational move: sparse anchors + tiny MLP generate nearby Gaussians at render time. The most architectural reformulation in the literature; most SOTA papers build on the Scaffold-GS backbone.
⑤ Entropy coding — "spend bits proportional to surprise." Shannon's old idea, combined with learned priors (hash-grid hyperpriors, autoregressive context, closed-form Laplace for SH...). Combined with anchors, this is what gets you below 10 MB without quality loss.
⑥ Industry formats — "reuse what works." SOG/SOGS (image codecs), SPZ (quantize + gzip), glTF KHR_gaussian_splatting (Khronos 2026 standard). The bridge between research methods and real deployment.
Don't quantize position aggressively. 1mm offsets render very differently. All methods keep position at $\geq 16$ bits, or store it differently (anchor offsets, octree codes).
VQ + entropy coding $\gg$ VQ alone. Index distributions are never uniform; throw an arithmetic coder on top and shave another $1.5\text{-}2\times$.
Anchors and image-codec sort don't stack well — anchors aren't a regular grid. Pick one structural reformulation.
Diffusion restoration is a different regime. You trade "decoder is a giant generative model" for "splat file is tiny." Whether that wins depends entirely on deployment.
§7大小–质量 Pareto 一张图The size–PSNR Pareto in one chart
在切进每个 family 之前,先看全景:Mip-NeRF 360 上每个方法的位置。横轴是文件大小(log scale,越左越好),纵轴是 PSNR(越上越好)。悬浮看名字:
Before diving into individual families, the lay of the land. Each method on Mip-NeRF 360 — x is size (log scale, lower-left is better), y is PSNR (higher is better). Hover for names:
悬浮显示Hover for details
几个观察点:原版 3DGS 坐在 ~734 MB / 27.4 dB;HAC++、ContextGS、CodecGS 在 3-10 MB 还能匹配甚至超过 baseline——$70\text{-}100\times$ 已经"现役";SOG/SOGS 略大一点但是工业部署版;FCGS 一族在右边角,付出大小代价换"不用每场景重训"。A few observations: vanilla 3DGS sits at ~734 MB / 27.4 dB; HAC++, ContextGS, CodecGS hover at 3–10 MB while matching or exceeding the baseline — $70\text{--}100\times$ already in production; SOG/SOGS is a notch larger but is what production runtimes actually ship; FCGS sits at the right edge of the entropy cluster, paying a size premium to skip per-scene training.
§Part 3剪枝(Pruning)—— 哪些 Gaussian 可以直接删Pruning — which Gaussians can we just delete?
Vanilla 3DGS grows the Gaussian population during training: anywhere the loss has a high gradient, the densifier clones or splits. Intentionally aggressive — better too many than too few, hoping the opacity prune kills the freeloaders. Except it doesn't, really. The final 3M-Gaussian count includes a long tail of dim, tiny, or occluded ones that contribute nothing.
The compression question reduces to designing a scoring function.
不同打分函数下扔掉了什么What different scoring functions throw away
"随机"几乎立刻就把轮廓打废了,opacity 单独则会先丢高频。真正好的打分函数试图在不算 Hessian 的前提下逼近 Hessian。Random destroys the silhouette before anything else; opacity-only gives up high-frequency detail first. The art is approximating the loss-Hessian without computing it.
原版 .ply 太大;既要剪、又要把剩下的 SH 颜色压扁。The raw .ply is too big; we want to both prune and shrink the SH on what's left.
关键想法 三阶段流水线:
(1) global significance 打分剪枝——按 opacity $\times$ hit count $\times$ volume 给每个 Gaussian 一个分数,砍掉低分;
(2) SH 蒸馏——以全 SH 模型为 teacher,用 pseudo-view 训一个 degree 3→2 的 student;
(3) VecTree 量化——Morton-order octree on positions、K=8192 codebook on lowest-significance SH、float16 on geometry。
体积归一化(除以 90 分位数)是关键创新:没有它,背景那种"覆盖很多 ray 但没编码任何细节"的大 blob 会主导分数。
Key idea A three-stage pipeline:
(1) global-significance pruning by opacity $\times$ hit count $\times$ volume;
(2) SH distillation—a full-SH teacher supervises a degree 3→2 student on pseudo-views;
(3) VecTree quantization—Morton-order octree on positions, K=8192 codebook on lowest-significance SH, float16 on geometry.
The volume normalization (by the 90th-percentile Gaussian volume) is the load-bearing innovation: without it, the score is dominated by background blobs that cover many rays but encode no detail.
很多 Gaussian 长得太大,在高频区域抹成一片,但 density control 已经罢工了。Many Gaussians are oversized blurry blobs that smear high-frequency regions; density control has given up.
Key idea Train a Zip-NeRF first as both teacher and initializer; then fit Gaussians to the teacher's clean renders; finally prune by max-contribution.
Max — not sum — is essential: a Gaussian seen only from one side shouldn't be penalized; that side might be the whole reason it exists. Score $h(p_i) = \max_r \alpha_i^r \tau_i^r$ over all training rays.
Mip-NeRF 360 PSNR 28.14 (vs 3DGS 27.20)
3.16M → 0.37M Gaussians (lightweight)
907 FPS
唯一一个在大幅剪枝的同时提升了 PSNR 的方法。"NeRF 当 teacher"自此变成标准技巧。The only method that improves PSNR while pruning aggressively. "NeRF as teacher" became a standard pattern.
Trimming the Fat— Efficient Compression of 3D Gaussian Splats
Key idea Dual signal: keep Gaussian iff $|\alpha|$ AND $|\nabla\alpha|$ both exceed the $\gamma$-quantile. Low $\alpha$ = invisible; low gradient = loss no longer cares. Both low $\Rightarrow$ truly useless. Iterative prune + brief fine-tune.
734 → 119 MB at $\gamma=0.5$ ($\sim\!6\times$)
激进模式aggressive $\sim\!50\times$
600 FPS
几乎免费的剪枝(两个信号都本来就有)。文献里最便宜的默认 baseline。Nearly free pruning — both signals already exist. The cheapest baseline in the literature.
opacity heuristic 不知道"这个 Gaussian 对 loss surface 究竟多重要"。Opacity heuristics don't actually tell you how much a Gaussian matters to the loss landscape.
关键想法 直接计算每个 Gaussian 关于空间参数的 L2 loss 的 Hessian 的 log-determinant。高分意味着 loss 在那个 Gaussian 周围"很陡"——它确实重要。$U_i = \log \det\!\left(\nabla_{x,s} I_G \cdot \nabla_{x,s} I_G^\top\right)$。在收敛模型上这等价于 Fisher information 限制到空间参数。一次性剪 90%,再短期 fine-tune。
Key idea Compute the log-determinant of the Hessian of the L2 loss w.r.t. each Gaussian's spatial parameters. High score = loss is sharp around this Gaussian — it really matters. $U_i = \log\det(\nabla_{x,s}I_G \cdot \nabla_{x,s}I_G^\top)$. At a converged model this equals Fisher information restricted to spatial params. One-shot 90% prune + brief fine-tune.
理论上最干净的打分——直接来自二阶优化。代价是 per-Gaussian Hessian 计算比 opacity 贵。The most theoretically grounded score in the family — straight from second-order optimization. Trade-off: per-Gaussian Hessian is more expensive than opacity heuristics.
Taming 3DGS— High-Quality Radiance Fields with Limited Resources
Key idea Budget-aware density control during training. Composite score combines positional gradient$\times 50$ + opacity$\times 100$ + blending weights$\times 50$ + distance-to-center$\times 50$ + view saliency$\times 10$ + scale$\times 25$ + coverage$\times 0.1$ + depth$\times 5$. Each round only the top-scoring Gaussians clone/split until the user budget is hit. Population never explodes.
0.63M Gaussians vs 3.31M baseline ($\sim\!5\times$)
PSNR 27.31 vs 27.46
训练时间train 11 min vs 43 min
"预防性医疗"而非"事后开刀"。教学上很重要:压缩可以从训练时就开始。Preventive medicine instead of post-hoc surgery. Pedagogically: compression can begin at training time, not just at the end.
GaussianSpa— Sparse 3D Gaussian Splatting via Optimization-based Pruning
Key idea Frame Gaussian-count reduction as a sparse optimization problem with an explicit $\ell_0$-like penalty on Gaussian "existence." ADMM-style alternating optimization. $10\text{--}15\times$ compression, no quality loss.
Mip-NeRF 360 ~17 MB / 27.4 dB
把经典稀疏优化机器(LASSO、ADMM)搬到 3DGS。"压缩 $\equiv \ell_0$ 正则化"是个统一框架。Brings classical sparse-optimization machinery (LASSO, ADMM) to 3DGS. "Compression as $\ell_0$ regularization" is a unifying view.
实战配方Practical recipe
# 在常规 3DGS 训练之后 / after vanilla 3DGS training:
score = compute_score(G, training_rays) # max α·τ (RadSplat) 或 PUP Hessian
keep_mask = score >= percentile(score, 85)
G = G[keep_mask]
finetune(G, photos, iters=5000) # 让幸存者来补偿 / let survivors compensate
5k 步 fine-tune 不能省——即使 Hessian-based 完美剪枝,幸存者也能在 10 分钟内吸收掉残余误差。The 5k-step fine-tune is essential — even a perfect Hessian-based prune leaves a few percent on the table that survivors absorb in <10 min.
Diminishing returns past $\sim\!10\times$. Each Gaussian still costs 236 bytes; cut $10\times$, save $10\times$. To go further you must attack the per-Gaussian bit budget — quantization, SH compression, or anchors. Continue below.
§Part 4量化(Quantization)—— 每个数少花几个比特Quantization — fewer bits per number
把连续(或高比特)值映射到更小的离散字母表,就是量化。3DGS 上有四种风味:
Quantization maps a continuous (or high-bit) value to a smaller discrete alphabet. For 3DGS, four flavors dominate:
类型Type
做法How
用在哪Used for
Scalar (SQ)
每个 float 独立量化到 N bitRound each float independently to N bits
opacity, log-scale, 四元数分量quaternion components
Vector (VQ)
K-means 聚类后用 index 替代K-means clustering, replace with index
SH coefficients, 完整协方差full covariance
Residual VQ (RVQ)
codebook 级联,每级编码上一级残差Cascade of codebooks; each stage codes the residual
Compact-3DGS (Lee) 的几何geometry
Latent-space
学一个 encoder/decoder,量化中间 latentLearn an encoder/decoder, quantize the latent
⚠ 命名混乱:这篇不是 Lee 等人的 Compact-3DGS,也不是 Liu 等人的 CompGS。⚠ Naming mess: this is not the Lee Compact-3DGS, nor the Liu CompGS.
关键想法 在 SH 系数和协方差上做 K-means 矢量量化(QAT,训练后期重跑 K-means)。两个 codebook:4096 codes for color (SH), 16384 codes for covariance (scale+rotation)。位置和 opacity 不 量化——太敏感。Index 按 Morton 顺序排序后 RLE。
相邻 Gaussian 落在同一 cluster 的概率高,于是 RLE 自然有长 run。
Key idea K-means VQ on SH coefficients and covariance, jointly with training (re-run K-means during the last 10K iters). Two codebooks: 4096 codes for color (SH), 16384 codes for covariance (scale+rotation). Position and opacity are not quantized — too sensitive. Indices Morton-sorted then RLE'd.
Adjacent Gaussians fall into the same cluster, so RLE gets long runs for free.
Mip-NeRF 360 778 → 19 MB ($\sim\!41\times$)
PSNR 27.42 → 27.12
$2.5\times$ faster rendering
Compact-3DGS (Lee)— Compact 3D Gaussian Representation for Radiance Field
Key idea Drop SH entirely. View-dependent color comes from a shared Instant-NGP-style hash grid queried at the Gaussian's position + view direction through a tiny MLP. Scale & rotation get Residual VQ; opacity gets 8-bit scalar; hash-grid params get 8-bit + Huffman.
RVQ trick: a 256-code book captures coarse geometry, a second 256-code book captures the residual… 4 stages = 32 bits but expressive equivalent to a $256^4 \approx 4$-billion-entry single codebook.
$\gt 25\times$ compression on Mip-NeRF 360
~28–48 MB 取决于配置depending on settings
文献里结构最激进的量化论文。"用 hash grid 共享 appearance"这条思路改变了之后所有人怎么看 SH。The most structurally aggressive quantization paper. The "shared hash grid for appearance" thread changed how everyone thinks about SH.
EAGLES— Efficient Accelerated 3D Gaussians with Lightweight Encodings
Key idea Don't quantize raw attributes — quantize a learned latent, decode with a small MLP. SH color → 16-D latent; rotation → 8-D; opacity → 1-D. Position, scale, SH-DC stay full precision (too sensitive). Round latents in the forward pass; use straight-through estimator for backprop.
Analogy: VQ is a dictionary lookup; EAGLES is an autoencoder — trade a small MLP for low-bit integer latents that still cover the full attribute manifold.
Key idea Standard K-means weights all dimensions equally. But a slot whose change visibly distorts rendering should be closer to its codebook entry. Compute per-parameter sensitivity $= \partial\text{image}/\partial\text{param}$, weight K-means by it. Then quantization-aware fine-tune.
最高Up to $31\times$ 压缩compression (avg $26\times$)
Bicycle: 1.5 GB → 47 MB
Truck: 600 MB → 21 MB
$\sim\!4\times$ faster rendering
敏感度加权 = importance sampling 的自然推广。RadSplat 加权 Gaussian,这篇加权属性槽位。Sensitivity-weighting = natural generalization of importance sampling. RadSplat weights Gaussians, this weights attribute slots.
MesonGS— Post-training Compression of 3D Gaussians via Eulerian + RAHT
Key idea Borrow tools from MPEG point-cloud compression (G-PCC): RAHT wavelet transform, quaternion→Euler (3 numbers instead of 4), block scalar quantization + brief fine-tune.
RAHT (Region Adaptive Hierarchical Transform) is the layered wavelet from PCC standards — concentrates spatially-correlated attributes into DC + low-entropy AC residuals, like JPEG's DCT but on irregular point sets.
Mip-NeRF 360: 641.7 → 27.6 MB ($23.2\times$)
PSNR 28.98 → 28.61 ($-0.37$)
T&T: 421.9 → 17.0 MB, PSNR drop $-0.04$
把 3DGS 接上成熟的点云压缩世界——跨领域借力的好例子。Bridges 3DGS to the mature point-cloud compression world — a clean cross-pollination moment.
Key idea Don't use straight-through estimator. Instead, inject matched noise as a surrogate for the quantization step during training. Gradients flow naturally; at inference, the same codebook quantizes for real.
A small but important trick: classical VQ training has a chicken-and-egg problem (gradients can't flow through arg-min); STE is biased; matched-noise substitution is unbiased and dramatically better in practice.
最高Up to $45\times$ model-size reduction
Reduced-3DGS— Reducing the Memory Footprint of 3D Gaussian Splatting
Key idea Each Gaussian gets its own SH degree from {0,1,2,3}. Matte Gaussians on a wall only need degree 0 (one RGB triple); shiny ones keep degree 3. Then codebook-quantize the remainder.
$\sim\!27\times$ 总压缩total compression
PSNR drop: 0.21 dB
"不是所有像素都该分到相同比特"的最干净教学例子。后面 anchor 和 SH 章里都会再见。Cleanest example of "not all pixels deserve the same bitrate." You'll see this idea again in the Anchor and SH chapters.
Key idea Given a freshly-trained 3DGS file, compress it in seconds, no fine-tuning. Attribute-discriminative pruning + INT8/INT4 channel-wise mixed-precision quantization + online adaptation. Mobile-deployable.
最高Up to 96.4% 压缩reduction
<1 dB PSNR drop
文献里最快的"实战编码器"——秒级出可用文件。Capture-then-share 场景的关键。The fastest practical compressor — usable file in seconds. Critical for capture-then-share workflows.
量化章的几个教训Lessons from the quant chapter
position 是圣域——上面每篇都把 position $\geq 16$ bit 或换成 anchor offset / octree code。
VQ + QAT 几乎总更好——在训练最后 10-20% 重跑 K-means,让网络补偿误差。
给 index 套上熵编码是免费午餐——VQ index 从来不均匀。再省 $1.5\text{-}2\times$。
SH 是大头。本章里大部分增益都来自把 SH 压得更聪明。下一章专门讲。
Position is sacred — every paper keeps it $\geq 16$ bits or replaces with anchor offset / octree code.
VQ + QAT is almost always worth it — re-running K-means in the last 10-20% of training lets the network compensate.
Entropy-coding indices is free lunch — VQ indices are never uniform. Another $1.5\text{-}2\times$ for free.
SH dominates everything. Most gains in this chapter come from smarter SH compression. The SH chapter is next.
§Part 5球谐压缩 —— 75% 子问题Spherical Harmonics — the 75% sub-problem
Recall: $48 \text{ SH coefficients per Gaussian} \times 3\text{M Gaussians} = $~570 MB — ~75% of the file. Anything you do to SH has $4\text{-}5\times$ the leverage of equivalent work elsewhere.
Interestingly, most coefficients are near zero. Most surfaces are nearly diffuse — degree 0 is enough. Only specular highlights and view-dependent reflections actually need the high bands.
高 band SH 系数严重集中在 0 附近——天然适合熵编码或激进剪枝。Higher-band coefficients are peaked near zero — perfect for entropy coding or aggressive pruning.
Covered in §Part 3. SH distillation: take the full-SH model as teacher, train a low-degree student on pseudo-view-jittered renderings. The student learns to fake the high-band specular look with limited coefficients.
Same trick as classical Hinton-style knowledge distillation, ported to view-dependent appearance.
Covered in §Part 4. Drop SH entirely; view-dependent color comes from a shared hash grid + tiny MLP. Per-Gaussian SH cost: 0 floats. Shared: ~30 MB hash grid + ~100 KB MLP, amortized over millions of Gaussians.
Effectively = NeRF appearance + 3DGS geometry. The cleanest fusion of the two worlds.
严格说这不是 SH 压缩,是"primitive design 改了之后 SH 总量随之减少"。常和 SH 方法一起讨论但更属于 compaction 维度。
Key idea Swap the primitive: $\exp(-x^2)$ → $\exp(-|x|^\beta)$ with learnable $\beta$. Sharper edges need fewer primitives → indirectly saves SH.
Strictly speaking this isn't SH compression — it's "primitive redesign that shrinks the SH footprint as a side effect." Often filed with SH methods but really belongs to the compaction axis.
Key idea Treat the giant $(N_\text{gaussians} \times N_\text{attr})$ matrix as low-rank. CP decomposition (rank $\sim\!16$) reconstructs each Gaussian's attributes from a few axis-wise factor vectors.
The TensoRF lineage: a 3D tensor factorizes as a sum of outer products of 1D vectors. For 3DGS, the tensor is "Gaussian $\times$ attribute"; rank is much smaller than either dimension.
Key idea Replace 48 SH coefficients with ~10 Spherical Gaussian lobes. An SG is $\exp(\lambda(\mathbf{d}\cdot\boldsymbol\mu - 1))$ — a directional Gaussian on the sphere centered at $\boldsymbol\mu$ with sharpness $\lambda$.
SH is a global basis (each band has support over the whole sphere); SGs are local (each lobe only lights up near its center). For typical appearance (few highlights, mostly diffuse), SGs are a much more compact representation.
2000s graphics already used SGs for precomputed radiance transfer (Tsai & Shih 2006); this is the rediscovery in the 3DGS era.
Key idea Plot the SH AC coefficient histogram. It's almost exactly Laplace. So just fit a Laplace, arithmetic-code under it. No context model, no MLP — just a closed-form distribution.
Other attributes (rotation, scale, opacity) use Gaussian mixtures. Per-Gaussian distribution parameters drive the arithmetic coder.
$\sim\!30\times$ rate reduction
$10\times$ model size
解码快fast decode (无神经网no neural net)
教学宝藏:100 年前的统计学还在管用,不是所有事都要神经网络。A pedagogical gem: 100-year-old statistics still works. Sometimes you don't need a neural network.
策略Strategy
机制Mechanism
代表Representative
杠杆Leverage
degree 自适应Degree adapt
每 Gaussian degree 不同per-Gaussian degree
Reduced-3DGS
$\sim\!3\times$
蒸馏Distillation
teacher–student
LightGaussian
$\sim\!2\times$
换成 fieldReplace w/ field
共享 hash gridshared hash grid
Compact-3DGS (Lee)
$\sim\!5\times$
换基Change basis
SG lobes / CP 分解SG lobes / CP factor
SG-Splatting, F-3DGS
$\sim\!3\text{--}8\times$
熵编码Entropy
Laplace / GMM
EntropyGS
$\sim\!3\text{--}4\times$
§Part 6锚点法 —— 存稀疏,渲染时生成密集Anchor-based — Store sparse, decode dense
Walk through a 3DGS scene and one fact is obvious: neighboring Gaussians have nearly identical attributes. A coffee tabletop is covered by hundreds of Gaussians all saying the same thing: "I'm brown, flat, matte." Storing 59 floats per Gaussian for every one of them is staggeringly wasteful.
What if you stored attributes at sparse anchor points (voxel grid) and let a tiny MLP regenerate the local Gaussians on demand? That's the Scaffold-GS bet — and it changed the field.
Scaffold-GS— Structured 3D Gaussians for View-Adaptive Rendering
CVPR 2024 Highlight
Lu, Yu et al. · CUHK / Shanghai AI Lab
· arXiv:2312.00109
· code
Key idea Don't store each Gaussian. Scatter sparse anchors on a voxel grid; each anchor carries a feature vector (e.g. 32-D) and $k$ learnable offset vectors (e.g. $k=10$). At render time, an MLP takes (anchor feature, view direction, view distance) → predicts opacity, color (no SH — direct RGB), scale, rotation of $k$ neural Gaussians at anchor_pos + offset_i. Anchors grow/prune dynamically during training.
Stored on disk: anchor positions + features + scaling + $k$ offsets + MLP weights (~few hundred KB).
Regenerated at render time: every Gaussian's attributes.
View-dependent trick: the MLP takes view direction as input — no SH needed. That's why Scaffold-GS often improves PSNR: the MLP expresses smoother view-dependence than degree-3 SH.
Mip-NeRF 360: 734 → 156 MB ($\sim\!5\times$)
PSNR: 27.4 → 27.50(更高!)(improved!)
Deep Blending: 676 → 66 MB
原版 3DGS 之后最重要的一篇。不是压缩论文,是重新定义了"3DGS 场景是什么"。之后几乎每篇 SOTA(HAC, HAC++, ContextGS, CompGS, GaussianForest)都在 Scaffold-GS backbone 上。The most important paper after vanilla 3DGS. Not really a compression paper — a reformulation of what a 3DGS scene is. Almost every SOTA paper (HAC, HAC++, ContextGS, CompGS, GaussianForest) builds on the Scaffold-GS backbone.
Key idea Put Scaffold-GS anchors in an octree; choose fractional LOD per pixel based on camera footprint. Distant pixels decode only coarse levels. Cumulative LOD across octree levels; a learnable per-anchor LOD bias supplements high-freq regions.
Mip-NeRF 360 PSNR 28.05
Deep Blending: 30.49 / 112 MB
anchor compression 与"超大场景可扩展渲染"的桥。size 与 Scaffold-GS 接近,主要赢在不同视距下的渲染一致性。The bridge between anchor compression and "scalable rendering for huge scenes." Size comparable to Scaffold-GS; the win is rendering consistency across view distances.
Key idea Organize Gaussians into trees: leaves store rapidly-varying attributes (position, opacity) explicitly; internal nodes hold shared features that an MLP decodes into smoothly-varying attributes (rotation, scale, color) for many leaves.
The clean pedagogical statement: store what varies a lot, share what varies smoothly.
Key idea Replace per-Gaussian attributes with a multi-level tri-plane (three 2D feature grids); a tiny MLP decodes each Gaussian by position lookup. The tri-plane is a 2D smooth field, designed to be compressible by off-the-shelf image codecs. Positions are stored separately for lossless coding.
"anchor → tiny MLP → Gaussian" 模板的灵活性体现——anchor 可以是点、voxel、hash 桶、tri-plane 采样点、tree 节点。Shows how flexible the "anchors → tiny MLP → Gaussians" template is — the "anchor" can be a point, voxel, hash bucket, tri-plane sample, or tree node.
CompGS (Liu)— Compressed Gaussian Splatting via Hybrid Primitives
ACM MM 2024
Liu et al. · arXiv:2404.09458 · ⚠ 与 Compact3D (Navaneet)、Compact-3DGS (Lee) 都是不同的工作。 · ⚠ Different from Compact3D (Navaneet) and Compact-3DGS (Lee).
关键想法 两类 primitive:少量 anchor primitive 持完整几何,大量 coupled primitive 只存从 anchor 预测出的小残差。Rate-distortion loss $\lambda R + D$ + hyperprior Gaussian entropy model 驱动量化。
Key idea Two kinds of primitives: a few anchor primitives with full geometry + many coupled primitives that store only tiny residuals predicted from the anchor. Rate-distortion loss $\lambda R + D$ + hyperprior Gaussian entropy models drive quantization.
Exact video codec analogy: anchors = I-frames, coupled = P-frames, only delta stored.
压缩比Compression: $45\text{--}175\times$
Mip-NeRF 360 ~17 MB / 27.26 PSNR
Playroom: 550 → 5 MB
Smol-GS— Compact Splat via Octree Positional Encoding
Key idea Compact representation around octree positional encoding + learned per-splat features. Recursive voxel hierarchy for coordinates; entropy-based feature compression. Mip-NeRF 360: 4.87 MB / 27.61 PSNR — among the smallest reported in late 2025.
为什么 anchor 这么有效Why anchors work so well
局部冗余是主要冗余源。Anchor 把它显式化:一个 feature 服务 $k=10$ 个邻居。
MLP 比 SH 更紧凑地编码视角依赖。一个共享 MLP(吃 view direction)比每 Gaussian $48 \text{ SH coefficients} \times 3\text{M Gaussian}$ 高效得多。
Local redundancy is the dominant redundancy. Anchors make this explicit — one feature serves $k=10$ neighbors.
The MLP encodes view-dependence more compactly than SH. One shared MLP (taking view direction as input) beats $48 \text{ SH coefficients} \times 3\text{M Gaussians}$.
Anchors are great entropy-coding targets. Sitting on a structured grid (voxel/octree), they enable spatial context models — predict each anchor from neighbors. This is what HAC, HAC++, ContextGS exploit in the next chapter to push compression past $100\times$.
§Part 7熵编码 —— SOTA 所在Entropy coding — where the SOTA lives
If quantization is "round each number," entropy coding is "spend bits proportional to surprise." Combined with anchors, this is what pushes 3DGS past $100\times$ without quality loss.
To transmit a long string where 'a' occurs 90% and 'b' 10%, you wouldn't waste a byte per letter. You'd use ~1 bit per letter because only the rare 'b' is surprising. Shannon's theorem says the minimum bits per symbol equals its negative log probability:
Key idea Take Scaffold-GS anchors. Jointly train a multiresolution binary hash grid. Each anchor queries the hash grid at its position → feature; a tiny MLP_c outputs $(\mu, \sigma)$ of a Gaussian distribution for each attribute; arithmetic-code the anchor under that predicted distribution.
This is the learned image compression framework (Ballé 2017/2018) ported to 3DGS — hash grid = hyperprior, MLP translates hyperprior into per-anchor distribution. The entropy is explicitly minimized as a training loss ($\lambda R + D$).
"Binary" = hash grid values quantized to $\{-1, +1\}$ — the hyperprior itself is essentially free.
Mip-NeRF 360: 15.3 / 21.9 MB (low/high)
PSNR: 27.53 / 27.77
Deep Blending: 4.4 MB / 29.98 PSNR
$75\times$ over 3DGS, $11\times$ over Scaffold-GS, 无 PSNR 下降。后续所有 entropy-coding 工作的蓝图。$75\times$ over 3DGS, $11\times$ over Scaffold-GS, no PSNR drop. The blueprint for everything that followed.
intra-anchor 部分填上了 HAC 的漏洞:一个 anchor 衍生的 $k=10$ 个 Gaussian 显然相关(共享同一 parent feature),但 HAC 没用上。HAC++ 在 hash-grid prior 上叠加 sibling 间的自回归。
Key idea On top of HAC: (1) intra-anchor context — the $k$ sibling Gaussians of one anchor predict each other; (2) per-attribute adaptive quantization; (3) a learnable mask drops useless Gaussians during training.
Intra-anchor fills HAC's gap: the $k=10$ Gaussians from one anchor are obviously correlated (shared parent feature), but HAC didn't exploit it. HAC++ adds autoregressive coupling among siblings on top of the hash-grid prior.
Mip-NeRF 360: 8.7 MB / 27.60 PSNR
T&T: 5.4 MB / 24.22
Deep Blending: 3.1 MB / 30.16 PSNR
$\gt 100\times$ over 3DGS, $\gt 20\times$ over Scaffold-GS
在 Deep Blending 上首次跌破 1 MB/scene 同时提升 PSNR——"高质量区间的压缩问题基本被解决"的最干净证明。Broke 1 MB per scene on Deep Blending while improving PSNR — the cleanest "compression is solved in the high-quality regime" demonstration.
ContextGS— Compact 3DGS with Anchor-Level Context Model
Key idea Where HAC predicts each anchor from a hash-grid feature, ContextGS predicts each anchor from its already-coded neighbors. Like PixelCNN for 3D anchors.
Split anchors into hierarchical levels (coarse → medium → fine). Coarsest first, coded under a small hyperprior. Once decoded, those anchors serve as context for the next level's distribution, arithmetic-coded. And so on. Directly exploits inter-anchor spatial redundancy — which HAC exploits only indirectly via the hash grid.
Mip-NeRF 360 low: 13.3 MB / 27.62 PSNR
Deep Blending: ~6 MB / 30.09
"3DGS 像图像编码"最干净的写法——经典 autoregressive context 在 3D 稀疏结构上的复刻。The cleanest "3DGS as image codec" formulation — the classical autoregressive context model applied to a sparse 3D structure.
HEVC 有 intra prediction、变换编码、CABAC、码率控制——为什么要重造?把 feature plane 训练成 HEVC-friendly(频率域 entropy model 对齐 HEVC 实际行为)就行。
Key idea Lay 3DGS attributes on 2D feature planes; run HEVC — the same codec that streams Netflix. Get 25 years of video-codec engineering for free, with hardware decoders included.
HEVC has intra prediction, transform coding, CABAC, rate control — why reinvent? Train the feature planes to be HEVC-friendly (frequency-domain entropy model aligned with what HEVC will do).
Mip-NeRF 360: 10.3 MB / 27.30 PSNR
Tanks & Temples: 7.8 MB / 23.63
证明压缩可以彻底脱离定制学习编码器。手机里 2014 年后的 HEVC 硬解都现成。Demonstrates compression can be fully decoupled from a custom learned codec. Hardware HEVC decoders exist in every phone from 2014+.
Key idea Every method above re-trains a neural codec on each new scene (minutes per scene). FCGS doesn't: a single pretrained network compresses any 3DGS scene in one forward pass — ~1 second per 100K Gaussians.
Entropy module is a 3-component Gaussian Mixture conditioned on: (1) hyperprior $h$ (coarse scene-wide latent), (2) inter-Gaussian context $s$ (grid-interpolated from already-decoded Gaussians), (3) intra-Gaussian context $c$ (within-Gaussian channel chunks). Multi-path entropy module routes each attribute to a different rate-constraint path.
Mip-NeRF 360 low: 36.3 MB / 27.05
$\gt 20\times$ 压缩 / 秒级compression / seconds
无需逐场景训练no per-scene training
"压缩 = 推理"的拐点。比 HAC++ 大 $\sim\!5\times$,但快 $100\times$。和学到的图像编码器从逐图训练走向 amortized 的轨迹一模一样。The amortization turning point. $\sim\!5\times$ larger than HAC++ but $\sim\!100\times$ faster. Mirrors the trajectory of learned image codecs from per-image training to amortized.
Key idea One bitstream, multiple decode quality levels. Clients can stop early for a decent preview; reading more refines. Progressive JPEG for 3DGS.
Progressive anchor masking (add anchors as you decode more) + progressive quantization (step size shrinks as more bits arrive). Single training, multiple bitrates extractable at decode time.
3DGS 第一个真正的渐进式码流——流媒体和带宽自适应 AR/VR 的天然继承者。The first true progressive 3DGS — natural successor to HAC++ for streaming and bandwidth-adaptive AR/VR.
关键想法 沿 Morton (Z-order) 曲线排序 Gaussian——比特流里相邻的就是 3D 里相邻的。然后用 neural field + 自适应 SH 带宽利用产生的相干性。
Key idea Sort Gaussians along a Morton (Z-order) curve: nearby in the bitstream $\Leftrightarrow$ nearby in 3D. Then exploit the resulting coherence with a neural field + adaptive SH bandwidth.
$54.6\times \text{--} 96.6\times$ compression
$2.1\text{--}2.4\times$ rendering speed-up
Morton 排序是 1960 年代图形学技巧(用于纹理 cache),加上现代熵编码就是 $100\times$ 杠杆。Morton sorting is a 1960s computer-graphics trick (texture caching). Combined with modern entropy coding it becomes a $100\times$ lever.
需求Need
用什么Use
绝对最小文件,可每场景训练Absolute smallest file, can train per scene
HAC++或orContextGS
小且能用硬件解码器Small + hardware decoder
CodecGS (HEVC) 或orSOG/SOGS
秒级压缩,不分钟Compress in seconds, not minutes
FCGS前馈(feed-forward)
流媒体自适应码率Streaming adaptive bitrate
PCGS
代码极简,解码极快Minimal code, fast decode
EntropyGS参数化parametric
§Part 8工业格式 —— 你手机真正收到的是什么Industry formats — what your phone actually receives
HAC++ 给你 3 MB 文件,然后呢?真正部署还需要:
跨平台二进制,不假设 CUDA;
解码器能在 WebGL/WebGPU/Metal/Vulkan 上毫秒级跑;
容器能嵌入现有 3D pipeline (glTF, USD);
理想情况是 标准,所有引擎都能读。
这些都不是学术论文交付的东西。2024–2026 工业格式逐渐定型。
HAC++ gives you a 3 MB file. Now what? Real deployment needs:
a cross-platform binary that doesn't assume CUDA;
a decoder that runs in WebGL/WebGPU/Metal/Vulkan in milliseconds;
a container that slots into 3D pipelines (glTF, USD);
ideally, a standard all engines agree on.
None of which is what an academic paper ships. 2024–2026 has crystallized a small set of contenders.
Key idea Sort $N$ Gaussians onto a $\sqrt N \times \sqrt N$ 2D grid so that grid-neighbors have similar attributes. Each attribute (position-x, scale-y, SH-coeff-k…) becomes a smooth 2D image. Save with PNG/JPEG-XL/WebP/AVIF — let the image codec do the entropy coding.
Sorting algorithm: PLAS (Parallel Linear Assignment Sorting) — a custom GPU algorithm that assigns $N$ high-D vectors to a 2D grid in seconds, optimizing local smoothness.
The most pedagogically elegant idea in the whole field. Make the unstructured problem structured; existing tools solve it.
Key idea SOG productionized. Each attribute is a WebP texture bundled in an archive. Decoded natively by the browser; loaded straight to the GPU in Morton order.
SOG v2 (late 2025) adds: Morton ordering for GPU-friendly loading, WebGPU-only encoder (no CUDA needed to write .sog), single self-contained archive.
"无 ML、就工程做到位"的范例。Scaniverse 设备端捕获采用,glTF KHR_gaussian_splatting_compression_spz 的官方载荷。The "boring baseline that actually works." Used by Scaniverse on-device. Now the official compression payload of Khronos KHR_gaussian_splatting_compression_spz.
Key idea A "trimmed PLY" binary: 8-bit SH, struct layout matching the Three.js renderer's internal Gaussian. Load bytes, point a buffer view, render — zero transformation.
Multiple compression levels; the most aggressive 8-bit-quantizes SH. Emphasizes decode speed over absolute size.
"磁盘上小"和"GPU 上快"是两件事。KSPLAT 比 SOGS 大,但加载更快。"Small on disk" $\ne$ "fast on GPU." KSPLAT is bigger than SOGS but loads faster.
glTF KHR_gaussian_splatting— Khronos Industry Standard
Standard · Release candidate Feb 2026
· press release
关键想法 第一个跨厂商 3DGS in glTF 标准。SPZ 是官方载荷;设计与算法无关,未来 codec 可插拔。
Niantic、Cesium/Bentley、Esri、OGC、Khronos 联合背书。类似于 JPEG 被写进 web 标准的时刻。
Key idea The first cross-vendor 3DGS-in-glTF standard. SPZ is the official compression payload; design is algorithm-agnostic so future codecs can drop in.
Two extensions:
KHR_gaussian_splatting defines the uncompressed structure: position, rotation, scale, opacity, SH split into diffuse (deg 0) + specular (deg 1-3).
KHR_gaussian_splatting_compression_spz wraps SPZ-compressed blobs as glTF buffers.
Backed by Niantic, Cesium/Bentley, Esri, OGC, and Khronos. The analog of the moment JPEG got written into a web spec.
MPEG-GSC— Gaussian Splatting Coding (ISO Future Standard)
Key idea The same standards body that gave us MPEG-2 and HEVC now treats 3DGS as a first-class media type. Goal: interoperable ISO codec, with HEVC/VVC as starting points. Reference standard expected 2027-2028.
glTF + SPZ 解决今天的 asset distribution;MPEG-GSC 将解决广播/流媒体的 codec 互操作。glTF + SPZ solves asset distribution today; MPEG-GSC will solve codec interop for broadcast/streaming.
需求Need
格式Format
最小、浏览器直送、可逐场景编Smallest, browser-delivery, per-scene OK
Until late 2024, every 3DGS compressor required per-scene training. FCGS broke that. Going forward, expect dominant codecs to be single pretrained networks that compress any scene in one forward pass.
FCGS (ICLR 2025, arXiv 2410.08017) — first feed-forward 3DGS compressor. $\sim\!20\times$ compression in seconds.
FCGS+ / Long-Context FCGS (arXiv 2512.00877, 2025) — Morton serialization to build thousands-of-Gaussian context windows. SOTA among generalizable codecs.
D-FCGS (arXiv 2507.05859, 2025) — feedforward dynamic-3DGS for free-viewpoint video.
Why it matters: per-scene training is the bottleneck for capture-to-share workflows. Seconds, not 10 minutes, qualitatively changes UX.
Key idea Push to $1000\times$ by destroying most detail, then resurrecting with an artifact-aware one-step diffusion decoder. The line between compression and synthesis blurs.
方向 3:4D 动态压缩Direction 3: 4D / dynamic compression
4DGS / 3DGStream 让动态场景成为可能后,自然问题:怎么压缩"splat 的视频"?标准视频编码器对像素的解法,splat 版正在 active research。
Once 4DGS / 3DGStream made dynamic scenes possible, the natural question: how to compress a video of splats? Video codecs do this for pixels; the splat analog is active research.
4DGC (arXiv 2503.18421) — rate-aware streamable, $\sim\!16\times$ smaller than 3DGStream
Conceptual pattern across all of them: canonical-space + deformation. Store the scene once at $t=0$, then store low-bandwidth deformation fields (or anchor motion) per frame. Same as the MPEG I-frame + P-frame story.
方向 4:移动端 / on-deviceDirection 4: Mobile / on-device
当前大部分 SOTA 假设 24 GB 数据中心 GPU。这条线是要让 3DGS 在手机(带电池)上跑好:
3DGauCIM (arXiv 2507.19133) — digital compute-in-memory 加速器
The field is starting to grow up. 3DGS.zip (Bagdasarian et al., 2025) is a live leaderboard pinning down cross-paper comparisons. Two toolkits are converging on a standard harness:
Together they suggest where 2027–2028 lands: a feed-forward compressor emitting a Khronos-standard glTF asset, with a generative-prior fallback decoder for ultra-low-bandwidth delivery.
All interactive demos from this survey, collected here for convenience. Each is already embedded in its chapter. All pure Canvas2D, zero dependencies — gauss-engine.js $\approx 280$ lines, easy to fork.
3DGS compression moves fast — ~30 new arXiv papers in Jan-May 2026 alone. This survey was compiled in May 2026. Numbers come from each paper's reported tables on Mip-NeRF 360, Tanks & Temples, Deep Blending; some are unverified or paraphrased. For definitive figures, go to the original paper or the 3DGS.zip leaderboard.
If you spot an error, an outdated claim, or a missing paper — the field will have moved on by the time you read this. Treat the survey as a starting point.