§ 0 · The problem§ 0 · 问题在哪
Three artifacts, one root cause 三种瑕疵,同一个病根
3DGS trains on photos at one fixed resolution. Reproduce the training views and it looks photographic. Move the camera even slightly, or change the rendering resolution, and you can see three distinct kinds of damage. Each has a name in the literature, and most people lump them together as "3DGS popping," but they're not the same bug.
3DGS 是在固定分辨率的照片上训练的。回放训练视角时,画面跟照片一样真。哪怕镜头稍微一动,或者换个渲染分辨率,就能看出三种不同的损伤。文献里每种都有自己的名字,但大多数人会一锅端称为"3DGS popping"——其实它们不是同一种 bug。
Popping (sort flips)跳变(排序翻转)
As you move the camera, two Gaussians whose centers project to similar depth can swap positions in the front-to-back sort — different pixels see different sort orders. Result: color jumps frame-to-frame in regions that should be stable.
镜头移动时,两颗中心投影到相近深度的高斯会在"从前到后"的全局排序里互换位置——而不同像素本该看到的顺序又各不相同。结果就是本该稳定的区域里颜色会突然跳变。
Dilation (sub-pixel splats)膨胀(亚像素 splat)
When you render at lower resolution than you trained at, a Gaussian that was 2 pixels wide at training becomes 0.4 pixels wide. The renderer clamps the footprint to one pixel — so a dim, sub-pixel-thick blob suddenly shines like a full bright pixel. The whole image grain-films.
渲染分辨率比训练时低时,一个原本 2 像素宽的高斯就变成了 0.4 像素宽。渲染器会把覆盖范围强制扩到一个像素——结果一个本来很暗的亚像素小团子突然亮得像一整个像素满分。整张图于是像胶片噪点一样在那儿闪。
Boundary aliasing边界锯齿
The 2D Gaussian ellipse has a smooth falloff, but its 3-sigma cutoff is a hard ring. At pixel granularity that ring becomes a jaggy boundary, especially when the splat is large. Magnified scenes look like stained glass.
2D 高斯椭圆本身衰减得很平滑,但它的 3σ 截断是一个硬环。在像素粒度上这个环就是一圈锯齿,splat 越大越明显。放大看场景就像彩绘玻璃。
Pop-in-zoom (LOD)推近时的细节崩坏(LOD)
The reverse of dilation: zooming in stretches a single Gaussian to cover a hundred pixels, and the seams between neighboring Gaussians become visible as soft cell walls. The scene loses fine detail.
膨胀的反向:镜头推近时,一颗高斯被拉到覆盖几百像素,相邻高斯之间的缝隙开始变成肉眼可见的"软细胞墙"。场景失去细节。
Two of these (popping, boundary jaggies) are sampling problems — we're evaluating a continuous signal at finite pixel positions without a proper prefilter. The other two (dilation, LOD) are scale problems — the optimization picks one Gaussian size per primitive, but the right size depends on the viewer's distance and pixel pitch. Both come back to the same classical idea, the one Zwicker named explicitly in 2001 and the 2023 paper quietly dropped: splatting needs a low-pass filter.
四个里面有两个(跳变、边界锯齿)是采样问题——我们在没有合适前置滤波的情况下,对一个连续信号在有限像素位置上取值。另两个(膨胀、LOD)是尺度问题——优化器为每颗基元挑了一个固定大小,但"正确"的大小取决于观察距离和像素间距。两类问题殊途同归,都要回到 Zwicker 在 2001 年明确写过、2023 年那篇悄悄丢掉的同一个经典想法:splatting 必须配一个低通滤波器。
§ 1 · Popping§ 1 · 跳变
The order isn't a property of the scene 排序顺序不是场景本身的属性
The original rasterizer sorts Gaussians by the depth of their centers, once per frame, globally. But two Gaussians near each other in 3D can project to nearly the same depth for the pixel in row 100 and a different relative depth for the pixel in row 200 — different rays cut the ellipsoids at different points. A global center-sort gets the average right and gets individual pixels wrong.
The artifact you see is sudden: a brown patch of fence becomes red for a frame as you pan, then flips back. The order-sensitivity of alpha compositing is what makes it visible — front-to-back is not a commutative operation.
原版光栅化器对高斯按中心点的深度做全局排序,每帧一次。但 3D 里靠得很近的两颗高斯,对第 100 行的像素和第 200 行的像素,相对深度可能完全不同——不同的射线切到椭球的不同位置。全局中心排序把平均值排对了,但每个像素的实际顺序它都搞错了。
你看到的瑕疵很突兀:你 pan 镜头,栅栏的一块原本是棕色的,突然有一帧变红了,又翻回去。α 合成对顺序敏感,所以这种错误是肉眼可见的——"从前往后"不是一种交换运算。
Interactive · watch the sort flip 交互 · 亲眼看排序翻转
Two overlapping Gaussians; their centers swap depth order halfway through the slider. The left view uses the original "sort once by center depth" — note the discontinuous color jump. The right view sorts per pixel and stays smooth. 两颗叠在一起的高斯,中心点在滑块走到中间时互换深度顺序。左图用原版"按中心深度排一次"——注意那一下不连续的颜色跳。右图按像素排序,过程是平滑的。
The fix · StopThePop (Radl et al., SIGGRAPH 2024) 补救方案 · StopThePop(Radl 等,SIGGRAPH 2024)
StopThePop, which we already mentioned in 3dgs-cuda §7.2, sorts per pixel. Naively that would cost 256× more sorts per tile; the paper's contribution is a hierarchical scheme that pays only ~10% over baseline. The artifact is gone. Architecturally, it's the simplest fix — change the inner loop's sort order — but it took the field a year to notice because the popping was usually blamed on "Gaussians moving" rather than "the ordering fluctuating."
StopThePop 我们在 3dgs-cuda §7.2 提过,它做的是按像素排序。直接这么干会让每个 tile 多 256 倍的排序成本;论文的贡献是一种分层方案,让总开销只比 baseline 多 ~10%。瑕疵直接消失。架构上这是最小改动——改一下内层循环的排序顺序——但社区花了一年才意识到这条路,因为大多数人把跳变怪罪到"高斯在动",而不是"顺序在抖"。
// hierarchical per-pixel sort: tile-wide coarse sort + small per-pixel insertion buffer
__shared__ Gaussian batch[BATCH]; // already roughly depth-sorted by tile
Gaussian queue[4]; // each pixel keeps its own 4-element window
int qlen = 0;
for (int b = start; b < end; b += BATCH) {
batch[threadIdx] = gaussians[sorted_ids[b + threadIdx]];
__syncthreads();
for (int k = 0; k < BATCH; ++k) {
float d = depth_along_pixel_ray(batch[k]);
insert_sort(queue, qlen, batch[k], d); // per-pixel order
}
}
composite(queue);
§ 2 · Dilation§ 2 · 膨胀
What happens when a splat is smaller than a pixel? 当一个 splat 比一个像素还小,会发生什么?
The 2D Gaussian \(G_{\text{2D}}\) is a continuous function — but we evaluate it at a finite set of pixel centers. If the Gaussian's screen extent is much smaller than the pixel grid, the pixel center either lands inside the splat (full brightness) or just outside (zero). There's no interpolation, no falloff, just a binary hit/miss. Render at half resolution and the same Gaussian becomes either invisible or twice as bright as before. Worse, the original code adds a safety dilation that makes every Gaussian at least ~1 pixel wide — meaning the brightness flickers as the camera moves and the splat oscillates between "on-pixel" and "off-pixel."
The textbook fix from rendering theory is a low-pass prefilter: convolve \(G_{\text{2D}}\) with the pixel reconstruction filter (typically a Gaussian or a box) before sampling. Convolution of two Gaussians is another Gaussian, with covariances added. Define a "screen-space anti-aliasing covariance" \(\Sigma_{\text{aa}} = sI\) with \(s\approx 0.3\) (the pixel-filter variance) and use
2D 高斯 \(G_{\text{2D}}\) 是个连续函数——但我们只在有限的像素中心上对它取值。当一颗高斯的屏幕尺寸远小于像素间距,像素中心要么落进这个 splat(全亮),要么落在外面(全黑)。没有插值、没有衰减,二选一。把分辨率降一半,同一颗高斯要么不见、要么亮度翻倍。雪上加霜的是,原版代码还加了一个"安全膨胀",让每一颗高斯至少有 ~1 像素宽——结果就是镜头一动,亮度跟着 splat 在"打到像素"和"打不到像素"之间来回切换地闪。
渲染理论里的教科书答案是低通前置滤波:先把 \(G_{\text{2D}}\) 跟像素的重建核(通常是一颗高斯或一个 box)做卷积,再采样。两颗高斯的卷积仍然是高斯,协方差相加。定义一个"屏幕空间抗锯齿协方差" \(\Sigma_{\text{aa}} = sI\),取 \(s\approx 0.3\)(像素滤波器的方差),然后用:
instead of the bare \(G(\mathbf{p}; \boldsymbol{\Sigma}')\). The splat is always at least as wide as a pixel; its color is correspondingly diluted. This is what Zwicker actually wrote in 2001 and what the original 3DGS dropped for speed. Putting it back is two lines of CUDA.
代替直接用 \(G(\mathbf{p}; \boldsymbol{\Sigma}')\)。这样 splat 永远至少跟一个像素一样宽;它的颜色相应地被稀释。这就是 Zwicker 2001 年写过的、3DGS 原版为了速度悄悄丢掉的那一项。把它加回去只要 CUDA 里两行代码。
Interactive · sub-pixel splat at varying resolution 交互 · 亚像素 splat 在不同分辨率下的样子
A single Gaussian whose physical screen footprint is ~0.5 pixel. Drag the slider to lower the rendering resolution; watch how naive sampling either annihilates the splat or makes it flash. The right column applies a 2D Mip prefilter and stays calm. 一颗物理屏幕足印 ~0.5 像素的高斯。拖滑块把渲染分辨率拉低,看朴素采样是怎么把它要么干掉、要么闪一下的。右边那一列用了 2D Mip 前置滤波,全程稳如老狗。
§ 3 · Mip-Splatting§ 3 · Mip-Splatting
Mip-Splatting (Yu et al., CVPR 2024 Best Paper) Mip-Splatting(Yu 等,CVPR 2024 最佳论文)
Mip-Splatting bundles the §2 fix with a second, complementary filter that lives in 3D. There are two distinct aliasing sources, and treating one without the other lets the bug leak through.
- 2D Mip filter (screen-space). Adds the pixel-filter covariance to every projected \(\Sigma'\), exactly as in §2. Kills dilation / sub-pixel flicker. Costs: approximately zero — one matrix addition.
- 3D smoothing filter (object-space). If a Gaussian is so small in world space that no training camera ever resolves it, the training procedure overfits to noise. Mip-Splatting tracks, per Gaussian, the smallest scale at which any training view samples it, and adds a tiny 3D Gaussian to its \(\Sigma\) to enforce a minimum world-space size. The Gaussian can no longer be smaller than the sharpest training view's pixel footprint — a sound Nyquist constraint, applied to the scene representation itself.
Mathematically: every primitive has its true covariance \(\boldsymbol{\Sigma}\) plus a learned floor \(\boldsymbol{\Sigma}_{\text{3D-filter}} = s_{\min}(\mathbf{g})\,I\). Render-time you use \(\boldsymbol{\Sigma} + \boldsymbol{\Sigma}_{\text{3D-filter}}\) instead. Train-time it goes through the optimizer naturally — bigger \(s_{\min}\) for primitives far from any camera, near-zero for ones in close-up regions.
Mip-Splatting 把 §2 那一招和另一个互补的、活在 3D 里的滤波器打包到一起。走样的源头有两个,只治一个,另一个就会漏出来。
- 2D Mip 滤波(屏幕空间)。给每颗投影后的 \(\Sigma'\) 加上像素滤波器的协方差,就和 §2 一样。直接干掉膨胀 / 亚像素闪烁。开销几乎是零——一次矩阵加法。
- 3D 平滑滤波(物体空间)。如果一颗高斯在世界空间里小到任何训练相机都分辨不出来,训练过程实际上是在过拟合噪声。Mip-Splatting 给每颗高斯记一份"最锐利的训练视角能采样到它的最小尺度",然后给它的 \(\Sigma\) 加一颗很小的 3D 高斯,强制世界空间下有一个最小尺寸下限。这颗高斯不再能小过最锐利训练视角的像素足印——这是把奈奎斯特约束直接施加到场景表示本身。
数学上:每颗基元有它真正的协方差 \(\boldsymbol{\Sigma}\),再加一个学到的下限 \(\boldsymbol{\Sigma}_{\text{3D-filter}} = s_{\min}(\mathbf{g})\,I\)。渲染时用 \(\boldsymbol{\Sigma} + \boldsymbol{\Sigma}_{\text{3D-filter}}\)。训练时它自然走过优化器——远离所有相机的基元,\(s_{\min}\) 大;处于近距离区域的基元,\(s_{\min}\) 接近零。
def project_and_filter(g, cam):
# ---- 3D smoothing filter (world space) ----
s_min = min_pixel_extent_in_training_views(g)
Sigma_3d_filtered = g.Sigma + (s_min ** 2) * torch.eye(3)
# ---- standard EWA projection ----
J = perspective_jacobian(cam, g.mu)
W = cam.world_to_view
Sigma_screen = J @ W @ Sigma_3d_filtered @ W.T @ J.T
# ---- 2D Mip filter (screen space) ----
Sigma_screen_aa = Sigma_screen + 0.3 * torch.eye(2)
alpha_rescaled = g.alpha * torch.sqrt(Sigma_screen.det() / Sigma_screen_aa.det())
return Sigma_screen_aa, alpha_rescaled
Notice the alpha rescale: when we widen the splat, we have to dilute its peak opacity to preserve the integral. \(\sqrt{|\Sigma|/|\Sigma_{aa}|}\) is the exact correction. Without it, applying the prefilter would brighten the image.
注意里面的 alpha rescale:我们把 splat 加宽时,必须相应地稀释它的峰值不透明度,才能保持积分总量不变。\(\sqrt{|\Sigma|/|\Sigma_{aa}|}\) 正是那个精确的修正系数。没有它的话,加了前置滤波反而会让整张图变亮。
Interactive · the 3D smoothing filter, in cross-section 交互 · 3D 平滑滤波的截面图
A "true" Gaussian in world space (left) and the same Gaussian after the 3D smoothing floor (right). Move the camera farther away to see how the floor saves you from invisible primitives. The training pass uses the right-hand version; this is what guarantees stable multi-resolution rendering. 左边是世界空间里一颗"真"高斯,右边是同一颗高斯加上 3D 平滑下限后的样子。把相机拉远,看下限是怎么救你于"基元消失不见"的窘境的。训练用的是右边那版;这正是多分辨率渲染稳定的保障。
§ 4 · Analytic-Splatting§ 4 · Analytic-Splatting
Don't sample — integrate 别采样——直接做积分
Mip-Splatting prefilters and then samples at the pixel center. Analytic-Splatting goes one step further and asks: what if we integrate \(G_{\text{2D}}\) over the actual pixel square instead of sampling at its midpoint?
Mip-Splatting 是先做前置滤波再在像素中心采样。Analytic-Splatting 再往前走一步:如果我们干脆对整个像素方块做积分,而不是只在中点采样呢?
For a 1D Gaussian the integral is a difference of error functions — exactly representable. For a 2D Gaussian with arbitrary rotation the closed form gets messier, but Liang et al. derived a cheap approximation using elementary functions that matches the exact integral to four decimal places. The big win is that pixel-area integration handles arbitrarily fine LODs without any explicit Mip filter — the integral already accounts for sub-pixel structure.
Mip-Splatting and Analytic-Splatting are largely orthogonal — you can stack them. Together they give the cleanest multi-resolution renders the field has produced. The downside: the analytic integral isn't free; it's ~1.3× the bare evaluation, which adds up on big tiles.
对于一维高斯,这个积分就是两个误差函数的差——可以精确表达。对于任意旋转的二维高斯,闭式就更乱一点,但 Liang 等用初等函数推了一个便宜的近似,跟精确积分一致到四位小数。最大的好处是:逐像素面积积分可以处理任意精细的 LOD,根本不需要显式的 Mip 滤波——积分本身已经把亚像素结构算进来了。
Mip-Splatting 和 Analytic-Splatting 基本是正交的——你可以叠加用。两者合用能跑出当前最干净的多分辨率渲染。坏处是:解析积分不是免费的,比单纯求值贵 ~1.3 倍,对大 tile 加起来也不小。
§ 5 · GES§ 5 · GES
Generalized Exponential Splatting 广义指数 Splatting
A different angle on the same problem: instead of filtering the Gaussian, replace it with a primitive whose falloff is sharper. The Gaussian is just one member of a family of generalized exponentials:
面对同一个问题的另一个切入:与其给高斯加滤波,不如把它换成一种衰减更锐的基元。高斯只是一族广义指数函数里的一个特例:
\(\beta=2\) is our familiar Gaussian. \(\beta\to\infty\) is a hard ellipsoid (binary mask). \(\beta\approx 1\) is a Laplacian — a sharper peak with heavier tails. GES makes \(\beta\) a learnable per-primitive parameter and reports that ~2 million GES primitives match ~5 million Gaussians in image quality. Fewer primitives → less popping (less stuff to flip in the sort), less aliasing (sharper edges decay outside the pixel), and ~2× faster rendering.
\(\beta=2\) 就是我们熟悉的高斯。\(\beta\to\infty\) 就是一个硬椭球(二值 mask)。\(\beta\approx 1\) 是拉普拉斯——峰更尖、尾更厚。GES 把 \(\beta\) 设成每颗基元可学的参数,发现 ~2 百万颗 GES 基元就能在画质上对齐 ~5 百万颗高斯。基元更少 → 跳变更少(排序里能翻的东西更少)、走样更少(更锐的边缘在像素之外衰减得更彻底),渲染也快了大约 2 倍。
Interactive · vary β 交互 · 调节 β
Drag β. At β=2 you have a Gaussian. Push β up and the falloff sharpens — at the limit you get a hard ellipse. Push β down and you get fatter tails. The right panel shows what the primitive looks like sampled at pixel grid resolution. 拖动 β。β=2 时是普通高斯。把 β 推大,衰减越来越锐——极限就是一个硬椭圆。把 β 推小,尾巴越来越胖。右边那张图是这颗基元在像素网格上采样的样子。
§ 6 · Multi-Scale 3DGS§ 6 · 多尺度 3DGS
Explicit LOD: a pyramid of Gaussians 显式 LOD:一座高斯金字塔
Mip-Splatting and Analytic-Splatting both modify the rendering of a fixed set of Gaussians. Multi-Scale 3DGS instead modifies the set itself — training maintains several representations of the same scene at different scales, and picks the right one at render time. This is the classic mipmap idea, applied to point clouds.
Concretely: train for a few thousand steps to convergence, then duplicate-and-coarsen — merge spatially-close, similar-color Gaussians into one bigger Gaussian. Now you have two pyramid levels. Repeat once or twice more. At render time, for each pixel, pick the pyramid level whose Gaussian footprint best matches the pixel footprint (~2 Gaussians cover a pixel, no more, no less). Picking is closed-form from the projected covariance and the camera-to-pixel distance.
Mip-Splatting 和 Analytic-Splatting 都只改一组固定高斯的渲染方式。多尺度 3DGS 反其道而行——改的是那一组高斯本身:训练时同时维护同一场景在多个尺度下的表示,渲染时挑对应尺度的那一份。这就是经典 mipmap 思路用在点云上的版本。
具体做法:先训几千步收敛,然后做"复制并粗化"——把空间邻近、颜色相似的高斯合并成一颗大高斯。这样就有了两层金字塔。再重复一两次。渲染时每个像素挑那一层、它的高斯足印最接近像素足印(让 ~2 颗高斯覆盖一个像素,不多不少)。挑选过程从投影协方差和相机到像素距离闭式算出来。
Interactive · pick the level by viewer distance 交互 · 根据观察距离挑层
The same scene at three pyramid levels: 500 tiny Gaussians, 80 medium, 12 big. As you change camera distance the picker chooses the right level — the highlighted ring shows which level is being rendered. Try zooming all the way out: you get the 12-Gaussian level, with no glaring sub-pixel splats. 同一场景的三层金字塔:500 颗小高斯、80 颗中等、12 颗大。改变相机距离,挑选器会自动选层——高亮圈出来的就是当前在渲染的那一层。一直拉远试试:你拿到的是 12 颗那一层,没有任何刺眼的亚像素 splat。
Multi-Scale stacks nicely with Mip-Splatting: the pyramid handles the coarse LOD picker, the Mip-filter handles the residual aliasing inside a level. Together they essentially eliminate every artifact in §0. The cost is storage: ~1.5× the original scene size for a three-level pyramid.
多尺度跟 Mip-Splatting 很好叠:金字塔负责粗粒度的 LOD 挑选,Mip-filter 负责消除每一层内部的残余走样。两者合起来基本上能干掉 §0 里所有四种瑕疵。代价是存储:三层金字塔大约会让场景大小变成原来的 1.5 倍。
§ 7 · The fix lineup§ 7 · 补救方案的阵营
Eight papers, one moving target 八篇论文,一个会动的靶
The fixes appeared in roughly two waves: 2023–early 2024 attacked the screen-space prefilter and per-pixel sort; mid-2024 onwards attacked the primitive itself and the LOD pyramid. Click any entry for the one-paragraph idea.
补丁分两波出现:2023–2024 上半年攻屏幕空间前置滤波和按像素排序;2024 下半年起攻基元本身和 LOD 金字塔。点下面任意条目看一段话总结。
§ 8 · Open§ 8 · 仍未关上的漏洞
Where the bug isn't quite dead bug 还没彻底死掉的几处
Even with all the above stacked, specular highlights remain flickery at far view — the SH color at a sub-pixel sample is whatever direction the pixel happens to ray-cast through, not the average over the pixel area. The 2D Mip filter doesn't help because it filters position, not direction. Several 2025 papers tackle this with a directional filter applied to SH coefficients before evaluation — a kind of MIP for spherical harmonics — but no consensus has formed.
On the other end of the LOD scale, very close-up renders still show cell boundaries between neighboring Gaussians, because the primitives are big and the seams are visible. The eventual fix is either much-smaller-primitives-everywhere (expensive) or a per-Gaussian microscale texture (lightly explored). The story is not over.
就算把以上招式全叠上,镜面高光在远视图下仍然会闪——亚像素采样时的 SH 颜色取决于那像素恰好射向哪个方向,而不是像素面积上的方向平均。2D Mip 滤波器对这件事帮不上忙,因为它滤的是位置,不是方向。2025 年好几篇论文用方向上的滤波器在 SH 求值前先平滑 SH 系数——某种"球谐的 MIP"——但还没有统一方案。
LOD 另一端,极近距离渲染时相邻高斯之间的细胞边界仍然能看出来,因为基元太大、缝隙可见。最终的解决方法要么是处处都用更小的基元(很贵),要么是给每颗高斯加一种微尺度纹理(探索得还不多)。故事远没结束。