Show HN: Real-Time Gaussian Splatting
73 markisus 32 5/15/2025, 1:26:49 PM github.com ↗
LiveSplat is a system for turning RGBD camera streams into Gaussian splat scenes in real-time. The system works by passing all the RGBD frames into a feed forward neural net that outputs the current scene as Gaussian splats. These splats are then rendered in real-time. I've put together a demo video at the link above.
However, I have not baked in the size or orientation into the system. Those are "chosen" by the neural net based on the input RGBD frames. The view dependent effects are also "chosen" by the neural net, but not through an explicit radiance field. If you run the application and zoom in, you will be able to see the splats of different sizes pointing in different directions. The system as limited ability to re-adjust the positions and sizes due to the compute budget leading to the pixelated effect.
I actually started with pointclouds for my VR teleoperation system but I hated how ugly it looked. You end up seeing through objects and objects becoming unparseable if you get too close. Textures present in the RGB frame also become very hard to make out because everything becomes "pointilized". In the linked video you can make out the wood grain direction in the splat rendering, but not in the pointcloud rendering.
[1] https://youtu.be/-u-e8YTt8R8?si=qBjYlvdOsUwAl5_r&t=14
The depth is helpful to properly handle the parallaxing of the scene as the view angle changes. The system should then ideally "in-paint" the areas that are occluded from the input.
You can either guess the input depth from matching multiple RGB inputs or just use depth inputs along with RGB inputs if you have them. It's not fundamental to the process of building the splats either way.
I wonder if one can go the opposite route and use gaussian splatting or (more likely) some other method to generate 3D/4D scenes from cartoons. Cartoons are famously hard to emulate in 3D even entirely manually; like with traditional realistic renders (polygons, shaders, lighting, post-processing) vs gaussian splats, maybe we need a fundamentally different approach.
Is there some temporal accumulation?
Supervised learning actually does work. Suppose you have four cameras. You input the three of them into the net and use the fourth as the ground truth. The live video aspect just emerges from re-running the neural net every frame.
That being said, afaict OP's method is 1000x faster, at 33ms.
I'm also following this work https://guanjunwu.github.io/4dgs/ which produces temporal Gaussian splats but takes at least half an hour to learn the scene.
I've considered publishing the source but the source code is is dependent on some proprietary utility libraries from my bigger project and it's hard to fully disentangle it and I'm not sure if this project has some business applications but I'd like to keep that door open at this time.
No comments yet
This is getting unreal. They're becoming fast and high fidelity. Once we get better editing capabilities and can shape the Gaussian fields, this will become the prevailing means of creating and distributing media.
Turning any source into something 4D volumetric that you can easily mold as clay, relight, reshape. A fully interactable and playable 4D canvas.
Imagine if the work being done with diffusion models could read and write from Gaussian fields instead of just pixels. It could look like anything: real life, Ghibli, Pixar, whatever.
I can't imagine where this tech will be in five years.
100%. And style-transfer it into steam punk or H.R. Giger or cartoons or anime. Or dream up new fantasy worlds instantaneously. Explore them, play them, shape them like Minecraft-becomes-holodeck. With physics and tactile responses.
I'm so excited for everything happening in graphics right now.
Keep it up! You're at the forefront!
Could you or someone else wise in the ways of graphics give me a layperson's rundown of how this works, why it's considered so important, and what the technical challenges are given that an RGB+D(epth?) stream is the input?
Usually creating a Gaussian splat representation takes a long time and uses an iterative gradient-based optimization procedure. Using RGBD helps me sidestep this optimization, as much of the geometry is already present in the depth channel and so it enables the real-time aspect of my technique.
When you say "big deal", I imagine you are also asking about business or societal implications. I can't really speak on those, but I'm open to licensing this IP to any companies which know about big business applications :)
I'm not aware of other live RGBD visualizations except for direct pointcloud rendering. Compared to pointclouds, splats are better able to render textures, view-dependent effects, and occlusions.