Show HN: Real-Time Gaussian Splatting

49 markisus 25 5/15/2025, 1:26:49 PM github.com ↗
LiveSplat is a system for turning RGBD camera streams into Gaussian splat scenes in real-time. The system works by passing all the RGBD frames into a feed forward neural net that outputs the current scene as Gaussian splats. These splats are then rendered in real-time. I've put together a demo video at the link above.

Comments (25)

spyder · 5m ago
Correct me if I'm wrong but looking at the video this just looks like a 3D point cloud using equal-sized "gaussians" (soft spheres) for each pixel, that's why it looks still pixelated especially at the edges. Even when it's low resolution the real gaussian splatting artifacts look different with spikes an soft blobs at the lower resolution parts. So this is not really doing the same as a real gaussian splatting of combining different sized view-dependent elliptic gaussians splats to reconstruct the scene and also this doesn't seem to reproduce the radiance field as the real gaussian splatting does.
armchairhacker · 11m ago
Gaussian Splatting looks pretty and realistic in a way unlike any other 3D render, except UE5 and some hyper-realistic not-realtime renders.

I wonder if one can go the opposite route and use gaussian splatting or (more likely) some other method to generate 3D/4D scenes from cartoons. Cartoons are famously hard to emulate in 3D even entirely manually; like with traditional realistic renders (polygons, shaders, lighting, post-processing) vs gaussian splats, maybe we need a fundamentally different approach.

yuchi · 33m ago
The output looks terribly similar to what sci-fi movies envisioned as 3D reconstruction of scenes. It is absolutely awesome. Now, if we could project them in 3D… :)
sendfoods · 46m ago
Please excuse my naive question - isn't Gaussian Splatting usually used to create 3D imagery from 2D? How does providing 3D input data make sense in this context?
markisus · 30m ago
Yes, the normal case uses 2D input, but it can take hours to create the scene. Using the depth channel allows me to create the scene in 33 milliseconds, from scratch, every frame. You could conceptualize this as a compromise between raw pointcloud rendering and fully precomputed Gaussian splat rendering. With pointclouds, you have a lot visual artifacts due to sparsity (low texture information, seeing "through" objects"). With Gaussian splatting, you can transfer a lot more of the 2D texture information into 3D space and render occlusion and view-dependent effects better.
Retr0id · 21m ago
How do the view-dependent effects get "discovered" from only a single source camera angle?
markisus · 8m ago
Actually there are multiple source cameras. The neural net learns to interpolate the source camera colors based on where the virtual camera is. Under the hood it's hard to say exactly what's going on in the mind of the neural net, but I think it's something like "If I'm closer to camera A, take most of the color from camera A."
ttoinou · 40m ago
Well if you have the D channel you might as well benefit from it and have better output
metalrain · 24m ago
How did you train this? I'm thinking there isn't reference output for live video frame to splats so supervised learning doesn't work.

Is there some temporal accumulation?

markisus · 21m ago
There is no temporal accumulation, but I think that's the next logical step.

Supervised learning actually does work. Suppose you have four cameras. You input the three of them into the net and use the fourth as the ground truth. The live video aspect just emerges from re-running the neural net every frame.

mandeepj · 30m ago
Another implementation of splat https://github.com/NVlabs/InstantSplat
jasonjmcghee · 19m ago
The quality is better, no doubt, but this method (from the paper) takes on the order of 10-45s depending on input from their table. Which is much better than 10 minutes etc.

That being said, afaict OP's method is 1000x faster, at 33ms.

markisus · 16m ago
Note that the method you linked is "Splatting in Seconds" where as real-time requires splatting in tens of milliseconds.

I'm also following this work https://guanjunwu.github.io/4dgs/ which produces temporal Gaussian splats but takes at least half an hour to learn the scene.

corysama · 17m ago
So, I see livesplat_realsense.py imports livesplat. Where’s livesplat?
markisus · 12m ago
I've tried to make it clear in the link that the actual application is closed source. I'm distributing it as a .whl full of binaries (see the installation instructions).

I've considered publishing the source but the source code is is dependent on some proprietary utility libraries from my bigger project and it's hard to fully disentangle it and I'm not sure if this project has some business applications but I'd like to keep that door open at this time.

IshKebab · 12m ago
The README says it's closed source.
sreekotay · 1h ago
This is realtime capture/display? Presumable (at this stage) for local viewing? Is that right?
markisus · 1h ago
Yes realtime capture and display. Locality is not required. You can send the source RGBD video streams over IP and in fact I have that component working in the larger codebase that this was split off from. For that use case, you need to do some sort of compression. The RGB stream compression is a pretty solved problem, but the depth channel needs special consideration since "perceptual loss" in the depth space is not a well researched area.
echelon · 58m ago
OP, this is incredible. I worry that people might see a "glitchy 3D video" and might not understand the significance of this.

This is getting unreal. They're becoming fast and high fidelity. Once we get better editing capabilities and can shape the Gaussian fields, this will become the prevailing means of creating and distributing media.

Turning any source into something 4D volumetric that you can easily mold as clay, relight, reshape. A fully interactable and playable 4D canvas.

Imagine if the work being done with diffusion models could read and write from Gaussian fields instead of just pixels. It could look like anything: real life, Ghibli, Pixar, whatever.

I can't imagine where this tech will be in five years.

markisus · 51m ago
Thanks so much! Even when I was putting together the demo video I was getting a little self-critical about the visual glitches. But I agree the tech will get better over time. I imagine we will be able to have virtual front row seats at any live event, and many other applications we haven't thought of yet.
echelon · 42m ago
> I imagine we will be able to have virtual front row seats at any live event, and many other applications we haven't thought of yet.

100%. And style-transfer it into steam punk or H.R. Giger or cartoons or anime. Or dream up new fantasy worlds instantaneously. Explore them, play them, shape them like Minecraft-becomes-holodeck. With physics and tactile responses.

I'm so excited for everything happening in graphics right now.

Keep it up! You're at the forefront!

_verandaguy · 48m ago
I know enough about 3D rendering to know that Gaussian splatting's one of the Big New Things in high-performance rendering, so I understand that this is a big deal -- but I can't quantify why, or how big a deal it is.

Could you or someone else wise in the ways of graphics give me a layperson's rundown of how this works, why it's considered so important, and what the technical challenges are given that an RGB+D(epth?) stream is the input?

markisus · 39m ago
Gaussian Splatting allows you to create a photorealistic representation of an environment from just a collection of images. Philosophically, this is a form of geometric scene understanding from raw pixels, which has been a holy grail of computer vision since the beginning.

Usually creating a Gaussian splat representation takes a long time and uses an iterative gradient-based optimization procedure. Using RGBD helps me sidestep this optimization, as much of the geometry is already present in the depth channel and so it enables the real-time aspect of my technique.

When you say "big deal", I imagine you are also asking about business or societal implications. I can't really speak on those, but I'm open to licensing this IP to any companies which know about big business applications :)

corysama · 3m ago
So, is there some amount of gradient-based optimization going on here? I see RGBD input, transmission, RGBD output. But, other than multi-camera registration, it's difficult to determine what processing took place between input and transmission. What makes this different from RGBD camera visualizations from 10 years ago?
patrick4urcloud · 1h ago
nice