If you want to play with this, as in really play, with over a dozen variant models with acceleration loras and a vibrant community, ya gotta check out:
"Wan2GP" is AI video and images "for the GPU poor", get all this operating with as little as 6GB VRAM, Nvidia only.
diggan · 1h ago
On the other side, is there any projects focusing on performance instead? I have the VRAM available to run Wan2.1, but still takes minutes per frame. Basically something like what vLLM is for running local LLM weights, but for video/WAN?
There are a lot of people focused on performance, various methods, just as there are a lot of people focused on non-performance issues like fine tunes that add aspects the models lack, such as terminology linking professional media terms to the model, the pop culture terminology the model does not know, accuracy of body posture during fight, dance, gymnastic, and sports activity, and then less flashy but pragmatic actions like proper use of tableware, chopsticks, keyboards and musical instruments - complex actions that stand out when done incorrectly or never shown. The model knowledge is high but has limits, which people are adding.
bsenftner · 10m ago
There is also a ton of Wan video activity in the ComfyUI community. Everyday for a while, about two weeks ago, ComfyUI had updates specific to Wan 2.2 video integrations in the standard installation. ComfyUI is more complex application, significantly, than Wan2GP though.
franky47 · 27m ago
Quick, someone make a UI for this and call it Obi.
ProofHouse · 4h ago
How can they manage that but not the website?
cubefox · 2h ago
Arguably most interesting facts about the new Wan 2.2 model:
- they are now using a 27B MoE architecture (with two 14B experts, for low level and high level detail), which were usually only used for autoregressive LLMs rather than diffusion models
- the smaller 5B model supports up to 720p24 video and runs on 24 GB of VRAM, e.g. an RTX 4090, a consumer graphics card
- if their benchmarks are reliable, the model performance is SOTA even compared to closed source models
esseph · 5h ago
Ugh hate they used this name
yorwba · 4h ago
You can call it Wanxiang (万相, ten thousand pictures) if you want. Similarly, Qwen is Qianwen (千问, one thousand questions).
CapsAdmin · 4h ago
Its original name was WanX, but the gen ai community found that to be too funny / unfortunate, so they changed it to just Wan.
latentsea · 4h ago
They should just pretend it's an acronym. Wide Art Network.
https://github.com/deepbeepmeep/Wan2GP
And the discord community: https://discord.gg/g7efUW9jGV
"Wan2GP" is AI video and images "for the GPU poor", get all this operating with as little as 6GB VRAM, Nvidia only.
There are a lot of people focused on performance, various methods, just as there are a lot of people focused on non-performance issues like fine tunes that add aspects the models lack, such as terminology linking professional media terms to the model, the pop culture terminology the model does not know, accuracy of body posture during fight, dance, gymnastic, and sports activity, and then less flashy but pragmatic actions like proper use of tableware, chopsticks, keyboards and musical instruments - complex actions that stand out when done incorrectly or never shown. The model knowledge is high but has limits, which people are adding.
- they are now using a 27B MoE architecture (with two 14B experts, for low level and high level detail), which were usually only used for autoregressive LLMs rather than diffusion models
- the smaller 5B model supports up to 720p24 video and runs on 24 GB of VRAM, e.g. an RTX 4090, a consumer graphics card
- if their benchmarks are reliable, the model performance is SOTA even compared to closed source models