Fix Broken Pipeline
A pipeline for fast autoregressive (AR) video generation based on the Self-Forcing framework (arXiv:2506.08009) is available at https://github.com/moonmath-ai/SiFRiA/tree/wan2.1_1.3bAR. Both stages use WAN2.1 as the base model.
Pretrained checkpoints from our training run are available on Hugging Face: https://huggingface.co/ik6626/self_forcing_trial/tree/main
expt_2_1_1_3b_ode/— ODE distillation checkpointexpt_2_1_1_3b_dmd/— DMD checkpoint (trained on top of the ODE checkpoint above)
The problem
Here is a video generated using the DMD checkpoint above, for the prompt:
“A Porsche, sleek and black, races forward swiftly along the asphalt. It weaves through the landscape against a backdrop of destroyed houses and skyscrapers cloaked in moss. As dawn breaks, the crimson sun ascends into the sky.”
The problem is: the Porsche drives backwards. The video is semantically wrong and physically implausible. This is not an isolated failure; the same issue reproduces across prompts and seeds. Something is systemically broken in the pipeline.
Your task
Identify the root cause of the failure, propose and implement a concrete fix. You do not need to train from scratch.