Fix Broken Pipeline

A pipeline for fast autoregressive (AR) video generation based on the Self-Forcing framework (arXiv:2506.08009) is available at https://github.com/moonmath-ai/SiFRiA/tree/wan2.1_1.3bAR. Both stages use WAN2.1 as the base model.

Pretrained checkpoints from our training run are available on Hugging Face: https://huggingface.co/ik6626/self_forcing_trial/tree/main

  • expt_2_1_1_3b_ode/ — ODE distillation checkpoint
  • expt_2_1_1_3b_dmd/ — DMD checkpoint (trained on top of the ODE checkpoint above)

The problem

Here is a video generated using the DMD checkpoint above, for the prompt:

“A Porsche, sleek and black, races forward swiftly along the asphalt. It weaves through the landscape against a backdrop of destroyed houses and skyscrapers cloaked in moss. As dawn breaks, the crimson sun ascends into the sky.”

The problem is: the Porsche drives backwards. The video is semantically wrong and physically implausible. This is not an isolated failure; the same issue reproduces across prompts and seeds. Something is systemically broken in the pipeline.

Your task

Identify the root cause of the failure, propose and implement a concrete fix. You do not need to train from scratch.

Submit