LiteRunner: MLOps-Style Tracking Without Touching the Code
LiteRunner is open-source, and contributions are welcome.
TL;DR
- Try LiteRunner: https://github.com/moonmath-ai/LiteRunner
- Running diffusion / video experiments often turns into manual bookkeeping, long CLI commands, scattered outputs, and no reliable record of configs, seeds, or results.
- LiteRunner adds lightweight tracking to any CLI command without changing the model, saving params, outputs, and metrics locally and in W&B so every run stays reproducible and organized.
When running video generation experiments with diffusion models, the workflow quickly turns into bookkeeping. Every run starts with hand-editing long CLI commands, quoting paths, swapping flags manually, and each run produces a different combination of config, output videos, metrics, and debug data. Output files end up scattered across multiple folders and machines with no central record, sometimes even overwriting each other. Moving those files and recording runs becomes tedious, and inevitably the one run that wasn’t properly recorded turns out to be the one that matters. Revisiting an old experiment often means digging through notes just to figure out whether it used seed 10 or 42.
When you own the code, you can wire in an MLOps tool to solve this. But often you’re just a user of someone else’s model, and modifying their source just to get proper tracking isn’t practical. That’s when the idea comes up: instead of changing the model code, bring MLOps-style logging to arbitrary CLI commands, so experiments can be tracked without touching the original implementation.
What LiteRunner is
LiteRunner is a lightweight runner for generative model experiments with built-in local tracking and Weights & Biases integration. Instead of writing long command lines for every test, you write a small run.py file where parameters, outputs, and metrics are defined once.
Run the script and LiteRunner prompts you for missing parameters, with your defaults pre-filled. It builds the command, executes it as a subprocess, streams stdout in real time, and extracts metrics along the way. After the run, it saves everything to a unique local folder and uploads it all to Weights & Biases, making every run reproducible.[1] (This includes command, parameters, stdout/stderr, input/output files, code, hostname, and duration.)
Sweeps can loop over parameter values while grouping runs for comparison, using simple bash or python scripts. The API allows branching runs, overriding parameters, and attaching metadata without rewriting scripts, and each execution gets a structured local run directory to keep experiments organized.
The wrapped code doesn’t need to change. Parameters, outputs, and metrics are declared in the run script, not inside the model. This makes the workflow consistent across repos you maintain and repos you only run, while keeping experiments minimal, repeatable, and easy to compare.
Directions we are exploring
There are a few directions we’re exploring.
- CLI-driven sweeps: defining parameter grids directly from the command line, without writing Python. The UX needs careful thought, but it would lower the barrier for quick experiments.
- Local run management: searching and browsing past runs from the local
~/lite_runs/directory, tagging runs after watching the results, and grouping them post hoc. - Remote runs management: a way to see what’s on remote machines without downloading full outputs.
- Additional backends: W&B is the default, but if there’s demand we can support other tracking tools.
- Improve storage efficiency.
- WorldJen integration: WorldJen[2] (WorldJen is high performance video and world evals tool, we plan to integrate lite runner to complement various ways AI researchers and engineers can interact with the platform. Learn more: https://moonmath.ai/posts/introducing-worldjen/) lets developers evaluate video generation models against a standard set of prompts and report quality metrics. LiteRunner could send a model that already has a run script to WorldJen, which could use it to drive evaluation while keeping metadata about the version evaluated. This allows a fast exploration and feedback loop for the developer.
LiteRunner is still evolving, but the goal is simple: make experiment tracking feel lightweight enough that people actually use it every day.
If this sounds useful for your workflow, check out LiteRunner and feel free to reach out with feedback or requests.