feat: use `ruff analyze graph` to detect files for the code bundle
#7,595 opened on Jun 26, 2026
Repository metrics
- Stars
- (3,705 stars)
- PR merge metrics
- (Avg merge 3d 8h) (116 merged PRs in 30d)
Description
Why
When you flyte.run(...) with the default copy_style="loaded_modules", the SDK decides which local files to ship in the code bundle by inspecting sys.modules at runtime (list_imported_modules_as_files). That has two downsides:
- It can miss files. Only modules already imported by the time bundling runs are captured. Lazy imports, conditional imports (
if TYPE_CHECKING:, inside-function imports), or modules imported later on the cluster can be silently left out of the bundle, surfacing as aModuleNotFoundErrorat execution time. - It depends on interpreter state, so the bundle contents can vary with how/where
flyte runis invoked.
A user asked for a more reliable, static way to determine the dependency set. ruff analyze graph produces a first-party import dependency graph statically (no need to actually import anything) and is fast. This is a nice-to-have, not a blocker — a good community contribution.
Concrete example: works with ruff analyze graph, broken today
# helper.py
def compute() -> int:
return 42
# main.py (the entrypoint you `flyte.run`)
import flyte
env = flyte.TaskEnvironment(name="demo")
@env.task
async def main() -> int:
from helper import compute # lazy, function-level import
return compute()
Run it: flyte.run(main).
- Today (
loaded_modules): at packaging time on your laptop,main()never executes, so thefrom helper import computeline never runs andhelperis not insys.modules. The bundle ships withouthelper.py, and the task fails on the cluster withModuleNotFoundError: No module named 'helper'. - With
ruff analyze graph: static analysis sees thefrom helper import ...edge regardless of where it appears, sohelper.pyis included and the run succeeds.
The same gap applies to imports guarded by if TYPE_CHECKING: that are later needed at runtime, and
to any module only reachable through a code path that isn't exercised locally before packaging.
What to change
Note: the code bundling logic lives in the flyteorg/flyte-sdk repo (Python SDK), under
src/flyte/_code_bundle/. The paths below refer to that repo. This issue is tracked here for v2 visibility.
Reimplement the existing loaded_modules detection on top of ruff analyze graph, which emits a JSON map of file -> [files it imports]. No new copy_style value or public API change — just make loaded_modules discover files statically from the import graph instead of from runtime sys.modules. Starting from the entrypoint module(s), walk that graph to collect the transitive set of first-party files, then feed them into the existing bundling path.
src/flyte/_code_bundle/_utils.py— inls_files(), thecopy_file_detection == "loaded_modules"branch currently callslist_imported_modules_as_files(str(source_path), sys_modules). Replace that implementation so it shells out toruff analyze graph, parses the JSON, and resolves the transitive imports of the entrypoint undersource_path. Keep filtering to files withinsource_path(drop third-party/stdlib), so the returned file list matches the existing contract.- The
CopyFilesliteral stays as-is (Literal["loaded_modules", "all", "none", "custom"]) — no new value. - Handle
ruffnot being installed: detectshutil.which("ruff")and fall back to the currentsys.modulesapproach with a clear log. Don't hard-fail. tests/— cover a project where a module is imported lazily/conditionally and assertloaded_modulesnow includes it.
Outcome
-
copy_style="loaded_modules"(the default) bundles the full transitive first-party import set of the entrypoint, including lazily/conditionally imported local modules - Third-party and stdlib files are excluded (bundle stays minimal)
- Graceful fallback to the old
sys.modulesbehavior whenruffis not onPATH - Tests cover the lazy/conditional-import case that
loaded_modulespreviously missed - No public API change (
CopyFilesvalues unchanged)
Getting started
- See the current logic:
list_imported_modules_as_filesandls_filesinsrc/flyte/_code_bundle/_utils.py(thecopy_file_detection == "loaded_modules"branch). - Try the tool:
ruff analyze graph path/to/entrypoint.py(outputs JSON). Docs: https://docs.astral.sh/ruff/ - Entry points to read:
src/flyte/_code_bundle/bundle.py(build_code_bundle),src/flyte/_run.py(copy_style). - Test:
pytest tests/— add a fixture project under a tmp dir with a lazy import. - Setup: see CONTRIBUTING in the flyte-sdk repo.