Package topology target spec¶
Status¶
Draft target architecture. This is a direction-setting spec, not a big-bang rewrite plan.
Goal¶
Evolve dfode_kit/ toward clearer subsystem boundaries that match the user-visible workflow and reduce mixed-responsibility modules.
Target shape:
dfode_kit/
cli/
main.py
command_loader.py
commands/
cases/
init.py
presets.py
sampling.py
deepflame.py
data/
contracts.py
io_hdf5.py
augment.py
label.py
models/
mlp.py
registry.py
training/
config.py
registry.py
supervised_physics.py
train.py
runtime/
config.py
run_case.py
utils/
Why this direction¶
The current package layout has the right broad pieces, but some names are historical and some modules mix concerns:
cli_tools/is clear, but the*_toolsnaming is heavier than needed.data_operations/currently mixes contracts, HDF5 I/O, augmentation, labeling, and some model/integration helpers.df_interface/includes case initialization, case setup, and sampling concerns under a vague name.dfode_core/mixes models, training, preprocessing, and an in-package test directory.- some modules carry more than one abstraction level, especially
h5_kit.py.
The target layout moves toward domain-first naming and narrower module responsibility.
Design principles¶
1. Name packages after stable responsibilities¶
Prefer names that tell a new contributor where code belongs:
cli: command-line entrypoints and dispatch onlycases: canonical case setup, case-local sampling, DeepFlame-facing case operationsdata: data contracts, storage I/O, augmentation, labelingmodels: neural-network architectures and model registrytraining: trainer config, trainer registry, training loopsruntime: machine-local configuration and case execution helpersutils: small generic helpers only
2. Keep orchestration separate from primitives¶
Examples:
- cases.init should assemble a case plan and apply it
- data.io_hdf5 should do HDF5 read/write/validation work
- training.train should orchestrate a training run
- models.mlp should define one architecture, not run the whole workflow
3. Avoid mixed abstraction levels in one module¶
A file named like an I/O helper should not also be the home for model inference or numerical integration orchestration.
4. Keep CLI modules thin¶
CLI command modules should: - define arguments - validate command-level options - call importable library functions - handle output formatting and exit behavior
They should not be the only place where business logic exists.
5. Preserve agent-friendly behavior¶
Refactors in this direction must preserve: - lazy imports for heavy dependencies where possible - deterministic CLI command ordering - clean stdout/stderr behavior - small, verifiable change slices
Target package responsibilities¶
dfode_kit/cli/¶
Owns command parsing, command registration, and command dispatch.
Intended contents¶
main.py: top-level argument parsing and exit-code contractcommand_loader.py: deterministic command specification/loadingcommands/: one module per user-facing subcommand
Non-goals¶
- no domain logic that cannot be reused outside the CLI
- no heavy imports at top level when avoidable
dfode_kit/cases/¶
Owns canonical case planning and DeepFlame/OpenFOAM-facing case operations.
Intended contents¶
init.py: case plan assembly and application helperspresets.py: named case-setup presets and preset metadatasampling.py: sample extraction from completed casesdeepflame.py: DeepFlame/OpenFOAM case-specific helpers
Notes¶
This package replaces the vaguer df_interface/ name with a user-visible workflow concept: cases.
dfode_kit/data/¶
Owns dataset contracts, file I/O, and state transformation steps.
Intended contents¶
contracts.py: HDF5/NPY schema checks and dataset ordering rulesio_hdf5.py: HDF5 reading/writing and stack/reshape helpersaugment.py: augmentation primitives and workflow helperslabel.py: labeling primitives and workflow helpers
Notes¶
This package should not become a second training or inference package. If a function is primarily model inference or time integration orchestration, it likely belongs elsewhere.
dfode_kit/models/¶
Owns model architectures and model registration.
Intended contents¶
mlp.py: current MLP implementationregistry.py: model factory registration and lookup
Notes¶
Keep model creation separate from training-loop concerns.
dfode_kit/training/¶
Owns trainer configuration, trainer registration, and training execution.
Intended contents¶
config.py: typed training configurationregistry.py: trainer registrysupervised_physics.py: current trainer implementationtrain.py: orchestration entrypoint for training runs
Notes¶
Preprocessing that is training-specific may move here later if it is not reusable elsewhere.
dfode_kit/runtime/¶
Owns machine-local runtime config and execution planning.
Intended contents¶
config.py: runtime config persistence and schemarun_case.py: case execution planning/application helpers
Notes¶
This separates workstation/runtime concerns from case-definition concerns.
dfode_kit/utils/¶
Owns small generic helpers.
Rule¶
If a helper is domain-specific enough to obviously belong to cases, data, models, training, or runtime, do not place it in utils.
Migration mapping from current layout¶
Current -> target¶
cli_tools/->cli/df_interface/->cases/data_operations/->data/dfode_core/model/->models/dfode_core/train/->training/runtime_config.py->runtime/config.py
Likely file moves¶
cli_tools/main.py->cli/main.pycli_tools/command_loader.py->cli/command_loader.pycli_tools/commands/*.py->cli/commands/*.pydf_interface/case_init.py->cases/init.pydf_interface/flame_configurations.py->cases/presets.pydf_interface/sample_case.py->cases/sampling.pydf_interface/oneDflame_setup.py->cases/deepflame.pyor split acrossinit.py+deepflame.pydata_operations/contracts.py->data/contracts.pydata_operations/h5_kit.py-> split; pure HDF5 pieces todata/io_hdf5.pydata_operations/augment_data.py->data/augment.pydata_operations/label_data.py->data/label.pydfode_core/model/mlp.py->models/mlp.pydfode_core/model/registry.py->models/registry.pydfode_core/train/config.py->training/config.pydfode_core/train/registry.py->training/registry.pydfode_core/train/supervised_physics.py->training/supervised_physics.pydfode_core/train/train.py->training/train.pyruntime_config.py->runtime/config.pycli_tools/commands/run_case_helpers.py->runtime/run_case.pyor split between CLI and runtime library
Compatibility expectations¶
This refactor direction should preserve current user-facing behavior during migration:
- keep the
dfode-kitCLI command name unchanged - keep subcommand names stable unless there is a strong reason to rename them
- preserve current import paths temporarily via compatibility shims where needed
- prefer deprecation windows over abrupt breakage
What should not happen¶
- do not rewrite all packages in one PR
- do not move files without adding or updating tests for the moved contract
- do not change user-facing CLI behavior incidentally while only intending package moves
- do not put more domain-specific logic into
utils.py - do not mix rename-only moves with unrelated feature work
Success criteria¶
A refactor slice is aligned with this spec if it: - makes package names more domain-specific - reduces mixed-responsibility modules - keeps behavior stable or makes changes explicitly documented - preserves agent-friendly CLI/test behavior - leaves the repository easier to navigate for a new contributor
Deferred questions¶
These can be answered incrementally rather than upfront:
- should preprocess.py become training/preprocess.py, data/preprocess.py, or be split?
- should case sampling stay under cases/ or move partly into data/ after extraction?
- should runtime/ eventually hold more execution-plan/provenance helpers?
- should a future workflows/ package exist for higher-level chained operations?