Data Preparation and Training Workflow¶

This page documents the currently exposed CLI stages after a case has been initialized and run successfully.

It focuses on the data pipeline from:

finished DeepFlame/OpenFOAM case outputs
sampled HDF5 state data
optional HDF5-to-NumPy conversion
augmented state datasets
labeled supervised-learning datasets
trained surrogate model artifacts

Stage boundaries¶

The current CLI presents the data workflow as a sequence of artifact transformations.

1. `sample`¶

Input: - a finished case directory - a mechanism file

Output: - an HDF5 file containing sampled scalar fields - optionally mesh datasets

Example:

dfode-kit sample \
  --mech /path/to/gri30.yaml \
  --case /path/to/run/oneD_flame_CH4_phi1 \
  --save /path/to/run/oneD_flame_CH4_phi1/ch4_phi1_sample.h5 \
  --include_mesh

Typical contents include: - root metadata such as mechanism - scalar_fields/ datasets keyed by output time - optional mesh datasets

2. `h52npy`¶

Input: - sampled HDF5 file

Output: - stacked NumPy array of scalar fields

Example:

dfode-kit h52npy \
  --source /path/to/run/oneD_flame_CH4_phi1/ch4_phi1_sample.h5 \
  --save_to /path/to/data/ch4_phi1_sample.npy

Use this when downstream workflows need a single NumPy array rather than time-indexed HDF5 datasets.

3. `augment`¶

Input: - sampled HDF5 file - mechanism file

Output: - augmented NumPy dataset

Example:

dfode-kit augment \
  --source /path/to/run/oneD_flame_CH4_phi1/ch4_phi1_sample.h5 \
  --mech /path/to/gri30.yaml \
  --save /path/to/data/ch4_phi1_aug.npy \
  --preset random-local-combustion-v1 \
  --target-size 20000 \
  --apply

Minimal public contract: - --source - --mech - --save (required for --apply) - --preset - --target-size - --seed (optional) - --preview - --apply - --json - --write-config - --from-config

Current note on `augment`¶

The augmentation CLI is intentionally preset-driven and keeps the public flag surface small. For more advanced tuning, use --preview --write-config and apply later with --from-config.

4. `label`¶

Input: - mechanism file - NumPy state dataset - reactor advancement time step

Output: - labeled NumPy dataset suitable for supervised learning

Example:

dfode-kit label \
  --mech /path/to/gri30.yaml \
  --time 1e-6 \
  --source /path/to/data/ch4_phi1_aug.npy \
  --save /path/to/data/ch4_phi1_labeled.npy

Conceptually, this stage advances each sampled state with Cantera/CVODE and writes paired source/target state data.

5. `train`¶

Input: - mechanism file - labeled NumPy dataset

Output: - trained model artifact written to the requested output path

Example:

dfode-kit train \
  --mech /path/to/gri30.yaml \
  --source_file /path/to/data/ch4_phi1_labeled.npy \
  --output_path /path/to/models/ch4_phi1_model.pt

Recommended artifact layout¶

A practical directory layout is:

<project-root>/
  runs/
    oneD_flame_CH4_phi1/
      ch4_phi1_sample.h5
  data/
    ch4_phi1_sample.npy
    ch4_phi1_aug.npy
    ch4_phi1_labeled.npy
  models/
    ch4_phi1_model.pt

This keeps: - case-run artifacts near the case directory - derived training datasets under a separate data/ area - trained models under a separate models/ area

Current limitations and documentation gaps¶

The CLI surface for the data pipeline is usable, but not yet as normalized as init and run-case.

Current gaps include: - limited machine-readable JSON output for sample, label, and train - older option naming conventions still present on some commands such as --source_file - thinner published documentation for training outputs and configuration detail than for case init/run

These are good future cleanup targets, but the commands above describe the current behavior on main.

Validated minimal sequence¶

For a validated 1D flame workflow, the current practical sequence is:

dfode-kit init oneD-flame ... --apply
dfode-kit run-case --case /path/to/case --apply --json
dfode-kit sample --mech /path/to/gri30.yaml --case /path/to/case --save /path/to/sample.h5 --include_mesh

After sampling, continue with either:

dfode-kit h52npy --source /path/to/sample.h5 --save_to /path/to/sample.npy

or directly with augmentation/labeling:

dfode-kit augment ...
dfode-kit label ...
dfode-kit train ...

Data Preparation and Training Workflow¶

Stage boundaries¶

1. sample¶

2. h52npy¶

3. augment¶

Current note on augment¶

4. label¶

5. train¶

Recommended artifact layout¶

Current limitations and documentation gaps¶

Validated minimal sequence¶

1. `sample`¶

2. `h52npy`¶

3. `augment`¶

Current note on `augment`¶

4. `label`¶

5. `train`¶