Create Small data model.

Parameters:
See dedicated page for more information.
CreateDataModelSmall is a scripted action. Embedded code is accessible and customizable through this tab.
See dedicated page for more information.
CreateDataModelSmall is a “one-click trainer” that launches a compact Anatella/GEL training pipeline and writes all model artefacts into a working directory. It’s ideal for getting a first, good model quickly (proof-of-concept, baselines) before moving on to heavier sweeps.
Note: this action is a controller; it doesn’t emit rows on its output pin. Seeing an empty Data/Records panel after a successful run is normal—the results are files written to disk.
Choose the training strategy:
Folder that will store all artefacts (models, metrics, logs, scoring pipeline, feature lists, etc.).
Typical choices:
recorded data / records/ – persisted with the pipeline run history.temporary data – for ephemeral experiments.Tips:
wDir, disable restartScratch to resume/reuse cached steps; enable it to force a clean run.Free-form extra flags forwarded to the underlying .gel training pipeline.
Use for advanced tuning only (e.g., to pass a time budget or fold count if your GEL supports it). Leave empty otherwise.
Minimum number of rows a category must have to be kept as its own level.
Small levels are grouped into “Other” to stabilise models.
mms=1–5 for small datasets; raise it if you see rare-level overfitting.Deterministic seed for reproducibility.
42) to make runs repeatable.0/blank to let the trainer pick a random seed.The column to predict.
Optional, but recommended. A unique identifier per row used for joins, leakage checks and lineage. Supply if your dataset already contains one (e.g., ID, issue_key, customer_id).
Path to the GEL that implements the training flow. In your setup:
assets:/results/for_predictive_modeling.gel_anatella
You can swap this to another GEL if you have a custom training recipe: the controller will run whatever GEL you point to and place its outputs under
wDir.
wDir if present (faster reruns).The training GEL will read data from your pipeline (usually via the recorded data / records/ mount you selected).
Ensure:
tName (target) and, if provided, kName.mms.Exact filenames depend on the GEL you use, but typically you’ll find:
.gel that you can drop into production to score new data).gel into a downstream pipeline.Action succeeds but “Data/Records” is empty
Normal. This is a controller; artefacts are on disk under wDir.
“Target column not found” / “Unknown field”
Verify tName and kName exactly match the dataset columns available to the training GEL.
Model quality unstable across runs
Fix rs (random seed) to a constant; raise mms if classes have tiny categories.
Runs keep using stale results
Enable restartScratch or point wDir to a fresh folder.
It takes too long
Start with insight mode – FAST. Move to Normal/performance only when you’re satisfied with features/target.
records/run_2025-09-05_fast/), so results are immutable and comparable.modelGeneration, tName, kName, mms, and rs in your experiment log.| Setting | Value |
|---|---|
| modelGeneration | insight mode – FAST |
| wDir | recorded data / records/ |
| gelFile | assets:/results/for_predictive_modeling.gel_anatella |
| tName | (set to your target column, e.g., statusResolved) |
| kName | (optional, e.g., issue_key or ID) |
| mms | (start with 1–5; increase if many rare categories) |
| rs | (set to a fixed integer like 42 to reproduce) |
| restartScratch | OFF (turn ON to force a clean run) |
