PrisML 0.3.0: getting the defaults right

PrisML 0.3.0 is out. It’s a minor version, but the changes are more meaningful than the number suggests — this release is about correctness.

A few defaults were wrong. The runtime dependency was wrong. The schema hash was broader than it needed to be. None of these were visible in the happy path, but they would have caused friction as usage scaled. 0.3.0 fixes all of them.

You no longer need to pick an algorithm

The biggest visible change: algorithm is now optional in defineModel().

Before, you had to write something like:

algorithm: { name: 'forest', version: '1.0.0' },

That’s a reasonable choice if you know your data well. But most of the time, especially early in a project, you don’t. You’re guessing. And often the guess is wrong — a random forest might underperform a simple linear model on your specific dataset.

Now if you omit algorithm, PrisML uses FLAML to pick the best one for you.

FLAML (Fast and Lightweight AutoML) is a library from Microsoft Research that automates algorithm selection. You give it labeled data and a time budget — say 60 seconds — and it tries combinations of algorithms and settings, keeps what works, and returns the best model it found. It’s not magic; it’s a disciplined search over a well-defined space. But it means you don’t have to be an ML expert to get a solid starting model.

In practice, your model definition gets simpler:

export const churnModel = defineModel<User>({
  name: 'churnRisk',
  modelName: 'User',
  output: {
    field: 'willChurn',
    taskType: 'binary_classification',
    resolver: (u) => u.cancelledAt !== null,
  },
  features: {
    daysActive: (u) => (Date.now() - u.createdAt.getTime()) / 86_400_000,
    loginFrequency: (u) => u.logins.length,
    planTier: (u) => u.subscription.tier,
    monthlySpend: (u) => u.totalSpend / u.monthsActive,
  },
  qualityGates: [
    { metric: 'f1', threshold: 0.85, comparison: 'gte' },
  ],
});

prisml train now tells you what it picked:

◉ Training churnRisk (FLAML AutoML, 60s budget)...
✔ Best estimator: LGBMClassifier

That name is stored in the artifact metadata. If you want to pin a specific algorithm — say you benchmarked it and gbm consistently wins on your data — the explicit override still works exactly as before.

String and number features were encoded wrong

This one is subtle but matters a lot.

Strings. The previous version mapped string features to integers using a hash function. So "premium" might become 3, "free" might become 7, "enterprise" might become 1. This is called label encoding, and it’s problematic: it tells the model that "enterprise" (1) is closer to "free" (7) than to "premium" (3), which is nonsense. The numbers imply an ordering that doesn’t exist.

The correct approach for categorical strings is one-hot encoding: each possible category becomes its own column, with a 1 where the category matches and 0 everywhere else. "premium" becomes [0, 1, 0], "free" becomes [1, 0, 0], "enterprise" becomes [0, 0, 1]. No false ordering, no implied relationships.

PrisML now builds a category list at training time, stores it in the model’s metadata, and uses it at inference. If a value appears at inference that wasn’t in the training data, it maps to all zeros — no crash, no silent corruption.

Numbers. Raw numbers also had a problem. A feature like monthlySpend might range from 5 to 5000. Another like loginFrequency might range from 0 to 30. When these go into a linear model or anything distance-based, the large-scale feature dominates just because of its range — not because it’s more important.

Standard scaling fixes this: compute the mean and standard deviation of each numeric feature at training time, then at inference apply (value - mean) / std. Every feature ends up on roughly the same scale. The stats are stored in metadata alongside the category lists. Nothing to manage manually.

These two changes don’t affect tree-based models much — decision trees and random forests are scale-invariant and handle ordinal encoding tolerably. But FLAML might pick a linear model or a support vector machine, both of which benefit significantly from proper encoding. Getting this right means FLAML’s choice of algorithm is actually valid regardless of what it picks.

The ONNX runtime was wrong

PrisML loaded models using onnxruntime-web. That’s the browser build — it uses WebAssembly and is designed to run in a browser tab.

PrisML runs in Node.js. There’s a separate package for that: onnxruntime-node. It uses native bindings and is substantially faster.

The web runtime technically executes in Node under some conditions, which is why this never surfaced as a hard failure. But it’s the wrong dependency and the inference path was operating outside its intended environment.

0.3.0 depends on onnxruntime-node directly. This is the correct runtime.

The schema hash was too broad

Every PrisML artifact carries a hash of your Prisma schema. When you load a model at runtime, the current schema is hashed and compared against the stored value. If they differ, inference is rejected — this is the schema drift check.

The problem: it was hashing the entire schema file. If you added an unrelated model to your schema — a new Log table, an index on User.email, anything — all your existing PrisML artifacts would fail to load with SchemaDriftError, even though nothing relevant to the model had changed.

0.3.0 changes the hash to cover only the model block the artifact was trained on, plus any enums that model references. Changes to unrelated models don’t invalidate your artifacts anymore.

Old artifacts (produced before 0.3.0) still load correctly. The artifact carries a metadataSchemaVersion field, and the runtime uses that to know which hash strategy to apply. No migration needed.

What didn’t change

The API is the same. defineModel, PredictionSession, load, predict, predictBatch, quality gates — none of that changed. Models trained with 0.1.x still load. The only breaking changes are in the TypeScript types: AlgorithmConfig.version was removed (it was never read), and a few internal fields in EncodedFeature and ModelMetadata are now required.

The goal of this release was to fix what was quietly wrong before building on top of it.

Install or update:

npm install @vncsleal/prisml@latest

Source on GitHub.