rules bound, judgments decide
every tunable in your system is either a rule or a judgment. most systems confuse the two, and pay for it.
there's a temptation when you're building a system: make everything deterministic. every decision traceable to a rule. every rule traceable to inputs. no guesses, no heuristics, no "well, it depends."
it's a good instinct. it's also wrong.
you can't get to a fully deterministic system. you shouldn't even try. and the shape of what you actually should build becomes obvious the moment you have to put every knob onto a config page.
the config page is the forcing function
when you build a config page for a real operation — every tunable in the system, exposed to the operator, with an explanation of what it does — you're forced to answer a question for each knob: is this a rule, or is this a judgment?
most systems never make this explicit. rules and judgments live side by side in the same code file, in the same database table, in the same operator's head. nothing forces the sorting. and the cost of skipping the sorting is that the system drifts: rules get treated like judgments — operators override them for "just this one case" — and judgments get treated like rules, because nobody revisits them once they feel frozen.
once you sort them, a structure emerges. four classes, and the storage mechanism is the classification.
four classes of tunable
hard constants. baked into code. changing one requires a deploy. these are the things you are most sure about — closing cost formulas, minimum margin floors, outlier-rejection math. if the operator wants to change these, something is probably wrong upstream, and you want that friction.
operator-tunable heuristics. stored in the database. an operator can change them without touching code. these are the things you picked a number for but aren't fully confident about — confidence thresholds for auto-approval, autonomy modes per pipeline stage, market-selection filters. you want these adjustable because you know the right answer drifts.
frozen heuristics. decided once, displayed for reference, but not actively tuned. these are the ones too embedded in the design to surface as live knobs, but also not hard-coded math. they're the shape of the thing — contact cadences, message arcs, sequence structures. most systems don't have a name for this category. most systems need one.
bounded runtime judgment. the decisions that happen in the moment — by a human, or by an ai call — constrained by the rules and thresholds around them. the pricing call on a specific deal. the escalation decision for a specific account. the "does this feel right" moment that can't be reduced to a formula, but can be fenced in.
the pattern: the decision lives in the judgment layer, but the rules bound what the judgment is allowed to do.
rules bound, judgments decide
this is the insight that reorganizes everything.
rules don't decide anything. they draw the edges of what's allowed. never offer below this margin. never auto-approve below this confidence. never consider a comp outside this statistical window. none of these produce an answer. they refuse certain answers.
judgments make the actual call inside the fence. the decision you end up with is always the judgment's. the rules only constrain the space it's allowed to move in.
the failure mode is confusing the two. you try to promote a judgment to a rule — "just give me a formula for the offer price" — and the formula ossifies the moment the market shifts. or you treat a hard rule as a judgment — "let me override the margin floor on this one deal" — and the guardrail stops guarding anything.
why judgments never go away
the temptation to delete the judgment layer is strong. surely with enough data, you can replace it with a rule? surely it's just a matter of collecting more examples and fitting a function?
no. and the reasons matter.
the environment isn't stationary. last quarter's data doesn't predict this quarter's. a rule fit to history is confidently wrong the moment history changes. a judgment operator reads the new conditions and adjusts in a day.
the dominant signal lives outside your data. the thing that actually moves the decision — the tone of a seller's voice, the phrasing of an inbound message, the condition of an asset that isn't in any database — is not in the inputs you'd feed the rule. you can't encode what you can't observe.
some decisions are value judgments. the confidence at which you auto-approve is not a fact about the world. it's a fact about your risk tolerance. no dataset can decide it for you. it's yours to set, to defend, and to adjust as your appetite changes.
trying to delete the judgment layer is trying to delete the part of the system that adapts. it's the wrong goal.
tighten the fence instead
the right goal is not deterministic decisions. the right goal is bounded judgment, with a mechanism that makes the fence tighter over time.
here is the playbook for any tunable you care about:
log the outcome. every time the judgment gets used — every time a number gets picked from inside the fence — record what happened. the value chosen, the context around it, the eventual result. this is the single step most systems skip, and skipping it means nothing that follows is possible. the outcomes log is the pattern. everything else is cleanup.
look at what worked. once you have enough outcomes, compare what you did to what the outcomes looked like. you are not computing the "right" number. you are finding out where your current range was wrong — too high, too low, too wide.
narrow the fence from evidence. a single number becomes a range with a default inside. the judgment still happens, but the operator can't accidentally step outside what the data supports. same judgment, tighter boundary.
split the category when a hidden axis appears. sometimes the data tells you that what looked like one category was really two. the fix is not a better rule — it's a finer classification, each with its own tighter fence.
you do this for years. the fences get tighter. the judgment space gets smaller. but it never goes to zero, because the environment keeps moving and the dominant signal stays outside your inputs.
the supervisor goes out of calibration
here is the twist that makes all of this harder than it looks.
the fence-tightening loop depends on a supervisor. somebody — operator, reviewer, whoever sits at the top of the stack — has to look at the outcomes log and decide whether the tightening is working. the whole machine assumes that supervisor's judgment stays sharp.
it doesn't.
when the fence is wide, the supervisor makes many raw decisions. each one is a rep. reps are how calibration gets built and how it gets maintained — pattern recognition for when the usual answer doesn't apply, feel for what normal looks like, instinct for when the model is wrong.
when the fence tightens, the reps change shape. the supervisor stops deciding and starts approving. approvals are a different cognitive act — you're pattern-matching on the approval request, not reading the raw situation. and the set of cases you see also changes: the automation handles the easy middle, so you only ever see the outliers. which sounds like a good thing — human does the hard cases, machine does the rest — except you've lost your view of the full distribution. you can't tell what's actually an outlier anymore because you've forgotten what normal looks like.
this is a known pattern in every heavily automated field. aviation calls it automation complacency. anesthesiologists watching monitors stop reading raw physiological cues. algorithmic traders lose feel for the tape. the usual framing is "automation fails, the human can't take over" — and that's real, but it's not the interesting failure mode for the case we're in.
our failure mode is subtler. the automation doesn't fail. it just drifts in a direction the decayed supervisor can no longer detect. there's no crash. there's only quiet, compounding divergence from what good used to look like. and because the supervisor is the one validating the tightening loop, the loop now optimizes toward what a degraded supervisor would approve. it eats its own tail.
there's a training-data corollary that bites even harder. in most ai-assisted systems, the supervisor's judgment is the training signal. overrides and edits teach the model what right looks like. as the fence tightens and overrides drop, the signal gets thinner — and the signal that does come through is coming from a supervisor whose calibration has been eroding. garbage in, garbage out, except the garbage is being produced by the very loop that was supposed to raise quality.
reversing this requires active practice. the natural dynamics always decay the human layer — the whole point of automation is to take decisions off the human. so any fix has to be intentional:
preserved reps. periodically, show the supervisor raw cases the automation could have handled. have them decide blind. compare. keep the muscle alive.
blind sampling. random audits where the supervisor reviews raw cases without seeing what the automation suggested first. catches drift the automation can't catch by definition.
adversarial probes. inject synthetic edge cases to test whether the supervisor can still spot outliers. training data for the human, not just the model.
rotation. don't let a single supervisor stay at the top of the stack indefinitely. fresh eyes see drift that tenured eyes have normalized to.
none of these happen by default. the default is decay.
the plumbing is the point
the thing i keep noticing: when a team says "we want to make this more deterministic," what they really need is not better rules — it's the outcomes log they never built, and the calibration log they didn't know they needed. the classification debate is interesting. the actual work is the feedback plumbing that lets you tighten anything at all — and the counter-plumbing that keeps the supervisor honest while you do it.
a system with rich rules and no outcomes log is frozen. it will do exactly what it did a year ago, forever.
a system with a rich outcomes log and no calibration preservation is alive but delusional. it compounds confidently in a direction nobody at the top of the stack can still evaluate.
a system with both is the one you want. it will look nothing like itself a year from now, and it will be better, and the person at the top will still be able to tell.
the goal is not to replace the judgment. the goal is to build an environment where the judgment is bounded, logged, moving in the right direction — and where the judge is still calibrated to recognize when it isn't.
rules bound. judgments decide. the feedback loop tightens the bound. and the counter-loop — the one that keeps the supervisor sharp — is what keeps the whole thing honest.