The SliceOps framework
Decisions are the source of truth. Architecture, specs, plans, and execution are consequences of them — governed across twelve canonical principles and one atomic unit of work.
The 12 canonical principles
Remove any one and the result is no longer SliceOps. They hold across any runtime, any language, any team size.
Why
Why agentic, probabilistic construction needs deterministic, auditable control with human authority — so AI leverage never costs correctness, accountability, or trust.
How
The construction mechanisms that make decision integrity hold.
What
The discipline made tangible — bound to no model, no runtime.
The canonical model
Work is organized into three structural levels, plus one view computed from the dependency graph.
A logical grouping of slices with coherent scope. It closes with a retrospective and calibrates velocity afterward — not a sprint, epic, or milestone.
A functional domain inside a block — the SEC segment of a slice's identifier. Stable across blocks: domain is structural, not temporal.
The atomic, vertical, end-to-end unit of work: one agent chat, one PR, one cohesive outcome. Not a story, ticket, or task.
A computed view of the dependency graph — which slices are mergeable now. Derived, never committed to in a ceremony. Not a sprint or phase.
BL-XX.SEC-XX.SL-XXXEvery slice carries this identifier through its branch, commits, PR title, and decision records — the provenance thread that ties the whole system together.
Every interaction is a Session
The slice is one kind of Session — the development kind that ends in a PR. Most AI work never produces a PR, and SliceOps refuses to leave it unaudited: every interaction is a Session, classified and inside the audit plane.
slice ⊂ session — every slice is a session; not every session is a slice.
Eight Session-Types
Session is the thirteenth entity in the cognitive catalog — the machine-readable model agents reason over: Decision Record, Insight Record, Outcome Record, Capability, Goal, Learning Pattern, Cognitive Framework, Context Pack, Active Priority, Relationship Context, Preference, Value, and Session.
Anatomy of a slice
A slice is the smallest unit that is independent, testable, and useful — and it carries its whole outcome with it.
Mergeable without breaking what exists. One architectural concern only — if it spans more, split it.
Validated by at least one test before it can merge.
Adds incremental value, not refactoring for its own sake.
Scope · decision · code · tests · evidence · merge — one chat, one PR.
A slice has two sizes
Conflating them is the number-one mistake teams make running agents. One axis is throughput — what you pay. The other is peak footprint — whether a model can run the slice at all.
Total work in billed-equivalent tokens (input plus cache, weighted by price) — not raw totals, which inflate. Governs forecast and budget.
The most context loaded in any single turn. Decides which models can run the slice at all — a window smaller than the footprint simply cannot.
The axes are orthogonal. A slice can be XL-token / S-context (lots of output, small working set) or S-token / XL-context (little output, a large codebase to load). Cost and viability are different failure modes — they need different bands.
Calibrated, not invented
The bands aren't guesses. The toolkit ships a deterministic script that reads your own agent session logs and computes your bands — same logs, same script, same result. Recalibrate when the model landscape shifts.
Calibration script — opening with the toolkit.
Model Triage
Which model should run a slice? SliceOps answers it the way it answers everything — explicitly, and on the record. Five axes, applied in filter order:
The primary filter — only models whose window fits the slice's peak footprint are eligible at all.
Sensitive slices route to a local model automatically — never to an external API, whatever the cost or speed.
Token-band and slice-type set the reasoning tier the work actually needs.
Interactive work and background batch have different speed needs.
Within the eligible set, the cheapest adequate model wins.
Each recommendation pairs a model with an execution mode — frontier API, local-via-API, a flat-rate plan already paid for, or a model embedded in the IDE.
Compliance by construction
Axis 2 is the one regulated teams feel first. “Sensitive data stays local” stops being a policy you hope people follow and becomes a routing rule that executes — and the model, the mode, and the rationale are recorded on the session. Closed tools hard-code one model or route invisibly; SliceOps makes the decision explicit and auditable.
The Context Router
A serious codebase's context only grows. The question isn't how to need less of it — it's how to use it well. The router loads only what a session actually needs.
Instead of the whole corpus, it activates only the relevant context-experts — the decisions, entities, and modules a session touches. It's MoE-inspired, but at the orchestration level: it governs which context the agent receives, not how the model attends internally.
Synthesis efficiency
The other half is production. A model that says the same thing in fewer tokens — without losing the idea — keeps tomorrow's footprint from bloating. Verbose creation today is compounding debt later. The goal is density that still clears the acceptance gate, not terseness for its own sake.
This is Context Discipline (P12) made tangible: context as a governed, single source of truth — selectively routed, never assumed.
Evidence by construction
Every slice produces evidence in four mandatory categories, plus a security gate. Un-evidenced slices do not merge.
Tests pass.
Linters and metrics — coverage, format, complexity.
Decision Records and insight records.
Slice ID, agent, timestamps, commit SHA.
Security is its own per-slice gate: secrets scan, SAST, dependency and supply-chain checks — not a periodic audit.
The audit plane: Decision Records
No code-quality tool, runtime monitor, or compliance platform audits the decision plane — what was decided, by whom, why, and with what supersession chain. That layer is the SliceOps wedge.
- 01Every architectural decision is a Decision Record with frontmatter that maps to a cognitive entity model agents can reason over.
- 02Records are append-only — never deleted, never silently rewritten. Supersession is a bidirectional, acyclic edge: the new record declares what it replaces, the old one points forward.
- 03Each record traces back to the slice that produced it and stays reachable from it.
- 04CI rejects any PR that breaks the schema, orphans a decision, or leaves a supersession chain inconsistent.
proposed → accepted → superseded / deprecatedStage as a DAG-derived view
Slices declare their dependencies with explicit edges. A Stage is computed by traversing that graph — Stage N is whatever is unblocked at step N. Parallelism falls out of the topology, not out of a planning meeting. Forecasting and retrospectives happen at the block level; there are no burndown charts and no “we committed to N slices this sprint.”
Fourteen merge gates
The canonical R-rules (R1–R14) are hard CI gates — each one traces back to a principle. Adopters add their own from R15 onward.
Phase 2.5 adds coherence validators on top of the R-rules — principle counts, entity counts, token-band units, and an LLM-cost gate — so denormalized drift can't quietly accumulate. They ship as drop-in CI checks in the toolkit.
The LLM bill is a finite resource
Running agents in CI costs real money, and the failure mode is silent — you find out when the invoice lands. SliceOps treats inference like any other shared resource: enumerated, capped, gated. Five levers:
Cache stable blocks; skipping it can leave 40–60% of cost on the table.
Use the cheapest adequate tier — top-tier “just in case” is roughly a 3× penalty for no gain.
Send the diff and a few lines around it; fetch the whole file only when reasoning needs it.
Audit on open, reopen, and ready — not on every push. Velocity shouldn't multiply the bill.
Skip expensive jobs on drafts, but finish green — a skipped required check blocks the PR forever.
Determinism over regeneration
If something repeats, materialize it once — a script, a validator, an R-rule — instead of asking a model to regenerate it every time. Deterministic code is cheaper, faster, and produces evidence that doesn't drift. Have the AI write the tool once, then reuse it.
Decision-driven, not spec-driven
Spec-first toolchains assume the spec is right up front and any divergence is a bug. SliceOps assumes you learn while you build — so the source of truth is the corpus of decisions and merged code, and the record of why the spec changed is itself the artifact worth keeping.
It wraps your flow — it doesn't compete
Spec-first, test-first, contract-first — those are authoring styles. SliceOps is the discipline plane around any of them. Use your preferred flow to get from intent to code; use SliceOps to make the whole thing atomic, auditable, and self-improving. It sits on top of Spec Kit, not against it.
Acceptance-first, by convention
Convention over configuration: the opinionated default is that each slice declares its acceptance criteria upfront, ideally as executable tests. One artifact anchors the scope at the start and closes the slice as evidence at the end. A default, not a mandate.
Three wedges
The architectural-decision layer no existing tool audits.
DAG-driven — five to thirteen simultaneous agents as a normal operating mode.
Every artifact is structured so the next agent can learn from it.