The SliceOps framework

Decisions are the source of truth. Architecture, specs, plans, and execution are consequences of them — governed across twelve canonical principles and one atomic unit of work.

The 12 canonical principles

Remove any one and the result is no longer SliceOps. They hold across any runtime, any language, any team size.

Why

Why agentic, probabilistic construction needs deterministic, auditable control with human authority — so AI leverage never costs correctness, accountability, or trust.

How

The construction mechanisms that make decision integrity hold.

What

The discipline made tangible — bound to no model, no runtime.

The canonical model

Work is organized into three structural levels, plus one view computed from the dependency graph.

Block

A logical grouping of slices with coherent scope. It closes with a retrospective and calibrates velocity afterward — not a sprint, epic, or milestone.

Section

A functional domain inside a block — the SEC segment of a slice's identifier. Stable across blocks: domain is structural, not temporal.

Slice

The atomic, vertical, end-to-end unit of work: one agent chat, one PR, one cohesive outcome. Not a story, ticket, or task.

Stage

A computed view of the dependency graph — which slices are mergeable now. Derived, never committed to in a ceremony. Not a sprint or phase.

BL-XX.SEC-XX.SL-XXX

Every slice carries this identifier through its branch, commits, PR title, and decision records — the provenance thread that ties the whole system together.

Every interaction is a Session

The slice is one kind of Session — the development kind that ends in a PR. Most AI work never produces a PR, and SliceOps refuses to leave it unaudited: every interaction is a Session, classified and inside the audit plane.

slice ⊂ session — every slice is a session; not every session is a slice.

Eight Session-Types

SliceFull end-to-end development that ships a PR.

ArtifactA bounded output — a script, template, config, or doc — short of a full slice.

SupportIncidents and care, internal or customer-facing.

InfraInfrastructure, deploy, and environment operations.

MetaGovernance — foundations, planning, framework and project decisions.

AuditVerification, control, and compliance work.

LearningExploration and research that feeds the next agent.

OrchestrateCoordination of other sessions.

Session is the thirteenth entity in the cognitive catalog — the machine-readable model agents reason over: Decision Record, Insight Record, Outcome Record, Capability, Goal, Learning Pattern, Cognitive Framework, Context Pack, Active Priority, Relationship Context, Preference, Value, and Session.

Anatomy of a slice

A slice is the smallest unit that is independent, testable, and useful — and it carries its whole outcome with it.

Independent

Mergeable without breaking what exists. One architectural concern only — if it spans more, split it.

Testable

Validated by at least one test before it can merge.

Useful

Adds incremental value, not refactoring for its own sake.

Scope · decision · code · tests · evidence · merge — one chat, one PR.

A slice has two sizes

Conflating them is the number-one mistake teams make running agents. One axis is throughput — what you pay. The other is peak footprint — whether a model can run the slice at all.

Token-band · throughput → cost

Total work in billed-equivalent tokens (input plus cache, weighted by price) — not raw totals, which inflate. Governs forecast and budget.

< 2M tokens

2–5M tokens

5–10M tokens

10–20M tokens

> 20M — a red flag to split

Context-band · peak footprint → viability

The most context loaded in any single turn. Decides which models can run the slice at all — a window smaller than the footprint simply cannot.

< 32K tokens

32–128K tokens

128–200K tokens

200–512K tokens

> 512K tokens

The axes are orthogonal. A slice can be XL-token / S-context (lots of output, small working set) or S-token / XL-context (little output, a large codebase to load). Cost and viability are different failure modes — they need different bands.

Calibrated, not invented

The bands aren't guesses. The toolkit ships a deterministic script that reads your own agent session logs and computes your bands — same logs, same script, same result. Recalibrate when the model landscape shifts.

Calibration script — opening with the toolkit.

Model Triage

Which model should run a slice? SliceOps answers it the way it answers everything — explicitly, and on the record. Five axes, applied in filter order:

1 · Context-band

The primary filter — only models whose window fits the slice's peak footprint are eligible at all.

2 · Sensitivity → locality

Sensitive slices route to a local model automatically — never to an external API, whatever the cost or speed.

3 · Complexity

Token-band and slice-type set the reasoning tier the work actually needs.

4 · Latency

Interactive work and background batch have different speed needs.

5 · Cost

Within the eligible set, the cheapest adequate model wins.

Each recommendation pairs a model with an execution mode — frontier API, local-via-API, a flat-rate plan already paid for, or a model embedded in the IDE.

Compliance by construction

Axis 2 is the one regulated teams feel first. “Sensitive data stays local” stops being a policy you hope people follow and becomes a routing rule that executes — and the model, the mode, and the rationale are recorded on the session. Closed tools hard-code one model or route invisibly; SliceOps makes the decision explicit and auditable.

The Context Router

A serious codebase's context only grows. The question isn't how to need less of it — it's how to use it well. The router loads only what a session actually needs.

Instead of the whole corpus, it activates only the relevant context-experts — the decisions, entities, and modules a session touches. It's MoE-inspired, but at the orchestration level: it governs which context the agent receives, not how the model attends internally.

Synthesis efficiency

The other half is production. A model that says the same thing in fewer tokens — without losing the idea — keeps tomorrow's footprint from bloating. Verbose creation today is compounding debt later. The goal is density that still clears the acceptance gate, not terseness for its own sake.

This is Context Discipline (P12) made tangible: context as a governed, single source of truth — selectively routed, never assumed.

Evidence by construction

Every slice produces evidence in four mandatory categories, plus a security gate. Un-evidenced slices do not merge.

Functional

Tests pass.

Quality

Linters and metrics — coverage, format, complexity.

Decision

Decision Records and insight records.

Provenance

Slice ID, agent, timestamps, commit SHA.

Security is its own per-slice gate: secrets scan, SAST, dependency and supply-chain checks — not a periodic audit.

The audit plane: Decision Records

No code-quality tool, runtime monitor, or compliance platform audits the decision plane — what was decided, by whom, why, and with what supersession chain. That layer is the SliceOps wedge.

01Every architectural decision is a Decision Record with frontmatter that maps to a cognitive entity model agents can reason over.
02Records are append-only — never deleted, never silently rewritten. Supersession is a bidirectional, acyclic edge: the new record declares what it replaces, the old one points forward.
03Each record traces back to the slice that produced it and stays reachable from it.
04CI rejects any PR that breaks the schema, orphans a decision, or leaves a supersession chain inconsistent.

Lifecycleproposed → accepted → superseded / deprecated

Stage as a DAG-derived view

Slices declare their dependencies with explicit edges. A Stage is computed by traversing that graph — Stage N is whatever is unblocked at step N. Parallelism falls out of the topology, not out of a planning meeting. Forecasting and retrospectives happen at the block level; there are no burndown charts and no “we committed to N slices this sprint.”

Fourteen merge gates

The canonical R-rules (R1–R14) are hard CI gates — each one traces back to a principle. Adopters add their own from R15 onward.

R1No secrets in markdown or YAML.

R2No broken cross-references.

R3Frontmatter required and valid.

R4Decision-registry consistency — every cited decision has a record.

R5Lifecycle transitions are atomic — move, status, and both supersession edges in one PR.

R6No TODO/FIXME/HACK in frozen decisions or specs.

R7External-source provenance preserved.

R8SemVer discipline — plan and spec versions move in lockstep.

R9Cross-repo agent context stays in sync.

R10Archived files are immutable.

R11Confidentiality classification present and in range.

R12Imports restricted to an authorized source tree.

R13The slice ledger is updated on every closing PR.

R14No content outside the repo's declared scope.

Phase 2.5 adds coherence validators on top of the R-rules — principle counts, entity counts, token-band units, and an LLM-cost gate — so denormalized drift can't quietly accumulate. They ship as drop-in CI checks in the toolkit.

The LLM bill is a finite resource

Running agents in CI costs real money, and the failure mode is silent — you find out when the invoice lands. SliceOps treats inference like any other shared resource: enumerated, capped, gated. Five levers:

Prompt-caching

Cache stable blocks; skipping it can leave 40–60% of cost on the table.

Model-tier

Use the cheapest adequate tier — top-tier “just in case” is roughly a 3× penalty for no gain.

Diff-only context

Send the diff and a few lines around it; fetch the whole file only when reasoning needs it.

Trigger minimalism

Audit on open, reopen, and ready — not on every push. Velocity shouldn't multiply the bill.

Draft gate

Skip expensive jobs on drafts, but finish green — a skipped required check blocks the PR forever.

Determinism over regeneration

If something repeats, materialize it once — a script, a validator, an R-rule — instead of asking a model to regenerate it every time. Deterministic code is cheaper, faster, and produces evidence that doesn't drift. Have the AI write the tool once, then reuse it.

Decision-driven, not spec-driven

Spec-first toolchains assume the spec is right up front and any divergence is a bug. SliceOps assumes you learn while you build — so the source of truth is the corpus of decisions and merged code, and the record of why the spec changed is itself the artifact worth keeping.

It wraps your flow — it doesn't compete

Spec-first, test-first, contract-first — those are authoring styles. SliceOps is the discipline plane around any of them. Use your preferred flow to get from intent to code; use SliceOps to make the whole thing atomic, auditable, and self-improving. It sits on top of Spec Kit, not against it.

Acceptance-first, by convention

Convention over configuration: the opinionated default is that each slice declares its acceptance criteria upfront, ideally as executable tests. One artifact anchors the scope at the start and closes the slice as evidence at the end. A default, not a mandate.

Three wedges

Audit plane

The architectural-decision layer no existing tool audits.

Multi-agent parallelism

DAG-driven — five to thirteen simultaneous agents as a normal operating mode.

AI-readable engineering

Every artifact is structured so the next agent can learn from it.

The SliceOps framework

Why

P1 — Decision Integrity by Construction

P2 — Audit Plane Discipline

P3 — Human-in-the-Loop Authority

How

P4 — Slice Atomicity

P5 — Stage as DAG-Derived View

P6 — Evidence by Construction

P7 — Security by Construction

P8 — Recursive Learning by Capture

P9 — Shared-Resource Pre-flight

P10 — Infrastructure Continuity