Joy and the Spec-driven Tools: an honest comparison

· Joydev Team

Spec-driven development (SDD) is the big topic right now in AI-assisted engineering. Instead of telling a model "build me this" and hoping, you give it clear context first and let it generate code from that. Five tools define the field: BMAD, SpecKit, OpenSpec, GSD and Kiro.

We are often asked how Joy fits in. This article answers that plainly: first short profiles of all five tools and of Joy, then a grouping by structure, and at the end a direct comparison that also shows where Joy does not keep up.

The five tools in short

SpecKit (GitHub). You write a constitution.md with your project principles once. After that, every feature runs through the phases /specify, /plan, /tasks, /implement. Each feature gets its own numbered folder (specs/001-feature/) with several generated files: spec, plan, tasks, and often research and a data model too. Tool-agnostic, with a large ecosystem of more than 70 extensions. The context per feature is thorough, but it lives in its own folder rather than in one continuous overall specification.

OpenSpec (Fission-AI). Separates two areas: specs/ as the living truth about the system's current behavior, organized by domain, and changes/ for proposed changes. A change is written as a "delta spec" (ADDED, MODIFIED, REMOVED) and merged back into the main spec on archive. The result is a continuously maintained, lasting specification. Brownfield-friendly and lean.

BMAD (BMAD-METHOD). Simulates a whole team of specialized roles (product manager, architect, developer, QA and more, between 12 and 21 depending on the version). It covers the full arc from idea to deployment, with story and architecture documents as its artifacts. Powerful, but with a lot of process and ceremony.

GSD (Open GSD, "Git. Ship. Done."). A deliberately minimal phase loop per milestone: Discuss, Plan, Execute, Verify, Ship. Its trick against "context rot" is to push heavy work into subagents with a fresh context window, while files like STATE.md and CONTEXT.md serve as memory across sessions. No backlog, no status model, just process discipline.

Kiro (AWS). An agentic IDE built on VS Code, plus a CLI. For each feature it generates requirements.md, design.md and tasks.md, along with project-wide "steering" documents as lasting context. Changes can be approved step by step in the editor. A mature environment, but tied to the tool.

Joy in the same format

Joy (Joydev). You write three documents once, Vision, Architecture and Contributing, as a shared foundation. They can be brief, or serve as an entry point into a larger set of documents. After that, the work is broken into typed items: epic, story, task, bug, rework, decision, idea. Each item is a plain-text file in the Git repo (.joy/items/), has a status (new, open, in-progress, review, closed) and can have a parent. A decision helps record an architecture decision, much like an ADR; a story describes a feature; a task describes a concrete step. Together with the three base documents, a feature becomes a complete implementation brief made of a few items, without a single long spec document.

The difference begins where more than one participant is involved. In Joy, an AI is its own project member with its own identity (for example ai:claude@joy). When it runs a Joy command, it authenticates with a delegation token handed to it by a human: the token tells the CLI which AI member is acting and which human delegated. An agent can propose an item, but the new -> open gate can be set so that only a human approves it, and AI members cannot perform managing actions. Every action lands in the log with its identity and the delegation chain, for example [ai:claude@joy delegated-by:mac@phoenix.org], so that responsibility for each agent action traces back to the human who authorized it. Everything stays plain text in the repo, tied to no tool and no IDE.

Four structural types, and where Joy sits

If you sort the tools by how they keep their truth, four groups emerge. This matters more than any feature list, because it decides whether two tools fit together or get in each other's way.

Throwaway per feature (SpecKit). A feature's spec lives in its own folder and is essentially done once the feature is built. Strong when you want to think a single feature through thoroughly. Weak as a lasting map of the whole system. Weighting: high feature depth, low overall coherence.

Living specification (OpenSpec, Kiro). A continuously maintained truth about the system, into which changes flow back. Strong for long-term clarity. The price is upkeep: the spec has to be kept current. Weighting: high coherence, medium effort.

Phase discipline (GSD). Not a store of truth, but a process that walks the agent through phases in a disciplined way and keeps context fresh. Strong against quality decay in long sessions. But it holds no backlog of its own. Weighting: high execution quality, no structure across features.

Role orchestration (BMAD). A simulated team guides you through the whole lifecycle. Strong when you want guidance across all disciplines. The price is ceremony. Weighting: high guidance, high weight.

Joy does not fit cleanly into any of these four groups, because it serves a different axis. Joy is a typed, git-native store of work and evidence: lasting like the living specification, but organized in items with status rather than in spec documents, and extended with approval, identity and an audit trail that none of the four groups have. The per-feature depth that SpecKit or Kiro generate, Joy deliberately does not produce.

Joy against the tools: what works, what does not

The overview below shows, for the most important topics, what the tools can do and what Joy can do. Deliberately also where Joy does not keep up.

TopicSpecKitOpenSpecBMADGSDKiroJoy
Generate deep feature spec (requirements, design, research)yespartlyyespartlyyesno
Living, lasting overall specificationnoyespartlynoyesyes, as items
Work in typed units (story, bug, decision …)nonopartlynonoyes
Status lifecycle per work unitnonononopartlyyes
Approval gate that can block AInonononopartlyyes
Authenticated identity per actor, with delegation chainnononononoyes
Audit log of who did what and whennononononoyes
Git-native, plain text, tool-agnosticpartlyyespartlyyesnoyes
Mature ecosystem, many integrationsyespartlyyespartlyyesno
Guided phase process with fresh contextpartlynoyesyespartlyno, by choice

Two rows deserve emphasis, because they show Joy's limits honestly.

First, the deep feature spec. If you expect a requirements.md plus design.md plus research.md per feature, you get that from SpecKit or Kiro, not from Joy. Joy spreads the same information across the three base documents and a few items. For many features that is leaner; for a single, very complex feature with a lot of upfront research, the tools' generated documents can do more.

Second, the ecosystem. SpecKit has more than 70 extensions, Kiro stands on an established editor. Joy is young, and its surroundings are correspondingly small.

But the table shows something just as clearly: the lower half, status, approval, identity, audit trail, is handled by none of the five tools. That is not by accident, it is the point. The five tools make an agent productive while working out a feature. Joy makes the collaboration of several agents productive: items with status are clear units of work that can be worked on in parallel without getting in each other's way, and the gate makes sure nothing starts without approval. Because every action carries an identity and lands in the log, this distributed work is traceable at the same time. Productivity and evidence are the same mechanism here, not two separate things.

Should you combine them?

Mostly not. Each of these tools has its own place where the truth lives, and Joy has one too. Run two of them in parallel and you maintain two sources for the same thing, and they drift apart. This holds for Joy plus one of the tools as well. If you use Joy, you do not need a second specification framework for the core loop, because the agent already has enough context from the three documents and the items.

Two exceptions make sense. If a team is already deep into one of the tools and does not want to give up the process, Joy can sit loosely on top and manage approvals and evidence across the whole portfolio, while the other tool works out individual features. And the good habits of the SDD world, fresh context per task, atomic commits, clarify before you plan, can be practiced directly in Joy, without a second tool alongside it.

The bottom line

The five tools make an AI agent productive while working out a feature, each in its own way. Joy makes the collaboration of several agents productive: typed items with status as units of work that can be worked on in parallel and in a coordinated way, and because every action carries an identity, this distributed work is traceable at the same time.

They do not rule each other out, but they do not replace each other either. Which one fits depends on whether your bottleneck is the depth of a single feature, or the productive, traceable collaboration of many participants across everything being worked on.

References