Skip to content

GStack: How It Transforms Claude Code into a Virtual Software Development Team

A Complete Breakdown of Its Architecture and Operating Principles

GStack:把 Claude Code 變成虛擬軟體開發團隊的運作原理完全解析

架構設計與運作原理深度剖析


"The future of software development isn't one AI doing everything — it's structured AI roles doing the right thing at the right time." — The design philosophy behind GStack

「軟體開發的未來不是一個 AI 做所有事——而是結構化的 AI 角色在正確的時間做正確的事。」—— GStack 背後的設計哲學


1. What Is GStack?

GStack is an open-source skill pack created by Garry Tan, CEO of Y Combinator, that transforms Claude Code from a single AI coding assistant into a structured virtual software development team. Instead of treating AI as one amorphous helper, GStack maps distinct human engineering roles — CEO, Architect, QA Lead, Release Manager, and more — into dedicated slash commands, each with its own behavioral constraints and responsibilities.

The project is MIT licensed and has amassed over 16,000 GitHub stars, making it one of the most popular Claude Code extensions in the ecosystem.

What makes GStack remarkable is its proven track record:

  • 600,000 lines of production code written in 60 days
  • 100 pull requests merged in a single 7-day sprint
  • All achieved while Garry Tan continued his full-time duties as YC CEO

These numbers are not from a dedicated engineering team. They represent what a single person accomplished by orchestrating AI through a disciplined, role-based workflow.

What GStack Is Not

Before diving deeper, it is important to clarify what GStack is not:

  • It is not a new AI model or a wrapper around an API
  • It is not a standalone application — it requires Claude Code as its exclusive platform
  • It is not a code generation tool — it is a workflow governance system that shapes how Claude Code behaves
  • It is not a cloud service — everything runs locally, with zero telemetry

GStack is fundamentally a collection of pure Markdown configuration files that instruct Claude Code to assume specific roles and follow specific processes. The genius lies not in the technology, but in the organizational design.

1. GStack 是什麼?

GStack 是由 Y Combinator CEO Garry Tan 所建立的開源技能包(skill pack),它將 Claude Code 從單一的 AI 程式撰寫助手,轉化為一個結構化的虛擬軟體開發團隊。GStack 不是把 AI 當作一個無所不包的萬能幫手,而是將人類工程團隊中的不同角色——CEO、架構師、QA Lead、Release Manager 等——映射為專屬的 slash command,每個角色都有自己的行為約束與職責範圍。

這個專案採用 MIT 授權,在 GitHub 上已累積超過 16,000 顆星,是 Claude Code 生態系中最受歡迎的擴充之一。

真正讓 GStack 令人驚豔的是它經過驗證的實戰成果:

  • 60 天內產出 600,000 行 正式環境程式碼
  • 單一個 7 天衝刺期內合併了 100 個 Pull Request
  • 這一切都是在 Garry Tan 同時擔任 YC CEO 全職工作的情況下完成的

這些數字不是來自一個專職的工程團隊,而是一個人透過紀律化、角色化的工作流程來指揮 AI 所達成的成果。

GStack 不是什麼

在深入探討之前,有必要釐清 GStack 不是什麼:

  • 不是一個新的 AI 模型,也不是 API 的封裝
  • 不是獨立應用程式——它需要 Claude Code 作為唯一的執行平台
  • 不是程式碼產生工具——它是一個工作流治理系統,塑造 Claude Code 的行為模式
  • 不是雲端服務——所有東西都在本地運行,零遙測資料

GStack 本質上是一組純 Markdown 設定檔,指示 Claude Code 扮演特定角色並遵循特定流程。它的精妙之處不在於技術本身,而在於組織設計。


2. Core Philosophy: Why Roles Matter

The Problem with Generic AI Assistants

When you open Claude Code and start asking it to "build a feature" or "fix this bug," something interesting happens: it tries to do everything at once. It might plan, code, test, and deploy in a single stream of consciousness. This sounds efficient until you realize it mirrors one of the most common anti-patterns in human software teams — the developer who wears too many hats simultaneously.

Consider what happens when a single developer acts as architect, coder, tester, and release manager all at once:

  • Architecture decisions get shortcut because the coder wants to start writing
  • Code review is shallow because the reviewer wrote the code
  • Testing is perfunctory because the tester already "knows" the code works
  • Release processes get skipped because "it's just a small change"

Generic AI assistants exhibit the same failure modes, just faster. They produce plausible-looking code without the structural checks that catch systemic problems. GStack's core insight is that the same discipline that makes human teams effective can be applied to AI workflows.

The Role-Based Solution

GStack applies a principle that has been validated across decades of software engineering: separation of concerns at the organizational level. Each slash command locks the AI into a specific discipline:

RoleSlash CommandDiscipline
CEO / Product Strategist/plan-ceo-reviewProduct vision and strategic alignment
Architect / Engineering Manager/plan-eng-reviewSystem design and technical decisions
Product Designer/office-hoursProblem reframing and user-centric thinking
Software Engineer(default Claude Code)Implementation
QA Lead/qa, /qa-onlySystematic testing and verification
Release Manager/shipDeployment workflow and PR management
Retrospective Facilitator/retroPerformance analysis and process improvement

When you invoke /review, the AI does not think about product strategy. When you invoke /plan-ceo-review, the AI does not write code. This is not a limitation — it is the entire point. By constraining what the AI can do in each mode, GStack ensures that each phase of development receives the focused attention it requires.

2. 核心哲學:為什麼角色分離如此重要

通用型 AI 助手的問題

當你打開 Claude Code 並要求它「建一個功能」或「修這個 bug」時,一件有趣的事發生了:它會試圖同時做所有事。它可能在一條思路中同時規劃、寫程式、測試和部署。這聽起來很有效率,直到你意識到這恰好反映了人類軟體團隊中最常見的反模式之一——同時戴太多頂帽子的開發者

想想看,當一個開發者同時擔任架構師、程式設計師、測試員和 Release Manager 時會發生什麼:

  • 架構決策被偷工減料,因為寫程式的人急著開始動手
  • Code Review 流於表面,因為審查者就是寫程式的人
  • 測試敷衍了事,因為測試者已經「知道」程式能跑
  • 發布流程被跳過,因為「只是一個小改動」

通用型 AI 助手展現了完全相同的失敗模式,只是速度更快。它們產出看起來合理的程式碼,卻缺乏能捕捉系統性問題的結構性檢查。GStack 的核心洞見是:讓人類團隊有效運作的紀律,同樣可以套用在 AI 工作流程上

角色化的解決方案

GStack 套用了一個在數十年軟體工程中已經驗證的原則:組織層級的關注點分離(Separation of Concerns)。每個 slash command 會將 AI 鎖定在特定的專業領域:

角色Slash Command專業領域
CEO / 產品策略師/plan-ceo-review產品願景與策略對齊
架構師 / 工程經理/plan-eng-review系統設計與技術決策
產品設計師/office-hours問題重構與使用者中心思考
軟體工程師(預設 Claude Code)實作
QA Lead/qa/qa-only系統化測試與驗證
Release Manager/ship部署工作流與 PR 管理
回顧主持人/retro績效分析與流程改善

當你呼叫 /review 時,AI 不會思考產品策略。當你呼叫 /plan-ceo-review 時,AI 不會寫程式碼。這不是限制——這正是重點所在。藉由約束 AI 在每個模式下能做的事,GStack 確保開發的每個階段都能獲得它所需要的專注力。


3. Sprint Workflow Deep Dive

GStack organizes work around a seven-phase sprint cycle. This is not arbitrary — it mirrors the cadence of high-performing engineering teams, compressed into a timeframe that AI can execute in hours rather than weeks.

The Seven Phases

Think → Plan → Build → Review → Test → Ship → Reflect
  │       │       │        │       │       │       │
  ▼       ▼       ▼        ▼       ▼       ▼       ▼
office  plan-*   code    review   qa/*   ship    retro
hours   reviews          audit   qa-only  PR

Phase 1: Think (/office-hours)

Before any code exists, the problem itself needs to be examined. This phase forces the AI to step back and ask: "Is this the right problem to solve? Is there a simpler framing?" Many engineering failures start not from bad code, but from building the wrong thing.

Phase 2: Plan (/plan-ceo-review + /plan-eng-review)

Planning splits into two distinct sub-phases with separate roles:

  • CEO Review evaluates the feature against product strategy, market positioning, and user value. It asks: "Should we build this at all?"
  • Engineering Review produces architecture diagrams, evaluates technical trade-offs, and documents design decisions. It asks: "How should we build this?"

This separation prevents a common failure: technical plans that ignore business context, or business decisions that ignore technical constraints.

Phase 3: Build (standard Claude Code)

With a clear plan approved through both lenses, implementation proceeds using Claude Code's default capabilities. GStack deliberately does not override the build phase — the plan constraints and review gates provide sufficient governance.

Phase 4: Review (/review)

Code review is not about style preferences. GStack's /review command performs structural audits targeting specific categories of production bugs:

  • Race conditions and concurrency issues
  • N+1 query patterns
  • Missing error handling
  • Security vulnerabilities
  • Memory leaks and resource management

Phase 5: Test (/qa + /qa-only)

Testing uses a real browser environment (more on this later) to verify behavior from the user's perspective. The /qa command follows find-fix-verify cycles, while /qa-only generates reports without modifying code.

Phase 6: Ship (/ship)

The release phase automates PR creation, changelog generation, and deployment preparation. It enforces consistent release practices regardless of the developer's energy level or time pressure.

Phase 7: Reflect (/retro)

After shipping, /retro generates engineering retrospectives with performance metrics. What went well? What took longer than expected? What patterns should be codified for future sprints?

Why This Order Matters

The sequence is not flexible by design. Each phase produces artifacts that the next phase consumes:

  • Think produces a problem statement that Plan needs
  • Plan produces an architecture document that Build follows
  • Build produces code that Review audits
  • Review produces audit findings that Test verifies are resolved
  • Test produces QA reports that Ship uses for release notes
  • Ship produces a release record that Reflect analyzes

Skipping or reordering phases creates gaps in the chain of evidence. GStack's workflow is fundamentally an audit trail — every decision is traceable back to its origin.

3. Sprint 工作流深度解析

GStack 將工作組織為七個階段的 Sprint 週期。這不是隨意的設計——它反映了高效能工程團隊的節奏,只是將時間框架壓縮到 AI 可以在數小時內完成,而非數週。

七個階段

Think → Plan → Build → Review → Test → Ship → Reflect
  │       │       │        │       │       │       │
  ▼       ▼       ▼        ▼       ▼       ▼       ▼
office  plan-*   code    review   qa/*   ship    retro
hours   reviews          audit   qa-only  PR

階段 1:Think(/office-hours

在任何程式碼存在之前,問題本身需要被檢視。這個階段迫使 AI 退一步問:「這是要解決的正確問題嗎?有沒有更簡單的問題框架?」許多工程失敗的根源不是壞程式碼,而是建造了錯誤的東西。

階段 2:Plan(/plan-ceo-review + /plan-eng-review

規劃拆分為兩個獨立的子階段,由不同的角色執行:

  • CEO Review 從產品策略、市場定位和使用者價值的角度評估功能。它問:「我們到底該不該做這個?」
  • Engineering Review 產出架構圖、評估技術權衡,並記錄設計決策。它問:「我們該怎麼做這個?」

這種分離防止了一個常見的失敗模式:忽略商業背景的技術計畫,或忽略技術限制的商業決策。

階段 3:Build(標準 Claude Code)

在計畫通過兩個視角的審查後,實作階段使用 Claude Code 的預設能力進行。GStack 刻意不覆蓋建構階段——計畫約束和審查閘門已經提供了足夠的治理。

階段 4:Review(/review

Code Review 不是關於風格偏好。GStack 的 /review 命令執行結構性審計,針對特定類別的正式環境 bug:

  • Race condition 與並發問題
  • N+1 查詢模式
  • 缺失的錯誤處理
  • 安全漏洞
  • 記憶體洩漏與資源管理

階段 5:Test(/qa + /qa-only

測試使用真實的瀏覽器環境(稍後詳述)從使用者的角度驗證行為。/qa 命令遵循「發現——修復——驗證」的循環,而 /qa-only 則只產生報告而不修改程式碼。

階段 6:Ship(/ship

發布階段自動化 PR 建立、Changelog 產生和部署準備工作。無論開發者的精力狀態或時間壓力如何,它都能強制執行一致的發布流程。

階段 7:Reflect(/retro

出貨後,/retro 會產生帶有績效指標的工程回顧。什麼做得好?什麼花了比預期更長的時間?什麼模式應該被制度化到未來的 Sprint 中?

為什麼這個順序很重要

這個順序在設計上是不可變的。每個階段都會產出下一個階段所需要的產物:

  • Think 產出 Plan 需要的問題陳述
  • Plan 產出 Build 遵循的架構文件
  • Build 產出 Review 審計的程式碼
  • Review 產出 Test 驗證已解決的審計發現
  • Test 產出 Ship 用於 Release Notes 的 QA 報告
  • Ship 產出 Reflect 分析的發布紀錄

跳過或重新排序階段會在證據鏈中產生缺口。GStack 的工作流本質上是一條審計軌跡——每個決策都可以追溯到它的源頭。


4. Nine Slash Commands: A Complete Breakdown

Each slash command is a self-contained skill definition written in Markdown. When invoked, it instructs Claude Code to adopt a specific persona, follow a specific process, and produce specific deliverables. Here is a detailed breakdown of all nine commands.

4.1 /office-hours — The Product Designer

Role: Product Designer / Problem Reframing Specialist When to use: Before any planning or coding begins, when a feature request or idea needs critical examination.

This is the most underestimated command in GStack. Most developers skip it because they want to start building. But /office-hours serves as the first line of defense against building the wrong thing. It forces the AI to:

  • Challenge assumptions in the original request
  • Explore alternative framings of the problem
  • Identify unstated user needs
  • Propose simpler solutions that might eliminate the need for complex code

Example invocation flow:

User: /office-hours I want to add real-time collaborative editing to our note-taking app

AI (as Product Designer):
- What specific collaboration scenarios are users requesting?
- Have you validated that real-time sync is needed vs. async merge?
- What's the user impact of a simpler "share and merge" model?
- Cost analysis: real-time CRDT implementation vs. operational transform

4.2 /plan-ceo-review — The CEO

Role: CEO / Executive Product Strategist When to use: After /office-hours, when a feature needs strategic alignment before technical planning.

This command evaluates work from a business and product lens:

  • Does this feature align with the product's north star metric?
  • What is the opportunity cost of building this now?
  • How does this position us against competitors?
  • What is the minimum scope that delivers user value?

The output is a product strategy document, not a technical specification.

4.3 /plan-eng-review — The Architect

Role: Architect / Engineering Manager When to use: After CEO review approves the direction, when technical decisions need to be made.

This is where architecture gets designed. The command produces:

  • System architecture diagrams (in text/ASCII format)
  • API contract definitions
  • Database schema decisions
  • Dependency analysis
  • Performance budget allocations
  • Risk assessment for technical choices

The Architect role explicitly documents trade-offs. It does not simply pick a solution — it explains why alternatives were rejected.

4.4 /review — The Senior Code Reviewer

Role: Senior Engineer / Code Auditor When to use: After implementation, before testing. This is a structural audit, not a style check.

The /review command targets production-grade concerns:

Audit CategoryWhat It Checks
ConcurrencyRace conditions, deadlocks, atomic operations
Data AccessN+1 queries, missing indexes, unbounded queries
Error HandlingUncaught exceptions, silent failures, error propagation
SecurityInjection vectors, authentication gaps, data exposure
Resource ManagementMemory leaks, connection pool exhaustion, file handle leaks
API DesignBreaking changes, backward compatibility, versioning

This command never suggests stylistic changes like variable naming or formatting. Its scope is strictly limited to issues that could cause production incidents.

4.5 /ship — The Release Manager

Role: Release Manager When to use: When code has passed review and testing, and is ready for deployment.

The /ship command automates the release workflow:

  1. Generates a structured PR with summary, test results, and changelog
  2. Ensures commit messages follow conventional commit format
  3. Tags the release appropriately
  4. Validates that all review and QA gates have been passed

It acts as a final checkpoint, ensuring nothing ships without going through the full workflow.

4.6 /browse — The Browser Automation Engine

Role: Visual QA / Browser Interaction Specialist When to use: When you need to interact with web pages, verify visual states, or automate browser-based tasks.

This is the command that powers GStack's persistent browser architecture. Unlike traditional browser automation that launches a new instance per command:

  • Connects to a persistent Chromium daemon running in the background
  • Achieves 100-200ms response times per interaction
  • Maintains cookie persistence across sessions
  • Supports authenticated testing workflows

More on the technical architecture in Section 7.

4.7 /qa — The QA Lead (Active)

Role: QA Lead / Test Engineer When to use: After code review, when systematic testing is needed and the AI should fix issues it finds.

The /qa command follows a rigorous find-fix-verify cycle:

Step 1: FIND — Identify bugs through systematic browser testing
Step 2: FIX  — Apply targeted code patches for each bug
Step 3: VERIFY — Re-test in the real browser to confirm the fix
Step 4: REPEAT — Continue until all identified issues are resolved

This is an active testing mode — the QA Lead both finds problems and fixes them. Each fix is immediately verified through the persistent browser, ensuring no regression.

4.8 /qa-only — The QA Lead (Audit Mode)

Role: QA Lead / Audit Specialist When to use: When you want a comprehensive QA report without any code modifications.

This is the read-only counterpart of /qa. It produces:

  • A categorized list of all identified issues
  • Severity classifications (critical, high, medium, low)
  • Steps to reproduce each issue
  • Expected vs. actual behavior documentation

The key distinction: /qa-only never modifies code. This makes it safe to run at any point without worrying about unintended changes.

4.9 /retro — The Retrospective Facilitator

Role: Engineering Retrospective Facilitator When to use: After shipping, when you want to analyze the sprint's performance and extract lessons.

The /retro command generates engineering retrospectives including:

  • Time-to-completion metrics per phase
  • Code churn analysis (how much code was rewritten during review/QA)
  • Bug density by component
  • Process bottleneck identification
  • Recommended improvements for the next sprint

This creates a feedback loop that improves the process over time. Each retrospective informs the next sprint's approach.

4. 九大 Slash Command 完全解析

每個 slash command 都是一個以 Markdown 編寫的獨立技能定義。被呼叫時,它會指示 Claude Code 扮演特定角色、遵循特定流程,並產出特定交付物。以下是全部九個命令的詳細解析。

4.1 /office-hours —— 產品設計師

角色: 產品設計師 / 問題重構專家 使用時機: 在任何規劃或編碼開始之前,當功能需求或想法需要被批判性地檢視時。

這是 GStack 中最被低估的命令。大多數開發者會跳過它,因為他們急著開始建造。但 /office-hours 作為防止建造錯誤東西的第一道防線,它迫使 AI:

  • 挑戰原始需求中的假設
  • 探索問題的替代框架
  • 識別未明說的使用者需求
  • 提出可能消除複雜程式碼需求的更簡單方案

呼叫流程範例:

使用者:/office-hours 我想為筆記 App 加入即時協作編輯功能

AI(以產品設計師身份):
- 使用者實際請求的是哪些具體協作場景?
- 你是否驗證過需要即時同步,還是非同步合併就夠了?
- 更簡單的「分享和合併」模型對使用者的影響是什麼?
- 成本分析:即時 CRDT 實作 vs. Operational Transform

4.2 /plan-ceo-review —— CEO

角色: CEO / 執行層級產品策略師 使用時機:/office-hours 之後,當功能需要在技術規劃之前進行策略對齊時。

這個命令從商業和產品的視角評估工作:

  • 這個功能是否與產品的北極星指標對齊?
  • 現在建造這個的機會成本是什麼?
  • 這如何讓我們在競爭對手面前定位?
  • 能交付使用者價值的最小範圍是什麼?

輸出是產品策略文件,不是技術規格。

4.3 /plan-eng-review —— 架構師

角色: 架構師 / 工程經理 使用時機: 在 CEO Review 批准方向之後,當需要做出技術決策時。

這是架構被設計出來的地方。這個命令會產出:

  • 系統架構圖(以文字 / ASCII 格式呈現)
  • API 契約定義
  • 資料庫 Schema 決策
  • 依賴分析
  • 效能預算分配
  • 技術選擇的風險評估

架構師角色會明確記錄取捨。它不會只是挑選一個方案——而是解釋為什麼替代方案被否決。

4.4 /review —— 資深 Code Reviewer

角色: 資深工程師 / 程式碼審計師 使用時機: 在實作之後、測試之前。這是結構性審計,不是風格檢查。

/review 命令針對正式環境等級的問題:

審計類別檢查內容
並發性Race condition、死鎖、原子操作
資料存取N+1 查詢、缺少索引、無上限查詢
錯誤處理未捕獲例外、靜默失敗、錯誤傳播
安全性注入向量、認證缺口、資料洩露
資源管理記憶體洩漏、連線池耗盡、檔案句柄洩漏
API 設計破壞性變更、向後相容性、版本管理

這個命令永遠不會建議風格性的修改,例如變數命名或格式化。它的範圍嚴格限制在可能導致正式環境事故的問題上。

4.5 /ship —— Release Manager

角色: Release Manager 使用時機: 當程式碼通過審查和測試,準備部署時。

/ship 命令自動化發布工作流程:

  1. 產生結構化的 PR,包含摘要、測試結果和 Changelog
  2. 確保 Commit Message 遵循 Conventional Commit 格式
  3. 適當地標記發布版本
  4. 驗證所有的 Review 和 QA 閘門都已通過

它作為最後的檢查點,確保沒有任何東西在未經完整工作流的情況下就被出貨。

4.6 /browse —— 瀏覽器自動化引擎

角色: 視覺 QA / 瀏覽器互動專家 使用時機: 當你需要與網頁互動、驗證視覺狀態或自動化瀏覽器操作時。

這是驅動 GStack 持久化瀏覽器架構的命令。不同於傳統每次命令啟動新瀏覽器實例的方式:

  • 連接到在背景運行的持久化 Chromium daemon
  • 每次互動達到 100-200ms 的回應時間
  • 跨 session 維護 Cookie 持久化
  • 支援帶認證的測試工作流

技術架構詳見第 7 節。

4.7 /qa —— QA Lead(主動模式)

角色: QA Lead / 測試工程師 使用時機: 在 Code Review 之後,當需要系統化測試且 AI 應修復它發現的問題時。

/qa 命令遵循嚴格的「發現——修復——驗證」循環:

步驟 1:FIND —— 通過系統化的瀏覽器測試識別 bug
步驟 2:FIX  —— 為每個 bug 套用針對性的程式碼修補
步驟 3:VERIFY —— 在真實瀏覽器中重新測試以確認修復
步驟 4:REPEAT —— 持續進行直到所有已識別的問題都被解決

這是主動測試模式——QA Lead 既找問題也修復問題。每個修復都會立即透過持久化瀏覽器進行驗證,確保不會產生回歸。

4.8 /qa-only —— QA Lead(審計模式)

角色: QA Lead / 審計專家 使用時機: 當你想要一份完整的 QA 報告但不做任何程式碼修改時。

這是 /qa 的唯讀版本。它會產出:

  • 所有已識別問題的分類清單
  • 嚴重程度分類(critical、high、medium、low)
  • 每個問題的重現步驟
  • 預期行為 vs. 實際行為的文件

關鍵區別:/qa-only 絕不修改程式碼。這使得它可以在任何時候安全執行,不用擔心意外的變更。

4.9 /retro —— 回顧主持人

角色: 工程回顧主持人 使用時機: 出貨後,當你想分析 Sprint 的表現並萃取教訓時。

/retro 命令產生工程回顧,包含:

  • 每個階段的完成時間指標
  • 程式碼翻攪分析(Code Review / QA 期間重寫了多少程式碼)
  • 各元件的 Bug 密度
  • 流程瓶頸識別
  • 對下一個 Sprint 的改善建議

這創造了一個隨時間改善流程的回饋迴圈。每次回顧都為下一個 Sprint 的方式提供資訊。


5. Role Isolation and Governance Model

Why Isolation, Not Just Roles

Assigning roles to an AI is easy. Enforcing them is hard. Without enforcement, an AI assigned the "QA Lead" role will inevitably drift into writing features or suggesting architecture changes. GStack addresses this through hard isolation — each slash command activates a skill definition that explicitly constrains the AI's behavior.

The isolation model works at three levels:

Level 1: Behavioral Constraints

Each skill's Markdown definition contains explicit instructions about what the role can and cannot do. For example, the /review skill might contain:

markdown
## Constraints
- DO NOT suggest feature additions
- DO NOT modify code directly
- DO NOT comment on code style or formatting
- ONLY identify structural issues that could cause production incidents

Level 2: Output Format Enforcement

Each role produces deliverables in a specific format. The CEO role produces strategy documents. The Architect role produces architecture diagrams. The QA role produces test reports. This formatting constraint acts as a second layer of isolation — if the output does not match the expected format, something has gone wrong.

Level 3: Workflow Gates

Certain transitions in the sprint workflow require explicit approval. You cannot invoke /ship without evidence that /review and /qa have been completed. These gates prevent the temptation to skip steps when deadlines pressure mounts.

Safety Tools: /careful and /freeze

GStack includes two additional governance mechanisms:

/careful — Destructive Command Warning System

When active, /careful intercepts commands that could cause irreversible damage:

  • git reset --hard
  • rm -rf on project directories
  • Database migration rollbacks
  • Force pushes to protected branches

Instead of blocking these commands outright, /careful requires explicit confirmation and explains the potential impact before execution.

/freeze — Edit Boundary Enforcement

/freeze allows you to declare certain files or directories as immutable during a sprint. This is particularly useful when:

  • Core infrastructure code should not be modified during a feature sprint
  • Third-party integration adapters have been manually verified
  • Configuration files have been audited and approved

Any attempt to modify frozen files triggers a warning and requires explicit override.

Review Gates in Practice

The review gate system creates a formal chain of approval:

/office-hours → Problem statement approved?
       ↓ YES
/plan-ceo-review → Strategic alignment confirmed?
       ↓ YES
/plan-eng-review → Architecture approved?
       ↓ YES
[Build Phase] → Implementation complete?
       ↓ YES
/review → No critical findings?
       ↓ YES
/qa → All tests passing?
       ↓ YES
/ship → Release approved?
       ↓ YES
/retro → Lessons captured?

Each gate is a conscious decision point. The developer (the human) retains authority at every transition. GStack does not automate decisions — it automates execution within human-approved boundaries.

5. 角色隔離與治理模型

為什麼是隔離,而不僅是角色

給 AI 分配角色很容易,執行角色卻很難。沒有強制執行機制,被分配「QA Lead」角色的 AI 不可避免地會漂移到寫功能或建議架構修改。GStack 透過硬隔離來解決這個問題——每個 slash command 會啟動一個明確約束 AI 行為的技能定義。

隔離模型在三個層級運作:

層級 1:行為約束

每個技能的 Markdown 定義包含關於角色能做和不能做什麼的明確指示。例如,/review 技能可能包含:

markdown
## Constraints
- DO NOT suggest feature additions
- DO NOT modify code directly
- DO NOT comment on code style or formatting
- ONLY identify structural issues that could cause production incidents

層級 2:輸出格式強制

每個角色以特定格式產出交付物。CEO 角色產出策略文件。架構師角色產出架構圖。QA 角色產出測試報告。這種格式約束作為第二層隔離——如果輸出不符合預期格式,就代表出了問題。

層級 3:工作流閘門

Sprint 工作流中的某些轉換需要明確的核准。你無法在沒有 /review/qa 完成證據的情況下呼叫 /ship。這些閘門防止了在截止日壓力下跳過步驟的誘惑。

安全工具:/careful/freeze

GStack 包含兩個額外的治理機制:

/careful —— 破壞性命令警告系統

啟用後,/careful 會攔截可能造成不可逆損害的命令:

  • git reset --hard
  • 對專案目錄執行 rm -rf
  • 資料庫 Migration 回滾
  • 對受保護分支的 Force Push

/careful 不會直接阻擋這些命令,而是要求明確確認,並在執行前解釋潛在影響。

/freeze —— 編輯邊界強制

/freeze 讓你可以在 Sprint 期間將特定檔案或目錄宣告為不可修改。這在以下場景特別有用:

  • 核心基礎設施程式碼在功能 Sprint 期間不應被修改
  • 第三方整合介面卡已經過手動驗證
  • 設定檔已經過審計和核准

任何嘗試修改被凍結檔案的行為都會觸發警告,並要求明確覆蓋。

Review 閘門的實務運作

Review 閘門系統建立了一條正式的核准鏈:

/office-hours → 問題陳述已核准?
       ↓ 是
/plan-ceo-review → 策略對齊已確認?
       ↓ 是
/plan-eng-review → 架構已核准?
       ↓ 是
[Build 階段] → 實作完成?
       ↓ 是
/review → 無重大發現?
       ↓ 是
/qa → 所有測試通過?
       ↓ 是
/ship → 發布已核准?
       ↓ 是
/retro → 教訓已紀錄?

每個閘門都是一個有意識的決策點。開發者(人類)在每個轉換中都保留權力。GStack 不會自動化決策——它在人類核准的邊界內自動化執行。


6. Persistent Browser Architecture

The Problem with Traditional Browser Automation

Most AI coding assistants that interact with browsers use one of two approaches:

  1. Headless browser per command — Launch a new browser instance, navigate to the URL, take a screenshot, close the browser. Latency: 3-10 seconds per interaction.
  2. MCP-based browser tools — Use Chrome DevTools Protocol through a Model Context Protocol server. Better, but still involves significant overhead per operation.

Both approaches share a fundamental limitation: they treat browser interactions as stateless, isolated events. Every command starts from scratch. There is no session continuity, no cookie persistence, no concept of "the developer is logged in."

For quick one-off screenshots, this is acceptable. For systematic QA testing that requires navigating authenticated flows, filling multi-step forms, and verifying state transitions across pages, it is painfully slow and unreliable.

GStack's Approach: The Chromium Daemon

GStack takes a radically different approach by maintaining a persistent Chromium daemon that runs as a background process:

┌─────────────────────────────────────────────┐
│                 Claude Code                  │
│                                              │
│  /browse ──→ GStack Browser Client           │
│  /qa    ──→      │                           │
│  /qa-only ──→    │                           │
│                  ▼                            │
│         ┌──────────────────┐                 │
│         │  IPC / Command   │                 │
│         │    Interface     │                 │
│         └────────┬─────────┘                 │
│                  │                            │
└──────────────────┼────────────────────────────┘

         ┌─────────▼─────────┐
         │  Chromium Daemon   │
         │  (Persistent)      │
         │                    │
         │  • Cookie Store    │
         │  • Active Sessions │
         │  • Page State      │
         │  • DOM Cache       │
         └────────────────────┘

Key characteristics of this architecture:

Persistent Process

The Chromium daemon starts once and stays running across all slash commands. It is not launched per-command and not torn down between operations. This eliminates the 3-10 second startup cost that plagues traditional approaches.

Sub-Second Response Times

With the browser already running and pages already loaded, interactions take 100-200ms per command. This is approximately 20x faster than Chrome MCP tools and makes real-time interactive testing practical.

Cookie Persistence

The daemon maintains its cookie store across sessions. Once you authenticate via /setup-browser-cookies, subsequent /qa and /browse commands can test authenticated flows without re-logging in. This is critical for testing features behind authentication walls.

Compiled Binary Distribution

The browser component ships as a compiled binary (approximately 58 MB) rather than requiring users to install and configure Chromium separately. This ensures consistent behavior across environments and simplifies setup.

How /setup-browser-cookies Works

For authenticated testing, GStack provides a dedicated setup command:

  1. You invoke /setup-browser-cookies
  2. GStack launches a visible browser window
  3. You manually log in to the target application
  4. GStack captures and persists the authentication cookies
  5. All subsequent /browse and /qa commands inherit these cookies

This approach avoids storing credentials in configuration files. The cookies are persisted only in the running daemon's memory and the local cookie store. No credentials are ever written to disk in plain text.

Integration with Playwright

Under the hood, GStack's browser automation is built on Playwright, the cross-browser testing framework. However, instead of using Playwright's standard launch-per-test model, GStack connects Playwright to the persistent daemon through a custom adapter. This gives GStack access to Playwright's full API surface — selectors, assertions, network interception — while maintaining the performance benefits of a persistent browser.

6. 持久化瀏覽器架構

傳統瀏覽器自動化的問題

大多數與瀏覽器互動的 AI 程式撰寫助手使用以下兩種方式之一:

  1. 每次命令啟動 Headless 瀏覽器 —— 啟動新的瀏覽器實例、導航到 URL、截圖、關閉瀏覽器。延遲:每次互動 3-10 秒。
  2. 基於 MCP 的瀏覽器工具 —— 透過 Model Context Protocol 伺服器使用 Chrome DevTools Protocol。好一些,但每次操作仍有顯著開銷。

兩種方式共享一個根本限制:它們將瀏覽器互動視為無狀態的隔離事件。每個命令都從頭開始。沒有 session 連續性,沒有 Cookie 持久化,沒有「開發者已登入」的概念。

對於快速的一次性截圖,這是可以接受的。但對於需要導航認證流程、填寫多步驟表單、跨頁面驗證狀態轉換的系統化 QA 測試來說,這是痛苦地緩慢且不可靠的。

GStack 的方式:Chromium Daemon

GStack 採取了一個截然不同的方式,維護一個作為背景程序運行的持久化 Chromium daemon

┌─────────────────────────────────────────────┐
│                 Claude Code                  │
│                                              │
│  /browse ──→ GStack Browser Client           │
│  /qa    ──→      │                           │
│  /qa-only ──→    │                           │
│                  ▼                            │
│         ┌──────────────────┐                 │
│         │  IPC / Command   │                 │
│         │    Interface     │                 │
│         └────────┬─────────┘                 │
│                  │                            │
└──────────────────┼────────────────────────────┘

         ┌─────────▼─────────┐
         │  Chromium Daemon   │
         │  (持久化)         │
         │                    │
         │  • Cookie Store    │
         │  • Active Sessions │
         │  • Page State      │
         │  • DOM Cache       │
         └────────────────────┘

這個架構的關鍵特性:

持久化程序

Chromium daemon 啟動一次後,在所有 slash command 之間持續運行。不會每次命令啟動,也不會在操作之間被關閉。這消除了困擾傳統方式的 3-10 秒啟動成本。

亞秒級回應時間

由於瀏覽器已經在運行且頁面已載入,每次互動僅需 100-200ms。這比 Chrome MCP 工具大約快 20 倍,使即時互動測試變得可行。

Cookie 持久化

Daemon 跨 session 維護其 Cookie 儲存區。一旦透過 /setup-browser-cookies 完成認證,後續的 /qa/browse 命令就能測試需要認證的流程,而不用重新登入。這對測試認證牆後的功能至關重要。

編譯二進位發佈

瀏覽器元件以編譯後的二進位檔發佈(約 58 MB),而非要求使用者自行安裝和設定 Chromium。這確保了跨環境的一致行為並簡化了設定。

/setup-browser-cookies 的運作方式

對於帶認證的測試,GStack 提供專用的設定命令:

  1. 你呼叫 /setup-browser-cookies
  2. GStack 啟動一個可見的瀏覽器視窗
  3. 你手動登入目標應用程式
  4. GStack 捕獲並持久化認證 Cookie
  5. 所有後續的 /browse/qa 命令都繼承這些 Cookie

這種方式避免了在設定檔中儲存憑證。Cookie 僅持久化在運行中 daemon 的記憶體和本地 Cookie 儲存區中。永遠不會將憑證以明文寫入磁碟。

與 Playwright 的整合

在底層,GStack 的瀏覽器自動化建構在 Playwright 這個跨瀏覽器測試框架之上。然而,GStack 沒有使用 Playwright 標準的「每次測試啟動」模型,而是透過自訂 adapter 將 Playwright 連接到持久化 daemon。這讓 GStack 可以使用 Playwright 的完整 API 表面——選擇器、斷言、網路攔截——同時維持持久化瀏覽器的效能優勢。


7. Technology Stack and Integration Model

Required Stack

ComponentRequirementPurpose
Claude CodeLatest versionExclusive execution platform
Bunv1.0+JavaScript/TypeScript runtime for skill scripts
GitAny modern versionVersion control integration
PlaywrightBundledBrowser automation engine
ChromiumBundled (~58 MB binary)Persistent browser daemon

Why Bun?

GStack uses Bun as its runtime instead of Node.js. Bun's significantly faster startup time (measured in milliseconds vs. hundreds of milliseconds for Node) matters when skills need to execute helper scripts as part of their workflow. For a system that aims for sub-second response times, every millisecond of runtime overhead counts.

Platform Support:

  • macOS (arm64, x64)
  • Linux (arm64, x64)
  • Windows is not officially supported

How Skills Integrate with Claude Code

GStack leverages Claude Code's native custom slash commands feature. This is not a hack or a workaround — it uses the officially supported extension mechanism.

The integration model:

Project Root/
├── .claude/
│   └── skills/
│       └── gstack/
│           ├── office-hours.md
│           ├── plan-ceo-review.md
│           ├── plan-eng-review.md
│           ├── review.md
│           ├── ship.md
│           ├── browse.md
│           ├── qa.md
│           ├── qa-only.md
│           ├── retro.md
│           ├── careful.md
│           ├── freeze.md
│           └── ... (additional skill files)

Each .md file is a pure Markdown skill definition. There is no compiled code, no binary plugins, no API keys. When you type /review in Claude Code, it reads the corresponding Markdown file and uses its contents as behavioral instructions.

This design has several implications:

Full Transparency

Every skill is human-readable. You can open any skill file and see exactly what instructions the AI receives. There is no obfuscation, no proprietary format, no hidden behavior.

Easy Customization

Want to add a domain-specific check to the /review command? Edit the Markdown file. Want to create a new role that does not exist in GStack? Write a new Markdown file. The barrier to customization is zero.

Zero Vendor Lock-In

GStack skills are standard Markdown. If Claude Code's slash command format changes, the content remains portable. If you want to adapt the role definitions for a different AI tool, the instructions are already in natural language.

Local-Only Architecture

GStack's security model is notable for what it does not do:

  • No telemetry — zero data sent to external servers
  • No PATH modifications — nothing is added to your system PATH
  • No background services — except the browser daemon, which only runs when explicitly started
  • No network requirements — skills work entirely offline
  • No API keys — GStack itself requires no authentication

The only network activity comes from Claude Code's own API calls to Anthropic (for the AI model) and the browser daemon's connections to localhost (for testing local applications).

7. 技術堆疊與整合模型

必要的技術堆疊

元件需求用途
Claude Code最新版本唯一的執行平台
Bunv1.0+技能腳本的 JavaScript / TypeScript 執行環境
Git任何現代版本版本控制整合
Playwright內建瀏覽器自動化引擎
Chromium內建(約 58 MB 二進位檔)持久化瀏覽器 daemon

為什麼是 Bun?

GStack 使用 Bun 作為執行環境而非 Node.js。Bun 顯著更快的啟動時間(以毫秒計 vs. Node 的數百毫秒)在技能需要執行輔助腳本作為工作流一部分時很重要。對於一個目標是亞秒級回應時間的系統來說,每一毫秒的執行環境開銷都很重要。

平台支援:

  • macOS(arm64、x64)
  • Linux(arm64、x64)
  • Windows 官方不支援

技能如何與 Claude Code 整合

GStack 利用 Claude Code 原生的自訂 slash command 功能。這不是 hack 或變通方式——它使用官方支援的擴充機制。

整合模型:

專案根目錄/
├── .claude/
│   └── skills/
│       └── gstack/
│           ├── office-hours.md
│           ├── plan-ceo-review.md
│           ├── plan-eng-review.md
│           ├── review.md
│           ├── ship.md
│           ├── browse.md
│           ├── qa.md
│           ├── qa-only.md
│           ├── retro.md
│           ├── careful.md
│           ├── freeze.md
│           └── ...(其他技能檔案)

每個 .md 檔案都是純 Markdown 技能定義。沒有編譯過的程式碼,沒有二進位外掛,沒有 API 金鑰。當你在 Claude Code 中輸入 /review 時,它會讀取對應的 Markdown 檔案,並將其內容作為行為指示。

這種設計有幾個重要意涵:

完全透明

每個技能都是人類可讀的。你可以打開任何技能檔案,看到 AI 確切接收到什麼指示。沒有混淆、沒有專有格式、沒有隱藏行為。

易於客製化

想在 /review 命令中加入特定領域的檢查?編輯 Markdown 檔案。想建立 GStack 中不存在的新角色?寫一個新的 Markdown 檔案。客製化的門檻是零。

零供應商鎖定

GStack 技能是標準 Markdown。如果 Claude Code 的 slash command 格式改變,內容仍然是可攜的。如果你想把角色定義適配到不同的 AI 工具,指示已經用自然語言寫好了。

純本地架構

GStack 的安全模型值得注意的是它不做的事情:

  • 無遙測 —— 零資料發送到外部伺服器
  • 不修改 PATH —— 不會有任何東西被加入你的系統 PATH
  • 無背景服務 —— 除了瀏覽器 daemon(僅在明確啟動時運行)
  • 無網路需求 —— 技能完全離線運作
  • 無 API 金鑰 —— GStack 本身不需要任何認證

唯一的網路活動來自 Claude Code 本身對 Anthropic 的 API 呼叫(用於 AI 模型)以及瀏覽器 daemon 對 localhost 的連線(用於測試本地應用程式)。


8. Parallelization and Scaling

The Conductor Integration

While the seven-phase sprint workflow is powerful for a single feature, real-world development requires working on multiple features simultaneously. GStack addresses this through Conductor integration, which enables 10-15 simultaneous sprints.

The Conductor model works like this:

                    ┌──────────────┐
                    │  Conductor   │
                    │  (Orchestrator)│
                    └──────┬───────┘

            ┌──────────────┼──────────────┐
            │              │              │
      ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
      │ Sprint #1  │ │ Sprint #2  │ │ Sprint #3  │
      │ Feature A  │ │ Feature B  │ │ Bug Fix C  │
      │            │ │            │ │            │
      │ Think      │ │ Build      │ │ Review     │
      │ ↓          │ │ ↓          │ │ ↓          │
      │ Plan       │ │ Review     │ │ QA         │
      │ ...        │ │ ...        │ │ ...        │
      └────────────┘ └────────────┘ └────────────┘

Each sprint runs independently with its own phase tracking. Sprint #1 might be in the Think phase while Sprint #2 is in Build and Sprint #3 is in QA. The Conductor maintains awareness of all active sprints and their current phases.

Why Process-Driven Parallelization Works

The key insight is that GStack's structured workflow makes parallelization manageable. Without structure, 10 simultaneous AI coding sessions quickly devolve into chaos — conflicting changes, duplicate work, and merge disasters.

GStack's sprint structure provides natural guardrails:

  • Phase isolation prevents two sprints from modifying the same files at the same time (they are in different phases)
  • Explicit deliverables at each phase make it easy to track progress across sprints
  • Review gates catch conflicts before they reach production
  • The retrospective phase identifies cross-sprint issues

Practical Limits

While GStack can theoretically manage many parallel sprints, practical limits emerge:

  • Git conflicts increase with the number of simultaneous sprints touching related code
  • Context window pressure grows as Claude Code tracks multiple sprint states
  • Human oversight capacity becomes the bottleneck — reviewing 10 sprints simultaneously requires significant attention
  • 10-15 sprints represents the tested upper bound for effective parallel work

8. 平行化與擴展

Conductor 整合

雖然七階段 Sprint 工作流對單一功能很強大,但現實世界的開發需要同時處理多個功能。GStack 透過 Conductor 整合來解決這個問題,它可以支援 10-15 個同步進行的 Sprint。

Conductor 模型的運作方式:

                    ┌──────────────┐
                    │  Conductor   │
                    │ (編排器)     │
                    └──────┬───────┘

            ┌──────────────┼──────────────┐
            │              │              │
      ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
      │ Sprint #1  │ │ Sprint #2  │ │ Sprint #3  │
      │ 功能 A     │ │ 功能 B     │ │ Bug 修復 C │
      │            │ │            │ │            │
      │ Think      │ │ Build      │ │ Review     │
      │ ↓          │ │ ↓          │ │ ↓          │
      │ Plan       │ │ Review     │ │ QA         │
      │ ...        │ │ ...        │ │ ...        │
      └────────────┘ └────────────┘ └────────────┘

每個 Sprint 獨立運行,有自己的階段追蹤。Sprint #1 可能在 Think 階段,同時 Sprint #2 在 Build,Sprint #3 在 QA。Conductor 維護對所有進行中 Sprint 及其當前階段的感知。

為什麼流程驅動的平行化可行

關鍵洞見是 GStack 的結構化工作流使平行化變得可管理。沒有結構的話,10 個同時進行的 AI 程式撰寫 session 很快就會陷入混亂——衝突的變更、重複的工作和合併災難。

GStack 的 Sprint 結構提供了天然的護欄:

  • 階段隔離防止兩個 Sprint 同時修改相同的檔案(它們處於不同的階段)
  • 每個階段的明確交付物使跨 Sprint 追蹤進度變得容易
  • Review 閘門在衝突到達正式環境之前就捕獲它們
  • 回顧階段識別跨 Sprint 的問題

實務限制

雖然 GStack 理論上可以管理許多平行 Sprint,但實務限制會浮現:

  • Git 衝突隨著同時觸及相關程式碼的 Sprint 數量增加而增加
  • 上下文視窗壓力隨著 Claude Code 追蹤多個 Sprint 狀態而增長
  • 人類監督能力成為瓶頸——同時審查 10 個 Sprint 需要顯著的注意力
  • 10-15 個 Sprint 代表了有效平行工作的已測試上限

9. Comparison with Traditional Development Workflows

To understand what GStack represents, it helps to compare it with how software is traditionally built.

GStack vs. Traditional Team

DimensionTraditional 5-Person TeamSolo Developer + GStack
Communication overheadMeetings, Slack, stand-upsZero — all in one context
Role expertiseVaries by hire qualityConsistent per skill definition
AvailabilityBusiness hours, PTO, sick days24/7
Ramp-up timeWeeks to monthsImmediate
Institutional knowledgeSpread across peopleCodified in skill files
Cost$500K-$1M+/year salaryClaude Code subscription
Code review qualityDepends on reviewer's attentionConsistent structural audit
Sprint velocity1-2 week sprints typicalHours to days
ParallelizationLimited by team size10-15 simultaneous sprints

GStack vs. Vanilla Claude Code

DimensionVanilla Claude CodeClaude Code + GStack
Role disciplineNone — general-purposeStrict role isolation
Workflow structureAd hocSeven-phase sprint
Code reviewManual promptingDedicated /review with structural audit
Browser testingRequires separate toolsIntegrated persistent browser
Release processManualAutomated via /ship
Safety guardrailsBasic Claude safety/careful + /freeze + review gates
RetrospectivesNoneBuilt-in /retro
ParallelizationSingle conversationConductor-managed multi-sprint

GStack vs. Other AI Coding Tools

GStack occupies a unique position in the AI development tool landscape. Tools like GitHub Copilot, Cursor, and Windsurf focus on code generation — making the Build phase faster. GStack focuses on workflow governance — making the entire development lifecycle more disciplined.

This is not a competition. GStack can coexist with code generation tools because it operates at a different level of abstraction. Code generation tools help you write code faster. GStack helps you write the right code by ensuring proper planning, review, and testing surround the implementation.

9. 與傳統開發工作流的比較

要理解 GStack 代表什麼,將它與軟體傳統建構方式進行比較是有幫助的。

GStack vs. 傳統團隊

維度傳統 5 人團隊單人開發者 + GStack
溝通開銷會議、Slack、站立會議零——全在同一個上下文中
角色專業度取決於招聘品質按技能定義保持一致
可用性上班時間、休假、病假全天候 24/7
上手時間數週到數月立即
組織知識分散在人員之間制度化在技能檔案中
成本年薪 $500K-$1M+Claude Code 訂閱費
Code Review 品質取決於審查者的注意力一致的結構性審計
Sprint 速度通常 1-2 週的 Sprint數小時到數天
平行化能力受團隊規模限制10-15 個同步 Sprint

GStack vs. 原生 Claude Code

維度原生 Claude CodeClaude Code + GStack
角色紀律無——通用型嚴格角色隔離
工作流結構隨意的七階段 Sprint
Code Review手動提示專用 /review 搭配結構性審計
瀏覽器測試需要額外工具整合的持久化瀏覽器
發布流程手動透過 /ship 自動化
安全護欄基本 Claude 安全機制/careful + /freeze + Review 閘門
回顧內建 /retro
平行化單一對話Conductor 管理的多 Sprint

GStack vs. 其他 AI 程式撰寫工具

GStack 在 AI 開發工具的版圖中佔據了獨特的位置。GitHub Copilot、Cursor 和 Windsurf 等工具聚焦於程式碼產生——讓 Build 階段更快。GStack 聚焦於工作流治理——讓整個開發生命週期更有紀律。

這不是競爭關係。GStack 可以與程式碼產生工具共存,因為它在不同的抽象層級上運作。程式碼產生工具幫你更快地寫程式碼。GStack 幫你寫出正確的程式碼,通過確保適當的規劃、審查和測試環繞著實作。


10. Limitations and Considerations

GStack is impressive, but it is not without significant limitations. An honest assessment is important for anyone considering adopting it.

Platform Lock-In

GStack works exclusively with Claude Code. It cannot be used with Cursor, Windsurf, GitHub Copilot, or any other AI coding tool. This is a hard dependency, not a soft preference. If Anthropic discontinues Claude Code or changes its slash command system, GStack breaks.

Context Window Constraints

Every slash command loads its skill definition into Claude Code's context window. Complex sprints with multiple phases accumulate context. With Claude's context window being large but finite, very long sprints or highly parallel work can hit limits that degrade output quality.

The "One Person" Problem

GStack's productivity numbers are real, but they come with a caveat: one person still needs to make all the strategic decisions, review all the outputs, and approve all the gates. The human becomes the bottleneck. There is no delegation mechanism within GStack — it is a force multiplier for a single developer, not a replacement for a team.

Quality vs. Quantity Trade-Off

600,000 lines of code in 60 days is remarkable. But lines of code are a famously poor quality metric. The real question is: what is the defect rate per line? What is the maintenance cost? These numbers are harder to measure and less frequently reported.

Skill Definition Maintenance

As projects evolve, skill definitions may need updating. A /review skill tuned for a web application may be inadequate for a machine learning pipeline. There is no automatic mechanism for adapting skills to changing project contexts — this requires manual maintenance.

Browser Testing Limitations

The persistent Chromium daemon is powerful but limited to web applications. Native mobile apps, desktop applications, and CLI tools cannot be tested through GStack's browser architecture. The /qa command's find-fix-verify cycle only works for browser-accessible interfaces.

Learning Curve

Despite being "just Markdown files," GStack introduces a workflow that requires discipline to follow. Developers accustomed to ad hoc coding may find the Think-Plan-Build-Review-Test-Ship-Reflect cycle overly rigid, especially for small changes where the overhead exceeds the value.

10. 限制與考量

GStack 令人印象深刻,但它也有不可忽視的限制。對任何考慮採用它的人來說,誠實的評估很重要。

平台鎖定

GStack 能與 Claude Code 搭配使用。它不能用於 Cursor、Windsurf、GitHub Copilot 或任何其他 AI 程式撰寫工具。這是硬依賴,不是軟偏好。如果 Anthropic 停止支援 Claude Code 或改變其 slash command 系統,GStack 就會失效。

上下文視窗限制

每個 slash command 都會將其技能定義載入 Claude Code 的上下文視窗。複雜的 Sprint 跨多個階段會累積上下文。Claude 的上下文視窗雖大但有限,非常長的 Sprint 或高度平行的工作可能會觸及限制,導致輸出品質下降。

「一個人」的問題

GStack 的生產力數字是真實的,但有一個但書:一個人仍然需要做出所有的策略決策、審查所有的輸出、核准所有的閘門。人類成為瓶頸。GStack 內部沒有委派機制——它是單一開發者的力量倍增器,不是團隊的替代品。

品質 vs. 數量的權衡

60 天 600,000 行程式碼是了不起的。但程式碼行數是出了名的糟糕品質指標。真正的問題是:每行的缺陷率是多少?維護成本是多少?這些數字更難衡量,也更少被報告。

技能定義的維護

隨著專案演進,技能定義可能需要更新。為 Web 應用程式調校的 /review 技能,對機器學習 Pipeline 可能不夠用。沒有自動機制來適配技能到變化的專案脈絡——這需要手動維護。

瀏覽器測試的限制

持久化 Chromium daemon 很強大,但僅限於 Web 應用程式。原生行動 App、桌面應用程式和 CLI 工具無法透過 GStack 的瀏覽器架構進行測試。/qa 命令的「發現——修復——驗證」循環僅對瀏覽器可存取的介面有效。

學習曲線

儘管「只是 Markdown 檔案」,GStack 引入了一個需要紀律來遵循的工作流程。習慣隨意編碼的開發者可能會覺得 Think-Plan-Build-Review-Test-Ship-Reflect 週期過於僵硬,特別是對於小改動來說,開銷可能超過價值。


11. Key Takeaways

GStack represents a significant evolution in how we think about AI-assisted software development. Here are the essential insights:

1. Structure beats intelligence.

GStack does not make Claude Code smarter. It makes Claude Code more disciplined. The same AI model, given structured roles and workflow constraints, produces dramatically better outcomes than the same model used ad hoc. This is the most important lesson.

2. Role isolation prevents generic agent chaos.

By constraining what the AI can do in each mode, GStack avoids the failure mode where AI tries to do everything at once and does nothing well. Each slash command is a focused tool, not a Swiss army knife.

3. The sprint workflow creates an audit trail.

Every phase produces deliverables that feed the next phase. This creates traceability from problem statement to production release — something most ad hoc AI workflows completely lack.

4. Persistent browser architecture is a genuine technical innovation.

The 20x performance improvement over traditional browser automation tools is not incremental. It enables an entirely different class of testing workflows that were previously impractical.

5. Pure Markdown skill definitions are brilliant in their simplicity.

No plugins, no SDKs, no build steps. Skills are human-readable configuration files. This maximizes transparency, customizability, and portability.

6. The human remains in the loop — by design.

GStack does not try to remove the human from the development process. It amplifies human decision-making by automating execution within human-approved boundaries. Every phase transition is a conscious human choice.

7. Limitations are real and should be acknowledged.

Platform lock-in to Claude Code, context window constraints, and the single-person bottleneck are genuine concerns. GStack is a powerful tool, not a silver bullet.

The deeper significance of GStack may be less about the tool itself and more about the principle it demonstrates: organizational design patterns from human engineering teams transfer directly to AI workflows. Roles, gates, retrospectives, and sprint structures are not relics of human coordination overhead — they are fundamental to producing reliable software, regardless of whether the builder is human or AI.

11. 關鍵要點

GStack 代表了我們思考 AI 輔助軟體開發方式的重大演進。以下是核心洞見:

1. 結構勝過智能。

GStack 沒有讓 Claude Code 更聰明,而是讓它更有紀律。同一個 AI 模型,在給予結構化角色和工作流約束後,產出的結果比隨意使用同一模型戲劇性地更好。這是最重要的教訓。

2. 角色隔離防止通用型 Agent 的混亂。

藉由約束 AI 在每個模式下能做什麼,GStack 避免了 AI 嘗試同時做所有事卻什麼都做不好的失敗模式。每個 slash command 是一個聚焦的工具,不是瑞士刀。

3. Sprint 工作流建立了審計軌跡。

每個階段產出的交付物會餵給下一個階段。這從問題陳述到正式環境發布建立了可追溯性——這是大多數隨意 AI 工作流完全缺乏的東西。

4. 持久化瀏覽器架構是真正的技術創新。

比傳統瀏覽器自動化工具快 20 倍的效能提升不是漸進式的。它使一整個以前不切實際的測試工作流類別成為可能。

5. 純 Markdown 技能定義的簡潔是天才之作。

沒有外掛、沒有 SDK、沒有建構步驟。技能是人類可讀的設定檔。這最大化了透明度、可客製化性和可攜性。

6. 人類仍在迴圈中——這是刻意的設計。

GStack 不試圖把人類從開發流程中移除。它透過在人類核准的邊界內自動化執行來放大人類的決策能力。每個階段轉換都是人類有意識的選擇。

7. 限制是真實的,應該被承認。

對 Claude Code 的平台鎖定、上下文視窗限制,以及單人瓶頸都是真正的顧慮。GStack 是一個強大的工具,不是銀彈。

GStack 更深層的意義可能不在於工具本身,而在於它展示的原則:來自人類工程團隊的組織設計模式可以直接遷移到 AI 工作流程。角色、閘門、回顧和 Sprint 結構不是人類協調開銷的遺跡——它們是產出可靠軟體的根本,無論建造者是人類還是 AI。


GStack is open source under the MIT license. For the latest updates, visit the official GitHub repository.

GStack 在 MIT 授權下開源。如需最新更新,請造訪官方 GitHub 儲存庫。