你的 Gherkin 規格正在對你說謊

你的 Gherkin 規格正在對你說謊。

不是故意的。它們起初是忠實的。但六個衝刺過後，有人重構了結帳流程，卻忘了更新 When the user submits payment 這個 step。.feature 檔案仍然通過，因為 step definition 還在。它只是呼叫了一段不再符合 scenario 實際描述的程式碼。你拿到了綠燈測試和虛假的信心。這就是 BDD 的預設軌跡，除非你主動對抗它。

問題不在於開發者懶惰。而是在於 .feature 檔案與 step definitions 之間的關係本質上是鬆散的。Gherkin scenario 是字串。Step definitions 是用 regex 或 annotation 去比對那些字串。沒有編譯器強制要求 scenario 的變更必須對應到程式碼的變更，反之亦然。工具鏈假設你會手動維持兩者對齊。你不會的。

為什麼手動自律在規模化時必然失敗

每個團隊開頭都一樣：先寫規格，再實作 step，兩邊一起更新。第一週這樣是可行的。

一到重構就崩解。你在程式碼裡重新命名了一個領域概念，但 Gherkin 還在用舊術語，因為改它意味著要更新十二個 feature 檔案，還要跟產品重新審查。或者你抽出了一條新的驗證規則，但現有的 scenario 暗中依賴了舊行為，而沒人發現，因為 step definition 被悄悄泛化，只為了讓測試繼續通過。規格變成了一個平行且越來越不準確的宇宙。

代價不只是過時的文件。而是信任。一旦開發者不再相信 feature 檔案描述的是現實，他們就不再讀它。然後不再寫它。於是你又退回到只有單元測試、名稱毫無意義、與利益關係人沒有共通語言的狀態。

「同步」真正的意思是什麼

讓規格保持同步，不是指讓測試通過。通過很容易。同步指的是三件事：

每個 Gherkin step 都必須有對應的 step definition，而且實際做的事情跟規格描述一致。
每個 step definition 都必須被至少一個 scenario 實際呼叫到。
規格裡的用詞要跟程式碼庫的用詞一致。

大多數團隊只驗證第一點，而且只在執行時驗證。你需要三點全部驗證，而且要在 CI 裡、程式碼合併之前就完成。

用嚴格綁定自動化驗證 step

Cucumber 這類工具裡寬鬆的字串比對才是根本原因。你可以把它收緊，讓 step definitions 變成編譯期就能驗證的一等參照。

在 TypeScript 或 JavaScript 專案中，你可以把基於 regex 的 step definitions 換成自動產生的 step registry，讓 Gherkin step 對應到真正的函式參照。關鍵在於這份對應表是「自動產生」而非「手寫」的，所以當 scenario 引用了一個不存在的 step，建置就會失敗。

以下是一個使用自訂 parser 與自動產生 registry 的最小範例。首先，在建置時解析你的 .feature 檔案：

// scripts/validate-steps.ts
import { readFileSync, readdirSync } from 'fs';
import { parse } from '@cucumber/gherkin';
import { IdGenerator } from '@cucumber/messages';

const featureFiles = readdirSync('./features').filter(f => f.endsWith('.feature'));
const allSteps = new Set<string>();

for (const file of featureFiles) {
  const content = readFileSync(`./features/${file}`, 'utf-8');
  const gherkinDocument = parse(content, new IdGenerator());
  
  for (const feature of gherkinDocument.feature?.children || []) {
    for (const step of feature.scenario?.steps || []) {
      allSteps.add(step.text);
    }
  }
}

// Import the actual step registry from your test code
import { stepRegistry } from '../steps/registry';

const registeredSteps = new Set(Object.keys(stepRegistry));
const undefinedSteps = [...allSteps].filter(s => !registeredSteps.has(s));
const orphanedSteps = [...registeredSteps].filter(s => !allSteps.has(s));

if (undefinedSteps.length > 0) {
  console.error('Undefined steps:', undefinedSteps);
  process.exit(1);
}

if (orphanedSteps.length > 0) {
  console.error('Orphaned steps:', orphanedSteps);
  process.exit(1);
}

console.log(`Validated ${allSteps.size} steps against ${registeredSteps.size} definitions.`);

你的 step registry 用精確的 Gherkin 文字來對應函式：

// steps/registry.ts
import { given, when, then } from './step-helpers';

export const stepRegistry: Record<string, Function> = {
  'the user is logged in': given.theUserIsLoggedIn,
  'the user adds an item to the cart': when.theUserAddsAnItemToTheCart,
  'the total should be {int}': then.theTotalShouldBe,
};

given、when 和 then 物件只是帶有函式的普通模組。沒有 regex 魔法。如果開發者改了 Gherkin 文字，就必須在 registry 裡加上對應的項目，否則建置會失敗。如果他刪掉了某個 scenario，orphaned step 的偵測就會抓出殘留的 definition。

在合併前綁進 CI

開發者要在本機執行的腳本，就是開發者會忘記執行的腳本。你必須讓驗證失敗就導致建置失敗。

加進你的測試流程：

# .github/workflows/ci.yml
jobs:
  validate-specs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
      - run: npm ci
      - run: npx ts-node scripts/validate-steps.ts
      - run: npm test

重點在於 validate-steps.ts 要在真正的測試套件之前執行。如果 feature 檔案與 step definitions 之間有不匹配，你要儘早失敗並給出清楚的錯誤訊息，而不是跑完一百個 cucumber scenario 後，才發現它們在過時的邏輯上默默通過。

Living documentation 需要自動產生的報告

驗證能維持語法一致，但它不保證規格好讀或有用。要做到這點，你需要一條 living documentation 管線：從 feature 檔案產生 HTML 報告，並在每次合併到 main 時發布。

Cucumber Reports 或 Pickles 這類工具可以將你的 .feature 檔案轉成可瀏覽的文件。關鍵在於這些文件是從「CI 會驗證的同一批檔案」產生的。scenario 被刪掉，它就會從文件裡消失。用詞變了，文件自動更新。沒有第二份真相需要維護。

在 CI 中把報告當作 artifact 發布，或部署到靜態網站：

# .github/workflows/docs.yml
jobs:
  publish-docs:
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      - run: npm install -g @picklesdoc/pickles
      - run: pickles --feature-directory=./features --output-directory=./docs
      - uses: actions/upload-pages-artifact@v3
        with:
          path: ./docs

利益關係人不需要讀原始 Gherkin。他們需要的是一份可讀、而且他們相信它是即時更新的頁面。自動化才能建立這份信任。

權衡：嚴格性 vs. 表達力

registry 做法是有代價的。你失去了 regex pattern 的彈性，例如 /^the user adds (\d+) items? to the cart$/。每種變體都要變成明確的項目，或是帶有型別占位符的參數化 step。這很囉嗦。

另一種選擇是保留 regex，但加上更嚴格的 linter：當 pattern 太寬鬆，或某段 step text 比對不到任何已知 pattern 時發出警告。你可以用 Cucumber 內建的 dry-run 和 publish 旗標，再搭配自訂的 unused step definitions 檢查，用 20% 的囉嗦換到 80% 的安全性。

# Dry-run parses all features without executing them, surfacing undefined steps
npx cucumber-js --dry-run

這比 registry 做法寬鬆。它能抓到 undefined steps，但抓不到 orphaned steps，也無法強制語義對齊。對於已有大量測試套件的團隊，這是務實的起點。對新專案來說，registry 做法在一個月內就能回本。

我們試過但沒用的方法

我們試過從程式碼註解產生 Gherkin。概念是開發者用 annotation 標記測試方法，再由工具產出 .feature 檔案。結果失敗了，因為 Gherkin 本來就該讓非開發者也能讀懂。從方法名產生的文字不可讀。它甚至稱不上是文字。

我們也試過強制每筆規格修改都要 pair programming。這有幫助，但無法規模化。問題是機械性的，解法也應該是機械性的。

今天就從 undefined step 偵測開始

如果你已經有一套現成的 Cucumber 測試套件，最小但最有用的改變，就是在 CI 管線裡加上 --dry-run。只需要五分鐘，就能抓到最常見的脫節：重構後的 scenario 再也比對不到任何 step definition。

如果你是從頭開始，可以考慮 registry 做法。明確對應表的前期成本，會由建置期的保證以及自由重構的信心來償還——你不必再擔心規格正在暗中過時。

你的 Gherkin 規格應該描述系統做了什麼。如果你無法信任它們做到這件事，它們就只是昂貴的註解。把維持誠實的檢查自動化，否則就接受它們遲早會對你說謊。