你的測試全過了，Mutation Score 卻只有 40%——Surviving Mutant 到底在跟你說什麼

你的測試全過了，coverage report 顯示 87%，但 mutation score 只有 40%，還有一半的 mutants 活得好好的。

這個 40% 不代表你的程式壞了，它代表你的測試壞了。Coverage 衡量的是「測試執行時跑過哪些行」；Mutation testing 衡量的是「如果這些行開始做錯事，你的測試會不會發現」。40% 的 mutation score 意味著，60% 本來可以偷偷混進你程式碼的 bug，會直接大搖大擺通過 CI。

Surviving Mutant 到底是什麼

Surviving mutant 就是一個被刻意放進去的小 bug，而你的測試沒抓到它。

Mutation testing 工具會對你的原始碼一個一個套用預設的變形規則。可能是把 > 翻成 >=、把 + 改成 -，或是把某個條件直接換成 true。每一個被變形過的版本就是一個 mutant。工具會把你的整份測試對每個 mutant 跑一遍。如果有任何測試失敗，這個 mutant 就被「killed」；如果全部測試都過，這個 mutant 就「survives」。

一個 surviving mutant 只代表兩件事之一：要麼你的測試根本沒驗證到那個被 mutant 改壞的行為，要麼這個 mutant 是「equivalent」（變形後的語意跟原本完全一樣——這是 mutation testing 裡公認的難題）。

大多數 survivors 並不是 equivalent。它們都是還在走的活屍 bug。

一個具體例子：密碼驗證器

這是一個檢查密碼是否符合政策的函式：

// password.js
function isValidPassword(password) {
  if (password.length < 8) {
    return false;
  }
  if (!/[A-Z]/.test(password)) {
    return false;
  }
  if (!/[0-9]/.test(password)) {
    return false;
  }
  return true;
}

module.exports = { isValidPassword };

以下是可以達成 100% line coverage 的測試：

// password.test.js
const { isValidPassword } = require('./password');

test('accepts a valid password', () => {
  expect(isValidPassword('Hello1')).toBe(true);
});

test('rejects a short password', () => {
  expect(isValidPassword('Hi1')).toBe(false);
});

test('rejects a password without uppercase', () => {
  expect(isValidPassword('hello1')).toBe(false);
});

test('rejects a password without a digit', () => {
  expect(isValidPassword('Hellooo')).toBe(false);
});

等等。isValidPassword('Hello1') 回傳 true，但 'Hello1' 只有六個字元，第一個檢查應該擋下來才對。這個測試本身是錯的，但它還是過了，因為測試自己 assert 了錯誤的行為。

像 Stryker 這樣的 mutation testing 工具就會抓到這個問題。它的其中一個 mutation 會把長度檢查的 < 翻成 <=。這個 mutant 會 survive，因為現有測試根本沒驗證 8 個字元這個 boundary。另一個 mutation 可能會直接把第一個 if 區塊整個刪掉。這個 mutant 也會 survive，因為測試裡沒有「8 個字元但沒有大寫或數字」的密碼。長度的上限從來沒有跟其他規則一起被測過。

以下是能真正 kill 掉這些 mutants 的測試：

// password.test.js
const { isValidPassword } = require('./password');

test('rejects password shorter than 8 chars', () => {
  expect(isValidPassword('Hello1')).toBe(false);
});

test('accepts password exactly 8 chars with uppercase and digit', () => {
  expect(isValidPassword('Hello1!@')).toBe(true);
});

test('rejects password without uppercase', () => {
  expect(isValidPassword('hello1!@')).toBe(false);
});

test('rejects password without digit', () => {
  expect(isValidPassword('Helloooo')).toBe(false);
});

test('rejects password missing both uppercase and digit', () => {
  expect(isValidPassword('helloooo')).toBe(false);
});

現在我們明確測試了 8 個字元這個 boundary。<= mutant 會失敗，因為 'Hello1!@'（8 個字元）必須被接受。刪除 mutant 也會失敗，因為 'helloooo' 如果沒有長度檢查就會溜過去。

Mutation Testing 的底層到底在做什麼

Mutation testing 的運算成本很高，因為它要對每個 mutant 跑完整套測試。

如果你的程式碼有 10,000 行，mutation tool 產出 3,000 個 mutants，那就是 3,000 次完整測試執行。早期的學術實作因此幾乎無法用在真實專案上。現代工具聰明多了。

Stryker——JavaScript 和 TypeScript 最常用的 mutation testing 框架——用了幾種最佳化：

Mutant scoping：Stryker 只跑那些「有可能執行到被改動那行」的測試子集，依據是初次 dry run 時的 coverage 資料。
Parallel execution：Mutants 會分散到多個 worker process 同時評估。
Incremental mode：Stryker 會快取結果，只對自上次執行後有變動的程式碼重新評估 mutants。
Checkers：對編譯型語言，Stryker 可以在 AST 層級驗證 mutants，不需要重新編譯整個專案。

即使有了這些最佳化，大型專案的完整 mutation test run 還是可能花上 10 到 30 分鐘。這就是為什麼大多數團隊只在 CI 的 pull request 或 nightly build 跑 mutation testing，而不是每次存檔都跑。

沒人告訴你的權衡

Mutation testing 不是免費的，也不是什麼時候都適合。

Equivalent mutant problem 是最大的理論限制。有些 mutation 根本不會改變可觀察的行為。例如：

const timeout = 1000 * 60;

把它改成 1000 * 61 語意不同，但改成 60 * 1000 就是 equivalent。沒有測試能 kill 它，因為值完全一樣。在一般情況下，區分 equivalent mutants 和真正的 survivors 是不可判定的。現代工具會用啟發式方法跳過顯而易見的案例，但你還是會看到一些。

效能是真的。 在中型 TypeScript 專案上，Stryker 可能產出 2,000 個 mutants，評估要花 15 分鐘。如果你在 pull request 開啟這個功能，每次就是 15 分鐘的 CI 時間。團隊通常會先設一個 threshold（例如 mutation score 低於 60% 就讓 build 失敗），然後把完整分析放到 nightly 跑。

False confidence 是雙面刃。 100% mutation score 不代表你的程式沒有 bug。它只代表「符合 tool 的 mutation operators 的 bug 不會溜過去」。Mutation testing 不會創造它不會產生的 bug。它抓不到需求裡的邏輯錯誤、無法模擬的 race condition，或是跨服務邊界的整合失敗。

如何真正開始使用 Mutation Testing

如果你寫 JavaScript 或 TypeScript，Stryker 就是起點。

安裝：

npm install --save-dev @stryker-mutator/core @stryker-mutator/jest-runner

建立 stryker.config.mjs：

// @ts-check
/** @type {import('@stryker-mutator/api/core').PartialStrykerOptions} */
const config = {
  packageManager: 'npm',
  reporters: ['html', 'clear-text', 'progress'],
  testRunner: 'jest',
  coverageAnalysis: 'perTest',
  mutate: ['src/**/*.js'],
  threshold: {
    break: 60,
  },
};

export default config;

執行：

npx stryker run

一開始要看的是 HTML report，不是分數。Report 會在每個 surviving mutant 的位置直接在原始碼旁標出來。先看前 10 個 survivors。對每一個問自己：如果這裡真的出現一個 bug，會造成 production 問題嗎？會的話，就寫一個能抓到的測試；不會的話，想想這段程式碼是不是 over-engineered。

不要追 100%。在成熟的專案上，70-80% 已經是很強的分數。低於 50%，你的測試大概只是在執行程式碼，卻沒有 assert 任何有意義的東西。超過 90%，你很可能已經進入報酬遞減區，而且 equivalent-mutant 的代價會越來越高。

40% 該怎麼辦

40% 的 mutation score 是一份禮物。它直接告訴你測試的哪些地方只是裝飾。

挑出 surviving mutants 最多的三個檔案。讀每一個 survivor，問自己少了什麼 assertion。修正通常很簡單：測試裡呼叫了某個函式但從沒檢查回傳值；資料送進了 parser 但沒驗證輸出；或是 happy path 換了三組 input 測了三次，卻完全沒測 error branch。

Mutants 不是雜訊。它們是一份按照風險排序的清單，告訴你最可能有未測試 bug 躲在哪裡。從最上面開始修。

常見問題

Code coverage 和 mutation testing 有什麼不同？ Code coverage 衡量哪些行被執行過。Mutation testing 衡量的是，如果這些行裡面有 bug，你的測試會不會失敗。100% coverage 配上 40% mutation score 代表：每一行都跑過了，但即使大部分都錯了，你的測試也不會發現。

Mutation testing 能找出我現有程式碼裡的 bug 嗎？ 不能。Mutation testing 評估的是你的測試，不是原始碼。它告訴你測試哪裡不夠。它不會告訴你程式碼是否正確，只會告訴你對某些類別的錯誤，你的測試會不會抓到。

哪些語言有好的 mutation testing 工具？ JavaScript/TypeScript（Stryker）、Java（PIT）、C#（Stryker.NET）、Python（mutmut）和 Rust（cargo-mutants）都有成熟的工具。生態系在效能和支援的 mutation operators 上各有不同。

Mutation testing 應該取代 code coverage 嗎？ 不應該。Coverage 便宜又快，用來在開發過程中快速取得回饋。Mutation testing 則是定期的品質閘門，用來找出 coverage 看不到的死角。