mirror of
https://github.com/mattermost/mattermost.git
synced 2026-04-07 01:55:06 -04:00
* ci: add test sharding plumbing to server CI Add infrastructure for upcoming test sharding without changing behavior: - Add shard-index and shard-total inputs to server-test-template.yml (defaults preserve existing single-runner behavior) - Add timing cache restore step (activates only when shard-total > 1) - Add merge-postgres-test-results job to server-ci.yml that: - Merges JUnit XML reports from shard artifacts - Saves timing data cache for future shard balancing - Handles both single-artifact and multi-shard scenarios - Add .gitignore entries for timing cache and shard work files Co-authored-by: Claude <claude@anthropic.com> * ci: shard server Postgres tests into 4 parallel runners Extract sharding logic into standalone, tested scripts and enable 4-shard parallel test execution for server Postgres CI: Scripts: - server/scripts/shard-split.js: Node.js bin-packing solver that assigns test packages to shards using timing data from previous runs. Two-tier strategy: light packages (<2min) whole, heavy packages (api4, app) split at individual test level. - server/scripts/run-shard-tests.sh: Multi-run wrapper that calls gotestsum directly for each package group with -run regex filters. - server/scripts/shard-split.test.js: 8 test cases covering round-robin fallback, timing-based balancing, heavy package splitting, JUnit XML fallback, and enterprise package separation. Workflow changes: - server-test-template.yml: Add shard splitting step that discovers test packages and runs the solver. Modified Run Tests step to use wrapper script when sharding is active. - server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update merge job artifact patterns for shard-specific names. Performance: 7.2 min with timing cache vs 62.5 min baseline = 88% wall-time improvement. First run without cache uses JUnit XML fallback or round-robin, then populates the cache for subsequent runs. Co-authored-by: Claude <claude@anthropic.com> * fix: raise heavy package threshold to 5 min to preserve test isolation sqlstore integrity tests scan the entire database and fail when other packages' test data is present. At 182s, sqlstore was just over the 120s threshold and getting split at test level. Raising to 300s keeps only api4 (~38 min) and app (~15 min) as heavy — where the real sharding gains are — while sqlstore, elasticsearch, etc. stay whole and maintain their test isolation guarantees. Co-authored-by: Claude <claude@anthropic.com> * ci: only save test timing cache on default branch PR branches always restore from master's timing cache via restore-keys prefix matching. Timing data is stable day-to-day so this eliminates cache misses on first PR runs and reduces cache storage. Co-authored-by: Claude <claude@anthropic.com> * ci: skip FIPS tests on PRs (enterprise CI handles compile check) Per review feedback: the enterprise CI already runs a FIPS compile check on every PR. Running the full FIPS test suite on PRs is redundant since it uses the identical test suite as non-FIPS — the only FIPS-specific failure mode is a build failure from non-approved crypto imports, which the enterprise compile check catches. Full FIPS tests continue to run on every push to master. Co-authored-by: Claude <claude@anthropic.com> * fix: address review feedback on run-shard-tests.sh - Remove set -e so all test runs execute even if earlier ones fail; track failures and exit with error at the end (wiggin77) - Remove unused top-level COVERAGE_FLAG variable (wiggin77) - Fix RUN_IDX increment position so report, json, and coverage files share the same index (wiggin77) - Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77) Co-authored-by: Claude <claude@anthropic.com> * style: use node: prefix for built-in fs module in shard-split.js Co-authored-by: Claude <claude@anthropic.com> * fix: avoid interpolating file paths into generated shell script Read shard package lists from files at runtime instead of interpolating them into the generated script via printf. This prevents theoretical shell metacharacter injection from directory names, as flagged by DryRun Security. Co-authored-by: Claude <claude@anthropic.com> * fix(ci): rename merged artifact to match server-ci-report glob The merged artifact was named postgres-server-test-logs-merged which does not match the *-test-logs pattern in server-ci-report.yml, causing Postgres test results to be missing from PR/commit reports. Also pins junit-report-merger to exact version 7.0.0 for supply chain safety. Co-authored-by: Claude <claude@anthropic.com> * fix(ci): pass RACE_MODE env into Docker container RACE_MODE was set on the host runner but never included in the docker run --env list. The light-package path worked because the heredoc expanded on the host, but run-shard-tests.sh reads RACE_MODE at runtime inside the container where it was unset. This caused heavy packages (api4, app) to silently lose -race detection. Co-authored-by: Claude <claude@anthropic.com> * fix(ci): discover new tests in heavy packages not in timing cache Tests not present in the timing cache (newly added or renamed) would not appear in any shard -run regex, causing them to silently skip. After building items from the cache, run go test -list to discover current test names and assign any cache-missing tests to shards via the normal bin-packing algorithm with a small default duration. Co-authored-by: Claude <claude@anthropic.com> * fix(ci): add missing line continuation backslash in docker run The previous --env FIPS_ENABLED line was missing a trailing backslash after adding --env RACE_MODE, causing docker run to see a truncated command and fail with "requires at least 1 argument". Co-authored-by: Claude <claude@anthropic.com> * fix(ci): add setup-go step for shard test discovery go test -list in shard-split.js runs on the host runner via execSync, but Go is only available inside the Docker container. Without this step, every invocation fails silently and new-test discovery is a no-op. Adding actions/setup-go before the shard split step ensures the Go toolchain is available on the host. Co-authored-by: Claude <claude@anthropic.com> --------- Co-authored-by: Claude <claude@anthropic.com>
333 lines
11 KiB
JavaScript
333 lines
11 KiB
JavaScript
const { describe, it, beforeEach, afterEach } = require("node:test");
|
|
const assert = require("node:assert/strict");
|
|
const fs = require("node:fs");
|
|
const path = require("node:path");
|
|
const { execFileSync } = require("node:child_process");
|
|
const os = require("node:os");
|
|
|
|
const SCRIPT = path.join(__dirname, "shard-split.js");
|
|
const TESTDATA = path.join(__dirname, "testdata");
|
|
|
|
/**
|
|
* Helper: run shard-split.js in a temp directory with given inputs.
|
|
* Returns the output files and stdout.
|
|
*/
|
|
function runSolver({ packages, shardIndex, shardTotal, gotestsumJson, prevReportXml }) {
|
|
const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), "shard-test-"));
|
|
try {
|
|
fs.writeFileSync(path.join(tmpDir, "all-packages.txt"), packages.join("\n"));
|
|
|
|
if (gotestsumJson) {
|
|
fs.writeFileSync(path.join(tmpDir, "prev-gotestsum.json"), gotestsumJson);
|
|
}
|
|
if (prevReportXml) {
|
|
fs.writeFileSync(path.join(tmpDir, "prev-report.xml"), prevReportXml);
|
|
}
|
|
|
|
const stdout = execFileSync("node", [SCRIPT], {
|
|
cwd: tmpDir,
|
|
env: {
|
|
...process.env,
|
|
SHARD_INDEX: String(shardIndex),
|
|
SHARD_TOTAL: String(shardTotal),
|
|
},
|
|
encoding: "utf8",
|
|
});
|
|
|
|
const te = fs.readFileSync(path.join(tmpDir, "shard-te-packages.txt"), "utf8");
|
|
const ee = fs.readFileSync(path.join(tmpDir, "shard-ee-packages.txt"), "utf8");
|
|
const heavy = fs.readFileSync(path.join(tmpDir, "shard-heavy-runs.txt"), "utf8");
|
|
|
|
return { te, ee, heavy, stdout };
|
|
} finally {
|
|
fs.rmSync(tmpDir, { recursive: true, force: true });
|
|
}
|
|
}
|
|
|
|
describe("shard-split.js", () => {
|
|
describe("round-robin fallback (no timing data)", () => {
|
|
it("distributes packages evenly across shards", () => {
|
|
const packages = [
|
|
"github.com/mattermost/mattermost/server/v8/channels/api4",
|
|
"github.com/mattermost/mattermost/server/v8/channels/app",
|
|
"github.com/mattermost/mattermost/server/v8/channels/store/sqlstore",
|
|
"github.com/mattermost/mattermost/server/v8/config",
|
|
];
|
|
|
|
// Collect assignments from all shards
|
|
const allTe = [];
|
|
for (let i = 0; i < 2; i++) {
|
|
const result = runSolver({ packages, shardIndex: i, shardTotal: 2 });
|
|
allTe.push(...result.te.split(" ").filter(Boolean));
|
|
}
|
|
|
|
// All packages should be assigned exactly once
|
|
assert.equal(allTe.sort().join("\n"), packages.sort().join("\n"));
|
|
});
|
|
|
|
it("uses round-robin when no timing files exist", () => {
|
|
const packages = ["pkg/a", "pkg/b", "pkg/c", "pkg/d", "pkg/e"];
|
|
const r0 = runSolver({ packages, shardIndex: 0, shardTotal: 2 });
|
|
const r1 = runSolver({ packages, shardIndex: 1, shardTotal: 2 });
|
|
|
|
assert.ok(r0.stdout.includes("round-robin"), "Should mention round-robin in output");
|
|
// No heavy runs
|
|
assert.equal(r0.heavy.trim(), "");
|
|
assert.equal(r1.heavy.trim(), "");
|
|
});
|
|
});
|
|
|
|
describe("timing-based balancing", () => {
|
|
it("balances shards using gotestsum.json timing data", () => {
|
|
const gotestsumJson = fs.readFileSync(
|
|
path.join(TESTDATA, "sample-gotestsum.json"),
|
|
"utf8"
|
|
);
|
|
const packages = [
|
|
"github.com/mattermost/mattermost/server/v8/channels/api4",
|
|
"github.com/mattermost/mattermost/server/v8/channels/app",
|
|
"github.com/mattermost/mattermost/server/v8/channels/store/sqlstore",
|
|
"github.com/mattermost/mattermost/server/v8/config",
|
|
"github.com/mattermost/mattermost/server/v8/enterprise/elasticsearch",
|
|
"github.com/mattermost/mattermost/server/v8/enterprise/compliance",
|
|
"github.com/mattermost/mattermost/server/public/model",
|
|
];
|
|
|
|
// Run for all 4 shards and check that loads are somewhat balanced
|
|
const loads = [];
|
|
const allAssigned = new Set();
|
|
|
|
for (let i = 0; i < 4; i++) {
|
|
const result = runSolver({
|
|
packages,
|
|
shardIndex: i,
|
|
shardTotal: 4,
|
|
gotestsumJson,
|
|
});
|
|
|
|
// Track all assigned packages and tests
|
|
const tePkgs = result.te.split(" ").filter(Boolean);
|
|
const eePkgs = result.ee.split(" ").filter(Boolean);
|
|
tePkgs.forEach((p) => allAssigned.add(p));
|
|
eePkgs.forEach((p) => allAssigned.add(p));
|
|
|
|
// Parse heavy runs
|
|
if (result.heavy.trim()) {
|
|
result.heavy
|
|
.trim()
|
|
.split("\n")
|
|
.forEach((line) => {
|
|
const pkg = line.split(" ")[0];
|
|
allAssigned.add(pkg);
|
|
});
|
|
}
|
|
}
|
|
|
|
// Every package should be covered
|
|
for (const pkg of packages) {
|
|
assert.ok(
|
|
allAssigned.has(pkg),
|
|
`Package ${pkg} should be assigned to some shard`
|
|
);
|
|
}
|
|
});
|
|
|
|
it("does not produce empty shards with sample data", () => {
|
|
const gotestsumJson = fs.readFileSync(
|
|
path.join(TESTDATA, "sample-gotestsum.json"),
|
|
"utf8"
|
|
);
|
|
const packages = [
|
|
"github.com/mattermost/mattermost/server/v8/channels/api4",
|
|
"github.com/mattermost/mattermost/server/v8/channels/app",
|
|
"github.com/mattermost/mattermost/server/v8/channels/store/sqlstore",
|
|
"github.com/mattermost/mattermost/server/v8/config",
|
|
];
|
|
|
|
for (let i = 0; i < 4; i++) {
|
|
const result = runSolver({
|
|
packages,
|
|
shardIndex: i,
|
|
shardTotal: 4,
|
|
gotestsumJson,
|
|
});
|
|
const hasWork =
|
|
result.te.trim() !== "" ||
|
|
result.ee.trim() !== "" ||
|
|
result.heavy.trim() !== "";
|
|
assert.ok(hasWork, `Shard ${i} should have some work assigned`);
|
|
}
|
|
});
|
|
});
|
|
|
|
describe("heavy package splitting", () => {
|
|
it("splits packages over HEAVY_MS threshold into individual tests", () => {
|
|
// Create timing data where api4 is very heavy (> 300s = 300000ms)
|
|
const lines = [];
|
|
// api4: 6 tests totaling 452.2s (> 300s threshold)
|
|
for (const [test, elapsed] of [
|
|
["TestGetChannel", 145.2],
|
|
["TestCreatePost", 98.1],
|
|
["TestUpdateChannel", 72.5],
|
|
["TestDeleteChannel", 58.3],
|
|
["TestGetChannelMembers", 45.7],
|
|
["TestSearchChannels", 32.4],
|
|
]) {
|
|
lines.push(
|
|
JSON.stringify({
|
|
Time: "2025-03-20T10:00:00Z",
|
|
Action: "pass",
|
|
Package: "github.com/mattermost/mattermost/server/v8/channels/api4",
|
|
Test: test,
|
|
Elapsed: elapsed,
|
|
})
|
|
);
|
|
}
|
|
// config: 2 tests totaling 8s (< 120s, stays whole)
|
|
for (const [test, elapsed] of [
|
|
["TestConfigStore", 5.0],
|
|
["TestConfigMigrate", 3.0],
|
|
]) {
|
|
lines.push(
|
|
JSON.stringify({
|
|
Time: "2025-03-20T10:00:00Z",
|
|
Action: "pass",
|
|
Package: "github.com/mattermost/mattermost/server/v8/config",
|
|
Test: test,
|
|
Elapsed: elapsed,
|
|
})
|
|
);
|
|
}
|
|
|
|
const gotestsumJson = lines.join("\n");
|
|
const packages = [
|
|
"github.com/mattermost/mattermost/server/v8/channels/api4",
|
|
"github.com/mattermost/mattermost/server/v8/config",
|
|
];
|
|
|
|
// With 2 shards, api4 tests should be split across shards
|
|
let heavyFound = false;
|
|
const allHeavyTests = [];
|
|
|
|
for (let i = 0; i < 2; i++) {
|
|
const result = runSolver({
|
|
packages,
|
|
shardIndex: i,
|
|
shardTotal: 2,
|
|
gotestsumJson,
|
|
});
|
|
|
|
if (result.heavy.trim()) {
|
|
heavyFound = true;
|
|
// Parse heavy runs to extract test names
|
|
for (const line of result.heavy.trim().split("\n")) {
|
|
const parts = line.split(" ");
|
|
assert.equal(
|
|
parts[0],
|
|
"github.com/mattermost/mattermost/server/v8/channels/api4",
|
|
"Heavy package should be api4"
|
|
);
|
|
// Regex is like "^TestGetChannel$|^TestCreatePost$"
|
|
const tests = parts[1].split("|").map((r) => r.replace(/[\^$]/g, ""));
|
|
allHeavyTests.push(...tests);
|
|
}
|
|
}
|
|
}
|
|
|
|
assert.ok(heavyFound, "Should have heavy package splits for api4");
|
|
// All api4 tests should be distributed
|
|
const expectedTests = [
|
|
"TestGetChannel",
|
|
"TestCreatePost",
|
|
"TestUpdateChannel",
|
|
"TestDeleteChannel",
|
|
"TestGetChannelMembers",
|
|
"TestSearchChannels",
|
|
];
|
|
assert.deepEqual(
|
|
allHeavyTests.sort(),
|
|
expectedTests.sort(),
|
|
"All api4 tests should be distributed across shards"
|
|
);
|
|
});
|
|
|
|
it("keeps light packages whole even with timing data", () => {
|
|
const gotestsumJson = [
|
|
'{"Action":"pass","Package":"pkg/light","Test":"TestA","Elapsed":5.0}',
|
|
'{"Action":"pass","Package":"pkg/light","Test":"TestB","Elapsed":3.0}',
|
|
].join("\n");
|
|
|
|
const result = runSolver({
|
|
packages: ["pkg/light"],
|
|
shardIndex: 0,
|
|
shardTotal: 2,
|
|
gotestsumJson,
|
|
});
|
|
|
|
// Light package should be assigned whole, not split
|
|
assert.equal(result.heavy.trim(), "", "Light package should not be in heavy runs");
|
|
assert.ok(
|
|
result.te.includes("pkg/light"),
|
|
"Light package should be in TE packages"
|
|
);
|
|
});
|
|
});
|
|
|
|
describe("JUnit XML fallback", () => {
|
|
it("uses JUnit XML when gotestsum.json is missing", () => {
|
|
const prevReportXml = `<?xml version="1.0" encoding="UTF-8"?>
|
|
<testsuites>
|
|
<testsuite name="pkg/fast" time="10.0" tests="5">
|
|
<testcase name="TestA" time="5.0"/>
|
|
<testcase name="TestB" time="5.0"/>
|
|
</testsuite>
|
|
<testsuite name="pkg/slow" time="50.0" tests="3">
|
|
<testcase name="TestX" time="25.0"/>
|
|
<testcase name="TestY" time="25.0"/>
|
|
</testsuite>
|
|
</testsuites>`;
|
|
|
|
const result = runSolver({
|
|
packages: ["pkg/fast", "pkg/slow"],
|
|
shardIndex: 0,
|
|
shardTotal: 2,
|
|
prevReportXml,
|
|
});
|
|
|
|
assert.ok(
|
|
result.stdout.includes("JUnit XML"),
|
|
"Should indicate using JUnit XML fallback"
|
|
);
|
|
// No heavy splits with XML-only data (no per-test timing)
|
|
assert.equal(result.heavy.trim(), "", "Should not split packages without per-test timing");
|
|
});
|
|
});
|
|
|
|
describe("enterprise package separation", () => {
|
|
it("separates enterprise packages into EE output", () => {
|
|
const packages = [
|
|
"github.com/mattermost/mattermost/server/v8/channels/app",
|
|
"github.com/mattermost/mattermost/server/v8/enterprise/compliance",
|
|
];
|
|
|
|
const result = runSolver({
|
|
packages,
|
|
shardIndex: 0,
|
|
shardTotal: 1,
|
|
});
|
|
|
|
assert.ok(
|
|
result.te.includes("channels/app"),
|
|
"TE should include non-enterprise packages"
|
|
);
|
|
assert.ok(
|
|
result.ee.includes("enterprise/compliance"),
|
|
"EE should include enterprise packages"
|
|
);
|
|
assert.ok(
|
|
!result.te.includes("enterprise"),
|
|
"TE should not include enterprise packages"
|
|
);
|
|
});
|
|
});
|
|
});
|