* [VAULT-41857] pipeline(find-artifact): add support for finding artifacts from branches (#11799)
Add support for finding matching workflow artifacts from branches rather than PRs. This allows us to trigger custom HCP image builds from a branch rather than an PR. It also enables us to build and test the HCP image on a scheduled nightly cadence, which we've also enabled.
As part of these changes I also added support for specifying which environment you want to test and threaded it through the cloud scenario now that there are multiple variants. We also make the testing workflow workflow_dispatch-able so that we can trigger HVD testing for any custom image in any environment without building a new image.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
- actions/cache -> v5.0.2: A bugfix around not retrying cache entries on
429s.
- actions/setup-go -> v6.2.0: NodeJS bump and internal actions/cache
bump. We don't use the caching in setup-go so this ought to have no
impact for us.
- actions/setup-node -> v6.2.0: internal bump of actions/cache.
- pnpm/action-setup -> v4.2.0: Adds support for .npmrc file.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
Sometimes our CI slack message outputs the wrong information, most
notably the data race failure when only UI tests run but the UI tests
fail. In an effort to fix this false positive I noticed that there are
several error cases we didn't consider when creating the notification.
Now we only report which failures were detected in the message.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* move from yarn to pnpm for package management
* remove lodash.template patch override
* remove .yarn folder
* update GHA to use pnpm
* add @babel/plugin-proposal-decorators
* remove .yarnrc.yml
* add lock file to copywrite ignore
* add @codemirror/view as a dep for its types
* use more strict setting about peerDeps
* address some peerDep issues with ember-power-select and ember-basic-dropdown
* enable TS compilation for the kubernetes engine
* enable TS compilation in kv engine
* ignore workspace file
* use new headless mode in CI
* update enos CI scenarios
* add qs and express resolutions
* run 'pnpm up glob' and 'pnpm up js-yaml' to upgrade those packages
* run 'pnpm up preact' because posthog-js had a vulnerable install. see https://github.com/advisories/GHSA-36hm-qxxp-pg3
* add work around for browser timeout errors in test
* update other references of yarn to pnpm
Co-authored-by: Matthew Irish <39469+meirish@users.noreply.github.com>
This was started to remove a trailing " that would show up when UI tests
failed. Since I was here I normalized our emoji to use `flashing-light`
instead of `rotating_light` because the former is rendered better in the
new Slack instance.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
When a pull request is created against a CE branch and it has changed any files in the `gotoolchain` group we'll automatically trigger the diff for every Go module file in the repo against the equivalent in the corresponding enterprise branch. If there's a delta in like configuration it will automatically fail the `build/ce-checks` job. It will also write a complete explanation of the diff to the step output and also to the `build/ce-checks` job step summary.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
Migrate all slack notifications to the `ibm-hashicorp` workspace. This
required creating three new `incoming-webhook` configurations which are
capable of posting into three different Slack channels, depending on the
workflow.
As they all use the `incoming-webhook` event, many of our integrations
had to be migrated from `chat.postMessage` and those changes are
reflected here.
Of note, there are lots of changes to the `release-procedure-ent`
workflow as it has by far the most uses of the Slack integrations. In
some cases it was to appease `actionlint` issues, in others I made small
idiomatic tweaks. I translated all of the payload messages to YAML
instead of JSON, which fits better into our existing workflows and also
because most of the payload messages were invalid JSON all together.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* [VAULT-39671] tools: use github cache for external tools
We currently have some ~13 tools that we need available both locally for
development and in CI for building, linting, and formatting, and testing Vault.
Each branch that we maintain often uses the same set of tools but often pinned
to different versions.
For development, we have a `make tools` target that will execute the
`tools/tool.sh` installation script for the various tools at the correct pin.
This works well enough but is cumbersome if you’re working across many branches
that have divergent versions.
For CI the problem is speed and repetition. For each build job (~10) and Go test
job (16-52) we have to install most of the same tools for each job. As we have
extremely limited Github Actions cache we can’t afford to cache the entire vault
go build cache, so if we were to build them from source each time we incur a
penalty of downloading all of the modules and building each tool from source.
This yields about an extra 2 minutes per job to install all of the tools. We’ve
worked around this problem by writing composite actions that download pre-built
binaries of the same tools instead of building them from source. That usually
takes a few seconds. The downside of that approach is rate limiting, which
Github has become much more aggressive in enforcing.
That leads us to where we are before this work:
- For builds in the compatibility docker container: the tools are built from
source and cached as separate builder image layer. (usually fast as we get
cache hits, slow on cache misses)
- For builds that compile directly on the runner: the tools are installed on
each job runner by composite github actions (fast, uses API requests, prone
to throttling)
- For tests, they use the same composite actions to install the tools on each
job. (fast, uses API requests, prone to throttling)
This also leads to inconsistencies since there are two sources of truth: the
composite actions have their own version pin outside of those in `tools.sh`.
This has led to drift.
We previously tried to save some API requests and move all builds into
the container. That almost works but docker's build conatiner had a hard
time with some esoteric builds. We could special case it but it's a bandaid at
best.
A prior version of this work (VAULT-39654) investigated using `go tool`, but
there were some showstopper issues with that workflow that make it a non-starter
for us. Instead, we’ll attempt to use more actions cache to resolve the
throttling. This will allow us to have a single source of truth for tools, their
pins, and afford us the same speed on cache hits as we had previously without
downloading the tools from github releases thousands of times per day.
We add a new composite github action for installing our tools.
- On cache misses it builds the tools and installs them into a cacheable path.
- On cache hits it restore the cacheable path.
- It adds the tools to the GITHUB_PATH to ensure runner based jobs can find
them.
- For Docker builds it mounts the tools at `/opt/tools/bin` which is
part of the PATH in the container.
- It uses a cache key of the SHA of the tools directory along with the
working directory SHA which is required to deal with actions/cache
issues.
This results in:
- A single source of truth for tools and their pins
- A single cache for tools that can be re-used between all CI and build jobs
- No more Github API calls for tooling. *_Rate limiting will be a thing of
the past._*
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
[VAULT-39160] actions(hcp): add support for testing custom images on HCP (#9345)
Add support for running the `cloud` scenario with a custom image in the
int HCP environment. We support two new tags that trigger new
functionality. If the `hcp/build-image` tag is present on a PR at the
time of `build`, we'll automatically trigger a custom build for the int
environment. If the `hcp/test` tag is present, we'll trigger a custom
build and run the `cloud` scenario with the resulting image.
* Fix a bug in our custom build pattern to handle prerelease versions.
* pipeline(hcp): add `--github-output` support to `show image` and
`wait image` commands.
* enos(hcp/create_vault_cluster): use a unique identifier for HVN
and vault clusters.
* actions(enos-cloud): add workflow to execute the `cloud` enos
scenario.
* actions(build): add support for triggering a custom build and running
the `enos-cloud` scenario.
* add more debug logging and query without a status
* add shim build-hcp-image for CE workflows
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
Update our pins to the latest version. Essentially all of these are
related actions needing to run on Node 24. Both our self-hosted and the
Github hosted runners that we use are all on a new enough version of
actions/runner that it shouldn't be a problem.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* license: add support for publishing artifacts to IBM PAO (#8366)
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: brian shore <bshore@hashicorp.com>
Co-authored-by: Ethel Evans <ethel.evans@hashicorp.com>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* VAULT-34830: enable the new workflow (#8661)
* pipeline: various fixes for the cutover to the enterprise first workflow (#8686)
Various small fixes that were discovered when doing the cutover to the enterprise first merge workflow:
- The `actions-docker-build` action infers enterprise metadata magically from the repository name. Use a branch that allows configuring the repo name until it's merged upstream.
- Fix some CE-In-Enterprise outputs in our metadata job.
- Pass the recurse depth flag correctly when creating backports
- Set the package name when calling the `build-vault` composite action
- Disallow merging changes into `main` and `release/*` when executing in the `hashicorp/vault` repository. This is a hack until PSS-909 is resolved.
- Use self-hosted runners when testing arm64 CE containers in enterprise.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Conflicts:
.github/workflows/backport-automation-ent.yml
.github/workflows/test-run-enos-scenario-containers.yml
---------
Signed-off-by: Ryan Cragun <me@ryan.ec>
Various small changes and tweaks to our CI/CD workflows to allow for running CE branches in the context of `hashicorp/vault-enterprise`.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Ubuntu 20.04 has reached EOL and is no longer a supported runner host distro. Historically we've relied on it for our CGO builds as it contains an old enough version of glibc that we can retain compatibility with all of our supported distros and build on a single host distro. Rather than requiring a new RHEL 8 builder (or some equivalent), we instead build CGO binaries inside an Ubuntu 20.04 container along with its glibc and various C compilers.
I've separated out system package changes, the Go toolchain install, and external build tools tools install into different container layers so that the builder container used for each branch is maximally cacheable.
On cache misses these changes result in noticeably longer build times for CGO binaries. That is unavoidable with this strategy. Most of the time our builds will get a cache hit on all layers unless they've changed any of the following:
- .build/*
- .go-version
- .github/actions/build-vault
- tools/tools.sh
- Dockerfile
I've tried my best to reduce the cache space used by each layer. Currently our build container takes about 220MB of cache space. About half of that ought to be shared cache between main and release branches. I would expect total new cache used to be in the 500-600MB range, or about 5% of our total space.
Some follow-up idea that we might want to consider:
- Build everything inside the build container and remove the github actions that set up external tools
- Instead of building external tools with `go install`, migrate them into build scripts that install pre-built `linux/amd64` binaries
- Migrate external to `go tool` and use it in the builder container. This requires us to be on 1.24 everywhere so ought not be considered until that is a reality.
Signed-off-by: Ryan Cragun <me@ryan.ec>
* VAULT-34834: pipeline: add better heuristics for changed files
To fully support automated Enterprise to Community backports we need to
have better changed file detection for community and enterprise only
files. Armed with this metadata, future changes will be able to inspect
changed files and automatically remove enterprise only files when
creating the CE backports.
For this change we now have the following changed file groups:
- autopilot
- changelog
- community
- docs
- enos
- enterprise
- app
- gotoolchain
- pipeline
- proto
- tools
- ui
Not included in the change, but something I did while updating out
checkers was generate a list of files that included only in
vault-enterprise and run every path the enterprise detection rules
to ensure that they are categorized appropriately post changes in
VAULT-35431. While it's possible that they'll drift, our changed
file categorization is best effort anyway and changes will always
happen in vault-enterprise and require a developer to approve the
changes.
We've also included a few new files into the various groups and updated
the various workflows to use the new categories. I've also included a
small change to the pipeline composite action whereby we do not handle
Go module caching. This will greatly reduce work on doc-only branches
that need only ensure that the pipeline binary is compiled.
Signed-off-by: Ryan Cragun <me@ryan.ec>
* VAULT-34822: Add `pipeline github list changed-files`
Add a new `github list changed-files` sub-command to `pipeline` command and
integrate it into the pipeline. This replaces our previous
`changed-files.sh` script.
This command works quite a bit differently than the full checkout and
diff based solution we used before. Instead of checking out the base ref
and head ref and comparing a diff, we now provide either a pull request
number or git commit SHA and use the Github REST API to determine the
changed files.
This approach has several benefits:
- Not requiring a local checkout of the repo to get the list of
changed files. This yields a significant perfomance improvement in
`setup` jobs where we typically determine the changed files list.
- The CLI supports both PRs and commit SHAs.
- The implementation is portable and doesn't require any system tools
like `git` or `bash` to be installed.
- A much more advanced system for adding group metadata to the changed
files. These groupings are going to be used heavily in future
pipeline automation work and will be used to make required jobs
smarter.
The theoretical drawbacks:
- It requires a GITHUB_TOKEN and only works for remote branches or
commits in Github. We could eventually add a local diff sub-command
or option to work locally, but that was not required for what we're
trying to achieve here.
While the groupings that I added in this change are quite rudimentary,
the system will allow us to add additional groups with very little
overhead. I tried to make this change more or less a port of the old
system to enable future work. I did include one small change of
behavior, which is that we now build all extended targets if the
`go.mod` or `go.sum` files change. We do this to ensure that dependency
changes don't subtly result in some extended platform breakage.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Verify vault secret integrity in unauthenticated I/O streams (audit log, STDOUT/STDERR via the systemd journal) by scanning the text with Vault Radar. We search for both known and unknown secrets by using an index of KVV2 values and also by radar's built-in heuristics for credentials, secrets, and keys.
The verification has been added to many scenarios where a slight time increase is allowed, as we now have to install Vault Radar and scan the text. In practice this adds less than 10 seconds to the overall duration of a scenario.
In the in-place upgrade scenario we explicitly exclude this verification when upgrading from a version that we know will fail the check. We also make the verification opt-in so as to not require a Vault Radar license to run Enos scenarios, though it will always be enabled in CI.
As part of this we also update our enos workflow to utilize secret values from our self-hosted Vault when executing in the vault-enterprise repo context.
Signed-off-by: Ryan Cragun <me@ryan.ec>
* VAULT-31402: Add verification for all container images
Add verification for all container images that are generated as part of
the build. Before this change we only ever tested a limited subset of
"default" containers based on Alpine Linux that we publish via the
Docker hub and AWS ECR.
Now we support testing all Alpine and UBI based container images. We
also verify the repository and tag information embedded in each by
deploying them and verifying the repo and tag metadata match our
expectations.
This does change the k8s scenario interface quite a bit. We now take in
an archive image and set image/repo/tag information based on the
scenario variants.
To enable this I also needed to add `tar` to the UBI base image. It was
already available in the Alpine image and is used to copy utilities to
the image when deploying and configuring the cluster via Enos.
Since some images contain multiple tags we also add samples for each
image and randomly select which variant to test on a given PR.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Update hashicorp/actions-packaging-linux to our rewritten version
that no longer requires building a Docker container or relies on code
hosted in a non-hashicorp repo for packaging.
As internal actions are not managed in the same manner as external
actions in via the tsccr trusted components db, the tsccr helper is
unable to easily re-pin hashicorp/* actions. As such, we unpin some
pinned hashicorp/* actions to automatically pull in updates that are
compatible.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Update the Github Actions pins to use the next generation of actions
that are supported by CRT.
In some cases these are simply to resolve Node 16 deprecations. In
others, we can now use `action/upload-artifact@v4` and
`actions/download-artifact@v4` since the next generation of actions like
`hashicorp/actions-docker-build@v2` and
`hashicorp/actions-persist-metadata@v2` use the `v4` versions of these.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Pin to the latest actions in preparation for the migration to
`actions/upload-artifact@v4`, `actions/download-artifact@v4`, and
`hashicorp/actions-docker-build@v2` on May 6 or 7.
Signed-off-by: Ryan Cragun <me@ryan.ec>