Work in progress, please report problems to @pspacek
Decide what you want to test
- list of versions to be tested
- at least one value must be provided
- accepts commit or branch or tag name
- v9_11_31 is always added on top of user-specified list of versions and serves as reference
SHOTGUN_SCENARIO- udp (default), tcp, dot, doh
SHOTGUN_TRAFFIC_MULTIPLIER- simulated load - if unsure leave default value "10", which is roughly 80 k QPS; that is about the maximum v9_11_31 can handle in our setup on UDP
SHOTGUN_DURATION- first 60 seconds (default) is most interesting because we always start with fresh instance and an empty cache
SHOTGUN_ROUNDS- how many test rounds - three recommended (default) so we are not fooled by noise
SHOTGUN_TRAFFIC_MULTIPLIER accept either one string or list of strings in Python syntax:
['main', 'v9_16_15', '1234567abcdef']. If multiple parameters contain list then Cartesian product of all provided values is tested.
Example #1: Determining maximum load for one version
[10, 12, 14]
Run specified version and fire at it over UDP "10 x base load", "12 x base load", "14 x base load". Repeat three times for each load value. Produces 9 (3 load x 3 runs) charts with response rates + one for reference v9_11_31. Good for finding maximum load by determining when response load starts dropping.
Example #2: Comparing performance between versions
['main', 'v9_16_15', '1234567abcdef']
Run each version three times, and fire "10 x base load" at it over UDP (roughly 10 x 8 k QPS). Produces 9 (3 versions x 3 runs) charts with response rates + one for reference v9_11_31. Good for comparison between versions. Assumes the load (traffic multiplier) is set to a value where at least one version is able to keep up, otherwise it would be hard to interpret results.
Go to https://gitlab.isc.org/isc-projects/bind9/-/pipelines/new and select branch
pspacek/ci-aws-integr2. Do not worry, you will enter versions to be tested later.
Wait couple seconds until Gitlab shows you this form:
(If it does not show up, reload page and click around for a while. This Gitlab form can be flaky.)
Fill in parameters and click on
Go to https://gitlab.isc.org/isc-projects/bind9/-/pipelines and find your new pipeline. Beware, branch will be shown as
pspacek/ci-aws-integr2. Open the "job map":
Eventually the job map will dynamically add a "child pipeline". Here do the least obvious thing and click on the little black arrow (denoted by the huge red arrow), and it will magically expand and show all the test jobs and final postprocessing job.
Wait until all jobs are finished. Then download all artifacts from the
postproc job, it has all the charts and also profiling flame charts for each run.
SVG charts in the artifacts are various representations of the same data. Depending on what you are trying to find out it might be beneficial to either look at individual runs and study rcodes, or look at summary charts for all runs without rcodes etc.
Obviously DNS Shotgun does not provide information "why" something is happening, that's left to you imagination.
Bear in mind that we are testing against the live Internet, so results are noisy and can change over time. Do not compare "old" and "new" results, it's better to retest then to chase ghosts of non-existing performance regressions.
Obviously you can also ask @pspacek :-)
- User starts Shotgun test job in Gitlab CI with tag
- TODO: describe child pipeline generation magic
- Gitlab directs job to dedicated Docker executor on VM running inside AWS.
- Runner VM needs couple configuration tricks to get IPv6 to work inside Docker container running on AWS VM
- Docker executor starts
.gitlab-ci.yamlscript inside dedicated Docker image shotgun-controller
shotgun_aws.pycreates two ephemeral VMs in AWS, dedicated for this test:
- To do that, the script needs AWS permissions to manage VMs and related resources. Runner machine is associated with special AWS ACL
- VMs use AMI (VM image) dedicated for DNS Shotgun tests
- To avoid hardcoding values into
shotgun_aws.py, the script uses AWS Launch Template (ID
lt-0161f30b78633fdb2) which can be modified in AWS console
- New VMs are tagged with timestamp in
isc:remove_aftertag, which denotes deadline after which the job has to be finished and all resources in AWS can be deleted. This is intended as guard against unlimited spending when Gitlab CI job is cancelled before it finishes teardown phase.
- Cleanup script
cleanup_ephemeral.pyis run from cron job on Gitlab CI runner machine.
- When VMs are ready,
shotgun_aws.pyexecutes Ansible playbook which orchestrates the test on the two VMs.
- The Ansible playbook connects to test VMs using SSH and runs DNS Shotgun on one machine and resolver under test on the other machine.
- VMs act as Docker hosts, i.e. Shotgun and resolver run inside Docker containers
- Docker networking is disabled using
- VM running DNS Shotgun has extra partition with PCAPs (AWS snapshot ID
- When test is finished, Ansible playbook gathers test results and stores them inside Docker container executed directly by Gitlab CI.
- TODO: Describe result gathering & postprocessing magic.