Documentation: use of Python 'set' datatype produces non-deterministic results
Summary
The Sphinx-generated HTML documentation for BIND9 is likely to vary between builds due to non-deterministic iteration order over the Python set
object instances that are used to de-duplicate tags appearing in the source documentation.
One of the relevant de-duplication implementations is found here in v9.19.19: https://gitlab.isc.org/isc-projects/bind9/-/blob/v9.19.19/doc/arm/_ext/iscconf.py#L128-130
I have reported this downstream in Debian's bugtracker and provided a possible patch there: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1064782
cc @ondrej
BIND version affected
I believe the relevant set(...)
de-duplication logic affects branches:
- v9.20.0 onwards
- v9.19.3 onwards
- v9.18.5 onwards
Steps to reproduce
- Create a
debian:stable
(bookworm) container. - Install required dependencies (
apt update && apt install -y autoconf automake git libcap-dev libssl-dev libtool libuv1-dev liburcu-dev python3-sphinx python3-sphinx-rtd-theme pkg-config
). - Retrieve the BIND9 sources (
cd && git clone https://gitlab.isc.org/isc-projects/bind9.git/
) - Checkout an affected release tag (e.g.
cd bind9 && git checkout v9.19.19
) - Configure the build
autoreconf -fi
./configure --disable-doh
- Build the HTML documentation twice
cd doc && make html
sha256sum arm/_build/html/reference.html
mv arm/_build arm/_build.old && make html
sha256sum arm/_build/html/reference.html
- Optionally, inspect detailed differences between the
reference.html
contents
diff arm/_build*/html/reference.html
What is the current bug behavior?
The content of reference.html
changes between each build; in particular, the :tags:
option entries listed in the Sphinx RST source documentation on namedconf:statement
directives typically vary.
A snippet from the diff
above on a recent local build here displays, among other differences:
11689c11689
< <td><p>query, server, zone</p></td>
---
> <td><p>query, zone, server</p></td>
What is the expected correct behavior?
Two properties are desired:
- The contents of
arm/reference.html
should be bit-for-bit identical for each documentation build -- this allows downstream consumers to confirm that distributed copies are genuine by rebuilding it from source if they wish to (this is a reproducible build goal). - The order of the tags written by the author(s) should be preserved -- in some cases it could communicate a priority ordering (however, duplicate tags should still be removed).
Relevant configuration files
N/A
Relevant logs
The expected output from a successful HTML documentation build is:
Making html in .
make[1]: Entering directory '/root/bind9/doc'
make[1]: Nothing to be done for 'html-am'.
make[1]: Leaving directory '/root/bind9/doc'
Making html in misc
make[1]: Entering directory '/root/bind9/doc/misc'
make[1]: Nothing to be done for 'html'.
make[1]: Leaving directory '/root/bind9/doc/misc'
Making html in man
make[1]: Entering directory '/root/bind9/doc/man'
make[1]: Nothing to be done for 'html'.
make[1]: Leaving directory '/root/bind9/doc/man'
Making html in arm
make[1]: Entering directory '/root/bind9/doc/arm'
SPHINX html-local
make[1]: Leaving directory '/root/bind9/doc/arm'