# Benchmarking the search implementations

This directory contains utilities for benchmarking the PH (perfect
hash) and QP (quadbit patricia trie) implementations, used by the VMOD
for the `.match()` and `.hasprefix()` methods, respectively. They are
meant to aid testing the search algorithms and implementations, and do
not measure any overhead added by the VMOD or Varnish.

The directory also contains files with test data and inputs, some of
which are meant to simulate common use cases for the VMOD.

As documented in [CONTRIBUTING](../../../CONTRIBUTING.rst), the
benchmarks are included in builds when ``configure`` is invoked with
``--enable-benchmarks``. They are always built when ``make`` is
invoked in this directory, but require that the ``ph.o`` or ``qp.o``
object file is built first.

## `bench_ph` -- benchmark perfect hashing

`bench_ph` reads a set of strings from a file or stdin, runs exact
matches against the strings, and reports statistics about the match
operation:

```
bench_ph [-hs] [-c csvfile] [-d dumpfile] [-i inputfile] [-n iterations] [file]
```

`bench_ph` reads the string set from `file`, or from stdin if no file
is specified. Each line of input forms a string in the set, excluding
the terminating newline.

The string set MAY NOT include the same string more than once, but
`bench_ph` does not check for duplicates, and will likely not
terminate if duplicates are present. (The VMOD runs `QP_Insert()`,
which rejects duplicate strings, before building the perfect hash.)

If `-i inputfile` is specified, then test inputs -- strings to be
matched against the set -- are read from `inputfile`, one string per
line (newlines excluded). If there is no `inputfile` then the strings
from the set are also used as test inputs, in which case every lookup
is a successful match. Lookup misses can only be tested by using an
input file.

`-n iterations` specifies the number of times each test input string
is matched against the set, default 1000. If `-n 0` is specified, then
`bench_ph` builds the set and reports statistics about it, and may
generate a dump file if requested, but does not run any matches.

If `-s` is specified, the test inputs are shuffled before each
iteration.  This may reveal effects of locality of reference and
branch prediction on the performance of matches.

Note that the benchmarks may implement an unnatural usage pattern --
every test input is matched against the set exactly the same number of
times. Real-world usages commonly match some strings more frequently
than others, which may be beneficial for locality and branch
prediction.  Performance differences with and without `-s` show that
there can be an impact -- generally, mean match times tend to be
longer for large string sets and/or for sets with long strings. But it
may be necessary to craft input data to simulate real usage patterns,
for example by including some strings more frequently in the input.

If `-d dumpfile` is specified, then the text dump of the perfect hash
structure produced by `PH_Dump()` is written to `dumpfile`.

If `-c csvfile` is specified, then `csvfile` is written with data
about each match operation in the benchmark. For consistency, the
format is the same used for `bench_qp` (see below). The CSV file has a
header line with the column names, with the following columns:

- `type`: always `match`

- `matches`: 1 for a match, 0 for a miss

- `exact`: 1 for a match, 0 for a miss

- `t`: time for the match operation in nanoseconds

If `-h` is specified, `bench_ph` prints a usage message and exits.

A benchmark proceeds as follows, with timings obtained by calling
`clock_gettime(2)` just before and just after each operation to be
measured, using the monotonic clock:

* The string set is read from `file` or stdin, and test inputs are
  read if there is an `inputfile`.

* The perfect hash is generated by calling `PH_Init()` (with seeds
  from `/dev/urandom`) and `PH_Generate()`. The total time for
  `PH_Generate()` is reported, as well as the mean time per string in
  the set.

* If a `dumpfile` was specified, call `PH_Dump()` and write its
  contents to the file.

* Statistics obtained from `PH_Stats()` are printed, as well as the
  time to run `PH_Stats()`.

* Exit if `-n` is set to 0.

* Run the benchmark with the specified number of iterations (default
  1000).

* Report results:

  * The number of match operations executed.

  * The number of matches and misses.

  * The cumulative time for all match operations, and the mean time
    per operation.

  * Throughput, as the number of operations per second.

* Report stats from `getrusage(2)` for the complete run of `bench_ph`:

  * user and system time

  * numbers of voluntary and involuntary context switches as `vcsw`
    and `ivcsw`

Since the match operation is CPU and memory bandwidth intensive, mean
times may be increased if `ivcsw` is high.

Examples:

```
# Benchmark the set in set.txt, using inputs from inputs.txt, with 100
# iterations, shuffling inputs on each iteration, and recording
# results in set.csv.
$ ./bench_ph -s -i inputs.txt -n 100 -c set.csv set.txt

# Form a set from 5000 strings chosen randomly from the words list,
# using the set as its own test inputs.
$ shuf -n 5000 /usr/share/dict/words | ./bench_ph
```

## `bench_qp` -- benchmark trie matches

Like `bench_ph`, `bench_qp` tests the QP implementation by reading a
set of strings from a file or stdin, running prefix matches and/or
exact matches against the strings, and reporting statistics about the
match operation:

```
bench_qp [-hos] [-c csvfile] [-d dumpfile] [-i inputfile] [-m m|p]
         [-n iterations] [file]
```

The `-h`, `-s`, `-c`, `-d`, `-i`, `-n` options and the optional `file`
argument have the same meaning as described above for `bench_ph`. The
discussion above concerning `-s` and the effects of usage patterns,
locality and branch prediction apply here as well.

By default, `bench_qp` runs both prefix matches and exact matches
against the set, with `QP_Prefixes()` and `QP_Lookup()` respectively.
Note that the VMOD does not use `QP_Lookup()`, since exact matches
with perfect hashing is faster for all but some unusual data sets.

The `-m` option can be set to `p` to run only prefix matches, or `m`
to run only exact matches. So to test only what the VMOD uses, specify
`-m p`.

If `-o` is specified, then the `allow_overlaps` flag for `QP_Insert()`
is set to 0. In that case, a set in which a string is a prefix for
another string in the set is rejected. By default, overlaps are
allowed.

The procedure of a benchmark is the same as described above for
`bench_ph`, except as follows:

* The string set is sorted before building the trie, using `qsort(2)`
  as the VMOD does. The time for the sort is reported.

* The trie is built by iterating `QP_Insert()` over the sorted set.

* Stats are obtained from `QP_Stats()`, and an optional dump file is
  written with the contents from `QP_Dump()`.

* The format of a CSV file is:

  * `type`: `prefix` for a prefix match, `match` for an exact match

  * `matches`: the number of matches found, which can be > 1 when
    there are common prefixes. 0 for non-matches.

  * `exact`: 1 if an exact match was found, 0 otherwise. Always 1
    on a benchmark for `QP_Lookup()`.

  * `t`: time for the match operation in ns (as for `bench_ph`)

* Benchmarks iterate the input for `QP_Prefixes()`, `QP_Lookup()`, or
  both.

Example:

```
# Benchmark prefix matches for the set in url.txt, using inputs from
# urlpfx_input.txt, with default 1000 iterations and no shuffling.
./bench_qp -m p -i urlpfx_input.txt url.txt
```

## Test data

The remaining files in the directory contain sample test data and inputs.

`set.txt` contains 8500 words chosen randomly from the words list
(`/usr/share/dict/words`).

`inputs.txt` contains the words in `set.txt` repeated four times, and
8500 random strings, all shuffled randomly. So this benchmark tests PH
with an 80% hit rate and 20% miss rate:

```
$ ./bench_ph -i inputs.txt set.txt
```

`url.txt` simulates a set of URL path prefixes of the form `/<string>`
-- 500 random choices from the words list, each with a leading `/`.

`urlmatch_input.txt` and `urlpfx_input.txt` can be used as inputs with
`url.txt` for exact and prefix matches,
respectively. `urlmatch_inputs.txt` contains the strings in `url.txt`
repeated four times, and 500 URL prefixes with random
strings. `urlpfx_input.txt` contains 2500 URL paths with five path
components, 2000 of which have the same prefixes as in `url.txt`, the
rest generated randomly.

So these benchmarks test the set in `url.txt` with 80% hit and 20% miss
rates, for exact matches and prefix matches:

```
# exact URL matches
$ ./bench_ph -i urlmatch_input.txt url.txt

# URL prefix matches
$ ./bench_qp -m p -i urlpfx_input.txt url.txt
```

The set in `hosts.txt` simulates 500 host names, starting with `www.`
and ending with the nine most common TLDs, with random choices from
the words list for the "subdomain" in between.

`hosts_input.txt` contains the strings in `hosts.txt` repeated four
times, and 500 additional simulated hosts, all shuffled:

```
# Benchmark 80% hits and 20% misses for Host matches
$ ./bench_ph -i hosts_input.txt hosts.txt
```

`moz500.txt` contains the "top 500 domains" from moz.com, downloaded
on March 10, 2020. `moz500_input.txt` contains the host names in
`moz500.txt` repeated four times, and 500 additional simulated host
names, all shuffled.

```
# Another "80-20" benchmark for Host matches
$ ./bench_ph -i moz500_input.txt moz500.txt
```

`methods.txt` contains the nine standard HTTP methods (GET, POST, etc),
and `methods_input.txt` contains 1000 of the methods, with each of the
nine in approximately equal distribution, randomly shuffled.

`allowed_methods.txt` contains only GET, HEAD and
POST. `rest_methods.txt` contains the six standard methods for a REST
API.

These can be used to benchmark matches against the request methods,
for example to generate a 405 "Method Not Allowed" synthetic response
when the method does not match. This would override the logic in
builtin VCL, which checks the method in `vcl_recv` using a sequence of
comparisons, and returns pipe if the method is not one of 8 standard
methods.

```
# Benchmark matches against the nine standard methods, with 10,000
# iterations
$ ./bench_ph -i methods_input.txt -n 10000 methods.txt

# Benchmark matches against only GET, HEAD and POST
$ ./bench_ph -i methods_input.txt -n 10000 allowed_methods.txt

# Benchmark matches against standard REST methods
$ ./bench_ph -i methods_input.txt -n 10000 rest_methods.txt
```

`compressible.txt` contains prefixes for the Content-Type header for
content that may be compressible. `mediatypes.txt` contains all of the
IANA standard media types as of June 10, 2020.

So this benchmark simulates running a prefix match against the
Content-Type header to decide if compression should be applied to a
backend response:

```
# Benchmark prefix matches against media types, with 10,000 iterations
$ ./bench_qp -m p -i mediatypes.txt -n 10000 compressible.txt
```

`asciipfx.txt` contains all of the ordered sequences of the printable
ASCII characters from length 1 to 95. This is a worst-case test for
prefix matching, as all of the strings in the set have common
prefixes. The trie has no fanout, instead forming a linear structure
of depth 95.  `asciipfx_input.txt` contains test input.

```
# Worst-case test for prefix matching
$ ./bench_qp -m p -i asciipfx_input.txt asciipfx.txt
```
