Files
intersection_benchmark/README.md
2026-03-30 15:17:33 +00:00

9.7 KiB

Intersection Benchmark

This project benchmarks several set-intersection strategies in Rust over the same generated input scenarios.

The benchmark output is split into two timed phases:

  • prepare

    • Measures the conversion from the benchmark's raw input format, a normal array of numbers, into the algorithm's prepared internal representation.
    • This phase does not measure the later intersection itself.
    • By default, raw input generation is not included in this time.
    • If TIME_PREPARE_INCLUDE_INPUT_GENERATION=true in .env, raw input generation is included too.
  • native

    • Measures only the intersection step on already prepared inputs.
    • The result is written into the algorithm's native output representation.
    • This phase does not measure preparation from the raw input arrays.
    • This phase does not measure converting the native result into a plain array of numbers.
    • By default, output clearing and result counting are not included in this time.
    • If enabled in .env, TIME_INTERSECTION_INCLUDE_OUTPUT_CLEAR and TIME_INTERSECTION_INCLUDE_RESULT_COUNT move those extra steps into the timed window.

Important notes about the timing model:

  • Output storage is created before timed native samples begin and then reused across samples.
  • Warmup runs are performed before measured samples and are not included in the reported timings.
  • Printing, formatting, statistics aggregation, and scenario planning are not part of the reported algorithm timings.

In short:

  • prepare answers: how long does it take to build the algorithm's working representation?
  • native answers: how long does it take to compute the intersection into the algorithm's own output format?

Intersection benchmark suite

  • Scenarios: 8
  • Universe: 0..=100000000 (100000001 values)
  • Set populations: sparse=4000 (0.0040%) | semi-sparse=40000 (0.0400%) | normal=400000 (0.4000%) | dense=4000000 (4.000%)
  • Overlap targets: low=10.0% | medium=50.0% | high=80.0%
  • Enabled densities: sparse=true | semi-sparse=true | normal=true | dense=true
  • Enabled overlaps: low=false | medium=true | high=false
  • Sampling: min=2 | max=5 | target total=800ms
  • Enabled algorithms: bitset=true | bitset-simd=false | std-hash=true | splitmix-hash=true | sorted-merge=true
  • Enabled phases: prepare=true | intersection=true
  • Timed extras: prepare input generation=false | intersection output clear=false | intersection result count=false
  • Phases per algorithm: prepare=true | native output intersection=true

Scenario: ordered input | sparse set population = 0.0040% of universe | medium overlap percentage = 50.0% of each set

  • Set population: 4000 / 100000001 values
  • Overlap: requested=50.0% | actual=50.0% | shared=2000/4000
algorithm phase samples mean median min max
bitset prepare 5 411.335us 401.487us 393.202us 444.185us
bitset native 5 950.881us 940.646us 840.562us 1.065ms
std-hash prepare 5 80.875us 80.946us 79.932us 81.558us
std-hash native 5 76.875us 75.985us 74.445us 81.305us
splitmix-hash prepare 5 36.794us 36.825us 36.478us 37.109us
splitmix-hash native 5 35.065us 33.227us 30.197us 43.628us
sorted-merge prepare 5 350ns 303ns 295ns 548ns
sorted-merge native 5 3.568us 3.541us 3.533us 3.646us

Scenario: ordered input | semi-sparse set population = 0.0400% of universe | medium overlap percentage = 50.0% of each set

  • Set population: 40000 / 100000001 values
  • Overlap: requested=50.0% | actual=50.0% | shared=20000/40000
algorithm phase samples mean median min max
bitset prepare 5 3.955ms 3.951ms 3.927ms 4.014ms
bitset native 5 1.428ms 1.449ms 1.356ms 1.507ms
std-hash prepare 5 864.919us 860.986us 854.337us 880.726us
std-hash native 5 905.199us 902.901us 898.793us 913.503us
splitmix-hash prepare 5 413.275us 410.172us 408.810us 423.680us
splitmix-hash native 5 469.869us 467.963us 465.287us 477.299us
sorted-merge prepare 5 6.172us 6.124us 5.921us 6.359us
sorted-merge native 5 36.202us 36.045us 36.023us 36.815us

Scenario: ordered input | normal set population = 0.4000% of universe | medium overlap percentage = 50.0% of each set

  • Set population: 400000 / 100000001 values
  • Overlap: requested=50.0% | actual=50.0% | shared=200000/400000
algorithm phase samples mean median min max
bitset prepare 5 7.272ms 7.200ms 6.820ms 7.893ms
bitset native 5 1.830ms 1.821ms 1.815ms 1.868ms
std-hash prepare 5 11.484ms 11.262ms 10.767ms 12.441ms
std-hash native 5 15.501ms 15.262ms 14.570ms 17.524ms
splitmix-hash prepare 5 5.993ms 6.124ms 5.715ms 6.193ms
splitmix-hash native 5 5.363ms 5.381ms 5.333ms 5.383ms
sorted-merge prepare 5 309.850us 265.482us 248.158us 507.857us
sorted-merge native 5 567.072us 543.143us 527.883us 649.634us

Scenario: ordered input | dense set population = 4.000% of universe | medium overlap percentage = 50.0% of each set

  • Set population: 4000000 / 100000001 values
  • Overlap: requested=50.0% | actual=50.0% | shared=2000000/4000000
algorithm phase samples mean median min max
bitset prepare 5 14.176ms 13.606ms 12.950ms 17.112ms
bitset native 5 1.765ms 1.731ms 1.674ms 1.860ms
std-hash prepare 2 441.875ms 441.875ms 436.380ms 447.370ms
std-hash native 2 445.366ms 445.366ms 442.813ms 447.919ms
splitmix-hash prepare 4 243.982ms 239.375ms 237.612ms 259.564ms
splitmix-hash native 5 51.967ms 49.131ms 48.321ms 56.781ms
sorted-merge prepare 5 11.852ms 11.638ms 11.416ms 12.418ms
sorted-merge native 5 5.537ms 5.530ms 5.517ms 5.582ms

Scenario: unordered input | sparse set population = 0.0040% of universe | medium overlap percentage = 50.0% of each set

  • Set population: 4000 / 100000001 values
  • Overlap: requested=50.0% | actual=50.0% | shared=2000/4000
algorithm phase samples mean median min max
bitset prepare 5 2.264ms 882.303us 845.089us 7.840ms
bitset native 5 1.776ms 1.778ms 1.695ms 1.861ms
std-hash prepare 5 83.110us 80.293us 79.781us 93.585us
std-hash native 5 77.430us 77.762us 74.848us 79.988us
splitmix-hash prepare 5 36.017us 35.957us 35.943us 36.129us
splitmix-hash native 5 33.200us 31.326us 28.101us 42.169us
sorted-merge prepare 5 61.613us 60.791us 55.215us 69.401us
sorted-merge native 5 3.617us 3.533us 3.528us 3.943us

Scenario: unordered input | semi-sparse set population = 0.0400% of universe | medium overlap percentage = 50.0% of each set

  • Set population: 40000 / 100000001 values
  • Overlap: requested=50.0% | actual=50.0% | shared=20000/40000
algorithm phase samples mean median min max
bitset prepare 5 2.221ms 1.596ms 1.463ms 3.909ms
bitset native 5 1.770ms 1.761ms 1.715ms 1.829ms
std-hash prepare 5 882.778us 869.598us 865.722us 910.316us
std-hash native 5 917.268us 915.333us 900.514us 935.037us
splitmix-hash prepare 5 417.845us 420.083us 411.847us 422.302us
splitmix-hash native 5 475.443us 473.060us 466.552us 486.611us
sorted-merge prepare 5 866.901us 867.193us 857.034us 877.401us
sorted-merge native 5 49.398us 48.383us 48.283us 53.209us

Scenario: unordered input | normal set population = 0.4000% of universe | medium overlap percentage = 50.0% of each set

  • Set population: 400000 / 100000001 values
  • Overlap: requested=50.0% | actual=50.0% | shared=200000/400000
algorithm phase samples mean median min max
bitset prepare 5 4.712ms 4.803ms 4.476ms 4.920ms
bitset native 5 1.759ms 1.756ms 1.687ms 1.844ms
std-hash prepare 5 10.839ms 10.826ms 10.524ms 11.178ms
std-hash native 5 15.163ms 14.614ms 14.440ms 17.563ms
splitmix-hash prepare 5 5.632ms 5.632ms 5.585ms 5.679ms
splitmix-hash native 5 5.483ms 5.432ms 5.233ms 5.969ms
sorted-merge prepare 5 10.528ms 10.457ms 10.445ms 10.814ms
sorted-merge native 5 544.184us 534.490us 530.891us 581.634us

Scenario: unordered input | dense set population = 4.000% of universe | medium overlap percentage = 50.0% of each set

  • Set population: 4000000 / 100000001 values
  • Overlap: requested=50.0% | actual=50.0% | shared=2000000/4000000
algorithm phase samples mean median min max
bitset prepare 5 58.153ms 57.261ms 53.448ms 63.278ms
bitset native 5 1.805ms 1.782ms 1.652ms 1.985ms
std-hash prepare 2 461.857ms 461.857ms 444.440ms 479.274ms
std-hash native 2 438.653ms 438.653ms 435.414ms 441.891ms
splitmix-hash prepare 4 252.147ms 250.242ms 243.913ms 264.189ms
splitmix-hash native 5 50.829ms 49.904ms 49.156ms 55.716ms
sorted-merge prepare 5 130.853ms 130.469ms 129.970ms 132.748ms
sorted-merge native 5 6.016ms 5.942ms 5.877ms 6.351ms