# Intersection Benchmark This project benchmarks several set-intersection strategies in Rust over the same generated input scenarios. The benchmark output is split into two timed phases: - `prepare` - Measures the conversion from the benchmark's raw input format, a normal array of numbers, into the algorithm's prepared internal representation. - This phase does not measure the later intersection itself. - By default, raw input generation is not included in this time. - If `TIME_PREPARE_INCLUDE_INPUT_GENERATION=true` in `.env`, raw input generation is included too. - `native` - Measures only the intersection step on already prepared inputs. - The result is written into the algorithm's native output representation. - This phase does not measure preparation from the raw input arrays. - This phase does not measure converting the native result into a plain array of numbers. - By default, output clearing and result counting are not included in this time. - If enabled in `.env`, `TIME_INTERSECTION_INCLUDE_OUTPUT_CLEAR` and `TIME_INTERSECTION_INCLUDE_RESULT_COUNT` move those extra steps into the timed window. Important notes about the timing model: - Output storage is created before timed `native` samples begin and then reused across samples. - Warmup runs are performed before measured samples and are not included in the reported timings. - Printing, formatting, statistics aggregation, and scenario planning are not part of the reported algorithm timings. In short: - `prepare` answers: how long does it take to build the algorithm's working representation? - `native` answers: how long does it take to compute the intersection into the algorithm's own output format? # Intersection benchmark suite - Scenarios: 8 - Universe: `0..=100000000` (`100000001` values) - Set populations: sparse=`4000` (0.0040%) | semi-sparse=`40000` (0.0400%) | normal=`400000` (0.4000%) | dense=`4000000` (4.000%) - Overlap targets: low=10.0% | medium=50.0% | high=80.0% - Enabled densities: sparse=`true` | semi-sparse=`true` | normal=`true` | dense=`true` - Enabled overlaps: low=`false` | medium=`true` | high=`false` - Sampling: min=`2` | max=`5` | target total=`800ms` - Enabled algorithms: bitset=`true` | bitset-simd=`false` | std-hash=`true` | splitmix-hash=`true` | sorted-merge=`true` - Enabled phases: prepare=`true` | intersection=`true` - Timed extras: prepare input generation=`false` | intersection output clear=`false` | intersection result count=`false` - Phases per algorithm: prepare=`true` | native output intersection=`true` ## Scenario: ordered input | sparse set population = 0.0040% of universe | medium overlap percentage = 50.0% of each set - Set population: `4000` / `100000001` values - Overlap: requested=50.0% | actual=50.0% | shared=`2000/4000` | algorithm | phase | samples | mean | median | min | max | | --- | --- | ---: | ---: | ---: | ---: | ---: | | bitset | prepare | 5 | 411.335us | 401.487us | 393.202us | 444.185us | | bitset | native | 5 | 950.881us | 940.646us | 840.562us | 1.065ms | | std-hash | prepare | 5 | 80.875us | 80.946us | 79.932us | 81.558us | | std-hash | native | 5 | 76.875us | 75.985us | 74.445us | 81.305us | | splitmix-hash | prepare | 5 | 36.794us | 36.825us | 36.478us | 37.109us | | splitmix-hash | native | 5 | 35.065us | 33.227us | 30.197us | 43.628us | | sorted-merge | prepare | 5 | 350ns | 303ns | 295ns | 548ns | | sorted-merge | native | 5 | 3.568us | 3.541us | 3.533us | 3.646us | ## Scenario: ordered input | semi-sparse set population = 0.0400% of universe | medium overlap percentage = 50.0% of each set - Set population: `40000` / `100000001` values - Overlap: requested=50.0% | actual=50.0% | shared=`20000/40000` | algorithm | phase | samples | mean | median | min | max | | --- | --- | ---: | ---: | ---: | ---: | ---: | | bitset | prepare | 5 | 3.955ms | 3.951ms | 3.927ms | 4.014ms | | bitset | native | 5 | 1.428ms | 1.449ms | 1.356ms | 1.507ms | | std-hash | prepare | 5 | 864.919us | 860.986us | 854.337us | 880.726us | | std-hash | native | 5 | 905.199us | 902.901us | 898.793us | 913.503us | | splitmix-hash | prepare | 5 | 413.275us | 410.172us | 408.810us | 423.680us | | splitmix-hash | native | 5 | 469.869us | 467.963us | 465.287us | 477.299us | | sorted-merge | prepare | 5 | 6.172us | 6.124us | 5.921us | 6.359us | | sorted-merge | native | 5 | 36.202us | 36.045us | 36.023us | 36.815us | ## Scenario: ordered input | normal set population = 0.4000% of universe | medium overlap percentage = 50.0% of each set - Set population: `400000` / `100000001` values - Overlap: requested=50.0% | actual=50.0% | shared=`200000/400000` | algorithm | phase | samples | mean | median | min | max | | --- | --- | ---: | ---: | ---: | ---: | ---: | | bitset | prepare | 5 | 7.272ms | 7.200ms | 6.820ms | 7.893ms | | bitset | native | 5 | 1.830ms | 1.821ms | 1.815ms | 1.868ms | | std-hash | prepare | 5 | 11.484ms | 11.262ms | 10.767ms | 12.441ms | | std-hash | native | 5 | 15.501ms | 15.262ms | 14.570ms | 17.524ms | | splitmix-hash | prepare | 5 | 5.993ms | 6.124ms | 5.715ms | 6.193ms | | splitmix-hash | native | 5 | 5.363ms | 5.381ms | 5.333ms | 5.383ms | | sorted-merge | prepare | 5 | 309.850us | 265.482us | 248.158us | 507.857us | | sorted-merge | native | 5 | 567.072us | 543.143us | 527.883us | 649.634us | ## Scenario: ordered input | dense set population = 4.000% of universe | medium overlap percentage = 50.0% of each set - Set population: `4000000` / `100000001` values - Overlap: requested=50.0% | actual=50.0% | shared=`2000000/4000000` | algorithm | phase | samples | mean | median | min | max | | --- | --- | ---: | ---: | ---: | ---: | ---: | | bitset | prepare | 5 | 14.176ms | 13.606ms | 12.950ms | 17.112ms | | bitset | native | 5 | 1.765ms | 1.731ms | 1.674ms | 1.860ms | | std-hash | prepare | 2 | 441.875ms | 441.875ms | 436.380ms | 447.370ms | | std-hash | native | 2 | 445.366ms | 445.366ms | 442.813ms | 447.919ms | | splitmix-hash | prepare | 4 | 243.982ms | 239.375ms | 237.612ms | 259.564ms | | splitmix-hash | native | 5 | 51.967ms | 49.131ms | 48.321ms | 56.781ms | | sorted-merge | prepare | 5 | 11.852ms | 11.638ms | 11.416ms | 12.418ms | | sorted-merge | native | 5 | 5.537ms | 5.530ms | 5.517ms | 5.582ms | ## Scenario: unordered input | sparse set population = 0.0040% of universe | medium overlap percentage = 50.0% of each set - Set population: `4000` / `100000001` values - Overlap: requested=50.0% | actual=50.0% | shared=`2000/4000` | algorithm | phase | samples | mean | median | min | max | | --- | --- | ---: | ---: | ---: | ---: | ---: | | bitset | prepare | 5 | 2.264ms | 882.303us | 845.089us | 7.840ms | | bitset | native | 5 | 1.776ms | 1.778ms | 1.695ms | 1.861ms | | std-hash | prepare | 5 | 83.110us | 80.293us | 79.781us | 93.585us | | std-hash | native | 5 | 77.430us | 77.762us | 74.848us | 79.988us | | splitmix-hash | prepare | 5 | 36.017us | 35.957us | 35.943us | 36.129us | | splitmix-hash | native | 5 | 33.200us | 31.326us | 28.101us | 42.169us | | sorted-merge | prepare | 5 | 61.613us | 60.791us | 55.215us | 69.401us | | sorted-merge | native | 5 | 3.617us | 3.533us | 3.528us | 3.943us | ## Scenario: unordered input | semi-sparse set population = 0.0400% of universe | medium overlap percentage = 50.0% of each set - Set population: `40000` / `100000001` values - Overlap: requested=50.0% | actual=50.0% | shared=`20000/40000` | algorithm | phase | samples | mean | median | min | max | | --- | --- | ---: | ---: | ---: | ---: | ---: | | bitset | prepare | 5 | 2.221ms | 1.596ms | 1.463ms | 3.909ms | | bitset | native | 5 | 1.770ms | 1.761ms | 1.715ms | 1.829ms | | std-hash | prepare | 5 | 882.778us | 869.598us | 865.722us | 910.316us | | std-hash | native | 5 | 917.268us | 915.333us | 900.514us | 935.037us | | splitmix-hash | prepare | 5 | 417.845us | 420.083us | 411.847us | 422.302us | | splitmix-hash | native | 5 | 475.443us | 473.060us | 466.552us | 486.611us | | sorted-merge | prepare | 5 | 866.901us | 867.193us | 857.034us | 877.401us | | sorted-merge | native | 5 | 49.398us | 48.383us | 48.283us | 53.209us | ## Scenario: unordered input | normal set population = 0.4000% of universe | medium overlap percentage = 50.0% of each set - Set population: `400000` / `100000001` values - Overlap: requested=50.0% | actual=50.0% | shared=`200000/400000` | algorithm | phase | samples | mean | median | min | max | | --- | --- | ---: | ---: | ---: | ---: | ---: | | bitset | prepare | 5 | 4.712ms | 4.803ms | 4.476ms | 4.920ms | | bitset | native | 5 | 1.759ms | 1.756ms | 1.687ms | 1.844ms | | std-hash | prepare | 5 | 10.839ms | 10.826ms | 10.524ms | 11.178ms | | std-hash | native | 5 | 15.163ms | 14.614ms | 14.440ms | 17.563ms | | splitmix-hash | prepare | 5 | 5.632ms | 5.632ms | 5.585ms | 5.679ms | | splitmix-hash | native | 5 | 5.483ms | 5.432ms | 5.233ms | 5.969ms | | sorted-merge | prepare | 5 | 10.528ms | 10.457ms | 10.445ms | 10.814ms | | sorted-merge | native | 5 | 544.184us | 534.490us | 530.891us | 581.634us | ## Scenario: unordered input | dense set population = 4.000% of universe | medium overlap percentage = 50.0% of each set - Set population: `4000000` / `100000001` values - Overlap: requested=50.0% | actual=50.0% | shared=`2000000/4000000` | algorithm | phase | samples | mean | median | min | max | | --- | --- | ---: | ---: | ---: | ---: | ---: | | bitset | prepare | 5 | 58.153ms | 57.261ms | 53.448ms | 63.278ms | | bitset | native | 5 | 1.805ms | 1.782ms | 1.652ms | 1.985ms | | std-hash | prepare | 2 | 461.857ms | 461.857ms | 444.440ms | 479.274ms | | std-hash | native | 2 | 438.653ms | 438.653ms | 435.414ms | 441.891ms | | splitmix-hash | prepare | 4 | 252.147ms | 250.242ms | 243.913ms | 264.189ms | | splitmix-hash | native | 5 | 50.829ms | 49.904ms | 49.156ms | 55.716ms | | sorted-merge | prepare | 5 | 130.853ms | 130.469ms | 129.970ms | 132.748ms | | sorted-merge | native | 5 | 6.016ms | 5.942ms | 5.877ms | 6.351ms |