Intersection Benchmark
This project benchmarks several set-intersection strategies in Rust over the same generated input scenarios.
The benchmark output is split into two timed phases:
-
prepare
- Measures the conversion from the benchmark's raw input format, a normal array of numbers, into the algorithm's prepared internal representation.
- This phase does not measure the later intersection itself.
- By default, raw input generation is not included in this time.
- If
TIME_PREPARE_INCLUDE_INPUT_GENERATION=true in .env, raw input generation is included too.
-
native
- Measures only the intersection step on already prepared inputs.
- The result is written into the algorithm's native output representation.
- This phase does not measure preparation from the raw input arrays.
- This phase does not measure converting the native result into a plain array of numbers.
- By default, output clearing and result counting are not included in this time.
- If enabled in
.env, TIME_INTERSECTION_INCLUDE_OUTPUT_CLEAR and TIME_INTERSECTION_INCLUDE_RESULT_COUNT move those extra steps into the timed window.
Important notes about the timing model:
- Output storage is created before timed
native samples begin and then reused across samples.
- Warmup runs are performed before measured samples and are not included in the reported timings.
- Printing, formatting, statistics aggregation, and scenario planning are not part of the reported algorithm timings.
In short:
prepare answers: how long does it take to build the algorithm's working representation?
native answers: how long does it take to compute the intersection into the algorithm's own output format?
Intersection benchmark suite
- Scenarios: 8
- Universe:
0..=100000000 (100000001 values)
- Set populations: sparse=
4000 (0.0040%) | semi-sparse=40000 (0.0400%) | normal=400000 (0.4000%) | dense=4000000 (4.000%)
- Overlap targets: low=10.0% | medium=50.0% | high=80.0%
- Enabled densities: sparse=
true | semi-sparse=true | normal=true | dense=true
- Enabled overlaps: low=
false | medium=true | high=false
- Sampling: min=
2 | max=5 | target total=800ms
- Enabled algorithms: bitset=
true | bitset-simd=false | std-hash=true | splitmix-hash=true | sorted-merge=true
- Enabled phases: prepare=
true | intersection=true
- Timed extras: prepare input generation=
false | intersection output clear=false | intersection result count=false
- Phases per algorithm: prepare=
true | native output intersection=true
Scenario: ordered input | sparse set population = 0.0040% of universe | medium overlap percentage = 50.0% of each set
- Set population:
4000 / 100000001 values
- Overlap: requested=50.0% | actual=50.0% | shared=
2000/4000
| algorithm |
phase |
samples |
mean |
median |
min |
max |
| bitset |
prepare |
5 |
411.335us |
401.487us |
393.202us |
444.185us |
| bitset |
native |
5 |
950.881us |
940.646us |
840.562us |
1.065ms |
| std-hash |
prepare |
5 |
80.875us |
80.946us |
79.932us |
81.558us |
| std-hash |
native |
5 |
76.875us |
75.985us |
74.445us |
81.305us |
| splitmix-hash |
prepare |
5 |
36.794us |
36.825us |
36.478us |
37.109us |
| splitmix-hash |
native |
5 |
35.065us |
33.227us |
30.197us |
43.628us |
| sorted-merge |
prepare |
5 |
350ns |
303ns |
295ns |
548ns |
| sorted-merge |
native |
5 |
3.568us |
3.541us |
3.533us |
3.646us |
Scenario: ordered input | semi-sparse set population = 0.0400% of universe | medium overlap percentage = 50.0% of each set
- Set population:
40000 / 100000001 values
- Overlap: requested=50.0% | actual=50.0% | shared=
20000/40000
| algorithm |
phase |
samples |
mean |
median |
min |
max |
| bitset |
prepare |
5 |
3.955ms |
3.951ms |
3.927ms |
4.014ms |
| bitset |
native |
5 |
1.428ms |
1.449ms |
1.356ms |
1.507ms |
| std-hash |
prepare |
5 |
864.919us |
860.986us |
854.337us |
880.726us |
| std-hash |
native |
5 |
905.199us |
902.901us |
898.793us |
913.503us |
| splitmix-hash |
prepare |
5 |
413.275us |
410.172us |
408.810us |
423.680us |
| splitmix-hash |
native |
5 |
469.869us |
467.963us |
465.287us |
477.299us |
| sorted-merge |
prepare |
5 |
6.172us |
6.124us |
5.921us |
6.359us |
| sorted-merge |
native |
5 |
36.202us |
36.045us |
36.023us |
36.815us |
Scenario: ordered input | normal set population = 0.4000% of universe | medium overlap percentage = 50.0% of each set
- Set population:
400000 / 100000001 values
- Overlap: requested=50.0% | actual=50.0% | shared=
200000/400000
| algorithm |
phase |
samples |
mean |
median |
min |
max |
| bitset |
prepare |
5 |
7.272ms |
7.200ms |
6.820ms |
7.893ms |
| bitset |
native |
5 |
1.830ms |
1.821ms |
1.815ms |
1.868ms |
| std-hash |
prepare |
5 |
11.484ms |
11.262ms |
10.767ms |
12.441ms |
| std-hash |
native |
5 |
15.501ms |
15.262ms |
14.570ms |
17.524ms |
| splitmix-hash |
prepare |
5 |
5.993ms |
6.124ms |
5.715ms |
6.193ms |
| splitmix-hash |
native |
5 |
5.363ms |
5.381ms |
5.333ms |
5.383ms |
| sorted-merge |
prepare |
5 |
309.850us |
265.482us |
248.158us |
507.857us |
| sorted-merge |
native |
5 |
567.072us |
543.143us |
527.883us |
649.634us |
Scenario: ordered input | dense set population = 4.000% of universe | medium overlap percentage = 50.0% of each set
- Set population:
4000000 / 100000001 values
- Overlap: requested=50.0% | actual=50.0% | shared=
2000000/4000000
| algorithm |
phase |
samples |
mean |
median |
min |
max |
| bitset |
prepare |
5 |
14.176ms |
13.606ms |
12.950ms |
17.112ms |
| bitset |
native |
5 |
1.765ms |
1.731ms |
1.674ms |
1.860ms |
| std-hash |
prepare |
2 |
441.875ms |
441.875ms |
436.380ms |
447.370ms |
| std-hash |
native |
2 |
445.366ms |
445.366ms |
442.813ms |
447.919ms |
| splitmix-hash |
prepare |
4 |
243.982ms |
239.375ms |
237.612ms |
259.564ms |
| splitmix-hash |
native |
5 |
51.967ms |
49.131ms |
48.321ms |
56.781ms |
| sorted-merge |
prepare |
5 |
11.852ms |
11.638ms |
11.416ms |
12.418ms |
| sorted-merge |
native |
5 |
5.537ms |
5.530ms |
5.517ms |
5.582ms |
Scenario: unordered input | sparse set population = 0.0040% of universe | medium overlap percentage = 50.0% of each set
- Set population:
4000 / 100000001 values
- Overlap: requested=50.0% | actual=50.0% | shared=
2000/4000
| algorithm |
phase |
samples |
mean |
median |
min |
max |
| bitset |
prepare |
5 |
2.264ms |
882.303us |
845.089us |
7.840ms |
| bitset |
native |
5 |
1.776ms |
1.778ms |
1.695ms |
1.861ms |
| std-hash |
prepare |
5 |
83.110us |
80.293us |
79.781us |
93.585us |
| std-hash |
native |
5 |
77.430us |
77.762us |
74.848us |
79.988us |
| splitmix-hash |
prepare |
5 |
36.017us |
35.957us |
35.943us |
36.129us |
| splitmix-hash |
native |
5 |
33.200us |
31.326us |
28.101us |
42.169us |
| sorted-merge |
prepare |
5 |
61.613us |
60.791us |
55.215us |
69.401us |
| sorted-merge |
native |
5 |
3.617us |
3.533us |
3.528us |
3.943us |
Scenario: unordered input | semi-sparse set population = 0.0400% of universe | medium overlap percentage = 50.0% of each set
- Set population:
40000 / 100000001 values
- Overlap: requested=50.0% | actual=50.0% | shared=
20000/40000
| algorithm |
phase |
samples |
mean |
median |
min |
max |
| bitset |
prepare |
5 |
2.221ms |
1.596ms |
1.463ms |
3.909ms |
| bitset |
native |
5 |
1.770ms |
1.761ms |
1.715ms |
1.829ms |
| std-hash |
prepare |
5 |
882.778us |
869.598us |
865.722us |
910.316us |
| std-hash |
native |
5 |
917.268us |
915.333us |
900.514us |
935.037us |
| splitmix-hash |
prepare |
5 |
417.845us |
420.083us |
411.847us |
422.302us |
| splitmix-hash |
native |
5 |
475.443us |
473.060us |
466.552us |
486.611us |
| sorted-merge |
prepare |
5 |
866.901us |
867.193us |
857.034us |
877.401us |
| sorted-merge |
native |
5 |
49.398us |
48.383us |
48.283us |
53.209us |
Scenario: unordered input | normal set population = 0.4000% of universe | medium overlap percentage = 50.0% of each set
- Set population:
400000 / 100000001 values
- Overlap: requested=50.0% | actual=50.0% | shared=
200000/400000
| algorithm |
phase |
samples |
mean |
median |
min |
max |
| bitset |
prepare |
5 |
4.712ms |
4.803ms |
4.476ms |
4.920ms |
| bitset |
native |
5 |
1.759ms |
1.756ms |
1.687ms |
1.844ms |
| std-hash |
prepare |
5 |
10.839ms |
10.826ms |
10.524ms |
11.178ms |
| std-hash |
native |
5 |
15.163ms |
14.614ms |
14.440ms |
17.563ms |
| splitmix-hash |
prepare |
5 |
5.632ms |
5.632ms |
5.585ms |
5.679ms |
| splitmix-hash |
native |
5 |
5.483ms |
5.432ms |
5.233ms |
5.969ms |
| sorted-merge |
prepare |
5 |
10.528ms |
10.457ms |
10.445ms |
10.814ms |
| sorted-merge |
native |
5 |
544.184us |
534.490us |
530.891us |
581.634us |
Scenario: unordered input | dense set population = 4.000% of universe | medium overlap percentage = 50.0% of each set
- Set population:
4000000 / 100000001 values
- Overlap: requested=50.0% | actual=50.0% | shared=
2000000/4000000
| algorithm |
phase |
samples |
mean |
median |
min |
max |
| bitset |
prepare |
5 |
58.153ms |
57.261ms |
53.448ms |
63.278ms |
| bitset |
native |
5 |
1.805ms |
1.782ms |
1.652ms |
1.985ms |
| std-hash |
prepare |
2 |
461.857ms |
461.857ms |
444.440ms |
479.274ms |
| std-hash |
native |
2 |
438.653ms |
438.653ms |
435.414ms |
441.891ms |
| splitmix-hash |
prepare |
4 |
252.147ms |
250.242ms |
243.913ms |
264.189ms |
| splitmix-hash |
native |
5 |
50.829ms |
49.904ms |
49.156ms |
55.716ms |
| sorted-merge |
prepare |
5 |
130.853ms |
130.469ms |
129.970ms |
132.748ms |
| sorted-merge |
native |
5 |
6.016ms |
5.942ms |
5.877ms |
6.351ms |