Watch Algorithm R (Vitter, 1985) and Algorithm L pick k uniformly random items from a stream of unknown length. Each item ends up in the reservoir with probability exactly k/n โ verify it with a Monte Carlo run.
After a Monte Carlo run, every item j โ [1, n] should appear in the reservoir k/n of the time. The bars below show empirical inclusion frequency per stream position. A flat line at k/n means the algorithm is unbiased.
Algorithm R โ fill the reservoir with the first k items. For each subsequent item i (1-indexed, i > k), pick an integer j โ [1, i] uniformly. If j โค k, replace slot j. This gives each item probability k/n of being in the final sample.
Algorithm L โ instead of testing every item, draw a geometric "skip" count floor(log(random()) / log(1-W)) where W starts at 1 and gets multiplied by exp(log(random())/k) after each replacement. Same uniformity guarantee, far fewer random draws on huge streams.
P[item i in sample] = k/n for all i โ [1, n]