Reservoir Sampling

Say you have a stream of items of large and unknown length that we can only iterate over once. Create an algorithm that randomly chooses an item from this stream such that each item is equally likely to be selected.

Click through for the algorithm. I imagine this is a common problem. Recently I wanted to do something similar, but I didn't know this technique and it wasn't important that my sample was statistically significant, so I just fudged it.

Previous
Previous

Timelines at Scale

Next
Next

The New Mac Pro