Jes2ica.IO

coding, trading, reading

  1. 1. Problem
  2. 2. Idea
  3. 3. Example
    1. 3.1. Sample size 1
  4. 4. Applications
  5. 5. More info:

Reservoir sampling is a family of randomized algorithms for randomly choosing a sample of k items from a list S containing n items, where n is either a very large or unknown number. Typically n is large enough that the list doesn’t fit into main memory.

Problem

Choose k entries from n numbers. Make sure each number is selected with the probability of k/n

Idea

  • Choose k items and put them into the reservoir.
  • For (k+1)th item, pick it with probability k/(k+1), and randomly replace a number in the reservoir.
  • For (k+i)th item, pick it with probability k/(k+i), and randomly replace a number in the reservoir.
  • Repeat until k+i reaches n.

Example

Sample size 1

  • Choose 1 from [1, 2, 3, 4] => initial reservoir: [1]

  • For 2:

    • with probability 1/2, keep it (discard the old one aka 1)
    • with probability 1 - 1/2 = 1/2, keep the old item (ignore the new one)
    • For 1, the probability that it keeps staying in the reservoir is:
    • P(1 was in the reservoir last time) * P(1 is not replaced by 2)
      • = 1 * (1 - 1/2 * 1)
      • = 1/2
  • For 3:

    • For 1, the probability that it keeps staying in the reservoir is:
    • P(1 was in the reservoir last time) * P(1 is not replaced by 3)
    • P(1 was in the reservoir last time) * (1 - P(3 is selected and replaces 1))
    • = 1/2 * (1 - 1/3 * 1)
    • = 1/2 * 2/3
    • = 1/3
  • Same for 4

Applications

More info:

This article was last updated on days ago, and the information described in the article may have changed.