Here, we discuss sampling with and without replacement in R, including balanced and weighted sampling with probabilities.

Functions for Sampling in R
Function Usage
sample() Sample elements from a set of items
sample.int() Sample integers

Both functions come with the "base" package in R, hence, no installation is needed.

See probability distributions for sampling from distributions.

1 Sampling Numbers without Replacement

To sample numbers, use the sample() or sample.int() function.

Sample 4 numbers from 1 to 10.

sample(1:10, 4)
[1]  3 10  2  8

Or:

sample.int(10, 4)
[1]  3 10  2  8

2 Sampling Numbers with Replacement

Sample 12 numbers from 1 to 5 with replacement. This way you are also able to sample a size greater than the number range you are sampling from.

sample(1:5, 12, replace = TRUE)
 [1] 3 3 2 2 3 5 4 1 2 3 5 3

Or:

sample.int(5, 12, replace = TRUE)
 [1] 3 3 2 2 3 5 4 1 2 3 5 3

3 Sampling a Set without Replacement

To sample elements from a set of items without replacement, use the sample() function.

Sample 3 items from the set {"A", "B", "C", "D", "E", "F"}:

set = c("A", "B", "C", "D", "E", "F")
sample(set, 3)
[1] "C" "F" "B"

Sample 3 numbers from the numbers set {2, 4, 6, 8, 10}:

nums = c(2, 4, 6, 7, 10)
sample(nums, 3)
[1]  6  4 10

4 Sampling a Set with Replacement

To sample with replacement, set the "replace" argument to "TRUE".

Sample 8 items from the set {"A", "E", "I", "O", "U"} with replacement:

set = c("A", "E", "I", "O", "U")
sample(set, 8, replace = TRUE)
[1] "O" "E" "U" "O" "A" "U" "O" "E"

Sample 8 items from the numbers set {4, 5, 6, 7, 8} with replacement:

sample(4:8, 8, replace = TRUE)
[1] 7 5 8 7 4 8 7 5

5 Weighted Sampling with Probabilities

To have a weighted sample based on probabilities, specify the set of probabilities of the same size as the set you are sampling from.

The probability assignments do not need to sum up to 1. For example, you could use c(7, 2, 1, 0) to sample from a set of size 4. The numbers will be used as weights.

It is also important to set the "replace" argument to "TRUE". This way, the full set of probabilities of selection are used every time a new item or number is sampled, and you can sample a size larger than the set you are sampling from.

Sample 15 items from the set {"A", "B", "C"} with different weights or probabilities:

set = c("A", "B", "c")
sample(set, 15, replace = TRUE, prob = c(0.7, 0.2, 0.1))
 [1] "A" "B" "A" "B" "c" "A" "A" "B" "A" "A" "c" "A" "A" "A" "A"

Sample 20 numbers from the numbers set {2, 4, 6, 8} with different weights or probabilities:

set = c(2, 4, 6, 8)
sample(set, 20, replace = TRUE, prob = c(7, 2, 1, 0))
 [1] 2 4 2 4 6 2 2 4 2 2 6 2 2 2 2 4 2 2 2 6

Because the weight for "8" is "0", it will not be sampled. Also, the number "2" will likely be sampled the most as it has the largest weight.

Copyright © 2020 - 2024. All Rights Reserved by Stats Codes