Your Final Interview Question: What does this picture mean? (If you know you know)

Random Number Generation

Encryption algorithms require randomness, and computers suck at generating random numbers.

Cloudflare is a DNS provider, which provides security to millions of websites. Predictable systems are vulnerable to attack, so they need a good way to achieve True Random Number Generation.

Lava lamps are inherently chaotic, so a camera takes photos at regular intervals and converts the pixels in the image into a string that can be then used as a seed to generate secure encryption keys.



What are some good reasons for needing to generate random numbers in the field of data science and machine learning?
  1. Initializing Parameters: in ML and DL, algorithms are often initialized with random weights, biases and parameters.
  2. Monte Carlo Simulation: need to generate observations on various probability distributions
  3. Random Sampling: either simulated random sampling, or generating random numbers to aid real random sampling (e.g I will sample from the th floor where is a randomly generated number.)
  4. Stochastic Optimization: Many optimization algorithms used in ML, such as Stochastic Gradient Descent (SGD), rely on randomness.
  5. Exploration in Reinforcement Learning: can be used to help the model explore random strategies/states/actions, resulting in more robust learning.
  6. Regularization: Such as ‘dropout’ when training neural networks.
  7. Simulating Random Processes: related to Monte Carlo Simulation.
  8. Ensuring Reproducibility: By setting a ‘seed’, we can debug and compare the performance of ML algorithms.


What are the different types of random number generation?

True Random Number Generation (TRNG)

This involves using truly random natural phenomenon

True Random Number Generation (TRNG)

This involves using truly random natural phenomenon



How does random.randint work behind the scenes?

The randint function, commonly found in programming languages like Python, is an implementation of a Pseudo-Random Number Generator (PRNG). Let’s take a closer look at how it typically works behind the scenes, using Python’s randint as an example:

  1. Seed Initialization:

    • PRNGs start with an initial value known as a seed. In Python, if you don’t explicitly set a seed, the system time or another source of randomness is used to generate the seed. This seed is the starting point for generating a sequence of numbers that appears random.
  2. Algorithm for Generating Numbers:

    • Once the seed is set, randint uses a specific algorithm to generate a sequence of numbers. Python’s random module, which includes randint, historically used the Mersenne Twister algorithm, a widely used and well-regarded PRNG. The Mersenne Twister produces numbers that pass many tests for statistical randomness.
    • These algorithms involve a series of mathematical operations (like multiplication, division, addition, modulo, bitwise operations) on the seed or the last generated number to produce the next number in the sequence.
  3. Generating a Specific Range:

    • randint(a, b) in Python generates a random integer N such that a <= N <= b. The underlying PRNG algorithm generates a random number, which is then scaled or transformed to fall within the specified range [a, b].
    • This transformation is done in a way that tries to maintain uniform distribution, meaning each number in the range [a, b] has an equal probability of being selected.
  4. Handling the State:

    • Each call to randint updates the state of the PRNG. This means that the output of one call to randint affects the next call. This sequence is deterministic; given the same initial seed and the same sequence of calls, randint will produce the same sequence of numbers.
  5. Considerations of Randomness:

    • It’s important to note that the numbers generated by randint are pseudo-random. They are not truly random because they are generated by a deterministic process. However, for many applications, pseudo-random numbers are sufficient.
    • For cryptographic purposes or applications where true randomness is required, a Cryptographically Secure Pseudo-Random Number Generator (CSPRNG) or a True Random Number Generator (TRNG) should be used instead.

In summary, randint works by using a deterministic algorithm (like the Mersenne Twister) to generate a sequence of numbers that appear random, starting from an initial seed. The algorithm ensures that these numbers are uniformly distributed across the specified range.



How do computers typically generate observations on different probability distributions?