
Your Final Interview Question: What does this picture mean? (If you know you know)
Random Number Generation
Encryption algorithms require randomness, and computers suck at generating random numbers.
Cloudflare is a DNS provider, which provides security to millions of websites. Predictable systems are vulnerable to attack, so they need a good way to achieve True Random Number Generation.
Lava lamps are inherently chaotic, so a camera takes photos at regular intervals and converts the pixels in the image into a string that can be then used as a seed to generate secure encryption keys.
What are some good reasons for needing to generate random numbers in the field of data science and machine learning?
- Initializing Parameters: in ML and DL, algorithms are often initialized with random weights, biases and parameters.
- Monte Carlo Simulation: need to generate observations on various probability distributions
- Random Sampling: either simulated random sampling, or generating random numbers to aid real random sampling (e.g I will sample from the th floor where is a randomly generated number.)
- Stochastic Optimization: Many optimization algorithms used in ML, such as Stochastic Gradient Descent (SGD), rely on randomness.
- Exploration in Reinforcement Learning: can be used to help the model explore random strategies/states/actions, resulting in more robust learning.
- Regularization: Such as ‘dropout’ when training neural networks.
- Simulating Random Processes: related to Monte Carlo Simulation.
- Ensuring Reproducibility: By setting a ‘seed’, we can debug and compare the performance of ML algorithms.
What are the different types of random number generation?
True Random Number Generation (TRNG)
This involves using truly random natural phenomenon
True Random Number Generation (TRNG)
This involves using truly random natural phenomenon
How does random.randint work behind the scenes?
The randint function, commonly found in programming languages like Python, is an implementation of a Pseudo-Random Number Generator (PRNG). Let’s take a closer look at how it typically works behind the scenes, using Python’s randint as an example:
-
Seed Initialization:
- PRNGs start with an initial value known as a seed. In Python, if you don’t explicitly set a seed, the system time or another source of randomness is used to generate the seed. This seed is the starting point for generating a sequence of numbers that appears random.
-
Algorithm for Generating Numbers:
- Once the seed is set,
randintuses a specific algorithm to generate a sequence of numbers. Python’srandommodule, which includesrandint, historically used the Mersenne Twister algorithm, a widely used and well-regarded PRNG. The Mersenne Twister produces numbers that pass many tests for statistical randomness. - These algorithms involve a series of mathematical operations (like multiplication, division, addition, modulo, bitwise operations) on the seed or the last generated number to produce the next number in the sequence.
- Once the seed is set,
-
Generating a Specific Range:
randint(a, b)in Python generates a random integer N such thata <= N <= b. The underlying PRNG algorithm generates a random number, which is then scaled or transformed to fall within the specified range[a, b].- This transformation is done in a way that tries to maintain uniform distribution, meaning each number in the range
[a, b]has an equal probability of being selected.
-
Handling the State:
- Each call to
randintupdates the state of the PRNG. This means that the output of one call torandintaffects the next call. This sequence is deterministic; given the same initial seed and the same sequence of calls,randintwill produce the same sequence of numbers.
- Each call to
-
Considerations of Randomness:
- It’s important to note that the numbers generated by
randintare pseudo-random. They are not truly random because they are generated by a deterministic process. However, for many applications, pseudo-random numbers are sufficient. - For cryptographic purposes or applications where true randomness is required, a Cryptographically Secure Pseudo-Random Number Generator (CSPRNG) or a True Random Number Generator (TRNG) should be used instead.
- It’s important to note that the numbers generated by
In summary, randint works by using a deterministic algorithm (like the Mersenne Twister) to generate a sequence of numbers that appear random, starting from an initial seed. The algorithm ensures that these numbers are uniformly distributed across the specified range.
How do computers typically generate observations on different probability distributions?