Did you just say function? You know what that means: we can use a neural network. - IndividualKex

A true understadning of the fundamentals of gradient descent, back propogation, and neural networks, will give you the capacity to easily understand everything else - all is required is a few more conceptual ‘add-on’s.

READ THIS THERE IS GENUINELY NO BETTER RESOURCE ON THE INTERNET FOR LEARNING NEURAL NETWORKS. IF THIS DOESN’T TEACH YOU NEURAL NETWORKS THEN NOTHING WILL. [WATCH THESE](AND 3B1B)


Rectified

The term “rectified” is used in signal processing and electronics to describe the conversion of alternating current (AC) to direct current (DC), typically by allowing only one polarity of current to pass through. Similarly, a ReLU function clips any negative input to 0 while for any non-negative input, output is equal to the input.

Here are some curveball questions you could get asked, which probe just how deeply you understand the neural network. Interviewers won’t be impressed if you know

Why is it called a rectified linear unit

Till you get itl.

Also known as multi layered perceptons. Why?


Structure the neural network chapter like 3B1B

Back Propogation

Andrej’s video, 5 minutes in

MLP. Become

Neural networks are so simple. Only 100 lines of code to implement. 

They are ultimately just functions. 

Symbolic (symbol) representation. Closed form expression

Name some hyperparameters in a neural network.
  • Learning rate
  • Number of epochs
  • Batch size
  • Number of layers
  • Number of neurons in each layer
  • Type of activation function
  • Dropout rate
In later chapters we’ll find better ways of initializing the weights and biases
The reason, of course, is the ability of deep nets to build up a complex hierarchy of concepts. It’s a bit like the way conventional programming languages use modular design and ideas about abstraction to enable the creation of complex computer programs. Comparing a deep network to a shallow network is a bit like comparing a programming language with the ability to make function calls to a stripped down language with no ability to make such calls. Abstraction takes a different form in neural networks than it does in conventional programming, but it’s just as important.
Implement a softmax function which takes in an np.array using Python

The softmax function is a mathematical function that converts a vector of numbers into a vector of probabilities, where the probabilities of each value are proportional to the relative scale of each value in the vector. The softmax function is often used in the final layer of a neural network-based classifier. The standard formula for the softmax function for a vector ( \mathbf{z} = [z_1, z_2, \ldots, z_K] ) is:

[ \text{softmax}(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}} ]

Where:

  • ( e^{z_i} ) is the exponential function applied to the ( i )-th element of the input vector.
  • ( K ) is the total number of elements in the input vector.
  • The denominator is the sum of exponential functions applied to all elements in the input vector. This ensures that the sum of the output probabilities is 1.

The implementation in NumPy, as shown in your script:

def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=1, keepdims=True)

This implementation does a couple of things:

  1. np.exp(x - np.max(x)): It subtracts the max value from x before applying the exponential function. This is a common trick to improve numerical stability (avoiding very large numbers that can result in computational issues).
  2. e_x.sum(axis=1, keepdims=True): It sums the exponentials along the specified axis (axis=1, summing across rows). The keepdims=True argument keeps the same dimensionality, which is important for correct broadcasting when dividing by this sum.

The result is that each element in the input vector is transformed into a value between 0 and 1, and the entire output vector sums to 1, making it a probability distribution.

Universality Theorem

Michael Nielsen

Universality theorems are a commonplace in computer science, so much so that we sometimes forget how astonishing they are. But it’s worth reminding ourselves: the ability to compute an arbitrary function is truly remarkable. Almost any process you can imagine can be thought of as function computation. Consider the problem of naming a piece of music based on a short sample of the piece. That can be thought of as computing a function. Or consider the problem of translating a Chinese text into English. Again, that can be thought of as computing a functionActually, computing one of many functions, since there are often many acceptable translations of a given piece of text.. Or consider the problem of taking an mp4 movie file and generating a description of the plot of the movie, and a discussion of the quality of the acting. Again, that can be thought of as a kind of function computationDitto the remark about translation and there being many possible functions.. Universality means that, in principle, neural networks can do all these things and many more.

That’s a pity, since the underlying reasons for universality are simple and beautiful.


Why use deep neural networks when the universal approximation theory tells us that you can approximate any function using only one hidden layer?
Why use any other model when neural networks learning exists?

Interpretability

Many traditional models like decision trees or linear regression are more interpretable than neural networks. In industries where understanding and explaining the decision-making process is crucial, such as healthcare or finance, interpretability can be a significant advantage.

Data Requirements

Neural networks generally require a large amount of data to perform well. In cases where only limited data is available, simpler models like SVMs or logistic regression might be more effective and less prone to overfitting.

Computational Resources

Training neural networks, especially deep learning models, often requires substantial computational resources. This can be a limiting factor in environments with restricted computational power or where rapid model training is needed.

Problem Complexity

For simpler problems, complex models like neural networks might not only be unnecessary but can also lead to overfitting. Simpler models can perform equally well or better in these cases. For example if there is a true linear relationship between two variables then holy sh*t, just use linear regression.

Model Robustness

Neural networks can be sensitive to small changes in input data or adversarial attacks. In cases where robustness is a key concern, other models might prove more resilient.

Training Time

Neural networks often take a longer time to train, which can be a bottleneck in rapid development cycles or when frequent model updates are necessary.

  1. Ease of Implementation: In some scenarios, the complexity of implementing and tuning a neural network might not justify the potential improvement in performance, especially for straightforward tasks where simpler models are sufficient and easier to deploy.

Some more creative answers:

  1. Quantifying uncertainty
  2. Algorithmic transparency and fairness
  3. Specific statistical properptius - ARIMA, Linear Regression
  4. Deployment context
    1. Online Learning and Adaptation
    2. Resource constrained deployment
  5. Cold start - simpler models or heuristic approaches
  6. Energy efficiency.