1. What is underflow and overflow?
  2. How to tackle the problem of underflow or overflow for softmax function or log softmax function?
  3. What is poor conditioning?
  4. What is the condition number?
  5. What are grad, div and curl?
  6. What are critical or stationary points in multi-dimensions?
  7. Why should you do gradient descent when you want to minimize a function?
  8. What is line search?
  9. What is hill climbing?
  10. What is a Jacobian matrix?
  11. What is curvature?
  12. What is a Hessian matrix?