SGD, for example, randomly selects a subset of data points to compute gradients, which speeds up the training process and can help the model to escape local minima.
Search
Jan 28, 2024, 1 min read
SGD, for example, randomly selects a subset of data points to compute gradients, which speeds up the training process and can help the model to escape local minima.