(1.9) Transformer Models

ATTENTION:

https://www.youtube.com/watch?v=UPtG_38Oq8o&t=1246s

https://www.youtube.com/watch?v=PSs6nxngL6k&t=208s&pp=ygUUc3RhdCBxdWVzdCBhdHRlbnRpb24%3D

https://www.youtube.com/watch?v=kCc8FmEb1nY&t=175s&pp=ygUQYW5kcmVqIGF0dGVudGlvbg%3D%3D

https://www.youtube.com/watch?v=XfpMkf4rD6E&pp=ygUQYW5kcmVqIGF0dGVudGlvbg%3D%3D

https://youtu.be/9uw3F6rndnA?si=-1NJZUVYJnzrLwQc https://youtu.be/qaWMOYf4ri8?si=TXk8Vs3usT_wQev9

https://youtu.be/4Bdc55j80l8?si=QaW-zni1glNFWkGn

Andrej’s one

https://www.youtube.com/watch?v=qaWMOYf4ri8&t=0s https://www.youtube.com/watch?v=OxCpWwDCDFQ&t=0s

Transformers do not make the same Markovian assumptions as bigram models. Instead, they can consider long-range dependencies and relationships between words in a sentence, regardless of their distance from each other. Transformers consist of encoder and decoder blocks that can process an input sequence and generate an output sequence.

⚘ DSX.com

Explorer

(1.9) Transformer Models

Graph View

Backlinks