Unraveling RoPE: Encoding Relative Positions in Transformers with Elegance

Why Positional Encoding? Unlike recurrent neural networks (RNNs), transformers process tokens in parallel meaning they do not inherently understand the order of words in a sequence. In language, the meaning of a word can heavily depend on its position, for example, “Salt has important minerals.” and “The food is so bland I had to salt it.”. Salt is used as a noun and verb depending on it position on the sentence. ...

January 24, 2025 · 6 min · 1277 words · Mwaura Collins