Rolling Your Own GPT

Training a GPT model sounds like a moonshot but it is actually just a series of simple and well-defined steps. At its core, it is just a giant text predictor, fed with tons of data and then allowed to guess the next word. The real challenge is not training it to predict the next word, but to make it produce something useful like answers to your assignment due midnight.😅 You need some special type of training like Reinforcement Learning with Human Feedback (RLHF). Though you can get good responses without RLHF, RLHF is required to make it chatbot-like. ...

February 4, 2025 · 15 min · 3146 words · Mwaura Collins

Unraveling RoPE: Encoding Relative Positions in Transformers with Elegance

Why Positional Encoding? Unlike recurrent neural networks (RNNs), transformers process tokens in parallel meaning they do not inherently understand the order of words in a sequence. In language, the meaning of a word can heavily depend on its position, for example, “Salt has important minerals.” and “The food is so bland I had to salt it.”. Salt is used as a noun and verb depending on it position on the sentence. ...

January 24, 2025 · 6 min · 1277 words · Mwaura Collins