<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/rss-styles.xsl" type="text/xsl"?>
<rss version="2.0">
  <channel>
    <title>oreoro</title>
    <description>This is my personal website, where I keep the code notes, implementation details, and technical ideas I work through outside normal project work.</description>
    <link>https://oreoro.github.io/</link>
    <lastBuildDate>Sun, 21 Jun 2026 03:56:51 GMT</lastBuildDate>
    <item>
      <title>Data Compression Explained: A Visual Guide to the Whole Book</title>
      <link>https://oreoro.github.io/posts/data-compression-explained-visual-guide/</link>
      <guid isPermaLink="true">https://oreoro.github.io/posts/data-compression-explained-visual-guide/</guid>
      <description>An illustrated whole-book study guide to Matt Mahoney&apos;s Data Compression Explained, covering information theory, benchmarks, coding, modeling, transforms, and lossy media compression.</description>
      <pubDate>Sat, 20 Jun 2026 00:00:00 GMT</pubDate>
      <lastUpdatedTimestamp>Sat Jun 20 2026 08:45:00 GMT+0000 (Coordinated Universal Time)</lastUpdatedTimestamp>
      <category>Personal Notes</category>
      <category>Information</category>
      <category>Guide</category>
      <category>Webtrotion</category>
      <content>&lt;div&gt;
                    &lt;p&gt;
                        &lt;em&gt;Note:&lt;/em&gt; This RSS feed strips out SVGs and embeds. You might want to read the post on the webpage
                        &lt;a href=&quot;https://oreoro.github.io/posts/data-compression-explained-visual-guide/&quot; target=&quot;_blank&quot;&gt;here&lt;/a&gt;.
                    &lt;/p&gt;
                    &lt;hr&gt;
                &lt;div&gt;&lt;p&gt;&lt;time&gt; June 20, 2026 &lt;/time&gt;&lt;/p&gt;&lt;/div&gt;&lt;hr&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Property &lt;/td&gt;&lt;td&gt; Value &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Source &lt;/td&gt;&lt;td&gt;&lt;a href=&quot;https://mattmahoney.net/dc/dce.html&quot; target=&quot;_blank&quot;&gt;Data Compression Explained&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Original author &lt;/td&gt;&lt;td&gt; Matt Mahoney &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Source last update &lt;/td&gt;&lt;td&gt; Apr. 15, 2013 &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Draft type &lt;/td&gt;&lt;td&gt; Visual Notion blog post / study guide &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Audience &lt;/td&gt;&lt;td&gt; Developers, technical writers, ML engineers, compression-curious readers &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Core idea &lt;/td&gt;&lt;td&gt; Compression is prediction plus coding, with transforms and perception models doing the heavy lifting. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;div&gt; Source note: This post is an original visual study guide based on Matt Mahoney&amp;apos;s book. It paraphrases and organizes the ideas for blog reading. It is not a redistributed copy of the book. Historical benchmark numbers and tool rankings should be read in the context of the source&amp;apos;s 2013 update.  &lt;/div&gt;&lt;/blockquote&gt;&lt;h3&gt;The Whole Book in One Picture&lt;/h3&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;flowchart LR
    A[Raw data] --&amp;gt; B{Can we expose structure?}
    B --&amp;gt;|Yes| C[Transform]
    B --&amp;gt;|No| D[Model]
    C --&amp;gt; D[Model predicts what comes next]
    D --&amp;gt; E[Coder maps probability to bits]
    E --&amp;gt; F[Archive or stream]
    F --&amp;gt; G[Decoder]
    G --&amp;gt; H[Inverse transform]
    H --&amp;gt; I[Original data or acceptable approximation]

    J[Benchmarks] -. measure .-&amp;gt; C
    J -. measure .-&amp;gt; D
    J -. measure .-&amp;gt; E
    K[Human perception] -. lossy path .-&amp;gt; B&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Compression looks like file shrinkage, but the book frames it as a deeper engineering problem:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Layer &lt;/td&gt;&lt;td&gt; Question &lt;/td&gt;&lt;td&gt; Main chapters &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Theory &lt;/td&gt;&lt;td&gt; What is compressible at all? &lt;/td&gt;&lt;td&gt; 1 &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Measurement &lt;/td&gt;&lt;td&gt; How do we compare compressors fairly? &lt;/td&gt;&lt;td&gt; 2 &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Coding &lt;/td&gt;&lt;td&gt; Given probabilities, how many bits are needed? &lt;/td&gt;&lt;td&gt; 3 &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Modeling &lt;/td&gt;&lt;td&gt; Where do good probabilities come from? &lt;/td&gt;&lt;td&gt; 4 &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Transforms &lt;/td&gt;&lt;td&gt; How do we rearrange data so simple models work? &lt;/td&gt;&lt;td&gt; 5 &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Lossy compression &lt;/td&gt;&lt;td&gt; What can we throw away without humans noticing? &lt;/td&gt;&lt;td&gt; 6 &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;p&gt;The shortest honest summary:&lt;/p&gt;&lt;blockquote&gt;&lt;div&gt; Compression is the search for shorter descriptions. Coding is mostly solved. Modeling is the hard part. Transforms make modeling easier. Lossy compression adds a model of human perception.  &lt;/div&gt;&lt;/blockquote&gt;&lt;h3&gt;Fast Mental Models&lt;/h3&gt;&lt;h4&gt;1. Bits Measure Surprise&lt;/h4&gt;&lt;p&gt;If an event has probability &lt;code&gt;p&lt;/code&gt;, the ideal code length is:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;ideal bits = log2(1 / p) = -log2(p)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Probability &lt;/td&gt;&lt;td&gt; Surprise &lt;/td&gt;&lt;td&gt; Intuition &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;code&gt;1/2&lt;/code&gt;&lt;/td&gt;&lt;td&gt; 1 bit &lt;/td&gt;&lt;td&gt; A fair yes/no question &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;code&gt;1/4&lt;/code&gt;&lt;/td&gt;&lt;td&gt; 2 bits &lt;/td&gt;&lt;td&gt; One outcome among four equal choices &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;code&gt;1/256&lt;/code&gt;&lt;/td&gt;&lt;td&gt; 8 bits &lt;/td&gt;&lt;td&gt; One byte under a uniform byte model &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Near 1 &lt;/td&gt;&lt;td&gt; Near 0 bits &lt;/td&gt;&lt;td&gt; Almost expected &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Near 0 &lt;/td&gt;&lt;td&gt; Many bits &lt;/td&gt;&lt;td&gt; Very surprising &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;p&gt;Visual rule:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;common symbol     -&amp;gt; short coderare symbol       -&amp;gt; long codeunknown pattern   -&amp;gt; expensive codeunderstood pattern -&amp;gt; tiny description&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;2. Compression Is Prediction&lt;/h4&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;flowchart TD
    H[History already decoded] --&amp;gt; M[Model]
    M --&amp;gt; P[Probability for next bit or symbol]
    P --&amp;gt; C[Coder]
    C --&amp;gt; O[Compressed output]
    O --&amp;gt; D[Decoder repeats same model]
    D --&amp;gt; H2[Recovered next bit or symbol]&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;The compressor and decompressor must make the same predictions from the same history. The compressed file mainly stores the information that the model could not predict.&lt;/p&gt;&lt;h4&gt;3. Lossy Compression Is Perception-Aware Prediction&lt;/h4&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;Lossless: recover exactly the same bits.Lossy: recover something humans judge close enough.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;That one change moves the problem from pure information theory into psychology, vision, hearing, language, and AI.&lt;/p&gt;&lt;h3&gt;Chapter Map&lt;/h3&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Chapter &lt;/td&gt;&lt;td&gt; Visual handle &lt;/td&gt;&lt;td&gt; What it teaches &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; 1. Information Theory &lt;/td&gt;&lt;td&gt; Limits map &lt;/td&gt;&lt;td&gt; Why random data cannot be compressed and why modeling matters more than coding. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; 2. Benchmarks &lt;/td&gt;&lt;td&gt; Tradeoff dashboard &lt;/td&gt;&lt;td&gt; How size, speed, memory, data set choice, and rules change compressor rankings. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; 3. Coding &lt;/td&gt;&lt;td&gt; Probability-to-bits machine &lt;/td&gt;&lt;td&gt; Huffman, arithmetic coding, asymmetric coding, numeric codes, archives, checksums, encryption. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; 4. Modeling &lt;/td&gt;&lt;td&gt; Prediction engine &lt;/td&gt;&lt;td&gt; Fixed-order models, variable-order models, context mixing, PAQ, ZPAQ, and why modeling is hard. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; 5. Transforms &lt;/td&gt;&lt;td&gt; Pattern-exposure tools &lt;/td&gt;&lt;td&gt; RLE, LZ77, LZW, BWT, filters, executable transforms, precompression. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; 6. Lossy Compression &lt;/td&gt;&lt;td&gt; Human sensor model &lt;/td&gt;&lt;td&gt; Images, video, audio, JPEG, MPEG, psychoacoustics, and recompression. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;hr&gt;&lt;h2&gt;1. Information Theory&lt;/h2&gt;&lt;h3&gt;Compression Starts with a Count&lt;/h3&gt;&lt;p&gt;There are &lt;code&gt;2^n&lt;/code&gt; different binary strings of length &lt;code&gt;n&lt;/code&gt;. There are fewer than &lt;code&gt;2^n&lt;/code&gt; shorter binary strings. Therefore, no lossless compressor can make every &lt;code&gt;n&lt;/code&gt;-bit input shorter while still allowing perfect decompression.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;All n-bit inputs:[000...000] [000...001] [000...010] ... [111...111]       count = 2^nPossible shorter outputs:[] [0] [1] [00] [01] ... [length &amp;lt; n]       count = 2^n - 1One-to-one decoding cannot map more inputs into fewer outputs.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;The key result:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Claim &lt;/td&gt;&lt;td&gt; Meaning &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; No universal compression &lt;/td&gt;&lt;td&gt; A compressor that shrinks every file cannot exist. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Some files must expand &lt;/td&gt;&lt;td&gt; If a compressor shrinks some inputs, it must make other inputs longer or refuse them. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Random-looking data is usually incompressible &lt;/td&gt;&lt;td&gt; Most possible strings have no shorter description. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Useful data is often compressible &lt;/td&gt;&lt;td&gt; Human-created data usually has patterns, constraints, formats, repetition, and meaning. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Why Meaningful Data Compresses&lt;/h3&gt;&lt;p&gt;Most possible strings are random. Most strings people store are not:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Data &lt;/td&gt;&lt;td&gt; Why it has structure &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; English text &lt;/td&gt;&lt;td&gt; Grammar, vocabulary, topic, repeated words, spelling patterns &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Source code &lt;/td&gt;&lt;td&gt; Keywords, syntax, indentation, identifiers, libraries &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Images &lt;/td&gt;&lt;td&gt; Neighboring pixels are correlated &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Audio &lt;/td&gt;&lt;td&gt; Samples are correlated over time and filtered by human hearing &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Executables &lt;/td&gt;&lt;td&gt; Instructions, addresses, headers, imported symbols &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Backups &lt;/td&gt;&lt;td&gt; Files repeat across versions and machines &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;p&gt;Compression works because our data is not drawn uniformly from all possible bit strings. It comes from processes with structure.&lt;/p&gt;&lt;h3&gt;Coding Is Bounded&lt;/h3&gt;&lt;p&gt;If a model says a symbol has probability &lt;code&gt;p&lt;/code&gt;, the best possible code length is approximately &lt;code&gt;-log2(p)&lt;/code&gt; bits. You can choose a bad coder and waste bits, but no coder can beat the model&amp;apos;s information content for all data drawn from that model.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;flowchart LR
    A[Model says symbol is likely] --&amp;gt; B[Short code]
    C[Model says symbol is rare] --&amp;gt; D[Long code]
    E[Model is wrong] --&amp;gt; F[Compressed size penalty]&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;The lesson is subtle:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Part &lt;/td&gt;&lt;td&gt; Status &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Turning probabilities into bits &lt;/td&gt;&lt;td&gt; Efficient, well-understood &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Finding the right probabilities &lt;/td&gt;&lt;td&gt; Hard, open-ended, data-dependent &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Modeling Is Not Computable&lt;/h3&gt;&lt;p&gt;A better model can turn a long string into a tiny description. For example, a million digits of &lt;code&gt;pi&lt;/code&gt; can be treated as random-looking decimal digits, or as &amp;quot;compute the first million digits of pi.&amp;quot; The second description is dramatically shorter, but it requires recognizing the source.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;weak model:314159265358979323846...-&amp;gt; &amp;quot;digits look independent&amp;quot;-&amp;gt; many bitsstrong model:314159265358979323846...-&amp;gt; &amp;quot;this is pi&amp;quot;-&amp;gt; short program or description&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;The book connects this to Kolmogorov complexity: the shortest program that outputs a string is an ideal compressed representation, but there is no general algorithm that can always find it.&lt;/p&gt;&lt;h3&gt;Compression and AI&lt;/h3&gt;&lt;p&gt;Prediction is a sign of understanding:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; If a system understands... &lt;/td&gt;&lt;td&gt; It can predict... &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; English &lt;/td&gt;&lt;td&gt; likely next words &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Images &lt;/td&gt;&lt;td&gt; likely neighboring pixels &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Audio &lt;/td&gt;&lt;td&gt; likely future samples &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Code &lt;/td&gt;&lt;td&gt; likely syntax and instruction patterns &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; File formats &lt;/td&gt;&lt;td&gt; likely headers, fields, and constraints &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;p&gt;This is why compression and AI meet. A compressor that understands a data source can describe it more compactly. A perfect general compressor would need a very broad kind of understanding.&lt;/p&gt;&lt;h3&gt;Chapter 1 Takeaways&lt;/h3&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Takeaway &lt;/td&gt;&lt;td&gt; Why it matters &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Random data cannot be compressed &lt;/td&gt;&lt;td&gt; Do not expect magic from encrypted, already-compressed, or random data. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Compression is model plus coder &lt;/td&gt;&lt;td&gt; Separate probability estimation from bit representation. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Coding has mathematical limits &lt;/td&gt;&lt;td&gt; Better coding helps, but only up to the model&amp;amp;#x27;s quality. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Modeling is the hard problem &lt;/td&gt;&lt;td&gt; Better compression usually comes from better prediction. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Understanding creates compression &lt;/td&gt;&lt;td&gt; The more structure you can exploit, the shorter the description. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;hr&gt;&lt;h2&gt;2. Benchmarks&lt;/h2&gt;&lt;h3&gt;What Benchmarks Actually Measure&lt;/h3&gt;&lt;p&gt;Compression benchmarks compare compressors on a chosen data set under chosen rules.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;flowchart TD
    A[Benchmark] --&amp;gt; B[Data set]
    A --&amp;gt; C[Rules]
    A --&amp;gt; D[Metrics]
    D --&amp;gt; E[Compressed size]
    D --&amp;gt; F[Compression speed]
    D --&amp;gt; G[Decompression speed]
    D --&amp;gt; H[Memory use]
    C --&amp;gt; I[Can include decompressor?]
    C --&amp;gt; J[Can tune to files?]
    C --&amp;gt; K[Single file or archive?]&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;The big triangle:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;         smaller output               /\              /  \             /    \            /      \           /        \less memory -------- faster speed&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;You usually cannot optimize all three at once. Maximum compression tools are often slow and memory-hungry. Practical formats often give up ratio to win speed, streaming, random access, or compatibility.&lt;/p&gt;&lt;h3&gt;Bits Per Character&lt;/h3&gt;&lt;p&gt;The book often uses &lt;code&gt;bpc&lt;/code&gt;, or bits per character, for byte-oriented corpora.&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; bpc &lt;/td&gt;&lt;td&gt; Meaning &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; 8.0 &lt;/td&gt;&lt;td&gt; No compression for byte data &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; 6.0 &lt;/td&gt;&lt;td&gt; 25 percent smaller than original &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; 4.0 &lt;/td&gt;&lt;td&gt; Half the original size &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; 2.0 &lt;/td&gt;&lt;td&gt; One quarter of original size &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Lower &lt;/td&gt;&lt;td&gt; Better compression, assuming the same input data &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Benchmark Landscape&lt;/h3&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Benchmark &lt;/td&gt;&lt;td&gt; What it emphasizes &lt;/td&gt;&lt;td&gt; Why it matters &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Calgary Corpus &lt;/td&gt;&lt;td&gt; Classic mixed small files &lt;/td&gt;&lt;td&gt; Historical baseline for text compression research. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Large Text Compression Benchmark &lt;/td&gt;&lt;td&gt; Large Wikipedia XML text &lt;/td&gt;&lt;td&gt; Natural language modeling and long-range structure. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Hutter Prize &lt;/td&gt;&lt;td&gt; Compression as AI research &lt;/td&gt;&lt;td&gt; Rewards improvements on a fixed text corpus with decompressor included. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Maximum Compression &lt;/td&gt;&lt;td&gt; Maximum ratio on mixed files &lt;/td&gt;&lt;td&gt; Encourages aggressive tuning for size. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Generic Compression Benchmark &lt;/td&gt;&lt;td&gt; Untuned universal prediction &lt;/td&gt;&lt;td&gt; Tests generality rather than file-type tricks. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Compression Ratings &lt;/td&gt;&lt;td&gt; Size and speed scoring &lt;/td&gt;&lt;td&gt; Makes tradeoffs adjustable by user preference. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Other public benchmarks &lt;/td&gt;&lt;td&gt; Multiple corpora and rule sets &lt;/td&gt;&lt;td&gt; Shows that rankings depend on test design. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; File system studies &lt;/td&gt;&lt;td&gt; Real-world storage mix &lt;/td&gt;&lt;td&gt; Reveals what data actually exists on machines. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Why Rankings Shift&lt;/h3&gt;&lt;p&gt;Two compressors can trade places when any of these changes:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Variable &lt;/td&gt;&lt;td&gt; Effect &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Data type &lt;/td&gt;&lt;td&gt; Text, images, executables, backups, logs, and audio favor different methods. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; File size &lt;/td&gt;&lt;td&gt; Small files make headers and model startup costs visible. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Archive rules &lt;/td&gt;&lt;td&gt; Solid archives can exploit similarity across files. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Decompressor inclusion &lt;/td&gt;&lt;td&gt; Including source or executable rewards simpler decoders. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Memory limit &lt;/td&gt;&lt;td&gt; Large models can dominate if memory is unrestricted. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Speed limit &lt;/td&gt;&lt;td&gt; Slow context mixers may lose to practical LZ-family tools. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Tuning policy &lt;/td&gt;&lt;td&gt; Per-file options can inflate benchmark-specific performance. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Visual: Benchmark as a Dashboard&lt;/h3&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;+--------------------------------------------------+| Compressor: example                              |+-------------------+------------------------------+| Size              | 1.95 bpc                     || Compression time  | slow                         || Decompression     | medium                       || Memory            | high                         || Decoder included  | yes                          || Data set          | text-heavy                   || Good use case     | archival / research          |+-------------------+------------------------------+&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h3&gt;Chapter 2 Takeaways&lt;/h3&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Takeaway &lt;/td&gt;&lt;td&gt; Why it matters &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Benchmarks are not neutral &lt;/td&gt;&lt;td&gt; They encode assumptions about data and priorities. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Size is only one metric &lt;/td&gt;&lt;td&gt; Real systems care about speed, memory, streaming, and compatibility. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Historical leaderboards age &lt;/td&gt;&lt;td&gt; Use the book&amp;amp;#x27;s rankings as context, not current product advice. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Data set choice dominates &lt;/td&gt;&lt;td&gt; A compressor can look brilliant on one corpus and ordinary on another. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; A benchmark is a contract &lt;/td&gt;&lt;td&gt; Read the rules before interpreting the chart. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;hr&gt;&lt;h2&gt;3. Coding&lt;/h2&gt;&lt;h3&gt;Coder Job Description&lt;/h3&gt;&lt;p&gt;A coder receives probabilities from a model and emits bits close to the theoretical ideal.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;flowchart LR
    A[Symbol] --&amp;gt; B[Model probability]
    B --&amp;gt; C[Coder]
    C --&amp;gt; D[Bitstream]
    D --&amp;gt; E[Decoder]
    E --&amp;gt; F[Same symbol]&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;The coder must be:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Requirement &lt;/td&gt;&lt;td&gt; Reason &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Decodable &lt;/td&gt;&lt;td&gt; The original symbols must be recoverable. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Efficient &lt;/td&gt;&lt;td&gt; Common symbols should use fewer bits. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Synchronized &lt;/td&gt;&lt;td&gt; Decoder must reproduce the same boundaries and model states. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Practical &lt;/td&gt;&lt;td&gt; Real files need headers, error checks, and sometimes encryption. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Huffman Coding&lt;/h3&gt;&lt;p&gt;Huffman coding builds a prefix tree. Frequent symbols sit near the root. Rare symbols sit deeper.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;    root   /    \common   *        / \    medium rare&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Core idea:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Symbol probability &lt;/td&gt;&lt;td&gt; Huffman effect &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; High &lt;/td&gt;&lt;td&gt; Shorter integer number of bits &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Low &lt;/td&gt;&lt;td&gt; Longer integer number of bits &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Exact powers of 1/2 &lt;/td&gt;&lt;td&gt; Very efficient &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Awkward probabilities &lt;/td&gt;&lt;td&gt; Wastes some space due to whole-bit code lengths &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;p&gt;Strengths:&lt;/p&gt;&lt;ul&gt;&lt;li&gt; Simple.  &lt;/li&gt;&lt;li&gt; Fast.  &lt;/li&gt;&lt;li&gt; Widely used.  &lt;/li&gt;&lt;li&gt; Good with static or block models.  &lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Limits:&lt;/p&gt;&lt;ul&gt;&lt;li&gt; Code lengths are whole numbers of bits.  &lt;/li&gt;&lt;li&gt; Binary alphabets cannot be compressed by basic Huffman alone.  &lt;/li&gt;&lt;li&gt; A full table or canonical description may need to be stored.  &lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;Arithmetic Coding&lt;/h3&gt;&lt;p&gt;Arithmetic coding represents an entire message as a subinterval inside &lt;code&gt;[0, 1)&lt;/code&gt;.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;Initial interval:[0 ------------------------------------------------ 1)After likely symbol:[0 -------- 0.7)After next symbol:[0.28 --- 0.42)After more symbols:[0.314159 ----------------)Output a binary number inside the final interval.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Why it matters:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Feature &lt;/td&gt;&lt;td&gt; Benefit &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Fractional bit efficiency &lt;/td&gt;&lt;td&gt; Avoids Huffman&amp;amp;#x27;s whole-bit rounding. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Works well for binary prediction &lt;/td&gt;&lt;td&gt; Ideal for bitwise models. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Adapts naturally &lt;/td&gt;&lt;td&gt; Model can update after every symbol. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Near-Shannon performance &lt;/td&gt;&lt;td&gt; Strong practical coding method. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Asymmetric Binary Coding&lt;/h3&gt;&lt;p&gt;Asymmetric binary coding is another way to code predicted bits efficiently. It uses a single integer-like state rather than arithmetic coding&amp;apos;s interval endpoints. It matters because the same theory can be implemented with different machine-level tradeoffs.&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Coding family &lt;/td&gt;&lt;td&gt; Mental model &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Huffman &lt;/td&gt;&lt;td&gt; Tree of prefix codes &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Arithmetic/range &lt;/td&gt;&lt;td&gt; Shrinking probability interval &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Asymmetric binary &lt;/td&gt;&lt;td&gt; State machine that packs bits according to probability &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Numeric Codes&lt;/h3&gt;&lt;p&gt;Some values are not arbitrary symbols. They are counts, offsets, lengths, or prediction errors. Numeric codes exploit common number distributions.&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Code &lt;/td&gt;&lt;td&gt; Good for &lt;/td&gt;&lt;td&gt; Visual shape &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Unary &lt;/td&gt;&lt;td&gt; Very small positive integers &lt;/td&gt;&lt;td&gt;&lt;code&gt;0&lt;/code&gt;, &lt;code&gt;10&lt;/code&gt;, &lt;code&gt;110&lt;/code&gt;, &lt;code&gt;1110&lt;/code&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Rice &lt;/td&gt;&lt;td&gt; Geometric-like distributions with power-of-two parameter &lt;/td&gt;&lt;td&gt; quotient plus remainder &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Golomb &lt;/td&gt;&lt;td&gt; Geometric-like distributions with flexible parameter &lt;/td&gt;&lt;td&gt; quotient plus bounded remainder &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Extra-bit codes &lt;/td&gt;&lt;td&gt; Ranges with extra low bits &lt;/td&gt;&lt;td&gt; length classes and offset details &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;p&gt;These appear in systems where the model says &amp;quot;small numbers are common, large numbers are rare.&amp;quot;&lt;/p&gt;&lt;h3&gt;Archive Formats&lt;/h3&gt;&lt;p&gt;A compression algorithm is not the whole file format. Archives also need structure.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;flowchart LR
    A[Archive header] --&amp;gt; B[File metadata]
    B --&amp;gt; C[Compressed payload]
    C --&amp;gt; D[Error check]
    D --&amp;gt; E[Optional encryption metadata]&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Important archive concerns:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Concern &lt;/td&gt;&lt;td&gt; Why it matters &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Single-file vs multi-file &lt;/td&gt;&lt;td&gt; Affects metadata and file recovery. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Solid compression &lt;/td&gt;&lt;td&gt; Similar files compressed together can shrink more. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Random access &lt;/td&gt;&lt;td&gt; Solid archives may make one-file extraction slower. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Error detection &lt;/td&gt;&lt;td&gt; Detects corruption after storage or transmission. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Encryption &lt;/td&gt;&lt;td&gt; Protects confidentiality but makes data look random afterward. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Error Detection&lt;/h3&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Method &lt;/td&gt;&lt;td&gt; What it catches &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Parity &lt;/td&gt;&lt;td&gt; Simple odd/even bit errors, weak but cheap. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; CRC-32 &lt;/td&gt;&lt;td&gt; Common accidental corruption check. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Adler-32 &lt;/td&gt;&lt;td&gt; Fast checksum used in some compression contexts. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Cryptographic hash &lt;/td&gt;&lt;td&gt; Strong integrity identity, designed against adversarial collisions. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;p&gt;Error detection is not compression, but production archives need it.&lt;/p&gt;&lt;h3&gt;Chapter 3 Takeaways&lt;/h3&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Takeaway &lt;/td&gt;&lt;td&gt; Why it matters &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Coding maps probability to bits &lt;/td&gt;&lt;td&gt; It is the final packing step. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Huffman is simple but rounded &lt;/td&gt;&lt;td&gt; Whole-bit lengths are a real limitation. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Arithmetic coding is closer to ideal &lt;/td&gt;&lt;td&gt; It fits adaptive and binary models well. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Numeric codes encode structured integers &lt;/td&gt;&lt;td&gt; They are useful for lengths, offsets, runs, and errors. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Archives are systems &lt;/td&gt;&lt;td&gt; Metadata, checks, encryption, and extraction rules matter. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;hr&gt;&lt;h2&gt;4. Modeling&lt;/h2&gt;&lt;h3&gt;The Hard Part&lt;/h3&gt;&lt;p&gt;A model estimates what comes next. Once we have a good probability, coding is mechanical. The model decides whether compression is mediocre or excellent.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;history:  &amp;quot;the quick brown &amp;quot;model:    next symbol is likely &amp;quot;f&amp;quot;, &amp;quot;d&amp;quot;, &amp;quot;c&amp;quot;, ...coder:    short code for likely next symbol&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Static vs adaptive:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Model type &lt;/td&gt;&lt;td&gt; How it works &lt;/td&gt;&lt;td&gt; Tradeoff &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Static &lt;/td&gt;&lt;td&gt; Analyze data, send model, then coded data &lt;/td&gt;&lt;td&gt; Good if model cost is small and data is stable. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Adaptive &lt;/td&gt;&lt;td&gt; Update model as data is read &lt;/td&gt;&lt;td&gt; Avoids sending full model, tracks local changes. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Fixed-Order Models&lt;/h3&gt;&lt;p&gt;An order &lt;code&gt;n&lt;/code&gt; model predicts the next symbol from the previous &lt;code&gt;n&lt;/code&gt; symbols.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;Order 0: no contextP(next)Order 1: one previous symbolP(next | previous)Order 3: three previous symbolsP(next | previous_3, previous_2, previous_1)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Example:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Context &lt;/td&gt;&lt;td&gt; Next-symbol table &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;code&gt;q&lt;/code&gt;&lt;/td&gt;&lt;td&gt;&lt;code&gt;u&lt;/code&gt; is very likely in English &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;code&gt;th&lt;/code&gt;&lt;/td&gt;&lt;td&gt;&lt;code&gt;e&lt;/code&gt;, &lt;code&gt;a&lt;/code&gt;, &lt;code&gt;i&lt;/code&gt;, &lt;code&gt;o&lt;/code&gt; are plausible &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;code&gt;ing&lt;/code&gt;&lt;/td&gt;&lt;td&gt; space, punctuation, or suffix continuation &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;p&gt;Why fixed order breaks:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Order too low &lt;/td&gt;&lt;td&gt; Misses useful context. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Order too high &lt;/td&gt;&lt;td&gt; Most contexts are rare or unseen. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Result &lt;/td&gt;&lt;td&gt; Need smoothing, fallback, or variable order. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Bytewise, Bitwise, and Indirect Models&lt;/h3&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Model style &lt;/td&gt;&lt;td&gt; Unit &lt;/td&gt;&lt;td&gt; Useful when &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Bytewise &lt;/td&gt;&lt;td&gt; Predict next byte &lt;/td&gt;&lt;td&gt; Text, simple binary data, byte-aligned formats. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Bitwise &lt;/td&gt;&lt;td&gt; Predict next bit &lt;/td&gt;&lt;td&gt; Arithmetic coding, mixed binary patterns, precise probability updates. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Indirect &lt;/td&gt;&lt;td&gt; Use hashed or transformed contexts &lt;/td&gt;&lt;td&gt; Large context spaces where full tables are too costly. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Variable-Order Models&lt;/h3&gt;&lt;p&gt;Variable-order models keep statistics for multiple context lengths and choose or mix them.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;flowchart TD
    A[Long context] --&amp;gt; B{Seen enough?}
    B --&amp;gt;|Yes| C[Use long-context prediction]
    B --&amp;gt;|No| D[Back off]
    D --&amp;gt; E[Medium context]
    E --&amp;gt; F{Seen enough?}
    F --&amp;gt;|Yes| G[Use medium-context prediction]
    F --&amp;gt;|No| H[Short context]&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;DMC&lt;/h4&gt;&lt;p&gt;Dynamic Markov Coding predicts bits with a state machine that grows as it observes data. It can split states when histories diverge.&lt;/p&gt;&lt;p&gt;Visual:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;state A --0--&amp;gt; state Bstate A --1--&amp;gt; state Cif state A is too vague:state A becomes A1 and A2&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;PPM&lt;/h4&gt;&lt;p&gt;Prediction by Partial Matching uses byte contexts and backs off from longer to shorter contexts. It is a classic text-compression idea.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;Try context &amp;quot;tion&amp;quot;if unknown, try &amp;quot;ion&amp;quot;if unknown, try &amp;quot;on&amp;quot;if unknown, try &amp;quot;n&amp;quot;if unknown, try order 0&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;The hard detail is handling symbols that have not appeared in a context, often called the escape or zero-frequency problem.&lt;/p&gt;&lt;h4&gt;CTW&lt;/h4&gt;&lt;p&gt;Context Tree Weighting mixes context tree predictions in a principled bitwise way. Instead of choosing only one context, it combines evidence across a tree.&lt;/p&gt;&lt;h3&gt;Context Mixing&lt;/h3&gt;&lt;p&gt;Context mixing uses many predictors at once.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;flowchart LR
    A[Text model] --&amp;gt; M[Mixer]
    B[Match model] --&amp;gt; M
    C[Low-order model] --&amp;gt; M
    D[File-format model] --&amp;gt; M
    E[Image/audio heuristic] --&amp;gt; M
    M --&amp;gt; P[Final probability]
    P --&amp;gt; Coder[Arithmetic coder]&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;The mixer can learn which predictors are useful in each situation.&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Component &lt;/td&gt;&lt;td&gt; Role &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Linear evidence mixing &lt;/td&gt;&lt;td&gt; Combine model outputs with weighted evidence. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Logistic mixing &lt;/td&gt;&lt;td&gt; Mix in probability/logit space for better behavior. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; SSE &lt;/td&gt;&lt;td&gt; Secondary Symbol Estimation adjusts predictions using past calibration. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; ISSE &lt;/td&gt;&lt;td&gt; Indirect SSE uses contexts to select or adapt estimators. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Match model &lt;/td&gt;&lt;td&gt; If current data matches earlier data, predict continuation from the match. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; PAQ models &lt;/td&gt;&lt;td&gt; High-compression family using context mixing. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; ZPAQ &lt;/td&gt;&lt;td&gt; A more configurable/archive-oriented successor in the PAQ family. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Crinkler &lt;/td&gt;&lt;td&gt; Specialized compression/linking for executable code size competitions. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Why Context Mixing Can Beat Single Models&lt;/h3&gt;&lt;p&gt;Different predictors notice different kinds of structure:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Predictor &lt;/td&gt;&lt;td&gt; Notices &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Low-order byte model &lt;/td&gt;&lt;td&gt; Local byte frequencies &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Word model &lt;/td&gt;&lt;td&gt; Language-level repetition &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Match model &lt;/td&gt;&lt;td&gt; Exact repeated substrings &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Image model &lt;/td&gt;&lt;td&gt; Neighboring pixel relationships &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Executable model &lt;/td&gt;&lt;td&gt; Instruction and address patterns &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; XML model &lt;/td&gt;&lt;td&gt; Tags, attributes, markup rhythm &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;p&gt;Compression improves when the mixer learns which predictor is trustworthy right now.&lt;/p&gt;&lt;h3&gt;Chapter 4 Takeaways&lt;/h3&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Takeaway &lt;/td&gt;&lt;td&gt; Why it matters &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Modeling is prediction &lt;/td&gt;&lt;td&gt; The compressed file stores surprises. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Fixed order is simple but brittle &lt;/td&gt;&lt;td&gt; Context length must match the data. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Variable order handles sparse contexts &lt;/td&gt;&lt;td&gt; Backoff avoids overconfidence in rare histories. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Context mixing is powerful &lt;/td&gt;&lt;td&gt; Many weak specialized models can beat one general model. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Better modeling can look like understanding &lt;/td&gt;&lt;td&gt; Language, images, code, and formats all reward domain knowledge. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;hr&gt;&lt;h2&gt;5. Transforms&lt;/h2&gt;&lt;h3&gt;What a Transform Does&lt;/h3&gt;&lt;p&gt;A transform rewrites data so a simpler model can compress it.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;flowchart LR
    A[Original data] --&amp;gt; B[Transform]
    B --&amp;gt; C[More model-friendly symbols]
    C --&amp;gt; D[Model]
    D --&amp;gt; E[Coder]
    E --&amp;gt; F[Compressed file]
    F --&amp;gt; G[Decoder]
    G --&amp;gt; H[Inverse transform]
    H --&amp;gt; I[Original data]&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Ideal transform:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Property &lt;/td&gt;&lt;td&gt; Meaning &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Reversible &lt;/td&gt;&lt;td&gt; Decompression gets the original back exactly. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Structure exposing &lt;/td&gt;&lt;td&gt; Repetition, locality, or predictable errors become obvious. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Cheap enough &lt;/td&gt;&lt;td&gt; Transform cost must be worth the compression gain. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Canonical when possible &lt;/td&gt;&lt;td&gt; Avoid arbitrary choices that add information burden. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Run Length Encoding&lt;/h3&gt;&lt;p&gt;RLE replaces repeated symbols with a symbol plus a count.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;AAAAAABBBBCCCCCCCCbecomes(A,6) (B,4) (C,8)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Best for:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Good &lt;/td&gt;&lt;td&gt; Bad &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Long repeated runs &lt;/td&gt;&lt;td&gt; Alternating symbols &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Simple image masks &lt;/td&gt;&lt;td&gt; Natural text &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Zero-filled data &lt;/td&gt;&lt;td&gt; Already transformed data with few runs &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;LZ77 and the Match Family&lt;/h3&gt;&lt;p&gt;LZ77 replaces repeated strings with pointers to previous occurrences.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;Input:ABRACADABRALater &amp;quot;ABRA&amp;quot; can become:(go back 7, copy 4)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Visual:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;sliding history buffer        lookahead[ABRACAD]                     [ABRA...]   ^^^^^   match reused by pointer&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Why it is popular:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Strength &lt;/td&gt;&lt;td&gt; Explanation &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Fast decompression &lt;/td&gt;&lt;td&gt; Decoder mostly copies bytes from earlier output. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; General-purpose &lt;/td&gt;&lt;td&gt; Works on many repeated byte patterns. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Streaming-friendly &lt;/td&gt;&lt;td&gt; Can run with bounded windows. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Foundation format &lt;/td&gt;&lt;td&gt; Deflate, LZMA-like families, and many practical tools build on the idea. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h4&gt;LZSS&lt;/h4&gt;&lt;p&gt;LZSS improves practical LZ77 by only emitting pointers when they save space. Short non-saving matches remain literals.&lt;/p&gt;&lt;h4&gt;Deflate&lt;/h4&gt;&lt;p&gt;Deflate combines LZ77-style matches with Huffman coding. It powers common zip/gzip-style compression and survives because it is fast, widely implemented, and compatible.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;flowchart LR
    A[Bytes] --&amp;gt; B[LZ77 literals and matches]
    B --&amp;gt; C[Huffman coding]
    C --&amp;gt; D[Deflate bitstream]&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;LZMA&lt;/h4&gt;&lt;p&gt;LZMA pushes stronger modeling around LZ-style matches, often improving ratio at the cost of more CPU and memory.&lt;/p&gt;&lt;h4&gt;LZX, ROLZ, LZP, Snappy, Deduplication&lt;/h4&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Method &lt;/td&gt;&lt;td&gt; Main idea &lt;/td&gt;&lt;td&gt; Design center &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; LZX &lt;/td&gt;&lt;td&gt; LZ-family compression used in Microsoft contexts &lt;/td&gt;&lt;td&gt; Practical binary/archive compression. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; ROLZ &lt;/td&gt;&lt;td&gt; Restricts match search by recent contexts &lt;/td&gt;&lt;td&gt; Better match relevance. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; LZP &lt;/td&gt;&lt;td&gt; Predicts repeated strings from context &lt;/td&gt;&lt;td&gt; Fast prediction of matches. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Snappy &lt;/td&gt;&lt;td&gt; Prioritizes very high speed &lt;/td&gt;&lt;td&gt; Low latency over maximum ratio. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Deduplication &lt;/td&gt;&lt;td&gt; Replaces repeated chunks across files or systems &lt;/td&gt;&lt;td&gt; Backup and storage efficiency. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;LZW and Dictionary Encoding&lt;/h3&gt;&lt;p&gt;Dictionary methods replace strings with dictionary references.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;dictionary:1 -&amp;gt; the2 -&amp;gt; compression3 -&amp;gt; modeltext:the compression modelencoded:1 2 3&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Dictionary types:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Type &lt;/td&gt;&lt;td&gt; Dictionary source &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Fixed &lt;/td&gt;&lt;td&gt; Built into the format or algorithm. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Static &lt;/td&gt;&lt;td&gt; Learned from the file and stored with it. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Dynamic &lt;/td&gt;&lt;td&gt; Built by compressor and decompressor in lockstep. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;p&gt;LZW is a dynamic dictionary method historically associated with formats like GIF-era compression. Its broader lesson is that both sides can build the same dictionary without transmitting every entry.&lt;/p&gt;&lt;h3&gt;Dictionary Encoding for Text&lt;/h3&gt;&lt;p&gt;Text-specific dictionaries can model:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Feature &lt;/td&gt;&lt;td&gt; Compression opportunity &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Words &lt;/td&gt;&lt;td&gt; Common words become compact tokens. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Capitalization &lt;/td&gt;&lt;td&gt; Store word identity separately from case pattern. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Newlines &lt;/td&gt;&lt;td&gt; Model paragraph and line structure. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Punctuation &lt;/td&gt;&lt;td&gt; Predict separators and syntax. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Word endings &lt;/td&gt;&lt;td&gt; Use morphology and repeated suffixes. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;p&gt;The stronger the text model, the more the compressor behaves like a small language-aware system.&lt;/p&gt;&lt;h3&gt;Symbol Ranking and Move-to-Front&lt;/h3&gt;&lt;p&gt;Move-to-front keeps a list of symbols ordered by recency. If the same few symbols keep appearing in a context, their ranks stay small.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;alphabet list:[A B C D E ...]read C -&amp;gt; output rank 2, move C to front[C A B D E ...]read C again -&amp;gt; output rank 0&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;This works well after transforms like BWT, where local neighborhoods tend to reuse a small set of symbols.&lt;/p&gt;&lt;h3&gt;Burrows-Wheeler Transform&lt;/h3&gt;&lt;p&gt;BWT sorts rotations of a block so characters with similar right contexts cluster together. After BWT, a fast local model can often compress well.&lt;/p&gt;&lt;p&gt;Tiny example, conceptually:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;Original block:banana&amp;#x24;Sort rotations:&amp;#x24;bananaa&amp;#x24;bananana&amp;#x24;bananana&amp;#x24;bbanana&amp;#x24;na&amp;#x24;bananana&amp;#x24;baTake last column:annb&amp;#x24;aa&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;The output is not obviously shorter, but it groups context-related symbols. It is usually followed by move-to-front, run-length coding, and entropy coding.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;flowchart LR
    A[Input block] --&amp;gt; B[BWT context sort]
    B --&amp;gt; C[Move-to-front]
    C --&amp;gt; D[Run-length coding]
    D --&amp;gt; E[Huffman or arithmetic coding]&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h3&gt;Predictive Filtering&lt;/h3&gt;&lt;p&gt;Numeric data often compresses better as prediction errors.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;samples:100, 103, 105, 106, 108predict next as previous:100, +3, +2, +1, +2&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Small errors are easier to encode than raw values.&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Filter &lt;/td&gt;&lt;td&gt; Used for &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Delta coding &lt;/td&gt;&lt;td&gt; Signals, images, ordered numeric data &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Color transform &lt;/td&gt;&lt;td&gt; Separating brightness from color differences &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Linear filtering &lt;/td&gt;&lt;td&gt; Predicting from neighboring samples or pixels &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Specialized Transforms&lt;/h3&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Transform &lt;/td&gt;&lt;td&gt; Target &lt;/td&gt;&lt;td&gt; Idea &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; E8E9 &lt;/td&gt;&lt;td&gt; x86 executable code &lt;/td&gt;&lt;td&gt; Normalize relative call/jump addresses so repeated code patterns match better. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Precomp &lt;/td&gt;&lt;td&gt; Already-compressed embedded data &lt;/td&gt;&lt;td&gt; Detect and temporarily expand compressed streams inside files so outer compression can work. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Huffman pre-coding &lt;/td&gt;&lt;td&gt; Context-mixing speed &lt;/td&gt;&lt;td&gt; Reduce input size before expensive modeling. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Chapter 5 Takeaways&lt;/h3&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Takeaway &lt;/td&gt;&lt;td&gt; Why it matters &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Transforms do not finish compression &lt;/td&gt;&lt;td&gt; They prepare data for modeling and coding. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; LZ methods exploit repeated strings &lt;/td&gt;&lt;td&gt; This explains many practical formats. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; BWT exploits sorted contexts &lt;/td&gt;&lt;td&gt; It turns context into local symbol clustering. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Filters exploit smooth numeric data &lt;/td&gt;&lt;td&gt; Prediction errors are often small. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Specialized transforms exploit file knowledge &lt;/td&gt;&lt;td&gt; Better compression often comes from knowing the data&amp;amp;#x27;s format. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;hr&gt;&lt;h2&gt;6. Lossy Compression&lt;/h2&gt;&lt;h3&gt;The Big Shift&lt;/h3&gt;&lt;p&gt;Lossless compression asks:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;Can we reproduce the original bits exactly?&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Lossy compression asks:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;Can we reproduce something humans accept as the same?&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;That makes perception the model.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;flowchart LR
    A[Original signal] --&amp;gt; B[Human perception model]
    B --&amp;gt; C[Discard hard-to-notice detail]
    C --&amp;gt; D[Quantize]
    D --&amp;gt; E[Lossless coding of remaining data]
    E --&amp;gt; F[Compressed media]&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h3&gt;Images&lt;/h3&gt;&lt;p&gt;Digital images are already approximations of continuous light. Lossy image compression removes detail that the visual system is less sensitive to.&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Human visual fact &lt;/td&gt;&lt;td&gt; Compression use &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Limited spatial resolution &lt;/td&gt;&lt;td&gt; Do not store invisible fine detail. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Limited brightness precision &lt;/td&gt;&lt;td&gt; Quantize small intensity differences. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Color sensitivity differs from brightness sensitivity &lt;/td&gt;&lt;td&gt; Store chroma at lower precision than luma. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Local smoothness is common &lt;/td&gt;&lt;td&gt; Predict pixels from neighbors or transform blocks. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Image Format Map&lt;/h3&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Format/topic &lt;/td&gt;&lt;td&gt; Role in the chapter &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; BMP &lt;/td&gt;&lt;td&gt; Mostly raw pixels, but still an approximation of continuous light. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; GIF &lt;/td&gt;&lt;td&gt; Palette-based images and simple animation history. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; PNG &lt;/td&gt;&lt;td&gt; Lossless image compression with filtering. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; TIFF &lt;/td&gt;&lt;td&gt; Flexible container used in imaging workflows. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; JPEG &lt;/td&gt;&lt;td&gt; Transform, quantization, and entropy coding for photos. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; JPEG recompression &lt;/td&gt;&lt;td&gt; Attempts to compress JPEGs further without fully losing practical recoverability. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;JPEG as a Visual Pipeline&lt;/h3&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;flowchart LR
    A[RGB pixels] --&amp;gt; B[Color transform]
    B --&amp;gt; C[Chroma subsampling]
    C --&amp;gt; D[8x8 blocks]
    D --&amp;gt; E[DCT frequency transform]
    E --&amp;gt; F[Quantization]
    F --&amp;gt; G[Zigzag ordering]
    G --&amp;gt; H[Run-length and entropy coding]
    H --&amp;gt; I[JPEG file]&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Mental picture:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;left side of block frequency table  = broad smooth shaperight side of table                 = fine detailJPEG keeps more of the left side and throws away more of the right side.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Why artifacts happen:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Artifact &lt;/td&gt;&lt;td&gt; Cause &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Blocking &lt;/td&gt;&lt;td&gt; Independent 8x8 block decisions become visible. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Ringing &lt;/td&gt;&lt;td&gt; Lost high-frequency detail near sharp edges. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Color bleeding &lt;/td&gt;&lt;td&gt; Reduced chroma detail. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Generational loss &lt;/td&gt;&lt;td&gt; Repeated decode/re-encode compounds quantization. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;JPEG Recompression&lt;/h3&gt;&lt;p&gt;JPEG files are already compressed. Recompressors look for remaining structure:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Strategy &lt;/td&gt;&lt;td&gt; What it tries to exploit &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Better entropy coding &lt;/td&gt;&lt;td&gt; JPEG&amp;amp;#x27;s stored coefficients may still be coded more compactly. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Coefficient modeling &lt;/td&gt;&lt;td&gt; Predict patterns in quantized DCT coefficients. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Metadata cleanup &lt;/td&gt;&lt;td&gt; Remove or compact non-image payload. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Specialized decoding knowledge &lt;/td&gt;&lt;td&gt; Preserve reconstructable JPEG details while storing them differently. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;p&gt;The chapter surveys historical approaches such as Stuffit, PAQ-based methods, WinZip behavior, and PackJPG. Treat the named results as historical context.&lt;/p&gt;&lt;h3&gt;Video&lt;/h3&gt;&lt;p&gt;Video compression adds time.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;flowchart LR
    A[Frame 1] --&amp;gt; B[Predict frame 2 from frame 1]
    B --&amp;gt; C[Encode motion]
    C --&amp;gt; D[Encode residual error]
    D --&amp;gt; E[Repeat across frames]&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Why video compresses:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Structure &lt;/td&gt;&lt;td&gt; Compression opportunity &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Adjacent frames are similar &lt;/td&gt;&lt;td&gt; Store changes instead of full frames. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Objects move &lt;/td&gt;&lt;td&gt; Motion vectors describe block movement. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Human vision tolerates some error &lt;/td&gt;&lt;td&gt; Quantize residuals. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Scenes contain spatial redundancy &lt;/td&gt;&lt;td&gt; Use image-like compression inside frames. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;NTSC and MPEG&lt;/h3&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Topic &lt;/td&gt;&lt;td&gt; Key idea &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; NTSC &lt;/td&gt;&lt;td&gt; Broadcast video is already shaped by human vision, refresh, interlacing, and color compromises. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; MPEG &lt;/td&gt;&lt;td&gt; Modern-style video coding predicts frames from other frames, stores motion, quantizes transforms, and entropy-codes the result. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;p&gt;Frame types as a mental model:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;I-frame: self-contained imageP-frame: predicted from previous framesB-frame: predicted from past and future reference frames&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Tradeoff:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; More prediction &lt;/td&gt;&lt;td&gt; Better compression, more complexity, more latency &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Less prediction &lt;/td&gt;&lt;td&gt; Easier seeking and editing, larger files &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Audio&lt;/h3&gt;&lt;p&gt;Audio compression uses psychoacoustics: what the ear can and cannot notice.&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Hearing fact &lt;/td&gt;&lt;td&gt; Compression use &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Limited frequency range &lt;/td&gt;&lt;td&gt; Do not store inaudible frequencies. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Sensitivity varies by frequency &lt;/td&gt;&lt;td&gt; Allocate bits where hearing is sharpest. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Loud sounds mask nearby quiet sounds &lt;/td&gt;&lt;td&gt; Remove masked components. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Perceived loudness is logarithmic &lt;/td&gt;&lt;td&gt; Quantization can follow perception. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Time masking exists &lt;/td&gt;&lt;td&gt; Sounds can hide nearby sounds in time. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;p&gt;Audio pipeline:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;flowchart LR
    A[PCM samples] --&amp;gt; B[Frequency analysis]
    B --&amp;gt; C[Psychoacoustic masking model]
    C --&amp;gt; D[Quantization and bit allocation]
    D --&amp;gt; E[Entropy coding]
    E --&amp;gt; F[Compressed audio]&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h3&gt;Chapter 6 Takeaways&lt;/h3&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Takeaway &lt;/td&gt;&lt;td&gt; Why it matters &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Lossy compression discards information &lt;/td&gt;&lt;td&gt; The hard part is choosing information humans will not miss. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Media compression is perceptual modeling &lt;/td&gt;&lt;td&gt; Vision and hearing are part of the algorithm. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Transform plus quantization is central &lt;/td&gt;&lt;td&gt; Especially for JPEG-like and audio/video systems. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Recompression is difficult &lt;/td&gt;&lt;td&gt; Already-compressed data has little easy redundancy left. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Perfect lossy compression would require understanding &lt;/td&gt;&lt;td&gt; A movie could theoretically be summarized semantically, but practical systems are far from that. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;hr&gt;&lt;h2&gt;The Grand Unifying Model&lt;/h2&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;flowchart TD
    A[Data source] --&amp;gt; B{Lossless or lossy?}
    B --&amp;gt;|Lossless| C[Preserve every bit]
    B --&amp;gt;|Lossy| D[Preserve perceptual meaning]
    C --&amp;gt; E{Transform useful?}
    D --&amp;gt; F[Perception transform and quantization]
    F --&amp;gt; E
    E --&amp;gt;|Yes| G[Expose patterns]
    E --&amp;gt;|No| H[Model directly]
    G --&amp;gt; H[Predict next symbol or bit]
    H --&amp;gt; I[Code using probability]
    I --&amp;gt; J[Package with metadata, checks, maybe encryption]&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Every concrete compressor can be placed in this frame:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Compressor family &lt;/td&gt;&lt;td&gt; Transform &lt;/td&gt;&lt;td&gt; Model &lt;/td&gt;&lt;td&gt; Coder &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; gzip/deflate &lt;/td&gt;&lt;td&gt; LZ77 matches &lt;/td&gt;&lt;td&gt; Huffman-coded literals/lengths &lt;/td&gt;&lt;td&gt; Huffman &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; bzip2-style &lt;/td&gt;&lt;td&gt; BWT, MTF, RLE &lt;/td&gt;&lt;td&gt; Local symbol frequencies &lt;/td&gt;&lt;td&gt; Huffman &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; PAQ-style &lt;/td&gt;&lt;td&gt; Often file-aware contexts &lt;/td&gt;&lt;td&gt; Context mixing &lt;/td&gt;&lt;td&gt; Arithmetic/range-like &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; PNG &lt;/td&gt;&lt;td&gt; Image filters &lt;/td&gt;&lt;td&gt; Deflate model &lt;/td&gt;&lt;td&gt; Huffman via deflate &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; JPEG &lt;/td&gt;&lt;td&gt; DCT and quantization &lt;/td&gt;&lt;td&gt; Coefficient/statistical coding &lt;/td&gt;&lt;td&gt; Entropy coding &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; MPEG-like video &lt;/td&gt;&lt;td&gt; Motion prediction and transforms &lt;/td&gt;&lt;td&gt; Residual and motion models &lt;/td&gt;&lt;td&gt; Entropy coding &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; MP3/AAC-like audio &lt;/td&gt;&lt;td&gt; Frequency analysis and masking &lt;/td&gt;&lt;td&gt; Psychoacoustic bit allocation &lt;/td&gt;&lt;td&gt; Entropy coding &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h2&gt;Practical Reading Paths&lt;/h2&gt;&lt;h3&gt;If You Want to Build a Compressor&lt;/h3&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt; Understand Chapter 1 so you stop expecting impossible wins.  &lt;/li&gt;&lt;li&gt; Pick a benchmark from Chapter 2 that matches your target data.  &lt;/li&gt;&lt;li&gt; Implement a simple coder from Chapter 3 or reuse a known one.  &lt;/li&gt;&lt;li&gt; Start with a simple adaptive model from Chapter 4.  &lt;/li&gt;&lt;li&gt; Add one transform from Chapter 5 only when it exposes a pattern you can explain.  &lt;/li&gt;&lt;li&gt; For media, study Chapter 6 before inventing quality knobs.  &lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;h3&gt;If You Want to Choose a Format&lt;/h3&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Need &lt;/td&gt;&lt;td&gt; Prefer &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Speed and compatibility &lt;/td&gt;&lt;td&gt; Deflate/gzip/zip-style tools &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Archival ratio &lt;/td&gt;&lt;td&gt; Stronger LZMA/context-mixing tools, if time is acceptable &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Text research &lt;/td&gt;&lt;td&gt; PPM/context-mixing/modern language-model-aware approaches &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Backups &lt;/td&gt;&lt;td&gt; Deduplication plus compression &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Photos &lt;/td&gt;&lt;td&gt; JPEG-like formats or newer perceptual image codecs &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Screenshots/graphics &lt;/td&gt;&lt;td&gt; PNG-like lossless image compression &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Audio/video distribution &lt;/td&gt;&lt;td&gt; Perceptual audio/video codecs &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;If You Want to Understand AI Through Compression&lt;/h3&gt;&lt;p&gt;Read Chapter 1 and Chapter 4 together. The essential loop is:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;understand pattern -&amp;gt; predict better -&amp;gt; encode surprise only -&amp;gt; shorter file&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Compression is not just storage optimization. It is a measurable way to ask how much structure a system has discovered.&lt;/p&gt;&lt;h2&gt;Cheat Sheets&lt;/h2&gt;&lt;h3&gt;Glossary&lt;/h3&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Term &lt;/td&gt;&lt;td&gt; Short definition &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Lossless &lt;/td&gt;&lt;td&gt; Decompression recovers the exact original data. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Lossy &lt;/td&gt;&lt;td&gt; Decompression recovers an acceptable approximation. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Model &lt;/td&gt;&lt;td&gt; Probability estimator for upcoming symbols or bits. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Coder &lt;/td&gt;&lt;td&gt; Converts model probabilities into a bitstream. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Transform &lt;/td&gt;&lt;td&gt; Rewrites data to expose compressible structure. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Entropy &lt;/td&gt;&lt;td&gt; Expected information content under a probability model. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Context &lt;/td&gt;&lt;td&gt; Previously seen data used to predict the next symbol. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Adaptive model &lt;/td&gt;&lt;td&gt; Updates as data is processed. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Static model &lt;/td&gt;&lt;td&gt; Sent or fixed before coding the payload. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Solid archive &lt;/td&gt;&lt;td&gt; Compresses multiple files together to exploit shared structure. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; BWT &lt;/td&gt;&lt;td&gt; Context-sorting transform that clusters similar-symbol contexts. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Quantization &lt;/td&gt;&lt;td&gt; Reducing precision, usually the irreversible part of lossy coding. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Psychoacoustics &lt;/td&gt;&lt;td&gt; Modeling what humans can hear. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h3&gt;Algorithm Selection Sketch&lt;/h3&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;Mostly repeated bytes?-&amp;gt; LZ-style match codingMostly text?-&amp;gt; PPM, context mixing, dictionary transforms, or modern language-aware modelingMostly smooth numeric samples?-&amp;gt; predictive filtering plus entropy codingMostly photos?-&amp;gt; perceptual image codecMostly backups or VM images?-&amp;gt; deduplication plus compressionAlready encrypted or compressed?-&amp;gt; expect little or no gain&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h3&gt;Red Flags When Evaluating Compression Claims&lt;/h3&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Claim &lt;/td&gt;&lt;td&gt; Skeptical question &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Compresses every file &lt;/td&gt;&lt;td&gt; How does it avoid the counting argument? &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Recompresses compressed data repeatedly &lt;/td&gt;&lt;td&gt; Where does the extra information go? &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Beats all compressors &lt;/td&gt;&lt;td&gt; On which benchmark and rules? &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; No quality loss in lossy mode &lt;/td&gt;&lt;td&gt; What exact metric or human test supports that? &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Tiny output with universal recovery &lt;/td&gt;&lt;td&gt; Is the decompressor/model included in the accounting? &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h2&gt;Closing&lt;/h2&gt;&lt;p&gt;The book&amp;apos;s central message is practical and philosophical at the same time:&lt;/p&gt;&lt;blockquote&gt;&lt;div&gt; To compress data, find structure. To find structure, predict. To predict well, understand the source.  &lt;/div&gt;&lt;/blockquote&gt;&lt;p&gt;That is why the same field contains Huffman trees, probability intervals, dictionaries, suffix sorting, image transforms, psychoacoustics, benchmark politics, and AI. They are all different ways of answering one question:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;What is the shortest description that still lets us recover what matters?&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h2&gt;Source and Further Reading&lt;/h2&gt;&lt;ul&gt;&lt;li&gt; Matt Mahoney, &lt;a href=&quot;https://mattmahoney.net/dc/dce.html&quot; target=&quot;_blank&quot;&gt;Data Compression Explained&lt;/a&gt;, last updated Apr. 15, 2013.  &lt;/li&gt;&lt;li&gt; For current tool rankings, consult up-to-date benchmark leaderboards directly; the rankings in the source are historical.  &lt;/li&gt;&lt;li&gt; For implementation practice, start with a simple RLE or Huffman coder, then build toward adaptive modeling, LZ-style matching, or arithmetic/range coding.  &lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;hr&gt;&lt;aside&gt;&lt;h2&gt;
Interlinked Content
&lt;/h2&gt;&lt;div&gt;&lt;br&gt;&lt;/div&gt;&lt;/aside&gt;&lt;/div&gt;</content>
    </item>
    <item>
      <title>Iran-USA War, Told Through Tweets (2026)</title>
      <link>https://oreoro.github.io/posts/iran-usa-war-timeline-2026-personal-notes/</link>
      <guid isPermaLink="true">https://oreoro.github.io/posts/iran-usa-war-timeline-2026-personal-notes/</guid>
      <description>A polished, personal blog-style archive of the 2026 Iran-USA war as it appeared through X posts: the first strikes, Hormuz, the ceasefires, the memorandum, and the June 20 argument over what peace now means.</description>
      <pubDate>Sat, 20 Jun 2026 00:00:00 GMT</pubDate>
      <lastUpdatedTimestamp>Sat Jun 20 2026 15:08:00 GMT+0000 (Coordinated Universal Time)</lastUpdatedTimestamp>
      <category>Personal Notes</category>
      <category>Information</category>
      <category>🪴 Potted</category>
      <content>&lt;div&gt;
                    &lt;p&gt;
                        &lt;em&gt;Note:&lt;/em&gt; This RSS feed strips out SVGs and embeds. You might want to read the post on the webpage
                        &lt;a href=&quot;https://oreoro.github.io/posts/iran-usa-war-timeline-2026-personal-notes/&quot; target=&quot;_blank&quot;&gt;here&lt;/a&gt;.
                    &lt;/p&gt;
                    &lt;hr&gt;
                &lt;div&gt;&lt;p&gt;&lt;time&gt; June 20, 2026 &lt;/time&gt;&lt;/p&gt;&lt;/div&gt;&lt;hr&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt; &amp;#x1f4dd; &lt;/div&gt;&lt;div&gt; Updated June 20, 2026. This is a blog post, not a wire story: &lt;a href=&quot;https://en.wikipedia.org/wiki/2026_Iran_war&quot; target=&quot;_blank&quot;&gt;Wikipedia&lt;/a&gt; gives me the dated skeleton, and the embedded X posts show the public mood around that skeleton: official confidence, breaking clips, deal spin, Israeli anxiety, and the first argument over who won.  &lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt; &amp;#x26a0;&amp;#xfe0f; &lt;/div&gt;&lt;div&gt; Reading note: I am treating every tweet as an artifact, not as a final fact. The useful question is not only &amp;#x201c;was this post right?&amp;#x201d; but &amp;#x201c;what did this post make the war feel like while people were trying to understand it?&amp;#x201d;  &lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;Why I wanted to save the feed&lt;/h3&gt;&lt;p&gt;A normal timeline makes the 2026 Iran-USA war look more orderly than it felt. It says the war began, escalated, reached Hormuz, hit a ceasefire, then moved into a memorandum. That is accurate enough for a reference page. It is not how the war arrived on my screen.&lt;/p&gt;&lt;p&gt;On X, the same week could look like victory, collapse, diplomacy, propaganda, and panic depending on which post landed first. Official accounts wrote in capital letters. Reporters posted fragments. Analysts tried to turn fragments into shape. Everyone else argued over whether the deal was peace, surrender, humiliation, or just a pause.&lt;/p&gt;&lt;blockquote&gt;&lt;div&gt; The timeline tells me what happened. The feed tells me what people were being asked to believe before the dust settled.  &lt;/div&gt;&lt;/blockquote&gt;&lt;h3&gt;The first day was already a media war&lt;/h3&gt;&lt;p&gt;Wikipedia places the opening U.S.-Israeli strikes on February 28, 2026, after the order for Operation Epic Fury. The feed had no patience for distance. The war was introduced as a command decision, then immediately as a moral claim, then as retaliation clips, then as a search for off-ramps.&lt;/p&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2027654336138924410&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;The White House presents the opening of U.S. combat operations as a presidential statement.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2027678826998714473&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Al Jazeera English captures Netanyahu framing the joint strikes as removing an existential threat.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2027705747044233628&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Iranian retaliation appears in the feed as video from Bahrain, not as a dry line in a chronology.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2027827160598065523&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Barak Ravid reports Trump already talking about possible off-ramps after the opening strikes.&lt;/div&gt;&lt;/div&gt;&lt;p&gt;That mix matters. The war was not only launched. It was narrated into existence. By the end of the first day, the public had already been given the three tones that would dominate the next months: resolve, retaliation, and &amp;#x201c;there is still a way out.&amp;#x201d;&lt;/p&gt;&lt;h3&gt;Hormuz turned the war into everyone else&amp;#x2019;s problem&lt;/h3&gt;&lt;p&gt;The Strait of Hormuz is why this never stayed as a distant foreign-policy story. Once shipping, insurance, oil, and fertilizer entered the conversation, the war moved from strategy pages into grocery bills and fuel prices. This is where the feed became oddly concrete: one chokepoint, one map, one global anxiety.&lt;/p&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2035516932498030879&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;White House warning over the Strait of Hormuz during the pressure phase.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2064789019016265823&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;White House claim that the U.S. controlled the Strait of Hormuz.&lt;/div&gt;&lt;/div&gt;&lt;p&gt;But a strait is not controlled by a caption. It is controlled by ships, mines, drones, pilots, insurers, and whether captains believe tomorrow will be calmer than today. That is why the Hormuz posts are the most useful part of this archive: they show confidence running ahead of the actual settlement.&lt;/p&gt;&lt;h3&gt;The ceasefire sounded cleaner than it was&lt;/h3&gt;&lt;p&gt;By April, the word &amp;#x201c;ceasefire&amp;#x201d; started doing too much work. Pakistan pushed diplomacy. The U.S. and Iran talked through intermediaries. Israel and Lebanon sat awkwardly inside the same sentence without being fully contained by it. The feed looked like relief, but also like fine print.&lt;/p&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2041596151108137363&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Pakistan PM Shehbaz Sharif urging a two-week ceasefire window for diplomacy.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2041665043423752651&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Pakistan announces that Iran, the United States, and allies have agreed to a ceasefire.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2041929940678144097&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Iranian foreign minister Abbas Araghchi frames the ceasefire around the unresolved Israel/Lebanon track.&lt;/div&gt;&lt;/div&gt;&lt;p&gt;This is where the post stops being a simple U.S.-Iran story. If Lebanon keeps burning, the ceasefire is not a full stop. It is a bracket.&lt;/p&gt;&lt;h3&gt;The deal became content before it became peace&lt;/h3&gt;&lt;p&gt;The June memorandum had the rhythm of a product launch: leaks, denials, confirmation posts, victory captions, then screenshots and clause analysis. It is easy to mock that, but it is also how a lot of people first encountered the deal.&lt;/p&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2063943366983725410&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;White House post saying both Israel and Iran were looking toward an immediate ceasefire.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2065953102130495701&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Al Jazeera English on Trump saying a deal could be signed soon while Tehran urged caution.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2066272391525802417&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;White House declaring the deal with Iran complete.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2066341220604129675&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Ro Khanna backing the ceasefire agreement and sovereignty language.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2066397237727383830&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;EU foreign policy chief Kaja Kallas welcomes the U.S.-Iran deal and Hormuz reopening.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2066747601085669436&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Al Jazeera English notes U.S. officials framing the memorandum as not yet a full peace deal.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2067359023892938839&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Barak Ravid reports that U.S. and Iran signed the MOU remotely and that it is in effect.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2067360466620272982&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Follow-up detail on Trump personally signing the agreement.&lt;/div&gt;&lt;/div&gt;&lt;p&gt;The tension is visible in the posts. One account says complete. Another says interim. Another says signed. Another says not yet peace. Together they read less like contradiction and more like the actual shape of the moment: a war trying to turn itself into paperwork.&lt;/p&gt;&lt;h3&gt;June 20 update: the argument moved to the aftermath&lt;/h3&gt;&lt;p&gt;By June 20, the newest useful posts were not about the signing itself. They were about what the signing meant. That is usually the moment a war starts becoming history: not because everyone agrees it is over, but because everyone starts fighting over the interpretation.&lt;/p&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2068212788200382918&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;June 20: Al Jazeera English thread on the interim peace deal becoming a political flashpoint inside Israel.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2068317952961913267&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;June 20: ISW argues Hormuz is reopening in a way that retains Iranian control rather than restoring the pre-war status quo.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2068124747800400013&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;June 20: ISW reads Iran as likely seeking to delay nuclear negotiations while keeping leverage.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2067604741828264025&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Israeli political reading of the MOU&amp;#x2019;s Lebanon clause and why it alarms Israeli observers.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2067626141989404821&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Foreign Affairs frames the postwar risk as Iran winning the war but possibly losing the peace.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Mentions &lt;a href=&quot;https://twitter.com/user/status/2067324495845601614&quot;&gt;tweet&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Jake Sherman shares Senator Cassidy calling the outcome a major foreign-policy blunder.&lt;/div&gt;&lt;/div&gt;&lt;p&gt;These are the posts I would add if I were freezing the archive on June 20. They show the war shifting from &amp;#x201c;what happened?&amp;#x201d; to &amp;#x201c;who can live with the settlement?&amp;#x201d; Israel is angry. Iran is reading leverage into Hormuz. U.S. critics are calling the result a blunder. Analysts are already warning that the 60-day window is not a victory lap; it is a countdown.&lt;/p&gt;&lt;h3&gt;What the feed still misses&lt;/h3&gt;&lt;p&gt;Tweets are excellent at tension and terrible at scale. They can show a missile clip, a sentence from a leader, or a sharp argument over a clause. They do not naturally hold grief, repairs, debt, trauma, or the boring administrative work of making ports, hospitals, schools, and power grids usable again.&lt;/p&gt;&lt;p&gt;That is why I do not want this post to pretend the feed is the war. It is the surface of the war. A loud surface, sometimes useful, sometimes manipulative, sometimes ahead of official language, sometimes completely wrong.&lt;/p&gt;&lt;ul&gt;&lt;li&gt; The opening posts made the strikes feel decisive before the consequences were visible.  &lt;/li&gt;&lt;li&gt; The Hormuz posts made a global economic problem easier to understand, but also easier to oversimplify.  &lt;/li&gt;&lt;li&gt; The ceasefire posts showed how quickly diplomacy becomes branding.  &lt;/li&gt;&lt;li&gt; The June 20 posts show the aftermath beginning before the war has emotionally ended.  &lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;My read as of June 20&lt;/h3&gt;&lt;p&gt;The cleanest version is this: the war has moved from missiles to clauses. That is better than the reverse. But it is not peace in the deep sense. It is a document asking several angry systems to behave long enough for the next document to exist.&lt;/p&gt;&lt;p&gt;The feed will probably keep calling that victory or humiliation depending on the account. I am more interested in whether the strait stays open, whether Lebanon stops being the loophole, whether nuclear talks become real verification, and whether the human cost is still visible once the diplomatic theater moves on.&lt;/p&gt;&lt;blockquote&gt;&lt;div&gt; A ceasefire can stop the clock. It cannot, by itself, repair the time already lost.  &lt;/div&gt;&lt;/blockquote&gt;&lt;h3&gt;Source spine&lt;/h3&gt;&lt;ul&gt;&lt;li&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/2026_Iran_war&quot; target=&quot;_blank&quot;&gt;Wikipedia: 2026 Iran war&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Timeline_of_the_2026_Iran_war&quot; target=&quot;_blank&quot;&gt;Wikipedia: Timeline of the 2026 Iran war&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;https://x.com/AJEnglish/status/2068212788200382918&quot; target=&quot;_blank&quot;&gt;Al Jazeera June 20 post on Israeli political fallout&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;https://x.com/TheStudyofWar/status/2068317952961913267&quot; target=&quot;_blank&quot;&gt;ISW June 20 post on Hormuz and Iranian control&lt;/a&gt;&lt;/li&gt;&lt;li&gt; All embedded X posts above are saved as public artifacts from the war narrative, not endorsements.  &lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;hr&gt;&lt;aside&gt;&lt;h2&gt;
Interlinked Content
&lt;/h2&gt;&lt;div&gt;&lt;br&gt;&lt;/div&gt;&lt;/aside&gt;&lt;/div&gt;</content>
    </item>
    <item>
      <title>How LLMs Actually Work: A Friendly Map for Humans</title>
      <link>https://oreoro.github.io/posts/how-llms-actually-work-friendly-guide/</link>
      <guid isPermaLink="true">https://oreoro.github.io/posts/how-llms-actually-work-friendly-guide/</guid>
      <description>A plain-English, visual guide to tokenization, embeddings, attention, transformer layers, and next-token prediction, with optional technical notes and tiny code examples.</description>
      <pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate>
      <lastUpdatedTimestamp>Sat Jun 06 2026 07:22:00 GMT+0000 (Coordinated Universal Time)</lastUpdatedTimestamp>
      <category>Personal Notes</category>
      <category>Guide</category>
      <category>Information</category>
      <category>🌲 Evergreen</category>
      <content>&lt;div&gt;
                    &lt;p&gt;
                        &lt;em&gt;Note:&lt;/em&gt; This RSS feed strips out SVGs and embeds. You might want to read the post on the webpage
                        &lt;a href=&quot;https://oreoro.github.io/posts/how-llms-actually-work-friendly-guide/&quot; target=&quot;_blank&quot;&gt;here&lt;/a&gt;.
                    &lt;/p&gt;
                    &lt;hr&gt;
                &lt;div&gt;&lt;p&gt;&lt;time&gt; June 6, 2026 &lt;/time&gt;&lt;/p&gt;&lt;/div&gt;&lt;hr&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt; &amp;#x1f9ed; &lt;/div&gt;&lt;div&gt; LLMs are not magic brains. They are prediction machines built from a few repeatable parts: tokens, vectors, attention, memory-like feed-forward layers, and a loop that keeps choosing the next likely piece of text.  &lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt; &amp;#x270d;&amp;#xfe0f; &lt;/div&gt;&lt;div&gt; Source note: this is an original, beginner-friendly rewrite inspired by Kato&amp;apos;s article &lt;a href=&quot;https://www.0xkato.xyz/how-llms-actually-work/&quot; target=&quot;_blank&quot;&gt;How LLMs Actually Work&lt;/a&gt;, with extra examples, code, tables, and Notion-native structure.  &lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;The whole idea in one minute&lt;/h3&gt;&lt;p&gt;An LLM, or large language model, takes your text, turns it into numbers, runs those numbers through many transformer layers, and predicts what text should come next.&lt;/p&gt;&lt;p&gt;That is the simple version. The useful version is this:&lt;/p&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt; Your prompt is split into &lt;strong&gt;tokens&lt;/strong&gt;, which are small text pieces.  &lt;/li&gt;&lt;li&gt; Each token becomes a &lt;strong&gt;vector&lt;/strong&gt;, which is a list of numbers that carries learned meaning.  &lt;/li&gt;&lt;li&gt; The model adds information about &lt;strong&gt;order&lt;/strong&gt;, because &lt;code&gt;dog bites man&lt;/code&gt; and &lt;code&gt;man bites dog&lt;/code&gt; do not mean the same thing.  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Attention&lt;/strong&gt; lets each token decide which earlier tokens matter.  &lt;/li&gt;&lt;li&gt; A &lt;strong&gt;feed-forward network&lt;/strong&gt; does deeper processing for each token.  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Residual connections and normalization&lt;/strong&gt; keep the many layers stable.  &lt;/li&gt;&lt;li&gt; The model outputs scores for the next possible token.  &lt;/li&gt;&lt;li&gt; One token is chosen, added to the text, and the loop repeats.  &lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;pre&gt;flowchart LR
    A[&amp;quot;You type a prompt&amp;quot;] --&amp;gt; B[&amp;quot;Tokenizer&amp;lt;br&amp;gt;text pieces&amp;quot;]
    B --&amp;gt; C[&amp;quot;Embeddings&amp;lt;br&amp;gt;meaning as numbers&amp;quot;]
    C --&amp;gt; D[&amp;quot;Position signal&amp;lt;br&amp;gt;word order&amp;quot;]
    D --&amp;gt; E[&amp;quot;Attention&amp;lt;br&amp;gt;what should matter?&amp;quot;]
    E --&amp;gt; F[&amp;quot;Feed-forward layer&amp;lt;br&amp;gt;deeper processing&amp;quot;]
    F --&amp;gt; G[&amp;quot;Next-token scores&amp;quot;]
    G --&amp;gt; H[&amp;quot;Pick one token&amp;quot;]
    H --&amp;gt; I[&amp;quot;Add it to the text&amp;quot;]
    I --&amp;gt; E&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt; &amp;#x1f4a1; &lt;/div&gt;&lt;div&gt; A good mental model: an LLM is like an autocomplete system that has read a massive library and learned incredibly subtle patterns about what usually follows what.  &lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th scope=&quot;col&quot;&gt; Part &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Plain-English job &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Why it matters &lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Tokens &lt;/td&gt;&lt;td&gt; Break text into pieces &lt;/td&gt;&lt;td&gt; The model cannot read raw words or letters directly. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Embeddings &lt;/td&gt;&lt;td&gt; Turn pieces into meaning-shaped numbers &lt;/td&gt;&lt;td&gt; Similar ideas can sit near each other in number-space. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Position &lt;/td&gt;&lt;td&gt; Tell the model where each piece appears &lt;/td&gt;&lt;td&gt; Order changes meaning. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Attention &lt;/td&gt;&lt;td&gt; Let tokens look at useful previous tokens &lt;/td&gt;&lt;td&gt; This is how context flows through the sentence. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Feed-forward network &lt;/td&gt;&lt;td&gt; Process each token more deeply &lt;/td&gt;&lt;td&gt; A lot of learned structure lives here. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Next-token prediction &lt;/td&gt;&lt;td&gt; Score likely continuations &lt;/td&gt;&lt;td&gt; This is the generation loop behind every answer. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;1. Tokens: the model&amp;apos;s alphabet is not your alphabet&lt;/h3&gt;&lt;p&gt;Models do not see your sentence the way you do. You see words. The model sees token IDs.&lt;/p&gt;&lt;p&gt;A tokenizer might split a sentence like this:&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;Text:   &amp;quot;The sleepy robot writes poetry.&amp;quot;Tokens: [&amp;quot;The&amp;quot;, &amp;quot; sleepy&amp;quot;, &amp;quot; robot&amp;quot;, &amp;quot; writes&amp;quot;, &amp;quot; poetry&amp;quot;, &amp;quot;.&amp;quot;]IDs:    [791, 47823, 11205, 13004, 24465, 13]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;Those ID numbers are what enter the model. The specific numbers differ across model families, but the pattern is the same: text becomes a sequence of integers.&lt;/p&gt;&lt;p&gt;Why not just use whole words? Because language is messy. New names, typos, code, slang, and other languages would explode the vocabulary. Tokens sit between letters and words: flexible enough for rare text, efficient enough for common text.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;Slightly technical: why the strawberry counting problem happens&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;p&gt;When you ask a model how many letters are in a word, the model may not be looking at separate letters. It may see a word as one or a few tokens. That means character-level questions can be awkward unless the model deliberately reasons about spelling.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;javascript&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;const vocabulary = {  &amp;quot;The&amp;quot;: 791,  &amp;quot; sleepy&amp;quot;: 47823,  &amp;quot; robot&amp;quot;: 11205,  &amp;quot; writes&amp;quot;: 13004,  &amp;quot; poetry&amp;quot;: 24465,  &amp;quot;.&amp;quot;: 13,};const prompt = [&amp;quot;The&amp;quot;, &amp;quot; sleepy&amp;quot;, &amp;quot; robot&amp;quot;, &amp;quot; writes&amp;quot;, &amp;quot; poetry&amp;quot;, &amp;quot;.&amp;quot;];const tokenIds = prompt.map((piece) =&amp;gt; vocabulary[piece]);console.log(tokenIds);// [791, 47823, 11205, 13004, 24465, 13]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;2. Embeddings: IDs become meaning-shaped numbers&lt;/h3&gt;&lt;p&gt;A token ID by itself is just a label. ID &lt;code&gt;11205&lt;/code&gt; does not mean robot unless the model has a learned table that says what vector should represent that token.&lt;/p&gt;&lt;p&gt;That table is called the &lt;strong&gt;embedding matrix&lt;/strong&gt;. Think of it as a huge spreadsheet:&lt;/p&gt;&lt;ul&gt;&lt;li&gt; Every token ID gets one row.  &lt;/li&gt;&lt;li&gt; Every row contains many numbers.  &lt;/li&gt;&lt;li&gt; Those numbers are learned during training.  &lt;/li&gt;&lt;li&gt; The row becomes the token&amp;apos;s starting representation.  &lt;/li&gt;&lt;/ul&gt;&lt;p&gt;If two tokens are used in similar situations, their vectors often end up close together. Words like &lt;code&gt;doctor&lt;/code&gt;, &lt;code&gt;nurse&lt;/code&gt;, and &lt;code&gt;hospital&lt;/code&gt; tend to live near related medical concepts. This was not hand-labeled by a person; it emerges because those relationships help the model predict text.&lt;/p&gt;&lt;div&gt;&lt;div&gt; &amp;#x1f9e0; &lt;/div&gt;&lt;div&gt; Embeddings are not definitions. They are coordinates learned from usage. The model learns that concepts are related because they appear in related contexts.  &lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;Slightly technical: vector arithmetic&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;p&gt;An embedding is a vector, meaning a list of numbers. With enough training, directions in vector space can behave like meaning shifts. That is why famous examples like &lt;code&gt;king - man + woman &amp;#x2248; queen&lt;/code&gt; can sometimes work. It is geometry, not a dictionary.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;3. Position: the model needs word order&lt;/h3&gt;&lt;p&gt;A bag of tokens is not enough. These two sentences contain almost the same pieces but mean very different things:&lt;/p&gt;&lt;blockquote&gt;&lt;div&gt; The dog chased the boy.  &lt;/div&gt;&lt;/blockquote&gt;&lt;blockquote&gt;&lt;div&gt; The boy chased the dog.  &lt;/div&gt;&lt;/blockquote&gt;&lt;p&gt;The model therefore needs a position signal. Older transformers added a position vector to each token embedding. Many modern LLMs use &lt;strong&gt;RoPE&lt;/strong&gt;, short for Rotary Position Embeddings, where position is represented by rotating parts of the vector.&lt;/p&gt;&lt;p&gt;You do not need the math to understand the purpose: position makes the model aware that one token came before another, and roughly how far apart they are.&lt;/p&gt;&lt;div&gt;&lt;div&gt; &amp;#x1f4cc; &lt;/div&gt;&lt;div&gt; Practical takeaway: important context usually works best near the start or end of a long prompt. Many models are weaker at using information buried in the middle.  &lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;Slightly technical: why long context is still hard&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;p&gt;Even if a model can accept a huge prompt, that does not mean it uses every part equally well. Attention has to compare many tokens, and retrieval quality can drop when the answer is hidden in the middle of a long context window.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;4. Attention: tokens decide what to pay attention to&lt;/h3&gt;&lt;p&gt;Attention is the heart of the transformer. It lets each token ask: &lt;strong&gt;which previous tokens should shape my current meaning?&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;For each token, the model creates three learned views:&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th scope=&quot;col&quot;&gt; Name &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Question it answers &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Everyday analogy &lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Query &lt;/td&gt;&lt;td&gt; What am I looking for? &lt;/td&gt;&lt;td&gt; A search request &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Key &lt;/td&gt;&lt;td&gt; What do I match with? &lt;/td&gt;&lt;td&gt; A label on stored information &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Value &lt;/td&gt;&lt;td&gt; What information should be passed along? &lt;/td&gt;&lt;td&gt; The content you copy after finding a match &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;p&gt;Imagine the sentence:&lt;/p&gt;&lt;blockquote&gt;&lt;div&gt; The cat that I saw yesterday was sleeping.  &lt;/div&gt;&lt;/blockquote&gt;&lt;p&gt;When the model reaches &lt;code&gt;was&lt;/code&gt;, it needs to know what was sleeping. Attention can give more weight to &lt;code&gt;cat&lt;/code&gt; than to &lt;code&gt;yesterday&lt;/code&gt;, because &lt;code&gt;cat&lt;/code&gt; is more useful for understanding the verb.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;import mathscores = {&amp;quot;cat&amp;quot;: 3.0, &amp;quot;yesterday&amp;quot;: 0.2, &amp;quot;saw&amp;quot;: 0.7}# Softmax turns raw scores into weights that add up to 1.exp_scores = {word: math.exp(score) for word, score in scores.items()}total = sum(exp_scores.values())weights = {word: value / total for word, value in exp_scores.items()}print(weights)# cat gets most of the weight&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt; &amp;#x1f512; &lt;/div&gt;&lt;div&gt; GPT-style models use causal masking: while predicting the next token, they can look backward but not forward. Future text is hidden because it has not been generated yet.  &lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;5. Multi-head attention: many views at once&lt;/h3&gt;&lt;p&gt;One attention pattern is not enough for language. A sentence can contain grammar, references, tone, code syntax, and long-range dependencies at the same time.&lt;/p&gt;&lt;p&gt;Multi-head attention runs several attention operations in parallel. One head might track subject-verb relationships. Another might follow quotation marks. Another might notice that a variable name in code was used earlier.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;Slightly technical: heads are learned projections, not fixed slices&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;p&gt;Each head learns its own projections from the full token vector into a smaller query/key/value space. So a head is not simply handed a pre-cut piece of the vector. It learns its own way to view the whole token representation.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;The model then combines the outputs from all heads and sends the result onward.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;Token representation   &amp;#x251c;&amp;#x2500; attention head 1: grammar relationship   &amp;#x251c;&amp;#x2500; attention head 2: nearby phrase structure   &amp;#x251c;&amp;#x2500; attention head 3: repeated pattern   &amp;#x2514;&amp;#x2500; attention head 4: reference or pronoun link        &amp;#x2193;Combined into one updated token representation&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;A practical detail: during generation, the model stores old key and value vectors in a &lt;strong&gt;KV cache&lt;/strong&gt;. That way it does not need to recompute the entire conversation every time it adds one new token.&lt;/p&gt;&lt;hr&gt;&lt;h3&gt;6. Feed-forward networks: where a lot of learned structure lives&lt;/h3&gt;&lt;p&gt;After attention mixes information between tokens, each token goes through a feed-forward network.&lt;/p&gt;&lt;p&gt;Attention is about tokens communicating. The feed-forward network is more like each token doing private thinking.&lt;/p&gt;&lt;p&gt;The rough pattern is:&lt;/p&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt; Expand the vector into a larger space.  &lt;/li&gt;&lt;li&gt; Apply a non-linear function.  &lt;/li&gt;&lt;li&gt; Compress it back down.  &lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;p&gt;The non-linear step matters because it lets the model learn richer patterns. Without it, many stacked layers would collapse into something much simpler.&lt;/p&gt;&lt;div&gt;&lt;div&gt; &amp;#x1f9f1; &lt;/div&gt;&lt;div&gt; A lot of model parameters live in feed-forward layers. This is one reason they are often discussed as the model&amp;apos;s learned store of patterns, facts, and associations.  &lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;Slightly technical: dense models vs mixture of experts&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;p&gt;In a dense transformer, every token uses the same feed-forward network in a layer. In a mixture-of-experts model, a small router chooses only a few expert networks for each token. This can increase total model capacity without making every token run through every parameter.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;7. Residual stream and normalization: keeping deep models trainable&lt;/h3&gt;&lt;p&gt;A modern LLM can have dozens or even hundreds of layers. If each layer simply replaced the previous representation, training would be fragile.&lt;/p&gt;&lt;p&gt;Residual connections solve part of that problem. Instead of replacing the vector, a block adds its output back to the existing vector.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;new_vector = old_vector + block_output&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;This creates a running stream of information through the network. Each layer can add a refinement without destroying everything that came before.&lt;/p&gt;&lt;p&gt;Layer normalization keeps the numbers stable. Without it, values can grow too large or shrink too much as they pass through many layers.&lt;/p&gt;&lt;div&gt;&lt;div&gt; &amp;#x1f6e0;&amp;#xfe0f; &lt;/div&gt;&lt;div&gt; The boring-sounding parts matter. Residual connections and normalization are major reasons very deep transformer stacks can actually train.  &lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;8. Next-token prediction: the answer is built one piece at a time&lt;/h3&gt;&lt;p&gt;At the end of the stack, the model turns the final vector into scores for possible next tokens. These raw scores are called logits. A softmax converts them into probabilities.&lt;/p&gt;&lt;p&gt;Then a decoding strategy chooses one token.&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th scope=&quot;col&quot;&gt; Setting &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Plain-English effect &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; When useful &lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Temperature &lt;/td&gt;&lt;td&gt; Controls randomness &lt;/td&gt;&lt;td&gt; Lower for precise answers, higher for creative drafts &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Top-k &lt;/td&gt;&lt;td&gt; Only considers the k most likely tokens &lt;/td&gt;&lt;td&gt; Prevents very unlikely choices &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Top-p &lt;/td&gt;&lt;td&gt; Considers the smallest likely group whose probabilities add up to p &lt;/td&gt;&lt;td&gt; Flexible sampling without fixed k &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;text = &amp;quot;The capital of France is&amp;quot;while not done:    token_ids = tokenize(text)    vectors = transformer(token_ids)    next_token_scores = unembed(vectors[-1])    next_token = sample(next_token_scores, temperature=0.7)    text += detokenize(next_token)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;That loop is the machine behind the fluent paragraph. The model writes by repeatedly asking: &lt;strong&gt;given everything so far, what token should come next?&lt;/strong&gt;&lt;/p&gt;&lt;div&gt;&lt;div&gt; &amp;#x26a0;&amp;#xfe0f; &lt;/div&gt;&lt;div&gt; This also explains hallucinations. The base training objective rewards plausible continuation, not guaranteed truth. Post-training, retrieval, tool use, and evaluation are added to make outputs more useful and reliable.  &lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;9. Architecture vs weights: why models feel different&lt;/h3&gt;&lt;p&gt;Many modern LLMs share the same broad transformer-family shape. What makes them feel different is usually a combination of:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Training data:&lt;/strong&gt; what they learned from.  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Scale:&lt;/strong&gt; how many layers, heads, parameters, and tokens were used.  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Architecture choices:&lt;/strong&gt; dense or mixture-of-experts, attention variants, context length, tokenizer.  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Post-training:&lt;/strong&gt; instruction tuning, preference training, safety behavior, tool use, and product-level rules.  &lt;/li&gt;&lt;/ul&gt;&lt;p&gt;So when people compare GPT, Claude, Gemini, Llama, Mistral, Qwen, or Gemma, they are often comparing siblings in a broad transformer family rather than completely unrelated species of model.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;Slightly technical: modern transformer vocabulary&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;p&gt;&lt;strong&gt;RoPE:&lt;/strong&gt; position through vector rotation.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;RMSNorm:&lt;/strong&gt; a cheaper normalization variant used in many modern open models.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;SwiGLU:&lt;/strong&gt; a popular activation/feed-forward design.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;GQA:&lt;/strong&gt; grouped-query attention, which reduces KV-cache memory.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;MoE:&lt;/strong&gt; mixture of experts, where only selected expert networks run for each token.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;h3&gt;10. GPT-2 and MoE: two useful milestones&lt;/h3&gt;&lt;p&gt;Two research threads make the mechanics above feel more concrete. &lt;strong&gt;GPT-2&lt;/strong&gt; showed how far plain next-token prediction could go when scaled. &lt;strong&gt;Mixture of Experts&lt;/strong&gt; shows how a model can grow more capable without forcing every token to use every parameter.&lt;/p&gt;&lt;div&gt;&lt;div&gt; &amp;#x1f9e9; &lt;/div&gt;&lt;div&gt; Plain-English mental model: GPT-2 is like one very large generalist team. MoE is like a building with specialist rooms, where a router sends each token to only the rooms that seem useful.  &lt;/div&gt;&lt;/div&gt;&lt;h4&gt;GPT-2: scaling the next-token game&lt;/h4&gt;&lt;p&gt;OpenAI&amp;apos;s 2019 paper &lt;a href=&quot;https://cdn.openai.com/better-language-models/language-models.pdf&quot; target=&quot;_blank&quot;&gt;&lt;em&gt;Language Models are Unsupervised Multitask Learners&lt;/em&gt;&lt;/a&gt; made a simple bet famous: train a transformer to continue internet text, then test whether that same model can handle many tasks by phrasing them as text continuation.&lt;/p&gt;&lt;ul&gt;&lt;li&gt; It was autoregressive: it generated left to right, one token at a time.  &lt;/li&gt;&lt;li&gt; It was dense: every token passed through the same model weights.  &lt;/li&gt;&lt;li&gt; It helped popularize the idea that scale plus simple training can produce surprisingly general behavior.  &lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;# Simplified GPT-2-style objectiveprompt = &amp;quot;Translate to French: hello&amp;quot;target_next_token = &amp;quot; bon&amp;quot;# Training nudges the model so this next token becomes more likely.loss = cross_entropy(model(prompt), target_next_token)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;MoE: not every token needs the whole building&lt;/h4&gt;&lt;p&gt;A dense transformer usually runs every token through the same feed-forward network. In a &lt;strong&gt;Mixture-of-Experts&lt;/strong&gt; model, a small router chooses only a few expert networks for each token. The model can have many more total parameters, while each token activates only a subset.&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th scope=&quot;col&quot;&gt; Concept &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Dense LLM &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; MoE LLM &lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Work per token &lt;/td&gt;&lt;td&gt; Uses the same main blocks &lt;/td&gt;&lt;td&gt; Uses selected experts &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Analogy &lt;/td&gt;&lt;td&gt; One big generalist team &lt;/td&gt;&lt;td&gt; Router plus specialist teams &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Tradeoff &lt;/td&gt;&lt;td&gt; Simpler to train and serve &lt;/td&gt;&lt;td&gt; More capacity, more routing complexity &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;Slightly technical: where the MoE papers fit&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2101.03961&quot; target=&quot;_blank&quot;&gt;Switch Transformers&lt;/a&gt; simplified MoE routing by sending each token to one expert. &lt;a href=&quot;https://arxiv.org/abs/2112.10684&quot; target=&quot;_blank&quot;&gt;Efficient Large Scale Language Modeling with Mixtures of Experts&lt;/a&gt; studied autoregressive MoE language models at scale. &lt;a href=&quot;https://arxiv.org/abs/2401.04088&quot; target=&quot;_blank&quot;&gt;Mixtral of Experts&lt;/a&gt; is a modern sparse MoE example where each token is routed to two feed-forward experts.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt; &amp;#x2696;&amp;#xfe0f; &lt;/div&gt;&lt;div&gt; Important nuance: MoE does not automatically mean smarter. Data quality, routing balance, training stability, inference hardware, and post-training still matter.  &lt;/div&gt;&lt;/div&gt;&lt;h3&gt;11. The AI ecosystem: MCP, tools, RAG, agents, and evals&lt;/h3&gt;&lt;p&gt;The transformer is the engine, but real AI products usually add a stack around it. That stack gives the model fresh information, lets it take actions, checks its work, and keeps the system observable.&lt;/p&gt;&lt;div&gt;&lt;div&gt; &amp;#x1f5fa;&amp;#xfe0f; &lt;/div&gt;&lt;div&gt; Plain-English map: the LLM is the text brain, tools are the hands, RAG is the open-book notes, MCP is a standard plug for external systems, agents are the loop that decides what to do next, and evals are the tests that tell you if any of it works.  &lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th scope=&quot;col&quot;&gt; Term &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Simple meaning &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; What it helps with &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Watch out for &lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Prompt &lt;/td&gt;&lt;td&gt; Instructions and context &lt;/td&gt;&lt;td&gt; Steering behavior without changing weights &lt;/td&gt;&lt;td&gt; Vague prompts create vague answers &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Tool calling &lt;/td&gt;&lt;td&gt; The model asks your app to run a function &lt;/td&gt;&lt;td&gt; Weather, search, payments, calendars, databases &lt;/td&gt;&lt;td&gt; Validate every argument before doing anything real &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; MCP &lt;/td&gt;&lt;td&gt; A shared protocol for connecting AI apps to tools/data &lt;/td&gt;&lt;td&gt; Reusable integrations across different hosts &lt;/td&gt;&lt;td&gt; Permissions, auth, and tool descriptions matter &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; RAG &lt;/td&gt;&lt;td&gt; Retrieve relevant documents before answering &lt;/td&gt;&lt;td&gt; Fresh facts and private knowledge &lt;/td&gt;&lt;td&gt; Bad retrieval creates confident wrong answers &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Embeddings &lt;/td&gt;&lt;td&gt; Meaning as searchable vectors &lt;/td&gt;&lt;td&gt; Semantic search and clustering &lt;/td&gt;&lt;td&gt; Similar does not always mean correct &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Agent &lt;/td&gt;&lt;td&gt; A model inside a task loop &lt;/td&gt;&lt;td&gt; Planning, tool use, retries, handoffs &lt;/td&gt;&lt;td&gt; Needs limits, logs, and stop conditions &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Fine-tuning &lt;/td&gt;&lt;td&gt; Training on examples of desired behavior &lt;/td&gt;&lt;td&gt; Style, format, classification, repeated edge cases &lt;/td&gt;&lt;td&gt; Do evals first; do not use it as a fact database &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Evals &lt;/td&gt;&lt;td&gt; Tests for model behavior &lt;/td&gt;&lt;td&gt; Comparing prompts, tools, models, and releases &lt;/td&gt;&lt;td&gt; Tiny demo tests miss real-world messiness &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h4&gt;MCP: the USB-C idea for AI tools&lt;/h4&gt;&lt;p&gt;MCP stands for &lt;strong&gt;Model Context Protocol&lt;/strong&gt;. Instead of every AI app inventing a custom connector for every service, MCP defines a common client-server pattern. An AI app is the host. It creates an MCP client. That client connects to an MCP server, which exposes things like tools, resources, and prompts.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;AI app / host  &amp;#x2514;&amp;#x2500; MCP client       &amp;#x2514;&amp;#x2500; MCP server            &amp;#x251c;&amp;#x2500; tools: actions the model may request            &amp;#x251c;&amp;#x2500; resources: files, docs, database records, logs            &amp;#x2514;&amp;#x2500; prompts: reusable instruction templates&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;p&gt;The key idea is not that MCP makes the model smarter by itself. It makes integrations more standard. A coding agent can connect to GitHub, a support assistant can connect to tickets, and a research assistant can connect to document stores using the same basic pattern.&lt;/p&gt;&lt;div&gt;&lt;div&gt; &amp;#x1f510; &lt;/div&gt;&lt;div&gt; Security rule: treat tools like real permissions, not decorations. If a tool can send email, delete files, spend money, or publish content, the app should require clear approval, scoped access, logging, and argument validation.  &lt;/div&gt;&lt;/div&gt;&lt;h4&gt;RAG: giving the model an open book&lt;/h4&gt;&lt;p&gt;RAG means &lt;strong&gt;Retrieval-Augmented Generation&lt;/strong&gt;. The model does not rely only on what it learned during training. Your app first searches a knowledge base, pulls the most relevant chunks into the prompt, and asks the model to answer using that context.&lt;/p&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt; Split documents into chunks.  &lt;/li&gt;&lt;li&gt; Turn each chunk into an embedding vector.  &lt;/li&gt;&lt;li&gt; Store those vectors in a search index or vector database.  &lt;/li&gt;&lt;li&gt; When the user asks something, search for similar chunks.  &lt;/li&gt;&lt;li&gt; Put the best chunks into the model context and ask for a grounded answer.  &lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;javascript&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;const question = &amp;quot;What is our refund policy?&amp;quot;;const hits = await vectorSearch(embed(question), { topK: 5 });const answer = await llm.generate({  instructions: &amp;quot;Answer only from the provided policy snippets.&amp;quot;,  context: hits.map((hit) =&amp;gt; hit.text),  input: question,});&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;Agents: the loop around the model&lt;/h4&gt;&lt;p&gt;An agent is not a new kind of brain. It is usually an LLM plus an orchestration loop: read the goal, choose a next step, maybe call a tool, inspect the result, update the plan, and continue until done or stopped.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;goal &amp;#x2192; think about next step &amp;#x2192; call tool &amp;#x2192; observe result &amp;#x2192; adjust plan &amp;#x2192; final answer&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt; &amp;#x1f9ea; &lt;/div&gt;&lt;div&gt; Evals are what turn AI from a cool demo into an engineering system. Before shipping a new prompt, model, tool, or agent flow, test it on examples that represent real users, failure cases, and edge cases.  &lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;Slightly technical: how these pieces fit in one product&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;p&gt;A production assistant might use MCP to discover tools, RAG to fetch private documents, tool calling to take controlled actions, structured outputs to return clean JSON, evals to measure quality, tracing to debug failures, and guardrails to block unsafe or unauthorized actions.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;A friendly checklist for understanding any LLM answer&lt;/h3&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;span&gt;&amp;#x2705;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span&gt;  Did the model receive the right information in the prompt? &lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;span&gt;&amp;#x2705;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span&gt;  Was the important context near the beginning or end? &lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;span&gt;&amp;#x2705;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span&gt;  Is the task asking for facts, reasoning, creativity, or formatting? &lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;span&gt;&amp;#x2705;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span&gt;  Would retrieval or a tool make the answer more grounded? &lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;span&gt;&amp;#x2705;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span&gt;  Should the output be checked against a source before trusting it? &lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt; &amp;#x2705; &lt;/div&gt;&lt;div&gt; If you remember one thing, remember this: LLMs transform text into numbers, let those numbers exchange context through attention, and then predict the next token again and again until an answer appears.  &lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;Further reading&lt;/h3&gt;&lt;ul&gt;&lt;li&gt; Kato, &lt;a href=&quot;https://www.0xkato.xyz/how-llms-actually-work/&quot; target=&quot;_blank&quot;&gt;How LLMs Actually Work&lt;/a&gt;&lt;/li&gt;&lt;li&gt; Vaswani et al., &lt;a href=&quot;https://arxiv.org/abs/1706.03762&quot; target=&quot;_blank&quot;&gt;Attention Is All You Need&lt;/a&gt;&lt;/li&gt;&lt;li&gt; Su et al., &lt;a href=&quot;https://arxiv.org/abs/2104.09864&quot; target=&quot;_blank&quot;&gt;RoFormer: Enhanced Transformer with Rotary Position Embedding&lt;/a&gt;&lt;/li&gt;&lt;li&gt; Liu et al., &lt;a href=&quot;https://arxiv.org/abs/2307.03172&quot; target=&quot;_blank&quot;&gt;Lost in the Middle: How Language Models Use Long Contexts&lt;/a&gt;&lt;/li&gt;&lt;li&gt; Radford et al., &lt;a href=&quot;https://cdn.openai.com/better-language-models/language-models.pdf&quot; target=&quot;_blank&quot;&gt;Language Models are Unsupervised Multitask Learners&lt;/a&gt; (GPT-2)  &lt;/li&gt;&lt;li&gt; Fedus, Zoph, and Shazeer, &lt;a href=&quot;https://arxiv.org/abs/2101.03961&quot; target=&quot;_blank&quot;&gt;Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity&lt;/a&gt;&lt;/li&gt;&lt;li&gt; Artetxe et al., &lt;a href=&quot;https://arxiv.org/abs/2112.10684&quot; target=&quot;_blank&quot;&gt;Efficient Large Scale Language Modeling with Mixtures of Experts&lt;/a&gt;&lt;/li&gt;&lt;li&gt; Jiang et al., &lt;a href=&quot;https://arxiv.org/abs/2401.04088&quot; target=&quot;_blank&quot;&gt;Mixtral of Experts&lt;/a&gt;&lt;/li&gt;&lt;li&gt; Model Context Protocol, &lt;a href=&quot;https://modelcontextprotocol.io/docs/learn/architecture&quot; target=&quot;_blank&quot;&gt;Architecture overview&lt;/a&gt;&lt;/li&gt;&lt;li&gt; OpenAI, &lt;a href=&quot;https://developers.openai.com/api/docs/guides/function-calling&quot; target=&quot;_blank&quot;&gt;Function calling / tool calling guide&lt;/a&gt;&lt;/li&gt;&lt;li&gt; OpenAI, &lt;a href=&quot;https://openai.com/index/introducing-text-and-code-embeddings/&quot; target=&quot;_blank&quot;&gt;Introducing text and code embeddings&lt;/a&gt;&lt;/li&gt;&lt;li&gt; OpenAI Agents SDK, &lt;a href=&quot;https://openai.github.io/openai-agents-python/agents/&quot; target=&quot;_blank&quot;&gt;Agents guide&lt;/a&gt;&lt;/li&gt;&lt;li&gt; OpenAI, &lt;a href=&quot;https://developers.openai.com/api/docs/guides/supervised-fine-tuning&quot; target=&quot;_blank&quot;&gt;Supervised fine-tuning guide&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;blockquote&gt;&lt;div&gt; Polished enough to read like an essay, structured enough to use as a reference, and simple enough that you can explain it to a friend after one pass.  &lt;/div&gt;&lt;/blockquote&gt;&lt;/div&gt;&lt;hr&gt;&lt;aside&gt;&lt;h2&gt;
Interlinked Content
&lt;/h2&gt;&lt;div&gt;&lt;br&gt;&lt;/div&gt;&lt;/aside&gt;&lt;/div&gt;</content>
    </item>
    <item>
      <title>The 2026 AI Landscape: A Hacker&apos;s Deep Dive 🤖</title>
      <link>https://oreoro.github.io/posts/2026-ai-landscape/</link>
      <guid isPermaLink="true">https://oreoro.github.io/posts/2026-ai-landscape/</guid>
      <description>Everything you need to understand the current AI moment, from transformers to agents, RAG pipelines to MCP, and every buzzword in between.</description>
      <pubDate>Thu, 04 Jun 2026 00:00:00 GMT</pubDate>
      <lastUpdatedTimestamp>Thu Jun 04 2026 15:22:00 GMT+0000 (Coordinated Universal Time)</lastUpdatedTimestamp>
      <category>Personal Notes</category>
      <category>Guide</category>
      <category>Information</category>
      <category>Tools</category>
      <content>&lt;div&gt;
                    &lt;p&gt;
                        &lt;em&gt;Note:&lt;/em&gt; This RSS feed strips out SVGs and embeds. You might want to read the post on the webpage
                        &lt;a href=&quot;https://oreoro.github.io/posts/2026-ai-landscape/&quot; target=&quot;_blank&quot;&gt;here&lt;/a&gt;.
                    &lt;/p&gt;
                    &lt;hr&gt;
                &lt;div&gt;&lt;p&gt;&lt;time&gt; June 4, 2026 &lt;/time&gt;&lt;/p&gt;&lt;/div&gt;&lt;hr&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt; &amp;#x1f916; &lt;/div&gt;&lt;div&gt; This deep dive is structured as a native Notion article: use the table of contents below to jump between architecture, agents, RAG, protocols, frameworks, prompting, vector databases, glossary, and code appendix.  &lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th scope=&quot;col&quot;&gt; Layer &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Native Notion treatment &lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Navigation &lt;/td&gt;&lt;td&gt; Built-in table of contents plus semantic headings &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Reference material &lt;/td&gt;&lt;td&gt; Native tables and collapsible glossary sections &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Implementation detail &lt;/td&gt;&lt;td&gt; Language-aware code blocks and equation blocks &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Long-form reading &lt;/td&gt;&lt;td&gt; Callouts, dividers, and structured sections &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;hr&gt;&lt;div&gt;&lt;div&gt; &amp;#x2615; &lt;/div&gt;&lt;div&gt; Everything you need to understand the current AI moment &amp;#x2014; from transformers to agents, RAG pipelines to MCP, and every buzzword in between. Grab a coffee.  &lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;1. The Big Picture: Where AI Stands in 2026&lt;/h3&gt;&lt;p&gt;The AI landscape of 2026 is defined by a single, sweeping shift: &lt;strong&gt;from chat to action&lt;/strong&gt;. The previous era was dominated by raw model intelligence &amp;#x2014; who had the biggest, smartest LLM. The current era prioritizes &lt;strong&gt;orchestration layers&lt;/strong&gt; that unify multiple models and tools to automate complex, end-to-end business workflows. We&amp;apos;ve moved from &amp;quot;AI that talks&amp;quot; to &amp;quot;AI that does.&amp;quot;[1][2]&lt;/p&gt;&lt;p&gt;A few landmark data points paint the picture clearly:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;700 million people&lt;/strong&gt; use ChatGPT weekly as of mid-2025[3]  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Anthropic&lt;/strong&gt; now holds ~40% of enterprise LLM API spend; &lt;strong&gt;OpenAI has dropped to 27%&lt;/strong&gt;, down from ~50% in 2023[4]  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;2026 is the year of autonomous AI agents&lt;/strong&gt; &amp;#x2014; goal-driven systems combining reasoning, planning, and tool use, marking the biggest functional jump since GPT-3[1]  &lt;/li&gt;&lt;li&gt; Open-source models (Meta&amp;apos;s Llama 4 family with 10M token context windows) have &lt;strong&gt;narrowed the gap&lt;/strong&gt; with proprietary models dramatically[4]  &lt;/li&gt;&lt;/ul&gt;&lt;p&gt;The defining theme is the &lt;strong&gt;Agentic Web&lt;/strong&gt; &amp;#x2014; where AI agents serve as the primary gateway to the internet, navigating backends through APIs rather than humans switching between websites.[2]&lt;/p&gt;&lt;hr&gt;&lt;h3&gt;2. The GPT Evolution &amp;#x2014; A Complete Timeline&lt;/h3&gt;&lt;p&gt;The journey from GPT-1 to GPT-5.5 is arguably the fastest capability evolution in computing history &amp;#x2014; parameters grew from 117 million to 175+ billion, a &lt;strong&gt;1,495&amp;#xd7; increase in two years&lt;/strong&gt; (GPT-1 to GPT-3).[3]&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th scope=&quot;col&quot;&gt; Model &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Date &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Params &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Key Leap &lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GPT-1&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; June 2018 &lt;/td&gt;&lt;td&gt; 117M &lt;/td&gt;&lt;td&gt; Proved unsupervised pre-training works[3] &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GPT-2&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Feb 2019 &lt;/td&gt;&lt;td&gt; 1.5B &lt;/td&gt;&lt;td&gt; Coherent long-form text; initially &amp;quot;too dangerous to release&amp;quot;[5] &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GPT-3&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; May 2020 &lt;/td&gt;&lt;td&gt; 175B &lt;/td&gt;&lt;td&gt; First commercially viable model; in-context learning[5] &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GPT-3.5 / InstructGPT&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; 2022 &lt;/td&gt;&lt;td&gt; ~175B &lt;/td&gt;&lt;td&gt; RLHF introduced; gave us ChatGPT[5] &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GPT-4&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; March 2023 &lt;/td&gt;&lt;td&gt; Undisclosed &lt;/td&gt;&lt;td&gt; Multimodal (text + image); reasoning at scale[3] &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GPT-4o&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; May 2024 &lt;/td&gt;&lt;td&gt; Undisclosed &lt;/td&gt;&lt;td&gt; Omnimodal (text, image, audio natively); 2&amp;#xd7; speed[6] &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GPT-4.5&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Feb 2025 &lt;/td&gt;&lt;td&gt; Undisclosed &lt;/td&gt;&lt;td&gt; Stronger world knowledge, fewer hallucinations[6] &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GPT-5&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Aug 2025 &lt;/td&gt;&lt;td&gt; Undisclosed &lt;/td&gt;&lt;td&gt; 94.6% on advanced math; 45% fewer hallucinations vs GPT-4o[3] &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GPT-5.5&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; April 2026 &lt;/td&gt;&lt;td&gt; Undisclosed &lt;/td&gt;&lt;td&gt; Native omnimodal; autonomous computer use; agentic coding[6] &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h4&gt;The o-Series: Reasoning Models&lt;/h4&gt;&lt;p&gt;Alongside the GPT-series, OpenAI launched a separate &lt;strong&gt;reasoning-first architecture&lt;/strong&gt; &amp;#x2014; the o1, o3, and o4-mini series &amp;#x2014; which departed from pure next-token prediction. These models &amp;quot;think before they speak,&amp;quot; running chain-of-thought reasoning internally before generating output. The o3 and o4-mini launched in April 2025 with significantly enhanced reasoning, particularly in STEM.[5][6]&lt;/p&gt;&lt;hr&gt;&lt;h3&gt;3. How LLMs Actually Work: The Transformer Architecture&lt;/h3&gt;&lt;p&gt;Every modern LLM is built on the &lt;strong&gt;Transformer&lt;/strong&gt;, introduced in the landmark 2017 paper &lt;em&gt;&amp;quot;Attention Is All You Need&amp;quot;&lt;/em&gt;. Here&amp;apos;s the architecture unwrapped:[7]&lt;/p&gt;&lt;h4&gt;3.1 The Four Building Blocks&lt;/h4&gt;&lt;p&gt;A transformer layer has four core components:[8]&lt;/p&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;&lt;strong&gt;Token Embeddings&lt;/strong&gt; &amp;#x2014; Convert words/subwords into numerical vectors in high-dimensional space  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Positional Encoding&lt;/strong&gt; &amp;#x2014; Inject information about the order of tokens (since attention has no built-in notion of sequence)  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Attention&lt;/strong&gt; &amp;#x2014; The magic: lets each token &amp;quot;look at&amp;quot; every other token  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Feed-Forward Block&lt;/strong&gt; &amp;#x2014; A pair of linear transformations applied position-wise  &lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;h4&gt;3.2 Self-Attention: The Core Insight&lt;/h4&gt;&lt;p&gt;Self-attention answers the question: &lt;em&gt;&amp;quot;Which other words should I focus on to understand my own meaning?&amp;quot;&lt;/em&gt; For every token, three vectors are computed:[9][10]&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Q (Query)&lt;/strong&gt; &amp;#x2014; &amp;quot;What am I looking for?&amp;quot;  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;K (Key)&lt;/strong&gt; &amp;#x2014; &amp;quot;What do I contain?&amp;quot;  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;V (Value)&lt;/strong&gt; &amp;#x2014; &amp;quot;What information do I carry?&amp;quot;  &lt;/li&gt;&lt;/ul&gt;&lt;p&gt;The attention score is computed as:&lt;/p&gt;&lt;div&gt;&lt;span&gt;&lt;span&gt;Attention(Q,K,V)=softmax(QKTdk)V\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V&lt;span&gt;&lt;span&gt;&lt;span&gt;Attention&lt;/span&gt;(Q,K,V)=&lt;span&gt;softmax&lt;span&gt;(&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;d&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&amp;#x200b;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&amp;#x200b;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;Q&lt;span&gt;K&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;T&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&amp;#x200b;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;V&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;p&gt;where &lt;span&gt;&lt;span&gt;&lt;span&gt;dkd_k&lt;span&gt;&lt;span&gt;&lt;span&gt;d&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&amp;#x200b;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; is the key dimension. The &lt;span&gt;&lt;span&gt;&lt;span&gt;dk\sqrt{d_k}&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;d&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&amp;#x200b;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&amp;#x200b;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; scaling prevents the dot products from growing too large and making softmax gradients vanish.[11]&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;import torchimport torch.nn.functional as Fimport mathdef scaled_dot_product_attention(Q, K, V, mask=None):    &amp;quot;&amp;quot;&amp;quot;    Core self-attention mechanism.    Q, K, V: (batch_size, seq_len, d_k)    &amp;quot;&amp;quot;&amp;quot;    d_k = Q.size(-1)    # Compute attention scores    scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(d_k)    # Apply optional mask (for decoder / causal attention)    if mask is not None:        scores = scores.masked_fill(mask == 0, float(&amp;apos;-inf&amp;apos;))    # Softmax to get attention weights    attn_weights = F.softmax(scores, dim=-1)    # Weighted sum of values    output = torch.matmul(attn_weights, V)    return output, attn_weights# Example: 2 sentences, 4 tokens, 8-dim embeddingsbatch_size, seq_len, d_model = 2, 4, 8d_k = 8Q = torch.randn(batch_size, seq_len, d_k)K = torch.randn(batch_size, seq_len, d_k)V = torch.randn(batch_size, seq_len, d_k)output, weights = scaled_dot_product_attention(Q, K, V)print(f&amp;quot;Output shape: {output.shape}&amp;quot;)        # (2, 4, 8)print(f&amp;quot;Attention weights: {weights.shape}&amp;quot;)  # (2, 4, 4)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;3.3 Multi-Head Attention&lt;/h4&gt;&lt;p&gt;Instead of computing attention once, transformers run &lt;strong&gt;multiple attention heads in parallel&lt;/strong&gt; &amp;#x2014; GPT-3 uses 96 attention heads per block. Each head learns a different &amp;quot;relevance function.&amp;quot; The outputs are concatenated and projected:[7]&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;import torch.nn as nnclass MultiHeadAttention(nn.Module):    def __init__(self, d_model, num_heads):        super().__init__()        assert d_model % num_heads == 0        self.d_k = d_model // num_heads        self.num_heads = num_heads        self.W_q = nn.Linear(d_model, d_model)        self.W_k = nn.Linear(d_model, d_model)        self.W_v = nn.Linear(d_model, d_model)        self.W_o = nn.Linear(d_model, d_model)    def split_heads(self, x):        B, T, d = x.shape        # (B, T, d) -&amp;gt; (B, num_heads, T, d_k)        return x.view(B, T, self.num_heads, self.d_k).transpose(1, 2)    def forward(self, x, mask=None):        Q = self.split_heads(self.W_q(x))        K = self.split_heads(self.W_k(x))        V = self.split_heads(self.W_v(x))        attn_out, _ = scaled_dot_product_attention(Q, K, V, mask)        # Merge heads: (B, num_heads, T, d_k) -&amp;gt; (B, T, d_model)        B, H, T, d_k = attn_out.shape        attn_out = attn_out.transpose(1, 2).contiguous().view(B, T, H * d_k)        return self.W_o(attn_out)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;3.4 Modern Architecture Improvements&lt;/h4&gt;&lt;p&gt;The vanilla transformer has been significantly optimized:[11]&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th scope=&quot;col&quot;&gt; Innovation &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; What It Does &lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;FlashAttention&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Reduces memory traffic 2&amp;#x2013;4&amp;#xd7; by optimizing GPU SRAM access patterns &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;RoPE (Rotary Position Embeddings)&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Better position encoding enabling longer context windows &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MoE (Mixture of Experts)&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Activates only a subset of parameters per token &amp;#x2014; enables huge models at lower compute cost &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GQA (Grouped Query Attention)&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Multiple query heads share key/value heads, reducing KV-cache memory &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Linear Attention&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Reduces complexity from O(n&amp;#xb2;) to O(n) for long documents &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;4. The Agentic AI Revolution&lt;/h3&gt;&lt;p&gt;The single biggest shift in 2026 is from &lt;strong&gt;generative AI&lt;/strong&gt; (creates content) to &lt;strong&gt;agentic AI&lt;/strong&gt; (autonomous systems that plan, decide, and execute). Where a generative model answers your question, an agentic AI accomplishes your goal.[12]&lt;/p&gt;&lt;h4&gt;4.1 What Makes an AI Agent?&lt;/h4&gt;&lt;p&gt;An agent has four capabilities that a plain chatbot lacks:[12]&lt;/p&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;&lt;strong&gt;Goal Understanding&lt;/strong&gt; &amp;#x2014; Decompose a complex objective into sub-tasks  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Multi-Step Planning&lt;/strong&gt; &amp;#x2014; Create and revise a plan of action  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Tool Use&lt;/strong&gt; &amp;#x2014; Execute functions, call APIs, browse the web, write code  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Self-Correction&lt;/strong&gt; &amp;#x2014; Observe outcomes and adjust behavior in a loop  &lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;h4&gt;4.2 The ReAct Pattern&lt;/h4&gt;&lt;p&gt;&lt;strong&gt;ReAct (Reasoning + Acting)&lt;/strong&gt; is the foundational pattern for agents, introduced by Yao et al. The model interleaves reasoning traces with actions:[13]&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;Thought: I need to find the current population of Karachi.Action: search(&amp;quot;Karachi population 2026&amp;quot;)Observation: Karachi population is approximately 16.5 million.Thought: Now I can answer the question.Answer: Karachi has approximately 16.5 million people.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;from openai import OpenAIimport jsonclient = OpenAI()# Define tools the agent can usetools = [    {        &amp;quot;type&amp;quot;: &amp;quot;function&amp;quot;,        &amp;quot;function&amp;quot;: {            &amp;quot;name&amp;quot;: &amp;quot;web_search&amp;quot;,            &amp;quot;description&amp;quot;: &amp;quot;Search the web for current information&amp;quot;,            &amp;quot;parameters&amp;quot;: {                &amp;quot;type&amp;quot;: &amp;quot;object&amp;quot;,                &amp;quot;properties&amp;quot;: {                    &amp;quot;query&amp;quot;: {&amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;, &amp;quot;description&amp;quot;: &amp;quot;Search query&amp;quot;}                },                &amp;quot;required&amp;quot;: [&amp;quot;query&amp;quot;]            }        }    },    {        &amp;quot;type&amp;quot;: &amp;quot;function&amp;quot;,         &amp;quot;function&amp;quot;: {            &amp;quot;name&amp;quot;: &amp;quot;run_python&amp;quot;,            &amp;quot;description&amp;quot;: &amp;quot;Execute Python code and return the result&amp;quot;,            &amp;quot;parameters&amp;quot;: {                &amp;quot;type&amp;quot;: &amp;quot;object&amp;quot;,                &amp;quot;properties&amp;quot;: {                    &amp;quot;code&amp;quot;: {&amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;, &amp;quot;description&amp;quot;: &amp;quot;Python code to run&amp;quot;}                },                &amp;quot;required&amp;quot;: [&amp;quot;code&amp;quot;]            }        }    }]def run_react_agent(task: str, max_steps: int = 5):    messages = [{&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: task}]    for step in range(max_steps):        response = client.chat.completions.create(            model=&amp;quot;gpt-4o&amp;quot;,            messages=messages,            tools=tools,            tool_choice=&amp;quot;auto&amp;quot;        )        msg = response.choices[0].message        # No tool call = final answer        if not msg.tool_calls:            return msg.content        # Execute tool calls        messages.append(msg)        for tool_call in msg.tool_calls:            result = execute_tool(tool_call.function.name,                                   json.loads(tool_call.function.arguments))            messages.append({                &amp;quot;role&amp;quot;: &amp;quot;tool&amp;quot;,                &amp;quot;tool_call_id&amp;quot;: tool_call.id,                &amp;quot;content&amp;quot;: str(result)            })    return &amp;quot;Max steps reached&amp;quot;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;4.3 Agentic Patterns&lt;/h4&gt;&lt;p&gt;Four core patterns drive agent behavior:[14]&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th scope=&quot;col&quot;&gt; Pattern &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Description &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Example &lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Reflection&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Agent critiques its own output and revises &lt;/td&gt;&lt;td&gt; Code reviewer that re-checks generated code &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Planning&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Decompose goal into ordered sub-tasks &lt;/td&gt;&lt;td&gt; Research agent building a structured outline &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Tool Use&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Call external APIs and functions &lt;/td&gt;&lt;td&gt; Weather agent calling a weather API &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Multi-Agent&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Spawn specialized sub-agents &lt;/td&gt;&lt;td&gt; Orchestrator delegates to coder + tester agents &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h4&gt;4.4 Multi-Agent Systems (MAS)&lt;/h4&gt;&lt;p&gt;By 2026, the field has moved beyond single-purpose agents to &lt;strong&gt;Multi-Agent Systems&lt;/strong&gt; &amp;#x2014; AI &amp;quot;teams&amp;quot; where specialized agents collaborate to achieve a shared objective, mirroring microservice architecture in traditional software.[12]&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;                    &amp;#x250c;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2510;                    &amp;#x2502;   Orchestrator   &amp;#x2502;                    &amp;#x2502;   (Planner LLM) &amp;#x2502;                    &amp;#x2514;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x252c;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2518;            &amp;#x250c;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x253c;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2510;            &amp;#x25bc;               &amp;#x25bc;               &amp;#x25bc;    &amp;#x250c;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2510; &amp;#x250c;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2510; &amp;#x250c;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2510;    &amp;#x2502;  Researcher  &amp;#x2502; &amp;#x2502;   Coder     &amp;#x2502; &amp;#x2502;   Reviewer   &amp;#x2502;    &amp;#x2502;   Agent      &amp;#x2502; &amp;#x2502;   Agent     &amp;#x2502; &amp;#x2502;   Agent      &amp;#x2502;    &amp;#x2514;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2518; &amp;#x2514;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2518; &amp;#x2514;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2518;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;5. RAG: Retrieval-Augmented Generation Deep Dive&lt;/h3&gt;&lt;p&gt;RAG is how you give an LLM access to your private data without retraining it. In 2026, &lt;strong&gt;RAG as default infrastructure&lt;/strong&gt; for enterprise LLM applications.[1]&lt;/p&gt;&lt;h4&gt;5.1 Naive RAG Pipeline&lt;/h4&gt;&lt;p&gt;The vanilla RAG flow is simple:[15]&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;User Query    &amp;#x2502;    &amp;#x25bc;Embed Query (vector)    &amp;#x2502;    &amp;#x25bc;Similarity Search &amp;#x2192; Vector DB &amp;#x2192; Top-K Documents    &amp;#x2502;    &amp;#x25bc;Inject Context into LLM Prompt    &amp;#x2502;    &amp;#x25bc;LLM generates grounded response&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;from openai import OpenAIimport numpy as npclient = OpenAI()# Step 1: Embed documents at ingestion timedef embed_text(text: str) -&amp;gt; list[float]:    response = client.embeddings.create(        model=&amp;quot;text-embedding-3-small&amp;quot;,        input=text    )    return response.data[0].embedding# Step 2: Simple cosine similarity searchdef cosine_similarity(a: list, b: list) -&amp;gt; float:    a, b = np.array(a), np.array(b)    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))def retrieve(query: str, documents: list[dict], top_k: int = 3):    query_vec = embed_text(query)    scored = [        (doc, cosine_similarity(query_vec, doc[&amp;quot;embedding&amp;quot;]))        for doc in documents    ]    return sorted(scored, key=lambda x: x[1], reverse=True)[:top_k]# Step 3: Generate grounded answerdef rag_answer(query: str, docs: list[str]) -&amp;gt; str:    context = &amp;quot;\n\n&amp;quot;.join(docs)    prompt = f&amp;quot;&amp;quot;&amp;quot;Answer the question using ONLY the context below.If the answer isn&amp;apos;t in the context, say &amp;quot;I don&amp;apos;t know.&amp;quot;Context:{context}Question: {query}&amp;quot;&amp;quot;&amp;quot;    response = client.chat.completions.create(        model=&amp;quot;gpt-4o-mini&amp;quot;,        messages=[{&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: prompt}]    )    return response.choices[0].message.content&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;5.2 Advanced RAG Techniques&lt;/h4&gt;&lt;p&gt;Naive RAG breaks at scale. Production systems use:[16]&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th scope=&quot;col&quot;&gt; Technique &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Problem It Solves &lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Chunking strategies&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Semantic splitting preserves context better than fixed-size chunks &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Hybrid search&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; BM25 (keyword) + vector search for better recall &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Reranking&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Cross-encoder reranks top-K results for precision &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;HyDE&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Generate a hypothetical answer, embed it, then search &amp;#x2014; better for abstract queries &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMR (Maximal Marginal Relevance)&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Diversify retrieved documents to avoid redundancy &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Multi-vector retrieval&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Store summary + detailed chunks separately &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h4&gt;5.3 Agentic RAG&lt;/h4&gt;&lt;p&gt;&lt;strong&gt;Agentic RAG&lt;/strong&gt; supercharges RAG by adding an agent layer that can iterate, re-retrieve, and validate before answering:[15]&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;from typing import Optionalclass AgenticRAGPipeline:    &amp;quot;&amp;quot;&amp;quot;    Agentic RAG: Agent decides WHEN and WHAT to retrieve,    can iterate multiple retrieval rounds, and validates output.    &amp;quot;&amp;quot;&amp;quot;    def __init__(self, retriever, llm_client):        self.retriever = retriever        self.client = llm_client        self.retrieved_docs = []    def should_retrieve_more(self, current_answer: str, query: str) -&amp;gt; bool:        &amp;quot;&amp;quot;&amp;quot;Ask the LLM if it needs more context.&amp;quot;&amp;quot;&amp;quot;        check_prompt = f&amp;quot;&amp;quot;&amp;quot;Query: {query}Current draft answer: {current_answer}Is this answer complete and well-supported? Reply with JSON: {{&amp;quot;complete&amp;quot;: true/false, &amp;quot;missing&amp;quot;: &amp;quot;what&amp;apos;s missing&amp;quot;}}&amp;quot;&amp;quot;&amp;quot;        response = self.client.chat.completions.create(            model=&amp;quot;gpt-4o-mini&amp;quot;,            messages=[{&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: check_prompt}],            response_format={&amp;quot;type&amp;quot;: &amp;quot;json_object&amp;quot;}        )        import json        result = json.loads(response.choices[0].message.content)        return not result[&amp;quot;complete&amp;quot;], result.get(&amp;quot;missing&amp;quot;, &amp;quot;&amp;quot;)    def run(self, query: str, max_iterations: int = 3) -&amp;gt; str:        answer = &amp;quot;&amp;quot;        for i in range(max_iterations):            # Retrieve relevant docs            search_query = query if i == 0 else f&amp;quot;{query} - focusing on: {answer}&amp;quot;            new_docs = self.retriever.search(search_query, top_k=5)            self.retrieved_docs.extend(new_docs)            # Generate answer with all accumulated context            context = &amp;quot;\n---\n&amp;quot;.join(self.retrieved_docs)            answer = self._generate(query, context)            # Check if we need more info            needs_more, missing = self.should_retrieve_more(answer, query)            if not needs_more:                break        return answer    def _generate(self, query: str, context: str) -&amp;gt; str:        response = self.client.chat.completions.create(            model=&amp;quot;gpt-4o&amp;quot;,            messages=[{                &amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;,                &amp;quot;content&amp;quot;: f&amp;quot;Context:\n{context}\n\nAnswer: {query}&amp;quot;            }]        )        return response.choices[0].message.content&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;6. MCP: The USB-C for AI Tools&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; is an open standard introduced by Anthropic in November 2024 to standardize how AI systems integrate with external tools, data sources, and services. Think of it as the USB-C port for AI &amp;#x2014; one standard connector for everything.[17]&lt;/p&gt;&lt;h4&gt;6.1 Why MCP Matters&lt;/h4&gt;&lt;p&gt;Before MCP, every AI-tool integration was a custom one-off. MCP provides:[18]&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Tools&lt;/strong&gt; &amp;#x2014; Functions the AI can call (e.g., &lt;code&gt;run_sql&lt;/code&gt;, &lt;code&gt;read_file&lt;/code&gt;)  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Resources&lt;/strong&gt; &amp;#x2014; Data the AI can read (files, database records, API responses)  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Prompts&lt;/strong&gt; &amp;#x2014; Reusable prompt templates  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Sampling&lt;/strong&gt; &amp;#x2014; The server can ask the client to run an LLM query  &lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;&amp;#x250c;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2510;&amp;#x2502;                  MCP Architecture                &amp;#x2502;&amp;#x2502;                                                  &amp;#x2502;&amp;#x2502;  &amp;#x250c;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2510;      MCP Protocol              &amp;#x2502;&amp;#x2502;  &amp;#x2502;  AI Client   &amp;#x2502;&amp;#x25c4;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x25ba;&amp;#x2510;           &amp;#x2502;&amp;#x2502;  &amp;#x2502; (Claude/GPT) &amp;#x2502;                    &amp;#x2502;           &amp;#x2502;&amp;#x2502;  &amp;#x2514;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2518;         &amp;#x250c;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2534;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2510;  &amp;#x2502;&amp;#x2502;                            &amp;#x2502;    MCP Server     &amp;#x2502;  &amp;#x2502;&amp;#x2502;                            &amp;#x2502;  (your tools)     &amp;#x2502;  &amp;#x2502;&amp;#x2502;                            &amp;#x2502;                   &amp;#x2502;  &amp;#x2502;&amp;#x2502;                            &amp;#x2502;  &amp;#x250c;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2510;  &amp;#x2502;  &amp;#x2502;&amp;#x2502;                            &amp;#x2502;  &amp;#x2502;  Tools      &amp;#x2502;  &amp;#x2502;  &amp;#x2502;&amp;#x2502;                            &amp;#x2502;  &amp;#x2502;  Resources  &amp;#x2502;  &amp;#x2502;  &amp;#x2502;&amp;#x2502;                            &amp;#x2502;  &amp;#x2502;  Prompts    &amp;#x2502;  &amp;#x2502;  &amp;#x2502;&amp;#x2502;                            &amp;#x2502;  &amp;#x2514;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2518;  &amp;#x2502;  &amp;#x2502;&amp;#x2502;                            &amp;#x2514;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2518;  &amp;#x2502;&amp;#x2514;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2518;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;6.2 Building an MCP Server&lt;/h4&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;# Minimal MCP server using the official Python SDKfrom mcp.server import Serverfrom mcp.server.stdio import stdio_serverfrom mcp.types import Tool, TextContentimport mcp.types as typesapp = Server(&amp;quot;my-mcp-server&amp;quot;)@app.list_tools()async def list_tools() -&amp;gt; list[Tool]:    return [        Tool(            name=&amp;quot;get_weather&amp;quot;,            description=&amp;quot;Get the current weather for a city&amp;quot;,            inputSchema={                &amp;quot;type&amp;quot;: &amp;quot;object&amp;quot;,                &amp;quot;properties&amp;quot;: {                    &amp;quot;city&amp;quot;: {                        &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,                        &amp;quot;description&amp;quot;: &amp;quot;City name&amp;quot;                    }                },                &amp;quot;required&amp;quot;: [&amp;quot;city&amp;quot;]            }        ),        Tool(            name=&amp;quot;run_sql&amp;quot;,            description=&amp;quot;Execute a read-only SQL query against our DB&amp;quot;,            inputSchema={                &amp;quot;type&amp;quot;: &amp;quot;object&amp;quot;,                &amp;quot;properties&amp;quot;: {                    &amp;quot;query&amp;quot;: {&amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;}                },                &amp;quot;required&amp;quot;: [&amp;quot;query&amp;quot;]            }        )    ]@app.call_tool()async def call_tool(name: str, arguments: dict) -&amp;gt; list[TextContent]:    if name == &amp;quot;get_weather&amp;quot;:        city = arguments[&amp;quot;city&amp;quot;]        # Call your actual weather API here        return [TextContent(type=&amp;quot;text&amp;quot;, text=f&amp;quot;Weather in {city}: 28&amp;#xb0;C, sunny&amp;quot;)]    elif name == &amp;quot;run_sql&amp;quot;:        query = arguments[&amp;quot;query&amp;quot;]        # Execute query safely        results = execute_readonly_query(query)        return [TextContent(type=&amp;quot;text&amp;quot;, text=str(results))]async def main():    async with stdio_server() as (read_stream, write_stream):        await app.run(read_stream, write_stream, app.create_initialization_options())if __name__ == &amp;quot;__main__&amp;quot;:    import asyncio    asyncio.run(main())&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;6.3 MCP vs Traditional APIs&lt;/h4&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th scope=&quot;col&quot;&gt; Dimension &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; REST API &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; MCP Server &lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; Discovery &lt;/td&gt;&lt;td&gt; Manual (read docs) &lt;/td&gt;&lt;td&gt; Auto-discovery via &lt;code&gt;list_tools()&lt;/code&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Schema &lt;/td&gt;&lt;td&gt; OpenAPI/Swagger &lt;/td&gt;&lt;td&gt; JSON Schema, AI-readable &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Context sharing &lt;/td&gt;&lt;td&gt; Per-request &lt;/td&gt;&lt;td&gt; Stateful sessions with context &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; AI integration &lt;/td&gt;&lt;td&gt; Custom glue code &lt;/td&gt;&lt;td&gt; Native, standardized &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Multi-tool &lt;/td&gt;&lt;td&gt; N integrations &lt;/td&gt;&lt;td&gt; One MCP layer &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;7. A2A: The Protocol for Agent Collaboration&lt;/h3&gt;&lt;p&gt;While MCP handles &lt;strong&gt;agent-to-tool&lt;/strong&gt; communication, &lt;strong&gt;A2A (Agent2Agent)&lt;/strong&gt; &amp;#x2014; announced by Google in April 2025 &amp;#x2014; handles &lt;strong&gt;agent-to-agent&lt;/strong&gt; communication.[19]&lt;/p&gt;&lt;h4&gt;7.1 The Problem A2A Solves&lt;/h4&gt;&lt;p&gt;Imagine a travel booking agent that needs to coordinate with a payment agent from a different company, a hotel API agent, and an airline agent &amp;#x2014; all built on different frameworks. A2A enables them to discover each other, understand capabilities, and coordinate tasks without sharing internals.[19]&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;MCP Layer (Vertical):          A2A Layer (Horizontal):Agent &amp;#x25c4;&amp;#x2500;&amp;#x2500;tools&amp;#x2500;&amp;#x2500;&amp;#x25ba; APIs         Agent &amp;#x25c4;&amp;#x2500;&amp;#x2500;collaborate&amp;#x2500;&amp;#x2500;&amp;#x25ba; AgentAgent &amp;#x25c4;&amp;#x2500;&amp;#x2500;tools&amp;#x2500;&amp;#x2500;&amp;#x25ba; Databases    Agent &amp;#x25c4;&amp;#x2500;&amp;#x2500;delegate&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x25ba; AgentAgent &amp;#x25c4;&amp;#x2500;&amp;#x2500;tools&amp;#x2500;&amp;#x2500;&amp;#x25ba; Files        Agent &amp;#x25c4;&amp;#x2500;&amp;#x2500;coordinate&amp;#x2500;&amp;#x2500;&amp;#x25ba; Agent&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;7.2 Agent Cards: The Discovery Mechanism&lt;/h4&gt;&lt;p&gt;Every A2A agent publishes an &lt;strong&gt;Agent Card&lt;/strong&gt; &amp;#x2014; a JSON document at a well-known URL that describes the agent&amp;apos;s capabilities:[20]&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;json&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;{  &amp;quot;name&amp;quot;: &amp;quot;payment-processor-agent&amp;quot;,  &amp;quot;version&amp;quot;: &amp;quot;1.2.0&amp;quot;,  &amp;quot;description&amp;quot;: &amp;quot;Handles payment processing for e-commerce workflows&amp;quot;,  &amp;quot;endpoint&amp;quot;: &amp;quot;https://payments.example.com/a2a&amp;quot;,  &amp;quot;skills&amp;quot;: [    {      &amp;quot;id&amp;quot;: &amp;quot;process_payment&amp;quot;,      &amp;quot;name&amp;quot;: &amp;quot;Process Payment&amp;quot;,      &amp;quot;description&amp;quot;: &amp;quot;Charge a customer for a transaction&amp;quot;,      &amp;quot;inputModes&amp;quot;: [&amp;quot;text&amp;quot;, &amp;quot;json&amp;quot;],      &amp;quot;outputModes&amp;quot;: [&amp;quot;json&amp;quot;]    },    {      &amp;quot;id&amp;quot;: &amp;quot;refund&amp;quot;,      &amp;quot;name&amp;quot;: &amp;quot;Issue Refund&amp;quot;,      &amp;quot;description&amp;quot;: &amp;quot;Refund a previously processed payment&amp;quot;    }  ],  &amp;quot;authentication&amp;quot;: {    &amp;quot;schemes&amp;quot;: [&amp;quot;Bearer&amp;quot;]  }}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;7.3 MCP + A2A: The Full Stack&lt;/h4&gt;&lt;p&gt;Google positioned A2A as &lt;strong&gt;complementary to MCP&lt;/strong&gt;, not competitive:[20]&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;&amp;#x250c;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2510;&amp;#x2502;              Enterprise AI Architecture           &amp;#x2502;&amp;#x2502;                                                   &amp;#x2502;&amp;#x2502;    &amp;#x250c;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2510;   A2A   &amp;#x250c;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2510;               &amp;#x2502;&amp;#x2502;    &amp;#x2502; Agent A &amp;#x2502;&amp;#x25c4;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x25ba;&amp;#x2502; Agent B &amp;#x2502;               &amp;#x2502;&amp;#x2502;    &amp;#x2514;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x252c;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2518;         &amp;#x2514;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x252c;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2518;               &amp;#x2502;&amp;#x2502;         &amp;#x2502; MCP               &amp;#x2502; MCP                 &amp;#x2502;&amp;#x2502;    &amp;#x250c;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2534;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2510;         &amp;#x250c;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2534;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2510;               &amp;#x2502;&amp;#x2502;    &amp;#x2502; Tools &amp;amp; &amp;#x2502;         &amp;#x2502; Tools &amp;amp; &amp;#x2502;               &amp;#x2502;&amp;#x2502;    &amp;#x2502;  Data   &amp;#x2502;         &amp;#x2502;  Data   &amp;#x2502;               &amp;#x2502;&amp;#x2502;    &amp;#x2514;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2518;         &amp;#x2514;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2518;               &amp;#x2502;&amp;#x2502;                                                   &amp;#x2502;&amp;#x2502;  MCP = vertical (agent &amp;#x2194; tools)                  &amp;#x2502;&amp;#x2502;  A2A = horizontal (agent &amp;#x2194; agent)                &amp;#x2502;&amp;#x2514;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2518;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;8. AI Agent Frameworks Compared&lt;/h3&gt;&lt;p&gt;Choosing the wrong framework costs weeks. Here&amp;apos;s the production-tested ranking for 2026:[21]&lt;/p&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th scope=&quot;col&quot;&gt; Framework &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Best For &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Architecture Style &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; MCP/A2A Support &lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;LangGraph&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Complex stateful production workflows &lt;/td&gt;&lt;td&gt; Graph-based, explicit state machines &lt;/td&gt;&lt;td&gt; &amp;#x2705; MCP &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Claude Agent SDK&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Anthropic-native production agents &lt;/td&gt;&lt;td&gt; Native Claude hooks + subagents &lt;/td&gt;&lt;td&gt; &amp;#x2705; MCP native &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;CrewAI&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Fast multi-agent prototypes &lt;/td&gt;&lt;td&gt; Role-based crews &lt;/td&gt;&lt;td&gt; &amp;#x2705; MCP &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;AutoGen / AG2&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Research-style conversational agents &lt;/td&gt;&lt;td&gt; Conversational multi-agent &lt;/td&gt;&lt;td&gt; &amp;#x2705; MCP &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Semantic Kernel&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Enterprise / .NET stacks &lt;/td&gt;&lt;td&gt; Plugin-based, Azure-first &lt;/td&gt;&lt;td&gt; &amp;#x2705; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;LlamaIndex&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; RAG-grounded agents &lt;/td&gt;&lt;td&gt; Data-layer first &lt;/td&gt;&lt;td&gt; &amp;#x2705; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Pydantic AI&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Type-safe Python agents &lt;/td&gt;&lt;td&gt; Pydantic validation throughout &lt;/td&gt;&lt;td&gt; &amp;#x2705; &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;h4&gt;8.1 LangGraph: Production-Ready State Machines&lt;/h4&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;from langgraph.graph import StateGraph, ENDfrom typing import TypedDict, Annotatedimport operatorclass AgentState(TypedDict):    messages: Annotated[list, operator.add]    query: str    documents: list[str]    answer: str    needs_more_info: booldef retrieve_node(state: AgentState) -&amp;gt; AgentState:    &amp;quot;&amp;quot;&amp;quot;Retrieve relevant documents.&amp;quot;&amp;quot;&amp;quot;    docs = vector_store.search(state[&amp;quot;query&amp;quot;], k=5)    return {&amp;quot;documents&amp;quot;: docs}def generate_node(state: AgentState) -&amp;gt; AgentState:    &amp;quot;&amp;quot;&amp;quot;Generate answer from retrieved docs.&amp;quot;&amp;quot;&amp;quot;    context = &amp;quot;\n&amp;quot;.join(state[&amp;quot;documents&amp;quot;])    answer = llm.invoke(f&amp;quot;Context: {context}\nQuestion: {state[&amp;apos;query&amp;apos;]}&amp;quot;)    return {&amp;quot;answer&amp;quot;: answer, &amp;quot;needs_more_info&amp;quot;: False}def check_node(state: AgentState) -&amp;gt; str:    &amp;quot;&amp;quot;&amp;quot;Route: done or need more retrieval?&amp;quot;&amp;quot;&amp;quot;    return &amp;quot;done&amp;quot; if not state[&amp;quot;needs_more_info&amp;quot;] else &amp;quot;retrieve&amp;quot;# Build the graphworkflow = StateGraph(AgentState)workflow.add_node(&amp;quot;retrieve&amp;quot;, retrieve_node)workflow.add_node(&amp;quot;generate&amp;quot;, generate_node)workflow.set_entry_point(&amp;quot;retrieve&amp;quot;)workflow.add_edge(&amp;quot;retrieve&amp;quot;, &amp;quot;generate&amp;quot;)workflow.add_conditional_edges(&amp;quot;generate&amp;quot;, check_node, {    &amp;quot;done&amp;quot;: END,    &amp;quot;retrieve&amp;quot;: &amp;quot;retrieve&amp;quot;  # Loop back if needed})app = workflow.compile()# Run itresult = app.invoke({&amp;quot;query&amp;quot;: &amp;quot;What is the capital of Punjab?&amp;quot;, &amp;quot;messages&amp;quot;: []})print(result[&amp;quot;answer&amp;quot;])&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;8.2 CrewAI: Role-Based Multi-Agent Teams&lt;/h4&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;from crewai import Agent, Task, Crew, Process# Define specialized agentsresearcher = Agent(    role=&amp;quot;Senior Research Analyst&amp;quot;,    goal=&amp;quot;Find accurate, up-to-date information on the given topic&amp;quot;,    backstory=&amp;quot;Expert researcher with access to web search and databases&amp;quot;,    verbose=True,    allow_delegation=False,    tools=[web_search_tool, wikipedia_tool])writer = Agent(    role=&amp;quot;Technical Writer&amp;quot;,    goal=&amp;quot;Write clear, engaging blog posts from research findings&amp;quot;,    backstory=&amp;quot;Experienced tech blogger who makes complex topics accessible&amp;quot;,    verbose=True,    allow_delegation=False)# Define tasksresearch_task = Task(    description=&amp;quot;Research the latest developments in {topic}. &amp;quot;                &amp;quot;Find key facts, statistics, and expert opinions.&amp;quot;,    expected_output=&amp;quot;A structured research brief with citations&amp;quot;,    agent=researcher)writing_task = Task(    description=&amp;quot;Write a 1000-word blog post based on the research brief. &amp;quot;                &amp;quot;Make it engaging for a technical audience.&amp;quot;,    expected_output=&amp;quot;A complete, publication-ready blog post in Markdown&amp;quot;,    agent=writer,    context=[research_task]  # Uses output from research_task)# Assemble and run the crewcrew = Crew(    agents=[researcher, writer],    tasks=[research_task, writing_task],    process=Process.sequential,    verbose=2)result = crew.kickoff(inputs={&amp;quot;topic&amp;quot;: &amp;quot;MCP protocol for AI agents&amp;quot;})print(result)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;9. Prompt Engineering Playbook&lt;/h3&gt;&lt;p&gt;Prompt engineering is the art of communicating precisely with LLMs. Here are the techniques every practitioner needs:[22]&lt;/p&gt;&lt;h4&gt;9.1 Core Techniques&lt;/h4&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;# &amp;#x2500;&amp;#x2500;&amp;#x2500; Zero-Shot &amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;# No examples. Rely on the model&amp;apos;s training.zero_shot = &amp;quot;Classify the sentiment of this review: &amp;apos;The app crashes constantly.&amp;apos;&amp;quot;# &amp;#x2500;&amp;#x2500;&amp;#x2500; Few-Shot &amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;# Provide examples to guide the modelfew_shot = &amp;quot;&amp;quot;&amp;quot;Classify sentiment. Examples:Input: &amp;quot;Love this product!&amp;quot; &amp;#x2192; PositiveInput: &amp;quot;Terrible experience.&amp;quot; &amp;#x2192; NegativeInput: &amp;quot;It&amp;apos;s okay, nothing special.&amp;quot; &amp;#x2192; NeutralNow classify: &amp;quot;The battery life is surprisingly good.&amp;quot;&amp;quot;&amp;quot;&amp;quot;# &amp;#x2500;&amp;#x2500;&amp;#x2500; Chain-of-Thought (CoT) &amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;# Force step-by-step reasoning for complex taskscot = &amp;quot;&amp;quot;&amp;quot;Solve this step by step:A store sells apples for Rs. 50 each. If Ali buys 12 apples with Rs. 700, how much change does he get?Think through it step by step before giving the final answer.&amp;quot;&amp;quot;&amp;quot;# &amp;#x2500;&amp;#x2500;&amp;#x2500; ReAct Pattern &amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;# Alternate reasoning and actionreact_system = &amp;quot;&amp;quot;&amp;quot;You are an agent. For each task:1. Thought: reason about what to do next2. Action: choose a tool [search | calculate | respond]3. Observation: note what the tool returned4. Repeat until you have the final answer.&amp;quot;&amp;quot;&amp;quot;# &amp;#x2500;&amp;#x2500;&amp;#x2500; Self-Consistency &amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;# Sample multiple reasoning paths, pick the majority answerimport refrom collections import Counterdef self_consistent_answer(question: str, client, n_samples: int = 5) -&amp;gt; str:    answers = []    for _ in range(n_samples):        response = client.chat.completions.create(            model=&amp;quot;gpt-4o-mini&amp;quot;,            messages=[{                &amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;,                 &amp;quot;content&amp;quot;: f&amp;quot;{question}\nThink step by step.&amp;quot;            }],            temperature=0.7  # Some variability for diversity        )        # Extract final answer (last number or key phrase)        text = response.choices[0].message.content        answers.append(text.strip().split(&amp;quot;\n&amp;quot;)[-1])    # Return most common answer    return Counter(answers).most_common(1)[0][0]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;9.2 System Prompt Architecture&lt;/h4&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;SYSTEM_PROMPT_TEMPLATE = &amp;quot;&amp;quot;&amp;quot;## RoleYou are {role_name}, a {expertise_level} specialist in {domain}.## Objective  {primary_objective}## Constraints- Always cite sources when making factual claims- If uncertain, say &amp;quot;I&amp;apos;m not sure&amp;quot; rather than guessing  - Keep responses under {max_length} words unless asked for detail- Output format: {output_format}## ContextToday&amp;apos;s date: {date}User&amp;apos;s technical level: {user_level}## Examples{few_shot_examples}&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;10. Vector Databases: The Memory Layer&lt;/h3&gt;&lt;p&gt;Vector databases store embeddings &amp;#x2014; dense numerical representations of meaning &amp;#x2014; enabling &lt;strong&gt;semantic search&lt;/strong&gt; (search by meaning, not keywords).[23]&lt;/p&gt;&lt;h4&gt;10.1 How Embeddings Work&lt;/h4&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;from openai import OpenAIimport numpy as npclient = OpenAI()def get_embedding(text: str) -&amp;gt; list[float]:    return client.embeddings.create(        model=&amp;quot;text-embedding-3-small&amp;quot;,        input=text    ).data[0].embedding# Semantic similarity demosentences = [    &amp;quot;Karachi is the largest city in Pakistan&amp;quot;,    &amp;quot;The metropolitan area of Karachi has 16 million people&amp;quot;,    &amp;quot;I like to eat biryani&amp;quot;,    &amp;quot;Python is a programming language&amp;quot;]embeddings = [get_embedding(s) for s in sentences]def cosine_sim(a, b):    a, b = np.array(a), np.array(b)    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))query_emb = get_embedding(&amp;quot;What is the population of Karachi?&amp;quot;)for s, emb in zip(sentences, embeddings):    score = cosine_sim(query_emb, emb)    print(f&amp;quot;{score:.3f} | {s}&amp;quot;)# Output:# 0.812 | Karachi is the largest city in Pakistan  &amp;#x2190; high# 0.798 | The metropolitan area of Karachi...      &amp;#x2190; high  # 0.312 | I like to eat biryani                    &amp;#x2190; low# 0.289 | Python is a programming language         &amp;#x2190; low&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h4&gt;10.2 Vector Database Comparison&lt;/h4&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th scope=&quot;col&quot;&gt; DB &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Best For &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Hosting &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Hybrid Search &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Notes &lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Pinecone&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Managed, production scale &lt;/td&gt;&lt;td&gt; Cloud-only &lt;/td&gt;&lt;td&gt; &amp;#x2705; &lt;/td&gt;&lt;td&gt; Easiest setup &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Weaviate&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; GraphQL + semantic queries &lt;/td&gt;&lt;td&gt; Self/Cloud &lt;/td&gt;&lt;td&gt; &amp;#x2705; &lt;/td&gt;&lt;td&gt; MCP support in v3.0 &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Qdrant&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; High-performance, Rust core &lt;/td&gt;&lt;td&gt; Self/Cloud &lt;/td&gt;&lt;td&gt; &amp;#x2705; &lt;/td&gt;&lt;td&gt; Best perf/&amp;#x24; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Chroma&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Local dev &amp;amp; testing &lt;/td&gt;&lt;td&gt; Self-host &lt;/td&gt;&lt;td&gt; Limited &lt;/td&gt;&lt;td&gt; Dead-simple Python API &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Milvus&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Massive scale (billions) &lt;/td&gt;&lt;td&gt; Self/Cloud &lt;/td&gt;&lt;td&gt; &amp;#x2705; &lt;/td&gt;&lt;td&gt; GPU-accelerated &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;pgvector&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Already using Postgres &lt;/td&gt;&lt;td&gt; Self-host &lt;/td&gt;&lt;td&gt; &amp;#x2705; &lt;/td&gt;&lt;td&gt; No new infra needed &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;# Quick start: Chroma (local, perfect for prototyping)import chromadbfrom chromadb.utils import embedding_functionsclient = chromadb.Client()openai_ef = embedding_functions.OpenAIEmbeddingFunction(    api_key=&amp;quot;YOUR_KEY&amp;quot;,    model_name=&amp;quot;text-embedding-3-small&amp;quot;)collection = client.create_collection(    name=&amp;quot;knowledge_base&amp;quot;,    embedding_function=openai_ef)# Add documentscollection.add(    documents=[        &amp;quot;MCP is an open standard by Anthropic for AI tool integration&amp;quot;,        &amp;quot;RAG stands for Retrieval-Augmented Generation&amp;quot;,        &amp;quot;LangGraph is a framework for building stateful agent workflows&amp;quot;    ],    ids=[&amp;quot;doc1&amp;quot;, &amp;quot;doc2&amp;quot;, &amp;quot;doc3&amp;quot;])# Queryresults = collection.query(    query_texts=[&amp;quot;How do AI agents connect to external tools?&amp;quot;],    n_results=2)print(results[&amp;quot;documents&amp;quot;])# [[&amp;apos;MCP is an open standard by Anthropic...&amp;apos;, #   &amp;apos;LangGraph is a framework...&amp;apos;]]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;h3&gt;11. The Complete AI Dictionary&lt;/h3&gt;&lt;p&gt;A comprehensive reference of every term you&amp;apos;ll encounter, from beginner to deep-technical.&lt;/p&gt;&lt;hr&gt;&lt;div&gt;&lt;div&gt;&lt;span&gt;&lt;h4&gt;&amp;#x1f524; Foundational Concepts&lt;/h4&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;p&gt;&lt;strong&gt;Artificial Intelligence (AI)&lt;/strong&gt; The broad field of building systems that perform tasks that typically require human intelligence &amp;#x2014; reasoning, learning, perception, language understanding.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Machine Learning (ML)&lt;/strong&gt; A subfield of AI where systems learn from data rather than being explicitly programmed. The model improves with experience.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Deep Learning (DL)&lt;/strong&gt; ML using neural networks with many layers (&amp;quot;deep&amp;quot;). Powers all modern LLMs, image models, and speech systems.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Neural Network&lt;/strong&gt; A computational model loosely inspired by biological neurons. Consists of layers of mathematical functions that transform inputs into outputs.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Parameters / Weights&lt;/strong&gt; The learnable numerical values in a neural network. GPT-3 has 175 billion parameters. More parameters &amp;#x2260; always better, but generally more capability.[3]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Training&lt;/strong&gt; The process of adjusting a model&amp;apos;s parameters on a large dataset to minimize prediction error. Requires massive compute (GPU clusters).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Inference&lt;/strong&gt; Running a trained model to generate outputs. What happens when you type a prompt into ChatGPT.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Token&lt;/strong&gt; The basic unit of text for LLMs. A token is roughly 0.75 words in English. &amp;quot;Hello, world!&amp;quot; = 4 tokens. LLMs process and generate text as token sequences.[16]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Context Window&lt;/strong&gt; The maximum number of tokens an LLM can &amp;quot;see&amp;quot; at once. GPT-4 had 128K tokens; Llama 4 supports 10 million tokens. Larger = more context, higher cost.[4]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Embedding&lt;/strong&gt; A dense numerical vector (array of floats) representing the semantic meaning of text, images, or other data. Similar meanings cluster together in embedding space.[23]&lt;/p&gt;&lt;hr&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span&gt;&lt;h4&gt;&amp;#x1f9e0; LLM Architecture Terms&lt;/h4&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;p&gt;&lt;strong&gt;Transformer&lt;/strong&gt; The neural network architecture underlying all major LLMs, introduced in 2017. Key innovation: the attention mechanism replaces sequential processing with parallel processing.[7]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Attention Mechanism&lt;/strong&gt; The core innovation of transformers. Lets each token attend to (learn from) every other token in context, regardless of distance. Computes Q/K/V matrices.[10]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Self-Attention&lt;/strong&gt; Attention where the query, key, and value all come from the same sequence. Enables a model to understand words in context of each other.[10]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Multi-Head Attention&lt;/strong&gt; Running multiple attention operations in parallel, each learning different relationships. GPT-3 uses 96 attention heads.[7]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Positional Encoding&lt;/strong&gt; A mechanism to inject token position information into embeddings, since attention is position-agnostic by default.[11]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;RoPE (Rotary Position Embeddings)&lt;/strong&gt; A modern positional encoding scheme that encodes position through rotation matrices, enabling better generalization to longer contexts than the original model was trained on.[11]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;MoE (Mixture of Experts)&lt;/strong&gt; Architecture where only a subset of model parameters (&amp;quot;experts&amp;quot;) activate per token, enabling models to have far more total parameters at similar inference cost.[11]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Encoder / Decoder&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Encoder-only&lt;/strong&gt; (e.g., BERT): Builds rich representations; best for classification, NER  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Decoder-only&lt;/strong&gt; (e.g., GPT): Generates text autoregressively; best for generation tasks  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Encoder-Decoder&lt;/strong&gt; (e.g., T5): Good for translation and summarization  &lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Autoregressive Generation&lt;/strong&gt; How LLMs generate text: one token at a time, each new token conditioned on all previous tokens. This is why they can&amp;apos;t &amp;quot;edit&amp;quot; &amp;#x2014; they always predict left-to-right.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Temperature&lt;/strong&gt; Controls randomness in generation. Temperature=0: always pick the most likely token (deterministic). Temperature=1: sample proportionally. Temperature&amp;gt;1: more random/creative.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Top-P (Nucleus Sampling)&lt;/strong&gt; Restricts sampling to the smallest set of tokens whose cumulative probability exceeds P. More robust than Temperature alone for controlling output quality.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;KV Cache&lt;/strong&gt; Stores computed key and value matrices for previously processed tokens so they don&amp;apos;t need to be recomputed during autoregressive generation. Critical for inference efficiency.&lt;/p&gt;&lt;hr&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span&gt;&lt;h4&gt;&amp;#x1f3cb;&amp;#xfe0f; Training &amp;amp; Alignment&lt;/h4&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;p&gt;&lt;strong&gt;Pre-training&lt;/strong&gt; Initial training of an LLM on massive unlabeled text data (web, books, code). Learns statistical patterns of language. Requires enormous compute.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Fine-tuning&lt;/strong&gt; Further training on a smaller, task-specific dataset to specialize a pre-trained model. Cheaper than pre-training.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;SFT (Supervised Fine-Tuning)&lt;/strong&gt; Fine-tuning on human-curated input-output pairs. &amp;quot;Given this input, produce this output.&amp;quot;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;RLHF (Reinforcement Learning from Human Feedback)&lt;/strong&gt; The technique that transformed GPT-3 into ChatGPT. Human evaluators rate outputs; a reward model is trained on those ratings; the LLM is fine-tuned to maximize the reward. Dramatically improves alignment and reduces harmful outputs.[24]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;PEFT (Parameter-Efficient Fine-Tuning)&lt;/strong&gt; Fine-tuning techniques that update only a small fraction of parameters (e.g., LoRA, QLoRA), making fine-tuning feasible on consumer hardware.[1]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;LoRA (Low-Rank Adaptation)&lt;/strong&gt; Popular PEFT method that adds small trainable rank-decomposition matrices to existing weight matrices. Often achieves 90%+ of full fine-tune quality at 1% of the compute.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;DPO (Direct Preference Optimization)&lt;/strong&gt; A simpler alternative to RLHF that directly optimizes the model against human preference data without a separate reward model.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Constitutional AI&lt;/strong&gt; Anthropic&amp;apos;s technique for alignment: the model critiques its own responses against a set of principles and revises them.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Hallucination&lt;/strong&gt; When an LLM generates confident, fluent, but factually wrong information. Caused by training on noisy data and the autoregressive generation process. Major active research area.[25]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Alignment&lt;/strong&gt; Ensuring AI systems behave according to human values and intentions. Includes safety, helpfulness, and harmlessness.&lt;/p&gt;&lt;hr&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span&gt;&lt;h4&gt;&amp;#x1f916; Agentic AI Terms&lt;/h4&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;p&gt;&lt;strong&gt;AI Agent&lt;/strong&gt; An AI system that autonomously perceives its environment, makes decisions, executes actions, and pursues goals over multiple steps &amp;#x2014; not just responds to a single prompt.[12]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Agentic AI&lt;/strong&gt; The paradigm where LLMs act as autonomous agents that plan, use tools, and complete multi-step tasks without constant human direction.[12]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Tool Use / Function Calling&lt;/strong&gt; The ability of an LLM to call external functions, APIs, and services based on user requests. Core capability enabling agents to &amp;quot;do things&amp;quot; rather than just &amp;quot;say things.&amp;quot;[26]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Orchestrator&lt;/strong&gt; The &amp;quot;brain&amp;quot; of a multi-agent system &amp;#x2014; the component that plans tasks, delegates to sub-agents, and synthesizes results. Often a more powerful LLM.[2]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Sub-agent / Worker Agent&lt;/strong&gt; Specialized agents that execute specific tasks delegated by an orchestrator. Examples: a web-search agent, a code-execution agent, a database agent.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Human-in-the-Loop (HITL)&lt;/strong&gt; A system design where a human can intervene, approve, or redirect an agent at key decision points. Critical for high-stakes workflows.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;ReAct (Reasoning + Acting)&lt;/strong&gt; A foundational prompting/architecture pattern for agents where the model interleaves reasoning (&amp;quot;Thought:&amp;quot;) and actions (&amp;quot;Action:&amp;quot;) in a loop.[13]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Planning&lt;/strong&gt; The agent&amp;apos;s ability to decompose a goal into an ordered sequence of sub-tasks. Types include: Plan-then-Execute, ReAct (interleaved), and Tree-of-Thought (branching).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Reflection&lt;/strong&gt; An agentic pattern where the agent reviews and critiques its own previous output to improve it &amp;#x2014; a form of self-correction.[14]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Memory (Agent Memory)&lt;/strong&gt; How agents retain information:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;In-context&lt;/strong&gt;: Within the current prompt window (ephemeral)  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;External&lt;/strong&gt;: Stored in vector DBs or traditional DBs (persistent)  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Episodic&lt;/strong&gt;: Records of past interactions  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Semantic&lt;/strong&gt;: General knowledge/facts  &lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Multi-Agent System (MAS)&lt;/strong&gt; A network of specialized agents collaborating to solve problems that exceed any single agent&amp;apos;s capability.[12]&lt;/p&gt;&lt;hr&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span&gt;&lt;h4&gt;&amp;#x1f4e1; Protocols &amp;amp; Infrastructure&lt;/h4&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;p&gt;&lt;strong&gt;MCP (Model Context Protocol)&lt;/strong&gt; Open standard by Anthropic (Nov 2024) for connecting AI agents to external tools and data sources through a standardized, AI-readable interface. The &amp;quot;USB-C for AI.&amp;quot;[17]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;A2A (Agent2Agent Protocol)&lt;/strong&gt; Open protocol by Google (April 2025) enabling AI agents from different vendors to discover, communicate, and collaborate with each other.[19]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Agent Card&lt;/strong&gt; A JSON document published by an A2A agent that describes its capabilities, endpoint, and authentication requirements &amp;#x2014; enabling other agents to discover it.[20]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;RAG (Retrieval-Augmented Generation)&lt;/strong&gt; Grounding LLM outputs in relevant documents retrieved from a knowledge base at inference time, reducing hallucination and enabling access to private/current data.[15]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Agentic RAG&lt;/strong&gt; RAG enhanced with agent capabilities &amp;#x2014; the agent can iteratively retrieve, evaluate, and re-retrieve context before generating the final answer.[15]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Vector Database&lt;/strong&gt; A database optimized for storing and querying high-dimensional vector embeddings via similarity search (ANN algorithms).[23]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Semantic Search&lt;/strong&gt; Search that finds results by meaning rather than keyword matching, using embedding similarity.[27]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Hybrid Search&lt;/strong&gt; Combining vector similarity search (semantic) with keyword-based search (BM25) in a single query for better recall and precision.[27]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;HNSW (Hierarchical Navigable Small World)&lt;/strong&gt; A graph-based indexing algorithm used in vector databases for fast approximate nearest neighbor (ANN) search.[23]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Chunking&lt;/strong&gt; Breaking documents into smaller pieces before indexing in RAG systems. Semantic chunking (by meaning) outperforms fixed-size chunking.[16]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Reranking&lt;/strong&gt; A second-pass step in RAG pipelines where retrieved documents are rescored using a more accurate (but slower) cross-encoder model to improve precision.&lt;/p&gt;&lt;hr&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span&gt;&lt;h4&gt;&amp;#x1f4dd; Prompting &amp;amp; Generation&lt;/h4&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;p&gt;&lt;strong&gt;Prompt Engineering&lt;/strong&gt; The practice of designing input prompts to maximize LLM output quality. A rapidly evolving discipline with significant impact on model performance.[22]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Zero-Shot Prompting&lt;/strong&gt; Instructing a model to perform a task with no examples &amp;#x2014; relying purely on the model&amp;apos;s pre-trained knowledge.[22]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Few-Shot Prompting&lt;/strong&gt; Providing a small number of input-output examples in the prompt to guide the model&amp;apos;s behavior.[22]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Chain-of-Thought (CoT) Prompting&lt;/strong&gt; Instructing the model to show its reasoning step-by-step before giving a final answer. Dramatically improves performance on math, logic, and multi-step tasks.[22]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;System Prompt&lt;/strong&gt; A special prompt (hidden from the user) that sets the model&amp;apos;s role, persona, constraints, and behavior for an entire conversation.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Temperature / Sampling&lt;/strong&gt; Parameters controlling the randomness and diversity of LLM outputs.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Prompt Injection&lt;/strong&gt; An attack where malicious content in the environment (e.g., a webpage an agent reads) attempts to override the agent&amp;apos;s instructions.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Jailbreaking&lt;/strong&gt; Attempts to bypass an LLM&amp;apos;s safety guardrails through cleverly crafted prompts.&lt;/p&gt;&lt;hr&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span&gt;&lt;h4&gt;&amp;#x1f4ca; Evaluation &amp;amp; Safety&lt;/h4&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;p&gt;&lt;strong&gt;Benchmark&lt;/strong&gt; A standardized test for measuring model capability. Examples: MMLU (knowledge), HumanEval (coding), MATH (mathematics), GPQA (PhD-level science).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Hallucination&lt;/strong&gt; LLM-generated content that is factually incorrect but stated with confidence.[25]&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Groundedness&lt;/strong&gt; The extent to which an LLM&amp;apos;s outputs are supported by provided context (e.g., retrieved documents in RAG).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Red-teaming&lt;/strong&gt; Adversarially probing an AI system to find safety vulnerabilities, jailbreaks, and failure modes before deployment.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;AI Safety&lt;/strong&gt; The field studying how to build AI systems that reliably do what humans intend and avoid unintended harmful behaviors.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Guardrails&lt;/strong&gt; Programmatic constraints applied to LLM inputs and outputs to enforce safety, content policies, and format requirements.&lt;/p&gt;&lt;hr&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span&gt;&lt;h4&gt;&amp;#x1f3e2; Model Families (2026 Landscape)&lt;/h4&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th scope=&quot;col&quot;&gt; Family &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Company &lt;/th&gt;&lt;th scope=&quot;col&quot;&gt; Notable Models &lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GPT&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; OpenAI &lt;/td&gt;&lt;td&gt; GPT-5, GPT-5.5, o3, o4-mini[6] &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Claude&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Anthropic &lt;/td&gt;&lt;td&gt; Claude 3.5 Sonnet, Claude 4[28] &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Gemini&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Google &lt;/td&gt;&lt;td&gt; Gemini 2.0 Flash, Gemini Ultra 2[28] &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Llama&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Meta &lt;/td&gt;&lt;td&gt; Llama 4 (10M context, multimodal)[4] &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Mistral&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Mistral AI &lt;/td&gt;&lt;td&gt; Mistral Large, Codestral, Mixtral MoE &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Phi&lt;/strong&gt;&lt;/td&gt;&lt;td&gt; Microsoft &lt;/td&gt;&lt;td&gt; Phi-4 (small, surprisingly capable)[4] &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;hr&gt;&lt;/div&gt;&lt;/div&gt;&lt;h3&gt;12. Code Appendix: Build It Yourself&lt;/h3&gt;&lt;div&gt;&lt;div&gt;&lt;span&gt;&lt;h4&gt;12.1 A Complete RAG + Agent System&lt;/h4&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;&amp;quot;&amp;quot;&amp;quot;Full-stack RAG + Agent system.Stack: OpenAI GPT-4o + ChromaDB + Function Calling&amp;quot;&amp;quot;&amp;quot;import osimport jsonimport chromadbfrom openai import OpenAIfrom chromadb.utils.embedding_functions import OpenAIEmbeddingFunctionclient = OpenAI(api_key=os.environ[&amp;quot;OPENAI_API_KEY&amp;quot;])# &amp;#x2500;&amp;#x2500;&amp;#x2500; 1. Setup Vector Store &amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;ef = OpenAIEmbeddingFunction(    api_key=os.environ[&amp;quot;OPENAI_API_KEY&amp;quot;],    model_name=&amp;quot;text-embedding-3-small&amp;quot;)chroma = chromadb.Client()kb = chroma.get_or_create_collection(&amp;quot;knowledge&amp;quot;, embedding_function=ef)def ingest_documents(docs: list[dict]):    &amp;quot;&amp;quot;&amp;quot;docs: [{&amp;quot;id&amp;quot;: &amp;quot;...&amp;quot;, &amp;quot;text&amp;quot;: &amp;quot;...&amp;quot;, &amp;quot;metadata&amp;quot;: {...}}]&amp;quot;&amp;quot;&amp;quot;    kb.upsert(        ids=[d[&amp;quot;id&amp;quot;] for d in docs],        documents=[d[&amp;quot;text&amp;quot;] for d in docs],        metadatas=[d.get(&amp;quot;metadata&amp;quot;, {}) for d in docs]    )def search_knowledge_base(query: str, n_results: int = 5) -&amp;gt; str:    results = kb.query(query_texts=[query], n_results=n_results)    docs = results[&amp;quot;documents&amp;quot;][0]    return &amp;quot;\n\n---\n\n&amp;quot;.join(docs) if docs else &amp;quot;No relevant documents found.&amp;quot;# &amp;#x2500;&amp;#x2500;&amp;#x2500; 2. Define Agent Tools &amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;TOOLS = [    {        &amp;quot;type&amp;quot;: &amp;quot;function&amp;quot;,        &amp;quot;function&amp;quot;: {            &amp;quot;name&amp;quot;: &amp;quot;search_knowledge_base&amp;quot;,            &amp;quot;description&amp;quot;: &amp;quot;Search internal knowledge base for relevant information&amp;quot;,            &amp;quot;parameters&amp;quot;: {                &amp;quot;type&amp;quot;: &amp;quot;object&amp;quot;,                &amp;quot;properties&amp;quot;: {                    &amp;quot;query&amp;quot;: {&amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;},                    &amp;quot;n_results&amp;quot;: {&amp;quot;type&amp;quot;: &amp;quot;integer&amp;quot;, &amp;quot;default&amp;quot;: 5}                },                &amp;quot;required&amp;quot;: [&amp;quot;query&amp;quot;]            }        }    },    {        &amp;quot;type&amp;quot;: &amp;quot;function&amp;quot;,        &amp;quot;function&amp;quot;: {            &amp;quot;name&amp;quot;: &amp;quot;calculate&amp;quot;,            &amp;quot;description&amp;quot;: &amp;quot;Evaluate a mathematical expression&amp;quot;,            &amp;quot;parameters&amp;quot;: {                &amp;quot;type&amp;quot;: &amp;quot;object&amp;quot;,                &amp;quot;properties&amp;quot;: {                    &amp;quot;expression&amp;quot;: {                        &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,                        &amp;quot;description&amp;quot;: &amp;quot;Python math expression, e.g. &amp;apos;2 ** 10&amp;apos;&amp;quot;                    }                },                &amp;quot;required&amp;quot;: [&amp;quot;expression&amp;quot;]            }        }    }]def execute_tool(name: str, args: dict) -&amp;gt; str:    if name == &amp;quot;search_knowledge_base&amp;quot;:        return search_knowledge_base(args[&amp;quot;query&amp;quot;], args.get(&amp;quot;n_results&amp;quot;, 5))    elif name == &amp;quot;calculate&amp;quot;:        try:            return str(eval(args[&amp;quot;expression&amp;quot;], {&amp;quot;__builtins__&amp;quot;: {}}, {}))        except Exception as e:            return f&amp;quot;Error: {e}&amp;quot;    return f&amp;quot;Unknown tool: {name}&amp;quot;# &amp;#x2500;&amp;#x2500;&amp;#x2500; 3. ReAct Agent Loop &amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;SYSTEM = &amp;quot;&amp;quot;&amp;quot;You are a helpful AI assistant with access to a knowledge base.Use the search_knowledge_base tool to look up relevant information before answering.Think through problems step by step. Always cite which documents informed your answer.&amp;quot;&amp;quot;&amp;quot;def chat(user_message: str, history: list = None) -&amp;gt; str:    if history is None:        history = []    messages = [{&amp;quot;role&amp;quot;: &amp;quot;system&amp;quot;, &amp;quot;content&amp;quot;: SYSTEM}]    messages.extend(history)    messages.append({&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: user_message})    while True:        response = client.chat.completions.create(            model=&amp;quot;gpt-4o&amp;quot;,            messages=messages,            tools=TOOLS,            tool_choice=&amp;quot;auto&amp;quot;        )        msg = response.choices[0].message        messages.append(msg)        # No tool calls = final answer        if not msg.tool_calls:            return msg.content        # Execute each tool call        for tc in msg.tool_calls:            args = json.loads(tc.function.arguments)            result = execute_tool(tc.function.name, args)            print(f&amp;quot;[Tool: {tc.function.name}] &amp;#x2192; {result[:100]}...&amp;quot;)            messages.append({                &amp;quot;role&amp;quot;: &amp;quot;tool&amp;quot;,                &amp;quot;tool_call_id&amp;quot;: tc.id,                &amp;quot;content&amp;quot;: result            })# &amp;#x2500;&amp;#x2500;&amp;#x2500; Usage &amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;&amp;#x2500;if __name__ == &amp;quot;__main__&amp;quot;:    # Ingest some documents    ingest_documents([        {&amp;quot;id&amp;quot;: &amp;quot;1&amp;quot;, &amp;quot;text&amp;quot;: &amp;quot;MCP (Model Context Protocol) was released by Anthropic in November 2024.&amp;quot;},        {&amp;quot;id&amp;quot;: &amp;quot;2&amp;quot;, &amp;quot;text&amp;quot;: &amp;quot;LangGraph is the #1 ranked AI agent framework for production stateful workflows in 2026.&amp;quot;},        {&amp;quot;id&amp;quot;: &amp;quot;3&amp;quot;, &amp;quot;text&amp;quot;: &amp;quot;GPT-5 was launched on August 7, 2025, achieving 94.6% on advanced math benchmarks.&amp;quot;},    ])    answer = chat(&amp;quot;What agent framework should I use for a production workflow, and when was GPT-5 released?&amp;quot;)    print(f&amp;quot;\nAnswer:\n{answer}&amp;quot;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span&gt;&lt;h4&gt;12.2 Minimal MCP Client&lt;/h4&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;&amp;quot;&amp;quot;&amp;quot;Consuming an MCP server from a Python client.Requires: pip install mcp anthropic&amp;quot;&amp;quot;&amp;quot;import asynciofrom mcp import ClientSession, StdioServerParametersfrom mcp.client.stdio import stdio_clientimport anthropicasync def run_with_mcp(user_query: str):    # Connect to an MCP server (e.g., filesystem, database)    server_params = StdioServerParameters(        command=&amp;quot;python&amp;quot;,        args=[&amp;quot;my_mcp_server.py&amp;quot;]    )    async with stdio_client(server_params) as (read, write):        async with ClientSession(read, write) as session:            await session.initialize()            # List available tools            tools_response = await session.list_tools()            tools = [                {                    &amp;quot;name&amp;quot;: t.name,                    &amp;quot;description&amp;quot;: t.description,                    &amp;quot;input_schema&amp;quot;: t.inputSchema                }                for t in tools_response.tools            ]            print(f&amp;quot;Available tools: {[t[&amp;apos;name&amp;apos;] for t in tools]}&amp;quot;)            # Use Claude with MCP tools            anthropic_client = anthropic.Anthropic()            messages = [{&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: user_query}]            while True:                response = anthropic_client.messages.create(                    model=&amp;quot;claude-3-5-sonnet-20241022&amp;quot;,                    max_tokens=4096,                    tools=tools,                    messages=messages                )                if response.stop_reason == &amp;quot;end_turn&amp;quot;:                    # Extract text response                    for block in response.content:                        if hasattr(block, &amp;quot;text&amp;quot;):                            return block.text                # Handle tool use                for block in response.content:                    if block.type == &amp;quot;tool_use&amp;quot;:                        # Execute tool via MCP                        result = await session.call_tool(block.name, block.input)                        messages.append({                            &amp;quot;role&amp;quot;: &amp;quot;assistant&amp;quot;,                            &amp;quot;content&amp;quot;: response.content                        })                        messages.append({                            &amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;,                            &amp;quot;content&amp;quot;: [{                                &amp;quot;type&amp;quot;: &amp;quot;tool_result&amp;quot;,                                &amp;quot;tool_use_id&amp;quot;: block.id,                                &amp;quot;content&amp;quot;: str(result.content)                            }]                        })                        breakasyncio.run(run_with_mcp(&amp;quot;List all files in the current directory and summarize their contents&amp;quot;))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span&gt;&lt;h4&gt;12.3 A2A Agent Discovery &amp;amp; Collaboration&lt;/h4&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;&amp;quot;&amp;quot;&amp;quot;Simplified A2A protocol implementation.Real A2A uses JSON-RPC 2.0 over HTTP with SSE for streaming.&amp;quot;&amp;quot;&amp;quot;import httpximport asynciofrom dataclasses import dataclass@dataclassclass AgentCard:    name: str    endpoint: str    skills: list[dict]    version: str = &amp;quot;1.0&amp;quot;class A2AClient:    &amp;quot;&amp;quot;&amp;quot;Discovers and calls remote A2A agents.&amp;quot;&amp;quot;&amp;quot;    async def discover_agent(self, agent_url: str) -&amp;gt; AgentCard:        &amp;quot;&amp;quot;&amp;quot;Fetch agent card from a known URL.&amp;quot;&amp;quot;&amp;quot;        async with httpx.AsyncClient() as client:            response = await client.get(f&amp;quot;{agent_url}/.well-known/agent.json&amp;quot;)            data = response.json()            return AgentCard(                name=data[&amp;quot;name&amp;quot;],                endpoint=data[&amp;quot;endpoint&amp;quot;],                skills=data[&amp;quot;skills&amp;quot;],                version=data.get(&amp;quot;version&amp;quot;, &amp;quot;1.0&amp;quot;)            )    async def send_task(self, agent: AgentCard, skill_id: str,                         message: str) -&amp;gt; str:        &amp;quot;&amp;quot;&amp;quot;Send a task to a remote agent and get the result.&amp;quot;&amp;quot;&amp;quot;        payload = {            &amp;quot;jsonrpc&amp;quot;: &amp;quot;2.0&amp;quot;,            &amp;quot;method&amp;quot;: &amp;quot;tasks/send&amp;quot;,            &amp;quot;id&amp;quot;: &amp;quot;req-1&amp;quot;,            &amp;quot;params&amp;quot;: {                &amp;quot;skill&amp;quot;: skill_id,                &amp;quot;message&amp;quot;: {&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;parts&amp;quot;: [{&amp;quot;text&amp;quot;: message}]}            }        }        async with httpx.AsyncClient() as client:            response = await client.post(                agent.endpoint,                json=payload,                headers={&amp;quot;Authorization&amp;quot;: &amp;quot;Bearer my-token&amp;quot;}            )            result = response.json()            return result[&amp;quot;result&amp;quot;][&amp;quot;output&amp;quot;][&amp;quot;message&amp;quot;][&amp;quot;parts&amp;quot;][0][&amp;quot;text&amp;quot;]class OrchestratorAgent:    &amp;quot;&amp;quot;&amp;quot;    An orchestrator that discovers and delegates to specialist agents.    This is the heart of a multi-agent A2A system.    &amp;quot;&amp;quot;&amp;quot;    def __init__(self):        self.a2a = A2AClient()        self.registry: dict[str, AgentCard] = {}    async def register_agent(self, url: str):        card = await self.a2a.discover_agent(url)        self.registry[card.name] = card        print(f&amp;quot;Registered agent: {card.name} with skills: {[s[&amp;apos;id&amp;apos;] for s in card.skills]}&amp;quot;)    async def handle_request(self, user_request: str) -&amp;gt; str:        # In a real system, an LLM would decide which agent to use        # Here we do simple keyword routing        if &amp;quot;payment&amp;quot; in user_request.lower():            agent = self.registry.get(&amp;quot;payment-agent&amp;quot;)            return await self.a2a.send_task(agent, &amp;quot;process_payment&amp;quot;, user_request)        elif &amp;quot;weather&amp;quot; in user_request.lower():            agent = self.registry.get(&amp;quot;weather-agent&amp;quot;)            return await self.a2a.send_task(agent, &amp;quot;get_forecast&amp;quot;, user_request)        return &amp;quot;I don&amp;apos;t have a specialist agent for this request.&amp;quot;async def main():    orchestrator = OrchestratorAgent()    # Discover available agents (in production: from a registry service)    await orchestrator.register_agent(&amp;quot;https://payments.example.com&amp;quot;)    await orchestrator.register_agent(&amp;quot;https://weather.example.com&amp;quot;)    result = await orchestrator.handle_request(        &amp;quot;Process a &amp;#x24;50 payment for order #1234&amp;quot;    )    print(f&amp;quot;Result: {result}&amp;quot;)asyncio.run(main())&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;p&gt;&lt;em&gt;Built with care for the Rawalpindi/Islamabad dev community &amp;#x2014; and every hacker reading this on a Thursday afternoon. Go build something.&lt;/em&gt;&lt;/p&gt;&lt;hr&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;/div&gt;</content>
    </item>
    <item>
      <title>API pagination: cursor vs offset</title>
      <link>https://oreoro.github.io/posts/api-pagination-cursor-vs-offset/</link>
      <guid isPermaLink="true">https://oreoro.github.io/posts/api-pagination-cursor-vs-offset/</guid>
      <description>When to use cursor pagination and how to implement it safely.</description>
      <pubDate>Thu, 04 Jun 2026 00:00:00 GMT</pubDate>
      <lastUpdatedTimestamp>Thu Jun 04 2026 12:17:00 GMT+0000 (Coordinated Universal Time)</lastUpdatedTimestamp>
      <category>Guide</category>
      <category>Tools</category>
      <content>&lt;div&gt;
                    &lt;p&gt;
                        &lt;em&gt;Note:&lt;/em&gt; This RSS feed strips out SVGs and embeds. You might want to read the post on the webpage
                        &lt;a href=&quot;https://oreoro.github.io/posts/api-pagination-cursor-vs-offset/&quot; target=&quot;_blank&quot;&gt;here&lt;/a&gt;.
                    &lt;/p&gt;
                    &lt;hr&gt;
                &lt;div&gt;&lt;p&gt;&lt;time&gt; June 4, 2026 &lt;/time&gt;&lt;/p&gt;&lt;/div&gt;&lt;hr&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h3&gt;Cursor pagination (recommended)&lt;/h3&gt;&lt;ul&gt;&lt;li&gt; Stable under inserts/deletes  &lt;/li&gt;&lt;li&gt; Uses an opaque cursor (e.g., last seen id + sort key)  &lt;/li&gt;&lt;li&gt; Easy to cache and resume  &lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;SELECT *FROM itemsWHERE (created_at, id) &amp;lt; (:created_at, :id)ORDER BY created_at DESC, id DESCLIMIT 50;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;h3&gt;Offset pagination (avoid at scale)&lt;/h3&gt;&lt;ul&gt;&lt;li&gt; Can skip/duplicate rows when data changes  &lt;/li&gt;&lt;li&gt; Gets slower as offset grows  &lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;div&gt; If you need &amp;#x201c;page numbers&amp;#x201d;, store cursors per page server-side.  &lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;/div&gt;</content>
    </item>
    <item>
      <title>TLS in plain English</title>
      <link>https://oreoro.github.io/posts/tls-in-plain-english/</link>
      <guid isPermaLink="true">https://oreoro.github.io/posts/tls-in-plain-english/</guid>
      <description>What happens during a TLS handshake, without the math.</description>
      <pubDate>Thu, 04 Jun 2026 00:00:00 GMT</pubDate>
      <lastUpdatedTimestamp>Thu Jun 04 2026 12:17:00 GMT+0000 (Coordinated Universal Time)</lastUpdatedTimestamp>
      <category>Information</category>
      <content>&lt;div&gt;
                    &lt;p&gt;
                        &lt;em&gt;Note:&lt;/em&gt; This RSS feed strips out SVGs and embeds. You might want to read the post on the webpage
                        &lt;a href=&quot;https://oreoro.github.io/posts/tls-in-plain-english/&quot; target=&quot;_blank&quot;&gt;here&lt;/a&gt;.
                    &lt;/p&gt;
                    &lt;hr&gt;
                &lt;div&gt;&lt;p&gt;&lt;time&gt; June 4, 2026 &lt;/time&gt;&lt;/p&gt;&lt;/div&gt;&lt;hr&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h3&gt;The handshake&lt;/h3&gt;&lt;ul&gt;&lt;li&gt; Client says: &amp;#x201c;Here are the cipher suites I support&amp;#x201d;  &lt;/li&gt;&lt;li&gt; Server replies with a certificate (public key)  &lt;/li&gt;&lt;li&gt; Client verifies the certificate chain  &lt;/li&gt;&lt;li&gt; They agree on session keys (usually via ECDHE)  &lt;/li&gt;&lt;li&gt; After that: traffic is encrypted + authenticated  &lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;What you get&lt;/h3&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Confidentiality&lt;/strong&gt; (encryption)  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Integrity&lt;/strong&gt; (tamper detection)  &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Authenticity&lt;/strong&gt; (you&amp;#x2019;re talking to the right server)  &lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;Common gotchas&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt; Expired certs  &lt;/li&gt;&lt;li&gt; Wrong hostname (SAN mismatch)  &lt;/li&gt;&lt;li&gt; Missing intermediate certs  &lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;/div&gt;</content>
    </item>
    <item>
      <title>Docker layers: 6 rules for smaller images</title>
      <link>https://oreoro.github.io/posts/docker-layers-6-rules-for-smaller-images/</link>
      <guid isPermaLink="true">https://oreoro.github.io/posts/docker-layers-6-rules-for-smaller-images/</guid>
      <description>A tiny checklist to cut build time and image size.</description>
      <pubDate>Thu, 04 Jun 2026 00:00:00 GMT</pubDate>
      <lastUpdatedTimestamp>Thu Jun 04 2026 12:17:00 GMT+0000 (Coordinated Universal Time)</lastUpdatedTimestamp>
      <category>Tools</category>
      <content>&lt;div&gt;
                    &lt;p&gt;
                        &lt;em&gt;Note:&lt;/em&gt; This RSS feed strips out SVGs and embeds. You might want to read the post on the webpage
                        &lt;a href=&quot;https://oreoro.github.io/posts/docker-layers-6-rules-for-smaller-images/&quot; target=&quot;_blank&quot;&gt;here&lt;/a&gt;.
                    &lt;/p&gt;
                    &lt;hr&gt;
                &lt;div&gt;&lt;p&gt;&lt;time&gt; June 4, 2026 &lt;/time&gt;&lt;/p&gt;&lt;/div&gt;&lt;hr&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h3&gt;6 rules&lt;/h3&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt; Use a small base (alpine/distroless when possible)  &lt;/li&gt;&lt;li&gt; Copy only what you need (use &lt;code&gt;.dockerignore&lt;/code&gt;)  &lt;/li&gt;&lt;li&gt; Install deps before copying app source  &lt;/li&gt;&lt;li&gt; Combine commands to reduce layers  &lt;/li&gt;&lt;li&gt; Use multi-stage builds  &lt;/li&gt;&lt;li&gt; Pin versions to avoid surprise rebuilds  &lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;pre data-language=&quot;docker&quot;&gt;&lt;code&gt;&lt;span&gt;&lt;span&gt;FROM node:24-alpine AS buildWORKDIR /appCOPY package*.json ./RUN npm ciCOPY . .RUN npm run buildFROM gcr.io/distroless/nodejs24-debian12COPY --from=build /app/dist /appCMD [&amp;quot;/app/index.js&amp;quot;]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;hr&gt;&lt;/div&gt;</content>
    </item>
  </channel>
</rss>
