

LLMs don’t ’remember the entire contents of each book they read’. The data are used to train the LLMs predictive capabilities for sequences of words (or more accurately, tokens). In a sense, it develops of lossy model of its training data not a literal database. LLMs use a stochastic process which means you’ll get different results each time you ask any given question, not deterministic regurgitation of ‘read texts’. This is why it’s a transformative process and also why LLMs can hallucinate nonsense.
This stuff is counter-intuitive. Below is a very good, in-depth explanation that really helped me get a sense of how these things work. Highly recommended if you can spare the 3 hours (!):
https://www.youtube.com/watch?v=7xTGNNLPyMI&list=PLMtPKpcZqZMzfmi6lOtY6dgKXrapOYLlN
I’ve long been an enthusiast of unpopular punctuation—the ellipsis, the em-dash, the interrobang‽
The trick to using the em-dash is not to surround it with spaces which tend to break up the text visually. So, this feels good—to me—whereas this — feels unpleasant. I learnt this approach from reading typographer Erik Spiekermann’s book, *Stop Stealing Sheep & Find Out How Type Works.