Here’s a workflow I keep seeing. Someone has a database full of rankings data. They point an LLM at it with a schema file and ask “what changed week over week?” The model reads the tables, compares the numbers, and spits out a summary.
This works. For a while.
Then the model gets updated and the output format shifts. Or it hallucinates a trend that isn’t there. Or you can’t reproduce last week’s analysis because the model was feeling different that day. Or you’re paying per-token for what amounts to arithmetic.