Scattered Thoughts on LLM Technology

In a sense, this is maybe part of the blog post I promised back in January 2023 but never wrote, because I couldn't ultimately wrap my head around LLMs enough to form coherent thoughts.

I am less skeptical of this than I used to be, and I find a number of arguments about it less compelling. For a longer, more compelling, less scattershot version of this post (not all of it, but some of it), please read The average AI criticism has gotten lazy, and that's dangerous by Danilo Campos.

  1. On the energy consumption case, I have to remind myself sometimes that I am not inherently against energy consumption. It must be done thoughtfully, and renewably, but energy use by itself is not disqualifying. Furthermore, there is going to be huge amounts of financial and competitive pressures for LLM technology to get leaner, using less compute.

    This is a contrast to blockchain, where in proof-of-work systems, resource consumption is the point. Efficiencies are automatically consumed by difficulty scaling, and so individual actors can find efficiencies and profit from them, the system as a whole cannot.

  2. On the training data problems: This is a legitimate concern; yet it strikes me that it is not impossible to overcome; Similar to how LLMs will have strong downward pressure on their compute and energy consumption, LLMs are experiencing downward pressure on their data requirements. While huge data and huge compute are required today, I think it's short sighted to assume the same will be true tomorrow. The idea of a small model that is specialized to mozilla-central is kind of appealing!

  3. It feels like there has to be a bit of a cost reckoning; If OpenAI isn't profitable (and I don't know that we know if that's true or not), we could see costs climb for the deployment of AI models. Spend some time playing with the OpenAI cost calculator, and it's pretty clear that at scale it's already reasonably expensive to use. I think we will see three trends here: 1) Companies that are just thin wrappers of OpenAI’s APIs will slowly disappear; why go third party when OpenAI will serve you (for cheaper almost certainly). 2) You’ll see at least two hardware startups blow up on their ability to dramatically reduce cost for running these open models. E.g. play with Groq demo. Partially driven by cost, I also think

  4. The “just throw a chatbot on X” model of deployment we’re seeing now will fade away; a lot more usage will be LLM-as-API; I expect this also means API focused models. Ones where output is always JSON, and they are task trained more than general purpose. I’ll bet we even see input stop being “prompts”, as the technology adapts to traditional programming over time.

  5. This space is going to change remarkably over the next few years; we're in for a period of dramatic change here. I've no idea how it's going to go, but it's going to look wildly different in five years I suspect. I work at Mozilla, and the Mozilla Foundation has been thinking about this a lot.

  6. I worry so much about bandwagoning, and bad choices driven by FOMO; I think so many of the "rub-generative-ai-on-it" projects are gross, and many of them undermine the value proposition of the organizations deploying them. Yet, I am increasingly convinced that I can't put a blanket over my head and hope this all blows over... it's not blowing over.

For more, be sure to also read Campos' What if Bill Gates is right about AI.