We tried some other semi-automated approaches, but just not feasible.Īll of this is why there is no way Langchain or anything like it is currently useful to built actually valuable user-facing features at production scale. The evaluation of quality is all human, manual work. Every prompt change has to be tested with a huge number of full-chain invocations on real input data before it can be accepted and stabilized. So #2 is where it's both hard to get right and still solvable. In some of the features, we expose this to the user somewhat as additional context and mix that in with the pre-built instructions. #1 - we've found it's doable for non-tech folks. Even things like newlines affects quality too much. #2, we tune these prompts internally within the team. We feed some context on what all the steps are to every step, so it can understand the overall purpose. The features now have a narrow usecase and a fully-defined DAG shape upfront. Either it works well, or we don't offer it as a feature. "Retry" as a user-facing operation is just stupid IMO. Not the kind of thing you can optimize iteratively to good enough quality at production scale. Attempting to do #3 very often either ends up completing the chain too early, or just spinning in a loop. Pretty much impossible to do with useful quality. Prompts ask LLM to generate the next stepĭoing #3 across multiple steps is the promise of Langchain, AutoGPT et al. Prompts ask LLM to generate instructions for the next stepģ. Prompts ask LLM to generate input for the next stepĢ. There are a few increasingly harder things when it comes to prompt customization:ġ. It's so basic it's dumb, yet it is more powerful, as it does not rely on GPT-4 level intelligence, it's just what I needed They are relying too much on the LLM being smart because they probably only test stuff in GPT-4 and 3.5, but with GPT4All models this prompt was not working at all, so I had to rewrite it, for simple routing, we don't even need json, carying the `next_inputs` here is weird if you don't need it. Using > for sections is not a normal thing, it's not markdown, which probably the thing read way more often on the internet, instead of open json comments, why not type signatures, instead of so many rules, why not give it examples? It is an autocomplete machine! It's a lot of instructions for an LLM, they seem to forget an LLM is an auto-completion machine, and which data it is trained on. But sure, the main insight I remember is this, take a look at their MULTI_PROMPT_ROUTER_TEMPLATE. Yeah I never know where memory goes exactly in langchain, it's not exactly clear all the time. It's just not feasible on top of the foundation models we have right now. Again not that our library or Langchain was bad engineering. We briefly made our own internal Langchain. The longer the chain, the more garbage you find at the output. But what we end up with a mediocre DAG framework where all the instructions/data passing through is just garbage. Langchain is attempting to set up abstractions to reuse everything. 95% is really just in the prompt tuning and data serialization formats. The part around setting up a DAG orchestration to run these chains is like 5% of the work. The input data has to be formatted a very specific way to generate good outputs for that feature/chain step. Each step in the chain requires handwritten prompts. The ultimate outputs are very human-like to the point where there is some private excitement that we've built an AGI.Įach feature requires very custom handwritten prompts. Over the last several months, my team has been building several features using highly sophisticated LLM chains that do all manner of reasoning. The #1 learning is that there is no reusability with the current generation of LLMs. The reason why Langchain is pointless is that it's trying to solve problems on top of technical foundations that just cannot support it.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |