LLMs are the best Zero Shot Classifiers (if used correctly)
A bit of context
So last week I was working on a problem I could not find much resources for. No existing embedding model for me to plug and play with my data, neither an existing dataset I could use to create my own model. And while I could still create a synthetic dataset using LLMs, the problem remained how to account for every corner case that exists.
After spending most of the week racking my brains, and still getting nowhere close to solving the problem I was almost out of ideas. But yeah, let us take a look at the problem I was trying to solve, technically not the exact problem, but yeah somewhat similar.
The problem
I had a pool of 400–500 categories that needed to be assigned to incoming queries. Since the categories were constantly expanding, fine-tuning a classifier wasn’t an option. I experimented with a LLM, hoping it could automatically assign the appropriate labels. However, due to hallucination, I could never get the appropriate label, or missed out on some significant labels. Thought RAG would help with grounding the tags, but even with that the hallucination problem was not resolved, I still got made up categories.
And after trying hundreds of different prompts, even resorting to extreme measures like threatening the extinction of the human race and warning the model that incorrect outputs would cost me my job, I still couldn’t get reliable results. With each failed attempt, I felt like I was losing my sanity and my will to live altogether. It seemed absurd that I couldn’t even make a LLM do something seemingly simple, perhaps it was time to abandon the idea altogether.
But finally it hit me, I could use the hallucination to my advantage. With just a couple of changes to the pipeline I could create a simple classifier for myself.
The solution that worked for me
So, as I mentioned earlier, the problem at hand was to classify an incoming query into multiple classes using a pre-defined set of classes I had already. The only issue was, even with RAG, the model would create classes of its own. So, rather than asking the model to pick tags from the list I had defined, I asked it to create it’s own classes (following the constraints I needed) and add a description for the class in a python dictionary. The goal was now simple, have a key value pair for my classes from the bucket, where they key was the class name, and the value was a description of the class. And for each query ask the LLM to create a bucket of classes it seems fit, and a description for these classes.
All that remained was to use a simple language encoder model to create embeddings for these two different buckets, perform cosine similarity match on the tags and get the desired classes out.
Will be adding a simple demonstration with a dummy problem to demonstrate the end to end pipeline soon.
Conclusion
In summary, my journey navigating the complexities of language models led me to a pivotal realisation: instead of battling against the model’s tendencies, we could harness them to our advantage. By reframing the problem and adapting our approach, we can easily discover a novel solution that aligns the model’s outputs with our requirements.
This experience underscored the importance of persistence and creativity in problem-solving, showcasing the transformative power of thinking outside the box. Through perseverance and a willingness to explore unconventional paths, I transformed frustration into innovation, ultimately finding a practical and effective solution. In a field as dynamic as artificial intelligence, where challenges abound and boundaries are continually pushed, it’s our ability to adapt and innovate that paves the way for progress and discovery. In embracing the unknown and embracing new possibilities, we unlock the true potential of technology to shape the world around us.
(Yeah I used an LLM to write the conclusion for me, cause I know the exact prompt for that, did not need a hacky way to make that work)