When we deliver a model we make sure we don't reach X seconds of latency in our API. Before even going into performance of LLMs for classification, I can tell you that with the current available tech they are just infeasible.
Reply
reply
LinuxSpinach
•
5h ago
^ this. And especially classification as a task, because businesses don’t want to pay llm buck... See more