At 2501.ai, we answered the challenge of function calling instability across LLM platforms and inference methods by building a universal adapter.
At 2501.ai, we’re dedicated to pushing the boundaries of Large Language Model (LLM) capabilities, to maximize performance. One significant challenge we’ve repeatedly encountered while building our native LLM orchestration framework is the instability and lack of supporting functions to call between multiple platforms and inference methods. This inconsistency restricts integration and utilization of LLMs in complex environments.
Today, we're going to share how we overcame this barrier: how we built our in-house universal function call adapter to enhance compatibility and reliability of the different models and inference platforms we're using.
Function calling mechanisms in LLMs are powerful tools that enable dynamic interactions and extend the models’ capabilities. However, we noticed:
Platforms implement function calling differently into their models, leading to unpredictable behavior and unstable performance. We had to figure out how to ensure all models behave in a similar way, or ideally to find a reliable function calling implementation.
Despite the efforts of big players to standardize the function calling methodology for implementation, there wasn’t yet a universally accepted standard for function calls, causing compatibility issues. Most LLM providers or inferences are using considerably different standards when it comes to configuration for their API calls or SDK.
Not all LLMs or inference engines support function calling natively. This limitation restricts the deployment of advanced functionalities and affects the overall user experience with the limited usage potential. It's a shame to see a high performance model penalized by the lack of function calling with native SDK/API features.
To provide the best possible service and ensure that our LLM orchestration is strong and flexible, we needed a system that:
We developed a universal function call adapter that operates within the system prompt of the LLM. Here’s how it addresses the challenges:
Embedding the adapter directly into the system prompt means that it doesn’t rely on external functions or platform-specific features. This approach increases compatibility and reduces dependencies.
Our adapter is designed to be minimalistic. It provides essential instructions without adding unnecessary complexity, ensuring efficient operation.
Below is the core template of our function call adapter; an add-on template that we're going to inject into our final system prompt:
Following lightweight principles, our template includes on purpose only the minimum needed to perform properly:
To extract the function calls from the LLM’s response, we use a regular expression:
This regex captures everything between ##FUNCTIONS_JSON## and ##END_FUNCTIONS_JSON##, allowing us to isolate the JSON containing the function calls.
After consistently testing the system prompt add-on template, we're happy to share our key achievements:
Implementing a universal function call adapter has significantly improved the stability and compatibility of our LLM orchestration at 2501.ai. By adhering to JSON standards and integrating directly with the system prompt, we developed a solution that’s both efficient and widely compatible.
This approach not only solves our immediate challenges, but also sets a foundation for future developments. We hope that sharing our experience can help others facing similar issues in the LLM space.
At 2501.ai, we’re committed to continuous innovation and collaboration. If you’re interested in learning more about our work or have any questions, feel free to reach out. Together, we can push the boundaries of what’s possible with AI.