
With AI models getting larger and larger, the potential strain for already stretched networks could be catastrophic in the coming years. One emerging startup is looking to help developers scale down model sizes while maintaining performance.
San Jose, California-based Clika recently secured seed funding from investors, including Accenture’s venture capital arm and IQT, the not-for-profit strategic investor for the U.S. national security community. Clika will use the funds to fuel its efforts to develop tools to reduce AI model sizes.
The team at Clika has created a compression platform that automatically compresses and compiles AI models, reducing them in size while maintaining their core capabilities. This enables them to be applied to edge and on-premises environments without impacting performance.
“We built Clika to solve the last-mile problem of AI, getting models out of the lab and into production, quickly and securely,” explained Clika co-founder and CEO Nayul Kim.
The CEO outlined to SDxCentral that with their SDK, Clika has been able to compress a model like Meta’s Llama 3.1 8B using an enthusiast-class graphics card like Nvidia’s RTX 3090 to reduce model size by approximately 60% in around 20 minutes.
Kim added that their toolkit acts as a unified platform where users can compress a model once and then have it ready for deployment across hardware from multiple vendors, including Nvidia, Intel, AMD, and Qualcomm.
“Before, this type of work process was unimaginable because of the hurdles and bottlenecks that engineering teams had to bear, like understanding target hardware limitations and what types of inference engines are running on the device,” Kim said, adding that with their compression SDK, Clika users can compress a model, loop it for target hardware, then leave the tool to run, only to come back once its completed.
Small but mighty?
The ever-increasing demand for AI is already causing network operators headaches. Recent Ciena research found that just 16% of network operators admitted their optical networks were “very ready” for the influx of AI, with around one-third (29%) expecting AI to contribute more than half of all long-haul traffic over the next three years.
Potentially compounding concerns is the rise of agentic AI – small-scale agents autonomously performing tasks – which could cause yet further strain on networks. A Cisco exec recently suggested to SDxCentral that AI agent deployments could soon add network bandwidth equivalent to “80 billion” users.
While flagship foundation models, such as Meta's Llama 3.1 – the largest open-source AI model to date – continue to grow in scale, smaller-scale AI systems have recently gained momentum. The shift is driven by increased developer interest in solutions that are compatible with more common hardware, which typically aren’t the largest or most powerful available.
Google’s Gemma line of AI models is just a few billion parameters (the adjustable variables within an AI model that help it learn from input data) in size, which, at the time of writing, have been downloaded more than 200 million times. Microsoft has its own line of small language models (sometimes abbreviated to SLM), including its Phi series. And to counter its mammoth model, Meta published a series of small-scale Llama models last September, with the smallest sizes capable of fitting onto mobile and edge devices.
Earlier this week, Google unveiled a version of its Gemma 3 family of AI models that stands at just 270 million parameters – but just because a model is small in parameter size, that doesn’t mean it doesn’t come with a hefty file size. As Clika co-founder and CTO Ben Asaf notes, the conception that the smaller the model is, the less powerful the hardware required to run it is wrong when a model like Meta’s Llama 3.1 8B has a file size of around 16 gigabytes.
“[Google Gemma 3 270M] is not an impressive model, but it does understand English, which is pretty damn impressive for a model of that size,” Asaf said, adding that despite its language skills, its limited size means it would struggle with some tasks.
The team at Clika sees one to three billion parameters as the sweet spot for SLMs, with their technology able to compress systems of that size to power use cases ranging from physical AI, like robotics, to eventual edge use cases.
The impact of such smaller-scale AI systems, reduced further in size through tools like those being developed by teams like Clika, could ease network operators' concerns about exploding demand.
“AI is still in the early adoption stage, meaning workloads will eventually bombard data centers and network systems," Kim said. "If we keep deploying larger models, we cannot sustain power consumption and communication levels. Some optimization needs to happen as it has a direct impact.”
Asaf explained to SDxCentral that the underlying architecture for some AI models has struggled with scalability challenges.
"The Transformer architecture, which is used pretty much everywhere now, was developed in 2017, but it took until 2021-2022 to start seeing it deployed widely," Asaf said. "Part of the problem was that it took hardware companies time to optimize the inference frameworks to a point where it's actually practical, not just theoretical.”
Asaf added that Clika's solution aims to "enable developers to try out ideas and test them on different hardware platforms.”
To support its vision of democratising AI by reducing model sizes, Clika will join Accenture Ventures’ Project Spotlight accelerator, providing it with access to the consulting giant’s domain expertise and its enterprise clients.
“The rapid growth of AI, and the complexity that comes with [it], demands faster, simpler, and smarter deployment,” Tom Lounibos, global lead for Accenture Ventures, explained. “With this investment, we are providing Clika with strategic market access to enterprise clients, teaming to deploy advanced models at the edge and put intelligence where it matters most.”
