Intelligent CIO Africa Issue 102

CIO OPINION

Chief among these are large language models, LLMs, which require training, inferencing, fine-tuning, and optimisation techniques for grounding Generative AI. Some organisations struggle with AI skill gaps, making it difficult to effectively use such technologies.

To ease these burdens, some organisations choose to consume LLMs managed by public cloud providers, or to run LLMs of their choosing on those third-party platforms.

As attractive as the public cloud is for shrinking launch timelines, it also features trade-offs. Variable costs and higher latency as well as data security and sovereignty concerns can make running AI workloads there unappealing or even untenable.

AI workloads also present more variables for IT decision makers to consider. As attractive as more choice is, it can also compound complexity.

Accordingly, running infrastructure, including compute, storage and GPUs on-premises gives organisations the ability to control all aspects of deployment. For instance, on-premises infrastructure offers great value for deploying large, predictable AI workloads, with closer proximity to where the data is processed, enabling organisations to respect their budgets.

Strong organisational controls are essential for safeguarding AI models, inputs, and outputs – which may include sensitive IP – from malicious actors and data leakage.

To comply with local regulations, some organisations must prioritise data security and data sovereignty mandates requiring that data remains in specific geographic locales. By running AI workloads where the data exists, organisations can remain compliant while also avoiding duplicate transfers between systems and locations.

To that end, many organisations today are customising open-source LLMs using retrieval-augmented generation, RAG. With RAG, organisations can tailor chatbots with prompt responses to specific use cases.

Moreover, as LLMs continue to downsize while maintaining high performance and reliability, more models are running on portable computers such as AI PCs or workstations on-premises and at the edge.

These factors underscore why 73 % of organisations prefer to self-deploy LLMs based on infrastructure operating at data centres, devices, and edge locations, according to Enterprise Strategy Group.

Empirical data comparing the value of on-premises deployments to the public cloud are scarce. However, ESG’ s recent study compared the expected costs of delivering inferencing for a text-based chatbot fuelled by a 70B parameter open-source LLM running RAG on-premises with a comparable public cloud solution from Amazon Web Services.

The analysis, which estimated the cost to support infrastructure and system administration for between 5,000 to 50,000 users over a four-year period, found that running the workload on-premises was as much as 62 % more cost-effective at supporting inferencing than the public cloud.

Moreover, the same on-premises implementation was as much as 75 % more cost-effective than running

Intelligent CIO Africa Issue 102 | Page 39