Enabling Enterprise AI in a Multicloud World: The Infrastructure Imperative

By Lee Larter, Pre-sales Director, Dell Technologies.

Artificial intelligence (AI) is changing how businesses operate to stay ahead of the curve, from enabling real-time fraud detection within financial services to driving predictive maintenance within manufacturing. The UK AI market alone is worth more than £ 21 billion, according to the US International Trade Administration, and is expected to grow to £1 trillion by 2035. But for UK enterprises to fully harness AI’s potential, they need more than sophisticated algorithms and talented data scientists. They need a resilient, adaptable infrastructure that reacts quickly and copes well with changing demands.

As each technological breakthrough reshapes the business landscape, we move further away from the days when IT operated solely within the confines of its own data centres. Now, in almost every sector, businesses blend multiple platforms, cloud providers and environments to sharpen efficiency, boost resilience and keep pace with diverse and compute-heavy workloads. As such, multicloud strategies have evolved beyond simply saving money or avoiding vendor lock-in. Multicloud is now about creating an infrastructure that can support ever more ambitious AI initiatives. 

According to research, 75% of organisations see AI as central to their business strategy. Yet, the technological environment in which they are operating is incredibly complex and presents development and management challenges. This might explain why only a few early adopters have broken out of the test and learn phase to achieve full business impact.

For those businesses looking to scale AI, each infrastructure approach and method for managing will require careful consideration. For innovators to succeed, the infrastructure on which they run will need flexibility but also precision, as well as immense performance power. At Dell, we recommend focusing on four key pillars as part of the development and deployment process: Computing Power, Data Management, Storage and Operational Efficiency. These elements form the backbone of scalable AI operations in a multicloud environment.

Computing Power: Scaling and Networking for AI Workloads

Performance is a necessity when it comes to enterprise AI. Training large models, combing through large datasets, generating real-time insights and remaining agile in the face of unpredictable demands requires the appropriate accelerated compute being applied onto Enterprise customers data sets. This amount of power isn’t generated by stacking high-performance GPUs; it’s achieved through smart and strategic choices across the infrastructure. Compute hardware decisions must be intentional - AI-specific hardware like GPUs, NPUs and dedicated accelerators have become staples for enterprises pushing the limits of what AI can do. To complement these smarter hardware decisions, AI applications require fast, uninterrupted data transfer. High-bandwidth, low-latency connections between cloud environments a well as the intra-rack connectivity to ensure smooth AI operations. Businesses should use software-defined networking (SDN) and network optimisation tools for seamless connectivity.

Data Management: Ensuring Seamless AI Data Flow

Once a robust hardware foundation has been layered with a high-speed network, there needs to be an adaptable system for data management that powers each AI model. AI thrives on high-quality, accessible data, but managing data across multiple clouds can be intricate, technical and risky. For UK businesses especially, effective data management is of even greater importance not just for operational success but for maintaining trust and compliance in a heavily regulated environment.

An AI data platform is distinguished by its ability to effectively manage data through strategic placement, federation, efficient processing, and secure protection:

Data Placement: This involves the real-time ingestion of large volumes of data from various sources, utilising scalable file, structured and object storage solutions to support high performance, especially for GPU-intensive workloads

Data Processing: A platform approach enhances data discoverability through data curation and enriched metadata. It supports the classification and tagging of data, the creation of product sets, and the indexing of unstructured data, allowing for seamless integration with improved speed and efficiency of data retrieval for Line of business users.

Data Protection: Organisations must ensure comprehensive data protection with features like access control, data masking, threat detection, and encryption. A coordinated approach safeguards data by preventing unauthorised access and ensuring compliance with regulatory requirements.

An AI data platform architecture should be open, flexible, and secure to avoid vendor lock-in and support an extensive ecosystem of tools and standards, making it adaptable to the evolving needs of AI and data teams. This ensures that UK enterprises maintain compliance with regulations such as GDPR and CCPA, while addressing concerns like data bias and privacy in AI models.

Storage: The Backbone of AI Scalability

Now that there is a powerful base processing high-quality and well-managed data, this information needs to be stored in a way that mitigates additional costs and performance bottlenecks. Businesses must store data securely, while also enabling easy access. Crucially, though, easy to access does not necessarily mean instantly available. Consider a tiered architecture; for datasets that do require instant access, choose high-speed storage systems like flash, while archival data can be stored in more cost-effective options. This ensures optimal performance without unnecessary expense.

Similarly, distributed storage systems and object storage are ideal for managing the unstructured data generated by AI applications. These solutions allow businesses to scale seamlessly as data volumes grow, and customers should consider a hybrid cloud object storage solution to balance cost, and data gravity priorities aligned to their use cases. 

Finally, on-demand storage solutions are increasingly popular for those looking for flexibility and minimum upfront capital investment. These models align well with the unpredictable growth of AI data.

As all AI models thrive best with fresh, relevant data, automating processes like archiving, deletion and migration ensures efficient storage use while maintaining compliance with data retention policies.

Driving Operational Efficiency and Sustainability

Finally, with an infrastructure powerful enough for advanced AI applications and a strong system for data management and storage, a business’s AI ambitions are primed to scale. But as AI adoption and usage grows, so does its environmental impact. Addressing this is the fourth and final consideration. 

Implementing energy-efficient hardware configurations, environmentally responsible cooling methods and drawing on management software tools can significantly reduce power usage and extend hardware lifespan. Power management tools that use telemetry provide valuable insights to optimise power and thermal management in real-time, while identifying potential hardware issues early.

Scaling AI from concept to reality requires intentional action and a robust infrastructure designed to empower innovation. By focusing on the four foundational pillars of: Computing Power, Data Management, Storage and Operational Efficiency, businesses can lay the groundwork for scalable, high-performing AI initiatives while navigating the realities of a complex multicloud environment. These aren’t abstract concepts but practical steps that will allow organisations to move from proof-of-concept projects to fully operational AI systems that deliver measurable value.

By Justin Borgman, CEO, Starburst.
By Mona Bokharaei Nia, Ph.D, Global Director, AI/ML Solutions, Tecnotree.
By Jesse Todd, CEO of EncompaaS.
By Jon Dodkins, Head of Technical Solutions at Headforwards.
By Dirk Alshuth, Cloud Evangelist at emma – the cloud management platform.