MosaicML Trains Generative AI Models Faster with Oracle

Machine Learning

MosaicML, a software development provider that offers infrastructure and tools for building large-scale machine learning models, selected Oracle Cloud Infrastructure (OCI) as its preferred cloud infrastructure to help enterprises extract more value from their data. With OCI’s high-performance AI infrastructure, MosaicML states that it has seen up to 50 percent faster performance and cost savings of up to 80 percent compared to other cloud providers.

“Hundreds of organizations rely on MosaicML’s platform to develop and train large, complex generative AI models. We provide the complex systems and hardware so our customers can focus on building and deploying their own high-performing custom models,” said Naveen Rao, CEO and co-founder, MosaicML. “We selected OCI as we believe it is the best foundation for MosaicML. When training models with massive troves of data in the cloud, every minute counts – and with OCI, we pay less than with other cloud providers and can scale almost linearly because of the way Oracle configured its interconnects.”

MosaicML helps organizations make training and inferencing of AI models more efficient and accessible with its model training capabilities. To scale its business to support the growing demand for AI services, MosaicML selected OCI. With OCI, MosaicML has been able to gain access to the latest NVIDIA GPUs, a very high bandwidth interconnect between nodes, and large compute block sizes for scaling to thousands of GPUs. This has enabled MosaicML to help enterprises and startups operationalize AI models, including Twelve Labs.

Twelve Labs is an AI startup building foundation models for multimodal video understanding. By taking advantage of MosaicML’s platform running on OCI and OCI’s AI infrastructure, Twelve Labs has been able to efficiently scale and deploy its AI models to help users effortlessly search, classify, and more effectively utilize their video data for various applications.

“The combination of MosaicML and Oracle have given us the perfect collaboration to help us handle large capacities at high speeds and to keep up with our growth long-term,” said Jae Lee, founder and CEO, Twelve Labs. “MosaicML enables us to efficiently manage our large AI clusters, while OCI’s AI infrastructure ensures we don’t have to compromise on speed, which has saved us thousands of hours and tens of thousands of dollars in efficiencies.”

OCI offers several capabilities for AI, including AI infrastructure. OCI Compute virtual machines and bare metal GPU instances can power applications for computer vision, natural language processing, recommendation systems, and more. For training large, complex models, such as large language models (LLMs) at scale, OCI Supercluster provide ultra-low latency cluster networking, HPC storage, and OCI Compute bare metal instances powered by NVIDIA GPUs. OCI Compute instances are connected by a high-performance ethernet network using RoCE v2 (RDMA over Converged Ethernet v2). The bandwidth on A100 GPUs provided by OCI exceeds that of both AWS and GCP by 4X-16X, which in turn reduces the time and cost of machine learning training.

“We are seeing an influx of AI companies come to OCI to run generative AI models, because we can run them faster and more economically than other cloud providers. It is not uncommon to train a 10 billion-parameter model within a few hours on OCI versus a few days on other platforms,” said Greg Pavlik, senior vice president, Oracle. “OCI’s architecture and non-blocking, low latency network design is fundamentally different than anything on the market.”

MosaicML selected Oracle in Q3 FY2022.

About Shakthi

I am a Tech Blogger, Disability Activist, Keynote Speaker, Startup Mentor and Digital Branding Consultant. Also a McKinsey Executive Panel Member. Also known as @v_shakthi on twitter. Been around Tech for two decades now.

View all posts by Shakthi →