Anyscale teams with Nvidia to boost LLM efficiency

Large language models used for generative AI tools often greatly increase the need for more processors, which are usually expensive and supply-constrained. Even cloud resources don’t always solve the problems for businesses trying to scale up and take advantage of the latest efforts to take advantage of gen AI.

“Sooner or later, scaling of GPU chips will fail to keep up with increases in model size,” said Avivah Litan, a vice president and distinguished analyst with Gartner Research, speaking originally to Computerworld.

"So, continuing to make models bigger and bigger is not a viable option."

The biggest GPU maker on the market, Nvidia, obviously sees the value of open source software to help improve AI development and efficiency. On Monday, Anyscale announced it is bringing Nvidia AI to Ray open source and the Anyscale platform. Nvidia AI will also run in Anyscale Endpoints, a service to help app devs embed LLMs into their apps with popular open source models such as Code Llama, Falcon, Llama 2, SDXL and more.

Nvidia TensorRT-LLM, recently announced, will support Anyscale as well as the Nvidia AI Enterprise software platform  It can be used to automatically scale inference to run models in parallel over multiple GPUs to provide 8x higher performance when running Nvidia H100 Tensore Core GPUs, Nvidia said in a blog.

RELATED: Nvidia offers new software to tame LLMs, improve AI inference

Also, Nvidia Triton Interence Server software supports interence across cloud, data center, edge and embedded devices on GPUs, CPUs and other processor, Nvidia said. When integrated with Ray, developers can boost efficiency for AI models from various frameworks including TensorRT, TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS XGBoost and others.And, Nvidia NeMo, a cloud-native framework can be used by Ray developers to customers LLMs.

“Our collaboration with Nvidia will bring even more performance and efficiency to Anyscale’s portfolio so that developers everywhere create LLMs and generative AI applications with unprecedented speed and efficiency,” said Robert Nishihara, CEO and co-founder of Anyscale in a statement.

Anyscale claims its Ray is the world’s fastest-growing open-source unified framework for scalable computing.