Hugging Face has unveiled its latest offering, Hugging Face Generative AI Services (HUGS), which aims to simplify the deployment and scaling of generative AI applications using open source models.
Based on Hugging Face technologies such as Transformers and Text Generation Inference (TGI), HUGS promises optimized performance on various hardware accelerators.
For developers using AWS or Google Cloud, the service is available for $1 per hour per container, with a five-day free trial on AWS to help users get started.
AI Optimization with Zero Configuration Inference
HUGS offers developers a solution to run AI models on their own infrastructure without manual configuration. One of the main challenges when implementing large language models (LLMs) is optimizing them for specific hardware environments. Each accelerator, whether an NVIDIA GPU or an AMD GPU, requires tuning to extract maximum performance.
With HUGS, these optimizations are managed automatically, delivering high performance right out of the box. In addition to NVIDIA and AMD GPUs, the company promises that its support will soon extend to AWS Inferentia and Google TPU.
Hugging Face aims to facilitate the transition from black box APIs to open, self-hosted solutions with support for a wide range of models, including well-known LLMs such as Llama and Gemma, with plans to soon introduce multimodal models such as Idefics and Llava. . In the future, the company says it will include integrated models such as BGE and Jina., giving developers even more options to customize their AI applications.
This service uses standardized APIs compatible with OpenAI model interfaces, so developers can migrate their own code.
For startups in particular, HUGS provides the opportunity to create AI applications without incurring the high costs associated with proprietary platforms. The availability of one-click deployments in DigitalOcean makes it even easier for small teams to experiment with generative AI technologies.
Meanwhile, larger businesses can leverage HUGS to scale their applications without being locked into a single cloud provider or proprietary API. At DigitalOcean, HUGS is included at no additional charge beyond the standard cost of GPU Droplets. Hugging Face also offers custom deployment solutions for businesses through its Enterprise Hub.