AWS Batch

Introducing AWS Batch Support for Amazon SageMaker Training jobs

Picture this: your machine learning (ML) team has a promising model to train and experiments to run for their generative AI project, but they’re waiting for GPU availability. The ML scientists spend time monitoring instance availability, coordinating with teammates over shared resources, and managing infrastructure allocation. Simultaneously, your infrastructure administrators spend significant time trying to […]

How Rufus doubled their inference speed and handled Prime Day traffic with AWS AI chips and parallel decoding

Amazon Elastic Container Service, AWS Batch, AWS Inferentia, AWS Trainium, Generative AI, Technical How-to / aiepicentre

Large language models (LLMs) have revolutionized the way we interact with technology, but their widespread adoption has been blocked by high inference latency, limited throughput, and high costs associated with text generation. These inefficiencies are particularly pronounced during high-demand events like Amazon Prime Day, where systems like Rufus—the Amazon AI-powered shopping assistant—must handle massive scale

How Rufus doubled their inference speed and handled Prime Day traffic with AWS AI chips and parallel decoding Read More »

Introducing AWS Batch Support for Amazon SageMaker Training jobs

How Rufus doubled their inference speed and handled Prime Day traffic with AWS AI chips and parallel decoding

Don’t miss out the latest Ai resources

AIE