AWS Trainium

How Amazon scaled Rufus by building multi-node inference using AWS Trainium chips and vLLM

Amazon EC2, Amazon Elastic Container Service, Architecture, AWS Trainium, Customer Solutions / aiepicentre

At Amazon, our team builds Rufus, a generative AI-powered shopping assistant that serves millions of customers at immense scale. However, deploying Rufus at scale introduces significant challenges that must be carefully navigated. Rufus is powered by a custom-built large language model (LLM). As the model’s complexity increased, we prioritized developing scalable multi-node inference capabilities that […]

How Amazon scaled Rufus by building multi-node inference using AWS Trainium chips and vLLM Read More »

Boost cold-start recommendations with vLLM on AWS Trainium

Artificial Intelligence, AWS Trainium / aiepicentre

Cold start in recommendation systems goes beyond just new user or new item problems—it’s the complete absence of personalized signals at launch. When someone first arrives, or when fresh content appears, there’s no behavioral history to tell the engine what they care about, so everyone ends up in broad generic segments. That not only dampens

Boost cold-start recommendations with vLLM on AWS Trainium Read More »

Enabling customers to deliver production-ready AI agents at scale

Amazon Bedrock, Amazon Connect, Amazon Nova, Amazon Q, Amazon Simple Storage Service (S3), Announcements, AWS Inferentia, AWS Trainium, AWS Transform, Featured, Thought Leadership / aiepicentre

AI agents will change how we all work and live. Our AWS CEO, Matt Garman, shared a vision of a technological shift as transformative as the advent of the internet. I’m energized by this vision because I’ve witnessed firsthand how these intelligent agent systems are already beginning to solve complex problems, automate workflows, and create

Enabling customers to deliver production-ready AI agents at scale Read More »

How Rufus doubled their inference speed and handled Prime Day traffic with AWS AI chips and parallel decoding

Amazon Elastic Container Service, AWS Batch, AWS Inferentia, AWS Trainium, Generative AI, Technical How-to / aiepicentre

Large language models (LLMs) have revolutionized the way we interact with technology, but their widespread adoption has been blocked by high inference latency, limited throughput, and high costs associated with text generation. These inefficiencies are particularly pronounced during high-demand events like Amazon Prime Day, where systems like Rufus—the Amazon AI-powered shopping assistant—must handle massive scale

How Rufus doubled their inference speed and handled Prime Day traffic with AWS AI chips and parallel decoding Read More »

Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia

AI/ML, AWS Inferentia, AWS Neuron, AWS Trainium, Generative AI, Neuron, Technical How-to / aiepicentre

PixArt-Sigma is a diffusion transformer model that is capable of image generation at 4k resolution. This model shows significant improvements over previous generation PixArt models like Pixart-Alpha and other diffusion models through dataset and architectural improvements. AWS Trainium and AWS Inferentia are purpose-built AI chips to accelerate machine learning (ML) workloads, making them ideal for

Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia Read More »

How Amazon scaled Rufus by building multi-node inference using AWS Trainium chips and vLLM

Boost cold-start recommendations with vLLM on AWS Trainium

Enabling customers to deliver production-ready AI agents at scale

How Rufus doubled their inference speed and handled Prime Day traffic with AWS AI chips and parallel decoding

Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia

Don’t miss out the latest Ai resources

AIE