Amazon Elastic Container Service

Auto Added by WPeMatico

Build a scalable containerized web application on AWS using the MERN stack with Amazon Q Developer – Part 1

The MERN (MongoDB, Express, React, Node.js) stack is a popular JavaScript web development framework. The combination of technologies is well-suited for building scalable, modern web applications, especially those requiring real-time updates and dynamic user interfaces. Amazon Q Developer is a generative AI-powered assistant that improves developer efficiency across the different phases of the software development […]

Build a scalable containerized web application on AWS using the MERN stack with Amazon Q Developer – Part 1 Read More »

How Amazon scaled Rufus by building multi-node inference using AWS Trainium chips and vLLM

At Amazon, our team builds Rufus, a generative AI-powered shopping assistant that serves millions of customers at immense scale. However, deploying Rufus at scale introduces significant challenges that must be carefully navigated. Rufus is powered by a custom-built large language model (LLM). As the model’s complexity increased, we prioritized developing scalable multi-node inference capabilities that

How Amazon scaled Rufus by building multi-node inference using AWS Trainium chips and vLLM Read More »

Accelerating AI innovation: Scale MCP servers for enterprise workloads with Amazon Bedrock

Generative AI has been moving at a rapid pace, with new tools, offerings, and models released frequently. According to Gartner, agentic AI is one of the top technology trends of 2025, and organizations are performing prototypes on how to use agents in their enterprise environment. Agents depend on tools, and each tool might have its

Accelerating AI innovation: Scale MCP servers for enterprise workloads with Amazon Bedrock Read More »

How Rufus doubled their inference speed and handled Prime Day traffic with AWS AI chips and parallel decoding

Large language models (LLMs) have revolutionized the way we interact with technology, but their widespread adoption has been blocked by high inference latency, limited throughput, and high costs associated with text generation. These inefficiencies are particularly pronounced during high-demand events like Amazon Prime Day, where systems like Rufus—the Amazon AI-powered shopping assistant—must handle massive scale

How Rufus doubled their inference speed and handled Prime Day traffic with AWS AI chips and parallel decoding Read More »