Architecture

Auto Added by WPeMatico

How Amazon scaled Rufus by building multi-node inference using AWS Trainium chips and vLLM

At Amazon, our team builds Rufus, a generative AI-powered shopping assistant that serves millions of customers at immense scale. However, deploying Rufus at scale introduces significant challenges that must be carefully navigated. Rufus is powered by a custom-built large language model (LLM). As the model’s complexity increased, we prioritized developing scalable multi-node inference capabilities that […]

How Amazon scaled Rufus by building multi-node inference using AWS Trainium chips and vLLM Read More »

Build a serverless audio summarization solution with Amazon Bedrock and Whisper

Recordings of business meetings, interviews, and customer interactions have become essential for preserving important information. However, transcribing and summarizing these recordings manually is often time-consuming and labor-intensive. With the progress in generative AI and automatic speech recognition (ASR), automated solutions have emerged to make this process faster and more efficient. Protecting personally identifiable information (PII)

Build a serverless audio summarization solution with Amazon Bedrock and Whisper Read More »

Build a scalable AI assistant to help refugees using AWS

This post is co-written with Taras Tsarenko, Vitalil Bozadzhy, and Vladyslav Horbatenko.  As organizations worldwide seek to use AI for social impact, the Danish humanitarian organization Bevar Ukraine has developed a comprehensive virtual generative AI-powered assistant called Victor, aimed at addressing the pressing needs of Ukrainian refugees integrating into Danish society. This post details our

Build a scalable AI assistant to help refugees using AWS Read More »