Amazon has announced a new series of multimodal generative artificial intelligence (AI) models, named "Nova," on Tuesday. These models will be released as part of the Amazon Bedrock model library within Amazon Web Services (AWS).
The company introduced four text-generating models: Micro, Lite, Pro, and Premier. Currently, Micro, Lite, and Pro are available only to AWS customers, while Premier is set to be introduced in early 2025.
In addition to the text-generating models, Amazon also unveiled two new generative models: Nova Canvas for image generation and Nova Reel for video generation.
Micro, Lite, Pro, and Premier
The text-generating Nova models are optimised for 15 languages and vary in size and capabilities. Micro offers rapid text processing with a 128,000-token context window.
Meanwhile, the Lite and Pro models can process text, images, and video. Pro is designed to offer a balance of precision, speed, and cost, making it suitable for a wide range of tasks.
Premier, on the other hand, is geared toward more complex tasks, including custom model creation. Additionally, the context windows for these models will expand to two million tokens in 2025.
Canvas and Reel
Canvas and Reel represent some of AWS’s most impactful breakthroughs in generative media to date. Canvas allows users to create and edit images based on text prompts. It can also extend existing images and insert objects or scenes of your choice, making it highly versatile for creative work.
Meanwhile, Reel is an advanced model designed to generate videos up to six seconds in length using text prompts. AWS has also hinted that a variant of Reel, capable of generating two-minute-long videos, is "coming soon."
Upcoming AI Nova model
In addition to the existing models, Amazon CEO Andy Jassy announced plans for a speech-to-speech model, which will process speech input and generate a transformed version of it. This model is expected to be released in Q1 2025.
Furthermore, an "any-to-any" model will be introduced around mid-2025. This model of Amazon will allow users to input text, speech, images, or video, and output any of these formats in return. "This is the future of how frontier models are going to be built and consumed," Jassy stated.