Las Vegas
AWS re:Invent just around the corner, and Amazon has already started to pre-announce some of the new services and features.
In keeping with the tradition of releasing predictions and wishlists prior to re:Invent, here is my list of the top 5 generative AI announcements to expect from this massive conference:
1. AWS users will get an official AI assistant for cloud operations
Microsoft has Copilot, and Google has Duet AI, but what about AWS? Amazon CodeWhisperer may be the answer, but it only caters to developers targeting code completion within IDEs. How about an AWS Ops Whisperer?
I expect AWS to unveil a comprehensive AI assistant strategy at re:Invent this year, allowing users to interact with the cloud platform through conversational AI and a chatbot interface. It may also reveal the platform and a set of tools required to develop custom AI assistants that can be integrated with external data and software.
Consider a chatbot in the AWS Console that accepts prompts like “launch an EC2 instance in Singapore optimized for running my NGINX web server based on the same configuration used with the Ireland region launched yesterday.” This is a highly contextual and effective method of implementing DevOps, CloudOps, and even FinOps on AWS.
Imagine performing a post-mortem and root cause analysis of an incident by simply asking the right questions to the AI assistant, which can aggregate logs and metrics from CloudWatch and CloudTrail. Users can interact with the AWS AI assistant to find out which region consumed the most expensive service in the previous month. While these queries may look primitive and simple, this idea truly revolutionizes operations, and the opportunities are just endless. AWS may even open up a marketplace for assistants that users can publish and even monetize.
The same concept can be easily extended to AWS CLI and Cloudformation to make automation intelligent. AWS CLI automatically recommends additional parameters and configurations based on best practices for cost security and performance optimization. You can interact with the AWS CLI through simple prompts, and an AI model running in the cloud turns them into a complex AWS CLI, Cloudformation template or CDK script, saving DevOps engineers hours of toil.
Eventually, AWS may build dedicated AI assistants for each of the job functions, such as infrastructure provisioning, storage operations, database operations, security operations and finance operations. These assistants will have a deep understanding of customer environments and historical data to suggest and recommend the most optimal way of using AWS services.
The AWS AI assistant could be the ultimate solution to dealing with the most complex and ever-growing cloud services platform of our time.
2. A new category of managed database services based on vector databases
The success of LLM-based applications is dependent on vector databases. They provide LLMs with long-term memory by remembering the history of the conversations and also providing contextual inputs to avoid hallucinations.
AWS has already added vector support for PostgreSQL running on the Amazon RDS and Amazon Aurora platforms. It also introduced a vector engine for Amazon OpenSearch Serverless to index, search and retrieve embeddings.
While it makes sense to add vector capabilities to existing databases, customers require a dedicated and cost-effective vector database that serves as a single source of truth for storing and retrieving data. A centralized vector database is preferable to a co-located vector database attached to each source in a scenario where customers’ structured and unstructured data is distributed across object storage, NoSQL, relational databases and the data warehouse.
Amazon also has the opportunity to introduce additional features, including efficient similarity search algorithms, inbuilt text embedding models and provisioned throughput to add value to the vector database.
Finally, I expect AWS to add vector support to Amazon Neptune, the graph database that can bring knowledge graphs to search, making the context rich and relevant.
3. Serverless RAG pipelines connecting various AWS data services to LLMs
Amazon Bedrock has a feature called knowlegde base that connects data sources with vector databases to help you build agents. However, the service, which is still in beta, is too basic and leaves a lot to the developer. It does not, for example, provide enough options for choosing different embedding models, vector databases, and LLMs.
One of the knowledge base service’s major drawbacks is its inability to keep the vector database in sync with the data source. When a PDF is deleted from an S3 bucket, it is unclear whether the associated vectors are also deleted. Furthermore, the user experience of creating a knowledge base leaves a lot to be desired. Configuring the required IAM role, data sources, and target LLMs is clumsy and time-consuming.
Amazon is likely to develop a serverless RAG pipeline that combines the best of Amazon Bedrock, AWS Glue, and AWS Step Functions. Customers should be able to begin with a blank canvas and then add one or more data sources to monitor the state and update the vector database. The same user interface must be used to select embedding models, vector databases, semantic search algorithms, prompt templates, and, finally, the target LLM. The retrieval augmented generation pipeline is implemented behind the scenes using AWS IAM, AWS Lambda, Amazon Bedrock and other services. The service’s knowledge base and agent capabilities should be combined into a single, unified serverless infrastructure and developer experience.
By connecting the agents to AWS Lambda and the API Gateway, this serverless platform can be extended to expose the agents as REST endpoints. Customers can gain insights into how the agents interact with the LLMs and the identity associated with each call by extending the observability to CloudWatch. This could be modeled after LangChain’s LangServer and LangSmith, which have similar capabilities.
This service can become the foundation for building an enterprise-grade no-code or low-code tool to build agents and AI assistants in the future.
4. A new and more improved large language model than Titan
Despite the fact that Amazon has Titan, its own LLM, available as a Bedrock foundation model, its performance is not comparable to proven models such as the GPT-4.
Instead of Titan, the majority of the AWS generative AI documentation, tutorials, and reference architectures use Claude 2 from Anthropic. Even the most recent no-code fun tool, PartyRock, is based on Claude (Anthropic), Jurassic (AI21) and Command (Cohere) LLMs rather than Titan, indicating the level of confidence in using the homegrown model in production.
There are rumors that Amazon is working on a better LLM codenamed “Olympus”, which has 2 trillion parameters. The group in charge of shipping Amazon’s own foundation models is under the leadership of Rohit Prasad, a former Alexa executive who now reports directly to CEO Andy Jassy. In his capacity as Amazon’s lead scientist for artificial general intelligence (AGI), Prasad brought in scientists from the Amazon Science team and those working on Alexa AI to focus on model training, providing specialized resources to tie together the company’s AI initiatives.
The new LLM may include a text embedding model that provides efficient techniques to vectorize text while preserving the context and semantic meaning.
The rumored LLM could become the de facto standard for all LLM-powered applications developed by Amazon and its ecosystem, such as agents and chatbots.
5. Multimodal AI in Amazon Bedrock
Finally, Amazon is expected to enhance Bedrock’s multimodal capabilities through the integration of LLMs and diffusion models. It may either develop its own models or bring existing multimodal models, such as the Large Language and Vision Assistant, or LLaVA. This model would be similar to OpenAI’s GPT-4V, which accepts an image or a textual prompt to respond to a user query.
Amazon may develop tools and wrappers to help with multimodal AI prompt engineering, which may become a significant feature of Amazon Bedrock.
I intend to publish a detailed analysis of the AWS re:Invent 2023 news and announcements. Stay tuned.