Hey there! In this article we're going to talk about using AWS Bedrock with Rust. At the end of this article, you'll have a API that can take a JSON prompt from a HTTP request and return an answer from AWS Bedrock that can be streamed or returned as a full response.
Interested in the full code? You can find the repository here.
What is AWS Bedrock?
AWS Bedrock is one of the AI-based services that Amazon offers. It allows you to use models directly for inference and generative AI.
Compared to other AWS offerings like SageMaker, you only pay for each API call. This makes it much cheaper to use in a real application compared to SageMaker, which charges you based on instance uptime and can snowball costs very quickly. Bedrock also comes with tools like guard rails to allow you to customise topic/word filters (to mitigate model abuse) and add your own training data.
Getting Started
Setting up your foundational model
Before we get started, you'll need to set up access to the foundational model you want to use and an IAM user. We'll use the Titan Text G1 Express model as an example.
To request access to a model, do the following:
- Log into AWS Console and go to the AWS Bedrock section (it can also be found using the search bar)
- Click "Model Access" on the left hand side - it's somewhere near the bottom.
- Click "Manage Model access" (top right hand side of the table).
- Find the model(s) you want access to, click the appropriate checkbox then click Request Access.
Note that some models require you to elaborate on your use case before AWS will approve access. The Titan Text models will generally grant you immediate access to use them. Once done, you'll need to find (and save) the name of the model ID that you're using!
You can find the endpoint URL you need here, as well as the model ID here.
Setting up an IAM user
You'll also need an access key ID and secret access key from an IAM user:
AWS_ACCESS_KEY_ID
(your Access Key)AWS_SECRET_ACCESS_KEY
(your Secret Access Key)
Both can be found in your IAM user credentials if you've already set a user up. If you don't have an appropriate user with policies, you can get started quickly by doing the following:
- Go to the Users menu
- Start creating a user and go to the "Attach policies" section (then search "Bedrock")
- Here you can either use the "AmazonBedrockFullAccess" policy which gives you full access to Bedrock on that user, or you can create a custom policy. Select one and finish creating your user. Access to Bedrock is required, as otherwise you won't be able to use it!
- Go back to the Users menu and click on your newly created user
- Go to "Access keys" and follow the prompt (clicking "Application outside of AWS"). Don't forget to store your Access Key ID and Secret Access Key!
In production, you may want to go a step further and create a Group that you can then attach policies and users to.
Initialisation
To get started, we're going to use shuttle init
(requires cargo-shuttle
installed) to initialise our template, picking Axum as the framework.
Next, we're going to add our dependencies:
You'll also want to store your secrets in a Secrets.toml
file (in the root of your project) like so:
Setting up the AWS config
To get started, we'll create a function that will take secrets from our Secrets.toml file that we created earlier.
This can be done by adding the #[shuttle_runtime::Secrets]
macro to our main function:
On a local or deployment run, the secrets macro will allow the Shuttle runtime to read the Secrets.toml
file!
Next we'll grab our secrets, then create our AWS Credentials
struct then create an aws_config
Config type. This will allow us to create the AWS Bedrock Runtime client, as well as any other client from the AWS Rust SDK that we need.
Onto using the runtime itself!
Using the Bedrock runtime
Pre-requisites
Before we start using the Bedrock runtime, you'll need:
- a model ID (for a model that you have access to)
- the identifier of the guardrail you want to use (if you're using one)
- the content type (JSON by default)
You can check out the model IDs here.
Getting a Prompt Response
For this example, we'll be using the Titan Text G1 Lite model. Although some models may differ in their request bodies and/or response body shape, the process is largely the same.
Before we can write our endpoint, we'll need to define a few structs:
- A JSON input (that contains the prompt)
- A struct that models the HTTP response from Bedrock for our model
- A struct that models the HTTP request to Bedrock for our model
Now onto making our prompt handler! We'll set up a function like so (note that we use destructuring to get access to the inner variable from our JSON prompt struct):
Next, we'll need to use our client from shared state to send a request to Bedrock.
After this, we need to get the response, convert the response body to a &[u8]
and deserialize it back into a response body struct.
Because the text results come back as a Vec
, we'll then use .first()
to then get the first results and immediately return the text as a HTTP string:
Streamed Prompt Responses
Often, it can be better to get a streamed response from a model. Models can often take a long time to formulate a full answer, so having a streamed response can greatly assist with user retention by not requiring them to wait for Bedrock to fully finish processing the tokens.
As you can see below, adding the method for a response stream does not require much change for requesting something from Bedrock:
However, for our response we do need to make some changes. To be able to create a stream, we need to declare a variable that is compatible with the futures::stream::Stream
type. We can use the stream::unfold
function that takes a variable, then puts the mutable state in a closure and then we can do as we like within our stream. The only requirement for this is that we return the item we want to output, as well as the state itself (so the stream can progress).
This would look something like this:
Interacting with your API
To make sure that the previous endpoint works, you can use curl on your service:
Note that the above snippet is for a non-streamed response. If you want to receive a streamed response using curl, you need to add the --no-buffer
flag:
The reason why you need to do this is because curl by default stores the response in a buffer. By removing the buffer (via the flag), you can immediately receive the text as it comes.
If you want to serve your response from a HTTP webpage, you need to set up a new TextDecoder
using JavaScript. This is the whole frontend HTML file that you need:
If you want to add this file to your Axum service, you'll want to install tower-http
with the fs
flag enabled:
This will allow you to serve a whole directory (or file) on your web service!
We can do this by adding the HTML file to a subfolder of our project root and then declaring it in Shuttle.toml
. Generally, we can use a wildcard - if we have a folder called static
(aptly named to hold all our static assets), we can declare it like so:
Then in our router, we would add tower_http::services::ServeDir
as a tower
layer for our Axum application:
Wrapping it all up
Now it's time to hook it all up! We'll change our axum::Router
so that it should only have the prompt routes, as well as including the static assets we talked about earlier (you can remove this if you're not using them).
Your Shuttle main function should look like this:
Deployment
To deploy, all you need to do is shuttle deploy
(with the --allow-dirty
flag if working from a Git branch with uncommitted changes) and watch the magic happen! Once finished, you'll get a message containing information about your deployment as well as where you can reach the deployment URL.
Shuttle will also cache your dependencies, so if you need to re-deploy you won't have to worry about waiting to re-compile!
Finishing up
Thanks for reading! By integrating Rust with AWS Bedrock, you can harness the power of both technologies to build scalable, reliable, and maintainable systems.
Read more: