In this article, we're going to talk about building and deploying an uptime-monitoring web service in Rust!
Interested in just deploying your uptime monitor? You can find that in 2 steps:
- Open your terminal and run
shuttle init --from joshua-mo-143/shuttle-monitoring-template
(requirescargo-shuttle
installed) and follow the prompt - Run
shuttle deploy --allow-dirty
and watch the magic happen!
For everyone who wants to learn how to build it, let's get started - if you get lost, you can find the repo with the final code here.
Getting Started
Firstly, we'll initialize our project using shuttle init
(requires cargo-shuttle
to be installed). Make sure you pick Axum as the framework!
Now you'll want to install all of your dependencies. You can do that with the following shell snippet below:
Once that's done, we will want to install sqlx-cli
to handle adding migrations. We can then run sqlx migrate add init
and it will create a new migrations folder, along with an SQL file in it that we can use for migrations. Add the following text into the migration file:
Note that created_at
defaults to truncate the time to the current minute; this helps with timing later on as we want to be able to split the requests down into per-minute records. Additionally, adding both website_alias
and created_at
in the unique constraint makes it so that the constraint is only violated when a new record containing a combination of both the website alias and timestamp is inserted.
Next, we'll want to add a database annotation to our program and add it as shared state to our Axum web service:
With one line of code, we've now given ourselves a database! Locally, Shuttle will use Docker to provision a Postgres container for us. In production, we are automatically provisioned one by Shuttle's servers with no input required on our part. Normally, it would be a bit of a pain to do manually but this has saved us some time.
Lastly, here is the list of imports we'll be using - make sure to add this to the top of your main.rs
file:
Building
There are three main parts to this: The frontend, the backend, and the actual monitoring task itself. Let's start with the monitoring task first.
Monitoring task
The monitoring task itself is simple enough: we fetch the list of websites from the database, then sequentially send a HTTP request to each of them and record the results in Postgres. Firstly, we'll create the struct that we want to represent the Website
:
Here we instantiate a reqwest
client and then fetch all of the websites that we want to search for:
We could use fetch_all()
, but by using fetch()
we get a Stream
back and can skip dealing with RowNotFound
SQL errors by checking whether or not there are any rows to fetch.
Now that we have our list of websites (or lack thereof), we can try to send a request sequentially to each website that exists then store the results of the fetch in our database:
While we don't store the response body, at this point there's not much reason to do so. The response status should give us all the information we need.
In the full function, we want to use tokio::time::interval
to be able to accurately start the loop again. See below for the code block which includes this:
Backend
Before we start our backend, we should probably add an error handling enum. Let's add an enum with one arm, implement axum::response::IntoResponse
for it (currently aliased in our program as AxumIntoResponse
) and then implement From<sqlx::Error>
for easy error propagation:
Now when we use any SQL queries, we can propagate the error by using ?
instead of unwrapping or having to manually deal with error handling! Of course, there will be certain situations where we can handle errors manually, but this will save a lot of pattern matching (or alternatively, .unwrap()
or .expect()
) within our code. We can also use it as a return type in our handler functions.
To start working on our backend, we can create an initial route to add a URL to monitor. You may have noticed earlier we added the Validate
derive trait. This allows us to validate the form data using preset rules and automatically return an error if the validation fails. In this case, we used #[validate(url)]
- so if the string isn't in a URL format it will automatically break:
Now let's write a route to grab all the websites we're monitoring, as well as get a quick report on what the uptime was like in the last recorded 24 hours (filling in any gaps where required) for each website.
Like before with the monitoring tasks, we need to grab a list of all of the websites we're currently tracking and add them to a vector of website data (except we will assume there are results there. If not, askama
will handle it for us by automatically not rendering any records):
Once this is done, we'll then need to make a new Vector. This will hold all the website URLs (and respective aliases) as well as a list of timestamps with the uptime percentage over the last 24 hours, calculated per hour from how many HTTP requests returned with 200 OK
. We then need to check if there's any gaps and fill them in with a None
. This lets the person viewing the data know that no data was recorded at the time (for example if we're either developing locally, or if the service itself had an outage and was unable to record data).
To start with, let's write a function for getting the daily stats for a given URL:
Although the SQL function looks quite complicated, it basically retrieves a truncated date (down to the hour) and an uptime percentage based on the recorded data and what percentage recorded 200 OK
response status. The query then aggregates them by timestamp and limits it to the last 24 hours.
Of course, this can lead to some awkward gaps in our data - for example, what if our monitoring web service goes down? We can't exactly just go back in time and record the data! We can, however, make up for this by filling in the gaps with a new fill_data_gaps
function that will inform the webpage viewer that there's no data points for a given time.
We can do this by declaring an enum:
This enum will allow us to differentiate what period we want data over. We can abstract this block into its own function:
Once done, we'll want to create a dynamic route for getting more information about a monitored URL. We can use this page to display things like past incidents/alerts. This function will follow most of the previous handler function except we're fetching one URL and additionally grabbing any records of HTTP requests that didn't return 200 OK
and labelling them as "Incidents".
As you can see below it is mostly the same as grabbing data for all websites, except now we also have last month's data (split by day) to add to our return results:
Finally, we need to create a route for deleting a website. This will be a two-step process where we need to delete all of the website logs and then the URL itself, then return 200 OK
if everything went well. We can use a transaction to rollback if there's an error anywhere in this process and manually return ApiError
:
Frontend
Now that we've written our backend, we can use askama
with htmx
to write our frontend! If you'd like to skip over this part and just grab the files, you can do so from the repo - but make sure you don't forget to write the styles
handler function below so your web server can find it! This function can be found at the bottom of this subsection or in the repo.
We'll want to make four files:
- A
base.html
file that will hold the head of our HTML so we don't need to write it in every file. - An
index.html
file. - A
single_website.html
file (for grabbing information about an individual monitored URL). - A
styles.css
file.
We'll first want to make our base.html
file:
This will be extended in the rest of the templates so that we won't need to constantly copy and paste the HTML head every time we want to use HTMX or the Google fonts.
Here is the HTML for our main page:
As you can see here, we extend the base.html
template and then loop through the websites we found earlier in our SQL query. We then display the timestamps as coloured circles depending on what the uptime percentage is (note that None
means there's a data gap). Although we unwrap the uptime percentages here, we already match it beforehand to make sure it is a Some
variant.
You may have noticed we're using tooltips to display the time of day that the request was taken as well as displaying the uptime percentage on the webpage. In the CSS file, we add styling so that when you hover over a circle, a tooltip will display the timestamp the circle represents as well as the exact uptime percentage.
The single URL HTML webpage is also mostly the same, except we're also adding an incident list:
Now it's time to add the CSS styling! The CSS file is extremely long; for the sake of not overwhelming you with code blocks, you can find it here. However, if you'd like to add your own styling, you're free to do so! We also additionally need to add the CSS file handling route - otherwise, our HTML won't be able to find it. We can do this like so:
Then when we add the route to our Router
, we need to specify the route as /styles.css
.
Hooking everything up
Now it's time to hook everything up! All you need to do is to create the AppState
, spawn the monitoring request loop as a tokio
task and then create your Router
:
Deploying
Now it's time to deploy! Type in shuttle deploy
(add --allow-dirty
if on a dirty Git branch) and watch the magic happen. When your program has deployed, it'll give you the URL of your deployment where you can try it out as well as deployment ID and other details like your database connection (password is hidden until you use the --show-secrets
flag).
Finishing up
Thanks for reading! I hope you have learned a little bit more about Rust by writing an uptime monitoring service.