Cutting $350K/year in hosting costs

TL;DR

This is a short story about how we managed to reduce the estimated hosting costs of a Kubernetes cluster, just before going live in production. In short, we had to resolve a memory leak issue and migrate the HTTP server framework (from Next.js routes to a Bun server).

About the project

This story takes place during the mainnet-launch period of Walrus and Walrus Sites, in early March 2025.

These websites are accessed through "portals" that translate user requests into HTTP calls to Walrus. For example, https://wal.app is the domain of the server portal. If you need to access your site, you would request https://name-of-your-site.wal.app. This way, the wal.app portal will fetch the name-of-your-site resources and serve them to the end user.
My colleagues and I were responsible for building the core features of the product, as well as optimizing both speed (increasing requests per second) and reducing cost. Let's explore how things turned out.

Ready, set... wait! Can we even launch?

At that time, we were three weeks away from launching on mainnet. This would mean an exponential increase in traffic.
We thought we were ready until the frontend of a famous NFT airdrop was hosted on our (at the time) live Walrus Sites testnet (staging) portal. Airdrops in general are infamous for creating traffic spikes, and this was no exception. Our portal was flooded with requests, until it became unresponsive. Users were not very happy, nor were stakeholders.

The two culprits

During the post-mortem, we took a look at the charts inside our Kubernetes cluster. The results were the following:

  1. During the airdrop, which lasted one day, we received approximately 113K requests. At the peak of the event, we had 180 req/sec, and things started looking bad at around 100 req/sec. We had three Kubernetes pods then, which means 33 req/sec were enough to overload a single pod.

  2. Looking at the memory usage of each container, the amount of RAM used was only increasing despite the load, which is a bad sign for memory leaks.

After the incident, we made some adjustments and scaled out to 15 pods per region, which is 45 pods in total. Each Kubernetes node could take about 12-16 pods, which meant we were able to handle 150 req/sec in each region for the time being.
But this was not enough. When Walrus Sites became popular, those numbers were not enough to handle future -unforgiving- airdrops. To provision the cluster to serve 1500 req/sec, we would need to spin up 30 Kubernetes nodes, costing $30k per month.
Evidently, we needed to move fast and fix things.

Fixing the memory leak

To fix a memory leak, you first need to understand what it is. By definition, memory leaks occur when a program allocates memory but fails to release it after use. Most of you might know this term from C or C++, since those languages don't have a garbage collector (GC). But in our case, we were using a JavaScript runtime, which does implement it. So why was it possible to have a memory leak here?
The fact that high-level languages have GC is a frequent source of confusion: it can give developers the false impression that they don't need to worry about memory management.

But they do. The reason is lingering references. Let me explain:

The most common GC algorithm is Mark-and-Sweep. In essence, at the start of a program, a root-set is created which contains references to root objects. Root objects are starting points of the program such as global variables, stack variables — anything the program can directly reach without following pointers. Given this, the algorithm is very simple:

  1. When an object is created, its parent references it and it’s marked as in use.
  2. Deleting an object removes its reference from the parent, marking it as unused.
  3. Periodically, the GC traverses reference trees, deallocating unreachable (unmarked) objects.
Therefore, to have a memory leak, we need to always have a reference pointed to an object. The cases where this holds true are:
  1. Circular references: an object A points to an object B, and at the same time B points to A. This means both A and B always have a reference pointed at them, marking them as "in use" by the GC.
  2. Adding objects to root objects without removing them, like a list that keeps growing.
In our case, the issue lay in the latter. At that time, we were tracking requests using Sentry. Because Sentry by default didn't provide the details that we needed, we had to add custom tags to each log message.
import logger from "@lib/logger";
import * as Sentry from "@sentry/bun";

function addLoggingArgsToSentry(args: { [key: string]: any }) {
    Object.entries(args).forEach(([key, value]) => {
        if (key !== "message") {
        	Sentry.setTag(key, value); // memleak🚰
        }
    });
}

The dependency Sentry object was a global. Calling Sentry.setTag(key, value) increased the Sentry object size upon each request, resulting in a memory leak! Migrating our logs to Grafana solved this issue. Easy peasy.

Improving the HTTP server's performance

Fixing the memory leak was good, but we could do better. While Next.js is convenient for quickly bootstrapping and deploying proof of concepts through excellent integration with Vercel, we had to consider the following:
While our HTTP server included just a few lines of code:

  1. Next.js comes with its own “batteries included” handlers in /_next, plus a long security baggage.
  2. Awkward routing: we had to find "hacky" ways to override the file-based routing logic when handling subdomains like https://docs.wal.app.
  3. Vercel did not support Bun at the time, an alternative to Node.js with extraordinary performance.
  4. A lot of security issues. Remember CVE-2025-29927?

So migrating to Bun as a package manager, runtime, and HTTP server improved both our performance and the readability of our code. The migration was simple, because Bun supports most Node.js APIs and package managers.

Results

Following these updates, we noticed the following improvements:

  1. During the first 12 hours after mainnet launch, we received 845.31k hits.
  2. CPU usage stable at ~2%, while memory usage stabilized at ~16-18%.
  3. Our backend servers could now handle 3000 req/sec (compared to 1500 req/sec that we would handle initially on prod data).
  4. Average response time dropped to 0.5 sec, with p99 being under 1 second.
  5. Costs: For each pod, using 1 core CPU & 1G RAM, our node was 16 cores & 128G RAM. Using the c4-highram-16 compute engine pricing at $1.04/hr, we averaged around $760 a month, so 12 pods were about 12/16 * 760 = $570 a month (compared to $30,000 that we had initially). That is 12*($30,000-$570) = $353,160 in a year!
  6. We experienced no downtime whatsoever, even many months after launch!

With these fixes in place, we were finally set to launch—no longer worried about NFT airdrop traffic spikes or dreaded hosting invoices!

Why didn't we do this from the start?

Because serverless! Our initial setup was running on Vercel, which was expensive and infrastructure management was not under our own control. Because we wanted to be in complete control (and responsible) for how our servers scaled, we decided to migrate to a Kubernetes cluster. This meant that we migrated from a stateless environment (serverless) to a stateful (Kubernetes) one. i.e.:

Overall, this was an interesting experience. Tackling challenges like these is what keeps our work exciting!