System design use-case: URL Shortener

Kadek Chresna Kharisma
7 min readJan 2, 2025

--

In this article, I wanted to share my journey in learning system design and my attempt at designing a URL Shortener system (let's call it paste.ly).

Scopes

These are use cases of the URL Shortener system,

  1. Users can enter text in the content of the shortener link
  2. Exclude updates and deletions from the user
  3. Users are anonymous
  4. The shortener link has an expiration with the default value of 5 min
  5. Users enter a URL shortener to view the content
  6. Users can get analytics of timely visits to the URL shortener
  7. Service will delete the expired URL shortener
  8. Service has high availability

Implementation

The high availability requires,

  • 5 million users
  • 5 million paste writes per month
  • 50 million paste reads per month

from the requirements above we can calculate how many requests per second,

  • ~2.5 million seconds per month
  • 1 request per second = 2.5 million requests per month
  • 2 requests per second = 5 million requests per month
  • 20 requests per second = 50 million requests per month

from this calculation, we know that it has more read requests than write and is about a 10:1 read-to-write ratio, and the service needs to handle at least 2 writes per second and 20 reads per second. With the assumption the average size of the content of a paste is ~5KB, it will be ~25 GB per month of the disk size needed for the object storage.

High-Level Design

HLD pastely V1

The current design consists of

  • paste.ly service
  • database for analytic
  • paste.ly db for storing paste
  • object storage for files

paste.ly service will handle all incoming services received from the load balancer. The service will also cover creating and retrieving paste data from paste.ly db using PostgreSQL as its main database. For each creation of a paste, a text file will be created in the object storage such as AWS S3 for storing the content of the paste (for local test we used MinIO) and save the file object key into the paste.ly db [1]. For the analytics, we will use TimescaleDB since we need time-series data.

paste.ly service is structured as below

.
├── cmd
├── config
├── deploy
│ ├── chart
│ │ ├── minio-pastely
│ │ ├── pastely
│ │ ├── postgres-operator
│ │ └── postgres-pastely
├── driver
│ ├── cache
│ ├── db
│ └── file-storage
│ ├── minio
│ └── s3
├── env.example
├── go.mod
├── go.sum
├── helper
│ ├── constant
│ ├── env
│ ├── logger
│ ├── pprof
│ ├── prometeus
│ └── transaction
├── internal
│ ├── v1
│ │ ├── model
│ │ ├── repository
│ │ ├── usecase
│ │ └── web
│ └── v2
│ ├── model
│ ├── repository
│ ├── usecase
│ └── web
├── main.go
├── migration

Here’s the full code of paste.ly service

Create Paste

When creating a paste, a request came from the Load Balancer and was received by paste.ly. Inside paste.ly service, the request will be received by the Create API. To generate a unique short URL will also be handled inside the use case layer in Create API. After generating the short URL, the content of the paste will be uploaded to object storage and saved the object key into the paste.ly database.

The paste.ly DB will have a table paste . Each short URL must be unique for each paste this will be ensured from the table and from the algorithm we will be using.


CREATE TABLE IF NOT EXISTS public.paste (
id bigserial NOT NULL,
shortlink varchar(7) NOT NULL,
paste_url varchar(255) DEFAULT ''::character varying NOT NULL,
created_at timestamp DEFAULT now() NOT NULL,
status varchar DEFAULT 'active'::character varying NULL,
expired_at timestamp NULL,
CONSTRAINT paste_pk PRIMARY KEY (id)
);
CREATE UNIQUE INDEX IF NOT EXISTS paste_shortlink_idx ON public.paste USING btree (shortlink);

Besides using the Unique Index, paste.ly service ensures the short URL is unique by using division and modulo with base 62. Why base 62? Our short URL will be using the standard URL character set excluding the special characters [A-Za-z0-9] and those ranges contain 62 characters [3].

Using base 62 above and with a minimal combination of 7 we get 62⁷, around 3.5 million combinations. We use the numbers in base 10 to be converted to base 62. These base 10 numbers are from combining UNIX timestamp and add it with the serial primary key field id.

The paste data struct is defined below,


type Paste struct {
ID int64
Shortlink string
PasteURL string
CreatedAt time.Time
Status string
ExpiredAt time.Time
}


func (p *Paste) GenerateShortURLBase62() {
alphabet := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"

num := time.Now().Unix() + p.ID
for num > 0 {
reminder := num % 62
p.Shortlink = string(alphabet[reminder]) + p.Shortlink
num = num / 62
}

}

For example, the timestamp 1735733728 and the current increment for the serial primary key is 35 . The base 10 to be converted into base 62 is 1735733763 . When the function above is executed it will return b3C7N1 , with the details as below

1  * 62**5
55 * 62**4
28 * 62**3
59 * 62**2
39 * 62**1
53 * 62**0
------------- +
1735733763

Each reminder will be mapped into the range of the character set we define.

1  > b
55 > 3
28 > C
59 > 7
39 > N
53 > 1

View Paste

When the user tries to access the short URL, the Load Balancer will pass the request to the paste.ly service. Inside the paste.ly service, the request will be handled by Read API. The paste data will be fetched by using the unique short URL. After paste data is fetched, the content of the paste will also be received from the object storage. When all of those are completed the data will be cached in the background and then the request will be logged for analytic data.

The analytics will be stored in PostgreSQL using TimescaleDB extension, transforming the table into time-series data with partitioning handled by the TimescaleDB.

CREATE TABLE IF NOT EXISTS paste_log (
time TIMESTAMPTZ NOT NULL,
shortlink text not null
);


SELECT create_hypertable('paste_log', by_range('time'));

Delete Paste

The deletion pastes will be done every midnight or when the usual traffic is low. This implementation utilizes the CronJob object in Kubernetes. The cronjob will hit a Delete API which will update the expired pastes.

Even if the implementation above seems okay, we can do better. We will scale the current design in an attempt to increase its availability while maintaining consistency.

Deep dive

HLD pastely v2

To enhance the performance there are a few steps we must do. First, we set a cache for the Read API. With our use case, we can implement Cache-aside strategy that will only cache the paste that often got fetched.

Second, by implementing a master-replica/slave database type to our main database. Since we deployed our service and its dependency on Kubernetes, we could implement Postgres Operator to handle our case to maintain availability and consistency [4]. The Postgres Operator will handle the load balancing, connection pool, and replication stream for our main database.

To even enhance the performance we can utilize the pre-signed URL [5]. Rather than let the paste.ly service upload the file or retrieve the file straight from the object storage, paste.ly service will provide a secured link for the client to download or upload the file. By using the pre-signed URL we also can handle even large files, but the downside of this enhancement is we need more handlers when we want to implement the CDN.

With this implementation, we’ve met our requirements for availability for the v2 design.

This load test was executed on my local machine with the specifications below,

OS: MacOS
Chip: Apple M2
Memory: 8GB
Storage: SSD

Conclusion

To handle 20 reads per second, we need to set prevention on high peak traffic on a specific popular paste. This can be handled by Cache and combining with the Replica Read SQL should be able to handle what cache misses if there is no trouble with replica lag. To handle 2 writes per second, a single Master SQL should suffice. For object storage, AWS S3 can handle 25GB per month of traffic, while there are options we can get to enhance the performance such as CDN with CloudFront or if we need even faster upload and download for long distances with AWS Transfer Acceleration which of course it is going to cost more [7].

I hope this article can give you a little bit of insight into developing similar use cases. Thank you and have a nice day!

Reference

  1. https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html
  2. https://github.com/kadekchresna/pastely
  3. https://stackoverflow.com/a/1856809/6953158
  4. https://postgres-operator.readthedocs.io/en/stable/
  5. https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html
  6. https://stackoverflow.com/questions/742013/how-do-i-create-a-url-shortener
  7. https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance-design-patterns.html#optimizing-performance-acceleration

--

--

No responses yet