Multipart uploads to Cloudlfare R2 + Workers

This tutorial will show you how to implement a multipart upload to a cloudflare R2 bucket using cloudflare workers and the aws-sdk. You will find this useful if you need to upload files larger than 5GB to cloudflare R2.

Github repo with the code for this tutorial can be found here

Configure R2 bucket
Setup a basic R2 worker
Server side logic for multipart upload
Client side logic for multipart upload
Testing it out

Configure your R2 bucket#

Before writing any application logic we first need to setup a bucket in cloudflare R2.

Create a bucket#

Navigate to the cloudflare dashboard and click on R2 in the left hand side bar.
On the overview page click on Create Bucket.
Give the bucket a name like example and click Create Bucket.

create_bucket_steps

Create API credentials#

Next we need to create an api token to access the bucket. If you go back to the overview tab you should your new bucket created and on the right a Manage R2 API Tokens button.

create_api_key

Click Manage R2 API Tokens > Create API Token. Then in the form give your token a name like example token and set the permissions to Admin Read & Write. This is only for demonstration purposes and you should do your own research on what permissions you need. Scroll down and click Create API Token.

name_permissions_api_key

This will take you to a page with your API token values. For this to work you only need Access Key ID, Secret Access Key and the url under Use jurisdiction-specific endpoints for S3 clients: allthough it is wise to save the Token value if you plan on using it later.

api_key_values_to_save

Create R2 CORS policy#

Next we need to add a CORS policy to the bucket. This will allow you to serve the files from the bucket later on.

Navigate to your bucket in the R2 dashboard.
Select the Settings tab.
Scroll down to the CORS Policy section and click Add CORS Policy.
Paste the following json into the text box and save

[
  {
    "AllowedOrigins": [
      "*"
    ],
    "AllowedMethods": [
      "GET",
      "PUT",
      "POST",
      "DELETE"
    ],
    "AllowedHeaders": [
      "*"
    ]
  }
]

Setup a basic R2 worker#

Make a folder called example-r2-multipart-upload and install wrangler and aws-sdk with the following commands

mkdir example-r2-multipart-upload 
cd example-r2-multipart-upload 
npm init -y
npm install wrangler --save-dev
npm install aws-sdk

Authenticate wrangler with cloudflare with the following command. This should open a browser window.

npx wrangler login

After that veify that you are logged in with the following command. It should print your account email and id to the console.

npx wrangler whoami

Next create folder called src with a index.js file inside it.

mkdir src
touch src/index.js

Add the following code to src/index.js

export default {
	async fetch(request, env, ctx) {
		return new Response('Hello World!');
	},
};

Next create a wrangler.toml file with the following contents

# must me the same as the folder name of the project
name = "example-r2-multipart-upload"
main = "src/index.js"
# use current date 
compatibility_date = "2024-05-07" 
compatibility_flags = ["nodejs_compat"]


[vars]
R2_ACCESS_KEY_ID = "<your access key id>"
R2_SECRET_ACCESS_KEY = "<your secret access key>"
# under "jurisdiction-specific endpoints for S3 clients" when creating the api key
CF_ENDPOINT = "<your endpoint url>"

Your project file structure should look like this. Excluding the node_modules folder

.
├── package-lock.json
├── package.json
├── src
│   └── index.js
└── wrangler.toml

This is a basic hello world worker and to test it locally run the following command.

npx wrangler dev

This should start a local sever at http://localhost:8787. If you navigate to that url you should see Hello World! printed

Next we need to deploy the worker to cloudflare with the following command.

npx wrangler deploy

This should print the url of the deployed worker in the format https://example-r2-multipart-upload.username.workers.dev

If you open this in the browser it should look exactly like the local version. remote_hello_world

If you navigate to the cloudflare dashboard you should see the worker deployed. worker_in_cf_dashboard

Server side logic for multipart upload#

Multipart uploads are a three part process on the server side.

Iniating the upload
Uploading the individual parts
Finalizing the upload

Eeach one of these operations has a specific aws-sdk command to be implmented at a url endpoint of our worker. I will go into detail about how to implement the upload from the client side after we have the worker logic in place.

Setup the ‘router’ for the worker#

Add the following code to src/index.js.

import {
  S3Client,
  CreateMultipartUploadCommand,
  UploadPartCommand,
  CompleteMultipartUploadCommand
} from "@aws-sdk/client-s3";


export default {
  async fetch(request, env, ctx) {

    const S3 = new S3Client({
      region: "auto",
      endpoint: env.CF_ENDPOINT,
      credentials: {
        accessKeyId: env.R2_ACCESS_KEY_ID,
        secretAccessKey: env.R2_SECRET_ACCESS_KEY,
      },
    });

    const headers = {
      "Access-Control-Allow-Origin": "*",
      "Access-Control-Allow-Methods": "GET, OPTIONS, PUT",
      "Access-Control-Allow-Headers": "*"
    }

    const url = new URL(request.url);
    const base = url.pathname.split('/').slice(1)[0];

    if (request.method === 'GET') {
      if (base === 'getMultiPartUpload') {
        return getMultiPartUpload(S3, request, headers);
      }
    }

    if (request.method === 'POST') {
      if (base === 'uploadPart') {
        return uploadPart(S3, request, headers);
      }
      if (base === 'completeMultipartUpload') {
        return completeMultipartUpload(S3, request, headers);
      }
    }
  }

};

Explanation of the above code

1 We import the necessary aws-sdk commands for the multipart upload and creating a new S3 client.

import {
  S3Client,
  CreateMultipartUploadCommand,
  UploadPartCommand,
  CompleteMultipartUploadCommand
} from "@aws-sdk/client-s3";

2 We create a new S3 client with the endpoint and credentials from our wrangler.toml file. env is an object that contains the values under [vars] in the wrangler.toml file.

const S3 = new S3Client({
  region: "auto",
  endpoint: env.CF_ENDPOINT,
  credentials: {
    accessKeyId: env.R2_ACCESS_KEY_ID,
    secretAccessKey: env.R2_SECRET_ACCESS_KEY,
  },
});

3 We define the headers that will be returned with each response. without these headers the browser will get a CORS error.

const headers = {
  "Access-Control-Allow-Origin": "*",
  "Access-Control-Allow-Methods": "GET, OPTIONS, PUT",
  "Access-Control-Allow-Headers": "*"
}

4 We parse the url and get the first part of the path. This will serve as a router for the worker.

const url = new URL(request.url);
const base = url.pathname.split('/').slice(1)[0];

5 We check the method and the path of the request and call the appropriate function either getMultiPartUpload,uploadPart or completeMultipartUpload.

The specific parameters for each function will be stored either in the request body or the url query parameters so we just need to pass the the S3 request and headers to each function.

if (request.method === 'GET') {
  if (base === 'getMultiPartUpload') {
    return getMultiPartUpload(S3, request, headers);
  }
}

if (request.method === 'POST') {
  if (base === 'uploadPart') {
    return uploadPart(S3, request, headers);
  }
  if (base === 'completeMultipartUpload') {
    return completeMultipartUpload(S3, request, headers);
  }
}

Implement the `getMultiPartUpload` function#

In this function we parse out the bucket and key from the url query parameters and call the CreateMultipartUploadCommand

async function getMultiPartUpload(S3, request, headers) {
  try {
    const url = new URL(request.url);
    const params = url.searchParams;
    const bucket = params.get('bucket');
    const key = params.get('key');

    const command = new CreateMultipartUploadCommand({
      Bucket: bucket,
      Key: key
    });

    const response = await S3.send(command);

    return new Response(JSON.stringify({
      msg: 'Success: /getMultiPartUpload',
      response: response
    }), {
      status: 200,
      headers: headers
    });

  } catch (err) {
    return new Response(JSON.stringify({
      msg: 'Error: /getMultiPartUpload',
      error: err
    }), {
      status: 500,
      headers: headers
    });
  }
};

This function should return a response with the uploadId that we will store on the cleint side and pass to all requests to uploadPart and completeMultipartUpload

{
  "Bucket": "examplebucket",
  "Key": "largeobject",
  "UploadId": "ibZBv_75gd9r8lH_gqXatLdxMVpAlj6ZQjEs.OwyF3953YdwbcQnMA2BLGn8Lx12fQNICtMw5KyteFeHw.Sjng--"
}

Implement the `uploadPart` function#

In this function we parse out the bucket, key, partNumber and uploadId from the the url query parameters and the file from the request body. We then call the UploadPartCommand

It should return a response with the ETag of the uploaded part. We will store this on the client side we can pass it to the completeMultipartUpload function. which will finalize the upload.

{
  "ETag": "\"d8c2eafd90c266e19ab9dcacc479f8af\""
}

Add this function to src/index.js

async function uploadPart(S3, request, headers) {

  try {
    const url = new URL(request.url);
    const params = url.searchParams;
    const bucket = params.get('bucket');
    const key = params.get('key');
    const partNumber = params.get('partNumber');
    const uploadId = params.get('uploadId');

    const formData = await request.formData();
    const fileData = formData.get('file');

    const input = {
      "Body": fileData,
      "Bucket": bucket,
      "Key": key,
      "PartNumber": partNumber,
      "UploadId": uploadId
    };

    const command = new UploadPartCommand(input);
    const response = await S3.send(command);

    return new Response(JSON.stringify({
      msg: 'Success: /uploadPart',
      response: response
    }), {
      status: 200,
      headers: headers
    });

  } catch (err) {
    return new Response(JSON.stringify({
      msg: 'Error: /uploadPart',
      error: err
    }), {
      status: 500,
      headers: headers
    });
  }
}

Implement the `completeMultipartUpload` function#

Finally in this function we parse out the bucket, key and uploadId from the query parameters and the parts from the request body. We then call the CompleteMultipartUploadCommand

async function completeMultipartUpload(S3, request, headers) {

  try {
    const url = new URL(request.url);
    const params = url.searchParams;
    const bucket = params.get('bucket');
    const key = params.get('key');
    const uploadId = params.get('uploadId');

    const partsData = await request.json();
    const parts = partsData.parts;

    const input = {
      "Bucket": bucket,
      "Key": key,
      "UploadId": uploadId,
      "MultipartUpload": {
        "Parts": parts
      }
    }

    const command = new CompleteMultipartUploadCommand(input);
    const response = await S3.send(command);

    return new Response(JSON.stringify({
      msg: '/completeMultipartUpload',
      response: response 
    }), {
      status: 200,
      headers: headers
    });
  } catch (err) {
    return new Response(JSON.stringify({
      msg: 'Error: /completeMultipartUpload',
      error: JSON.stringify(err)
    }), {
      status: 500,
      headers: headers
    });
  }
}

This function should return a response with the Bucket, ETag, Key and Location of the uploaded object. You can use the Location to serve the file from the bucket, although we did not cover that in this tutorial so Location will not be a valid url.

{
  "Bucket": "<your_bucket>",
  "ETag": "\"4d9031c7644d8081c2829f4ea23c55f7-2\"",
  "Key": "<object_key>",
  "Location": "<base_coudflare_endpoint>/<your_bucket>/<object_key>" 
}

Client side logic for multipart upload#

The client side logic for a multipart upload is also a three part process.

Initiate the upload with a post request to the worker
Split the file into parts and upload each part with a post request
Finalize the upload with a post request to the worker

This is a basic html file with form and embedded javascript function to handle the multipart upload. There’s no styling, error handling or progress bar. You can add that later.

Create a new html file called index.html in the root of the project and add the following code.

<html>
    <body>
        <div>
            <input type="file" id="fileUpload" />
            <button id="uploadButton" onclick="uploadFile()">Upload</button>
        </div>
        <script>

            async function uploadFile() {


                // 1 configure the base url and bucket name 
                const BASE_CF_URL = 'http://localhost:8787';
                const bucketName = 'example';

                const file = document.getElementById('fileUpload').files[0];
                const key = `${file.name}`;

                const CHUNK_SIZE = 5 * 1024 * 1024; // 5MB
                const totalParts = Math.ceil(file.size / CHUNK_SIZE);


                // 2 initiate the upload
                const getMutltiPartUploadUrl = new URL(`${BASE_CF_URL}/getMultiPartUpload`);
                getMutltiPartUploadUrl.searchParams.append('bucket', bucketName);
                getMutltiPartUploadUrl.searchParams.append('key', key);

                const uploadIdResponse = await fetch(getMutltiPartUploadUrl, {
                    method: 'GET'
                });

                const multiPartUploadJson = await uploadIdResponse.json();
                console.log('multiPartUploadJson', multiPartUploadJson);

                const uploadId = multiPartUploadJson.response.UploadId;
                console.log('uploadId', uploadId);

                // 3 upload the parts
                const partsData = [];
                const uploadPartUrl = new URL(`${BASE_CF_URL}/uploadPart`);
                uploadPartUrl.searchParams.append('bucket', bucketName);
                uploadPartUrl.searchParams.append('key', key);
                uploadPartUrl.searchParams.append('uploadId', uploadId);

                for (let i = 0; i < totalParts; i++) {
                    const start = CHUNK_SIZE * i;
                    const end = Math.min(file.size, start + CHUNK_SIZE);
                    const blob = file.slice(start, end);
                    const partNumber = i + 1;

                    const formData = new FormData();
                    formData.append('file', blob);
                    uploadPartUrl.searchParams.set('partNumber', partNumber);

                    const uploadPartResponse = await fetch(uploadPartUrl, {
                        method: 'POST',
                        body: formData
                    });

                    
                    const uploadPartJson = await uploadPartResponse.json();
                    const eTag = uploadPartJson.response.ETag;

                    partsData.push({ PartNumber: partNumber, ETag: eTag });

                    console.log(`${i}/${totalParts}: Uploaded part ${partNumber} with eTag ${eTag} `);

                }



                // 4 finalize the upload
                const completeUploadUrl = new URL(`${BASE_CF_URL}/completeMultipartUpload`);
                completeUploadUrl.searchParams.append('bucket', bucketName);
                completeUploadUrl.searchParams.append('key', key);
                completeUploadUrl.searchParams.append('uploadId', uploadId);

                console.log('partsData', partsData);

                const completeUploadResponse = await fetch(completeUploadUrl, {
                    method: 'POST',
                    body: JSON.stringify({ parts: partsData })
                });

                console.log(await completeUploadResponse.json());

            }
        </script>
    </body>
</html>

Explanation of the above uploadFile function

1.1 We define the base url of the cloudflare worker. This should be the url that was printed when you deployed the worker. You can also use the local url if you are testing locally. You also need to define the bucketName and the key of the file you are uploading.

1.2 We also define the chunk size of the file to be uploaded. This needs to be relatively small to keep the worker cpu time less than 7ms. If the worker takes longer than 7ms you will get a Worker exceeded CPU time limit error.

1.3 We calculate the total number of parts that the file will be split into. The maximum number of parts is 10000 so you should factor that into your chunk size.

const BASE_CF_URL = 'http://localhost:8787';
const bucketName = 'example';

const file = document.getElementById('fileUpload').files[0];
const key = `${file.name}`;

const CHUNK_SIZE = 5 * 1024 * 1024; // 5MB
const totalParts = Math.ceil(file.size / CHUNK_SIZE);

2 We initiate the upload with a get request to the worker. We pass the bucket and key as query parameters. The worker will return a response with the uploadId that we will use for the rest of the upload process.

const getMutltiPartUploadUrl = new URL(`${BASE_CF_URL}/getMultiPartUpload`);
getMutltiPartUploadUrl.searchParams.append('bucket', bucketName);
getMutltiPartUploadUrl.searchParams.append('key', key);

const uploadIdResponse = await fetch(getMutltiPartUploadUrl, {
  method: 'GET'
});

const multiPartUploadJson = await uploadIdResponse.json();
console.log('multiPartUploadJson', multiPartUploadJson);

const uploadId = multiPartUploadJson.response.UploadId;
console.log('uploadId', uploadId);

3 We upload the parts of the file. We loop through the file and slice it into chunks and upload each chunk with a post request to the worker. The chunks should be stored as a FormData object and passed as the body of the request. The worker will return the ETag of the uploaded part. We store the ETag and the partNumber in an array partsData for the final request to completeMultipartUpload.

const partsData = [];
const uploadPartUrl = new URL(`${BASE_CF_URL}/uploadPart`);
uploadPartUrl.searchParams.append('bucket', bucketName);
uploadPartUrl.searchParams.append('key', key);
uploadPartUrl.searchParams.append('uploadId', uploadId);

for (let i = 0; i < totalParts; i++) {
    const start = CHUNK_SIZE * i;
    const end = Math.min(file.size, start + CHUNK_SIZE);
    const blob = file.slice(start, end);
    const partNumber = i + 1;

    const formData = new FormData();
    formData.append('file', blob);
    uploadPartUrl.searchParams.set('partNumber', partNumber);

    const uploadPartResponse = await fetch(uploadPartUrl, {
        method: 'POST',
        body: formData
    });

    
    const uploadPartJson = await uploadPartResponse.json();
    const eTag = uploadPartJson.response.ETag;

    partsData.push({ PartNumber: partNumber, ETag: eTag });

    console.log(`${i}/${totalParts}: Uploaded part ${partNumber} with eTag ${eTag} `);

}

4 Finally we finalize the upload with a post request to the worker. We pass the bucket, key, uploadId and the partsData array as a json object in the body of the request. The worker will return a response with the Bucket, ETag, Key and Location of the uploaded object.

const completeUploadUrl = new URL(`${BASE_CF_URL}/completeMultipartUpload`);
completeUploadUrl.searchParams.append('bucket', bucketName);
completeUploadUrl.searchParams.append('key', key);
completeUploadUrl.searchParams.append('uploadId', uploadId);

console.log('partsData', partsData);

const completeUploadResponse = await fetch(completeUploadUrl, {
    method: 'POST',
    body: JSON.stringify({ parts: partsData })
});

console.log(await completeUploadResponse.json());

Testing it out#

To test this out just open index.html in a browser and select a file to upload.

It will also help to have the browser console open to see the progress of the upload.

If everything works correctly you should be able to see the uploaded object in the cloudflare dashboard.

uploaded_object

Github repo with the code for this tutorial can be found here.

If you found this helpful plese consider donating to my BuyMeACoffee