Introduction
Uploading files to remote servers is an essential operation for many applications. While developing a tool like Appwrite, it is crucial to efficiently handle the uploading of files to use it in their application without being concerned about the efficiency and reliability. Managing files upload comes with a challenge when dealing primarily with huge files. We need to think about efficient resource usage in both the client and server and a reliable upload process. This article will discuss our learning from adding support for huge file upload in Appwrite’s 0.13 release. So let’s get started.
Appwrite is an open source backend-as-a-service that abstracts all the complexity involved in building a modern application by providing you with a set of REST APIs for your core backend needs. Appwrite handles user authentication and authorization, databases, file storage, cloud functions, webhooks, and much more! If anything is missing, you can extend Appwrite using your favorite backend language.
Overview
First, when discussing file uploads using the HTTP protocol, let us understand the basic upload process from client and server perspectives. To upload a file to the client, first, we load the file into memory and make a request over HTTP to the server. The server then receives the file and saves it in a temporary location. Then we can decide to move the file to the appropriate location. The problem with this process is that the server freezes until the file is completely uploaded. And If we want to handle huge files, say gigabytes of files, we would need as many resources, memory, and CPU to handle those files. Also, the server would take much time to respond to requests. If the upload fails in between, the client will have to re-initiate the upload process from the beginning, which wastes time and resources. Also, imagine a server where hundreds of users are trying to upload gigabytes of files simultaneously. The server would need considerable resources to handle those requests. Many requests will have to wait long before the server can respond to them if the server resources are limited, which is not recommended. So this method is suitable for reasonably small file size uploads. Now that we know the challenges of uploading files, we will discuss what options are available to upload files in the next section.
Handling File Uploads
Researching the ways to handle file uploads to web servers, we mainly found two ways. On is the basic upload process we discussed in the previous section. If we use this approach, it is recommended to set a maximum upload size limit to around 20-30 MB to keep things simple and effective. But for us, that was not an option, as a developer tool limiting our users and would be unsuitable for various applications.
The next method that we found was the chunked upload process. Which was efficient, less resource-intensive, and in case of failure, would support resumable upload. Which meant we could use this for smaller as well as huge files. And that is what we decided to implement in Appwrite 0.13 to power our developers. In the next section, we will look into how this method works and its efficiency.
Chunking and Content Range
Until a specific file size, it is okay to use the simple file upload approach. However, when the file size grows, it is recommended to use chunked upload. So how do we upload files using chunks? In this method, we don’t send the whole file in a single request; we instead send a chunk (a part of the file) and make multiple requests with each part. The benefit is that the server doesn’t have to receive the whole file at once but rather receives small chunks. This makes both the upload process and server responsive. Also, we can only upload those chunks that failed; we don’t have to upload the whole file again. This makes uploading large files efficiently, even over unstable connections. Also, this opens up another possibility to upload multiple chunks in parallel, making the upload process much faster if the network bandwidth allows. While sending each chunk, we set the associated content-range header. The server uses the content-range header to determine the total chunks uploaded and part of the file the chunk belongs to. The content-range header is defined in the HTTP specification. The syntax is defined as the following
Content-Range: <unit> <range-start>-<range-end>/<size>
Content-Range: <unit> <range-start>-<range-end>/*
Content-Range: <unit> */<size>
We need to set the unit, generally in bytes, and then the range-start is the start offset of this chunk and range-end is the end offset of this chunk, and finally, the size is the total size of the file. The value of range-start, range-end, and size must be in the unit provided. These headers in each chunk will tell the server exactly how to combine the chunks to get the original file. Now that we know how the chunked upload process works let’s see how we can make the upload process resumable in the next section.
Resumable Upload
With the implementation of chunked upload, it is relatively easy to implement resumable uploads. To make it happen, we can track each successful chunk uploaded and each that has failed during upload. Whenever the upload process stops in between, we will record all the successfully uploaded chunks. Next time, we can start the upload process only for the remaining chunks. This makes it very easy to upload large files over an unstable connection as we can upload small chunks and only upload failed and remaining chunks with as many retries as possible.
Conclusion
Handling file uploads reliably and efficiently can be challenging. We saw in this article that we have two ways of uploading files: one, we upload the whole file in a single request, and another where we upload a file in smaller chunks. We should use the first approach for smaller file sizes, maybe up to 20-30MB, and for any file larger than that, it's best to implement a chunked upload to make the upload process reliable and efficient. We hope this article will be helpful to anyone trying to build an upload service. However, if you are using Appwrite, you will not have to think about all these things. Appwrite handles everything internally to provide reliable and efficient storage service to upload and manage files.