Better File Uploads with Shrine: Direct Uploads

This is 6th part of a series of blog posts about Shrine. The aim of this series is to show the advantages of using Shrine over other file attachment libraries.


So far we were talking about the server side of handling file uploads. However, there is a lot that we can also do on the client side to improve user experience and performance.

Let’s say we have a Photo model with an #image attachment attribute handled by an ImageUploader:

class Photo < Sequel::Model
  include ImageUploader::Attachment.new(:image)
end
class ImageUploader < Shrine
  # ...
end

The simplest file upload worfklow is having a vanilla form with a file field for selecting files, and also a hidden field for retaining uploaded files in case of validation errors.

# for retaining selected files across form redisplays
Shrine.plugin :cached_attachment_data
<form action="/photos" method="post" enctype="multipart/form-data">
  <input type="hidden" name="photo[image]" value="<%= photo.cached_image_data %>" class="attachment-field" />
  <input type="file" name="photo[image]" class="attachment-field" />

  <input type="submit" value="Submit" />
</form>

This alone provides a basic uploading user experience. There are many obvious limitations with the static approach:

  • When the user submits the form with selected files, there is no indicator telling them when the upload will finish.

  • When the user is uploading multiple files at once and the request happens to get aborted, it’s not possible to keep the files that were uploaded so far, because all files are sent in a single request. In other words, multiple uploads are all-or-nothing.

  • Files are validated only after they have been uploaded, which means the user needs to wait until the upload finishes before they can know whether their file was even valid.

We can improve that by asynchronously starting to upload files on the client side as soon as they’re selected. This also gives users the ability to continue filling in other fields while files are being uploaded, because the UI isn’t blocked during the upload.

There are many popular JavaScript file upload libraries out there – jQuery-File-Upload, Dropzone.js, FineUploader etc. – but the one you should use with Shrine is definitely Uppy :dog:. Uppy is a modular library that knows how to upload files to a custom endpoint on your app, to Amazon S3, or even to a resumable endpoint, providing progress bars, drag & drop functionality, image previews, file validations etc, all while making as few assumptions as possible.

var uppy = Uppy.Core({ /* ... */ })
  .use(Uppy.FileInput,   { /* ... */ }) // adds a pretty file field
  .use(Uppy.ProgressBar, { /* ... */ }) // displays a progress bar
  .use(Uppy.Informer,    { /* ... */ }) // displays validation errors

// ...

Theory

The idea is to have a generic endpoint which accepts file uploads and saves the uploads to a temporary storage. The reason for uploading to a separate temporary storage is security; if we allowed users to upload directly to the primary storage, they would be able to flood it if they wanted to, as directly uploaded files don’t have to necessarily end up being attached to a record.

POST http://example.com/upload HTTP/1.1

[... file content ...]

On the client side we would asynchronously upload selected files to this endpoint, and then send only the information about the cached files on form submission. Once validation has passed and the record has been successfully saved, Shrine would automatically upload the cached file to permanent storage.

In fact, Shrine already handles attachments this way. When a file is attached, Shrine first uploads it to temporary storage, and then once the record has been successfully saved the cached file is uploaded to permanent storage.

photo.image = File.open("...") # files gets uploaded to temporary storage
photo.image # cached file
photo.save
photo.image # stored file

With direct uploads the only difference is that files are uploaded to temporary storage prior to attachment, and then the information about the cached file is assigned instead of an actual file, in which case Shrine skips the caching step.

photo.image = '{"id":"...","storage":"cache","metadata":{...}}'
photo.image # cached file that we assigned
photo.save
photo.image # stored file

1. Simple upload

The simplest way we could enable direct uploads is to create an upload endpoint in our app. Shrine comes with an upload_endpoint plugin which allows you to create a Rack application that accepts file uploads and forwards them to the specified storage. The only thing we need to do is mount the app to our preferred path:

Shrine.plugin :upload_endpoint
Rails.application.routes.draw do
  mount ImageUploader.upload_endpoint(:cache) => "/images/upload"
end

The above gives our application a POST /images/upload endpoint which accepts the file multipart parameter and returns uploaded file data:

POST /images/upload HTTP/1.1
Content-Type: multipart/form-data

[... file content ...]
HTTP/1.1 200 OK
Content-Type: application/json

{
  "id": "70be82a657ba9ef892ef5182a1a18bde.jpg",
  "storage": "cache",
  "metadata": {
    "size": 3942146,
    "filename": "nature.jpg",
    "mime_type": "image/jpeg"
  }
}

Because Shrine’s upload endpoint is a pure Rack application, it can be run inside any Rack-based Ruby web framework (Sinatra, Roda, Cuba, Hanami, Grape etc), not just Rails. The :cache argument to Shrine.upload_endpoint specifies that incoming files will be uploaded to the configured temporary storage.

Client side

On the client side we just need to add the XHRUpload Uppy plugin and point it to this endpoint.

// ... other plugins ...

uppy.use(Uppy.XHRUpload, {
  endpoint: "/images/upload",
  fieldName: "file",
  headers: { 'X-CSRF-Token': document.querySelector('meta[name=_csrf]').content }
})

uppy.run()

uppy.on('upload-success', function (fileId, data) {
  var uploadedFileData = JSON.stringify(data)

  var hiddenField = document.querySelector('.attachment-field[type=hidden]')
  hiddenField.value = uploadedFileData
})

Notice that the response of Shrine’s upload endpoint already contains the uploaded file data in the format that can be assigned, so the only thing left to do is convert it to JSON and write it to the hidden attachment field to be submitted as the attachment.

2. Amazon S3

Uploading files to our app isn’t always the most suitable option. In most cases we don’t want store uploaded files on disk, but rather on a cloud service like Amazon S3, Google Cloud Storage or Microsoft Azure Storage, especially if our app is running on Heroku or on multiple servers. In that case, instead of uploading files to our app and then to the cloud service, it’s more performant to skip the app and upload directly to the cloud.

First we tell Shrine that we’ll be using S3 both for temporary and permanent storage, but specify separate directories:

# Gemfile
gem "aws-sdk-s3", "~> 1.2"
require "shrine/storage/s3"

Shrine.storages = {
  cache: Shrine::Storage::S3.new(prefix: "cache", **options),
  store: Shrine::Storage::S3.new(prefix: "store", **options),
}

The client side flow mostly stays the same, except that the browser now needs to ask the server for the upload URL and request parameters on each upload. Shrine has the presign_endpoint plugin which provides a Rack application that generates direct upload parameters, which we can mount in our application:

Shrine.plugin :presign_endpoint
Rails.application.routes.draw do
  mount Shrine.presign_endpoint(:cache) => "/presign"
end

Our application has now gained the GET /presign endpoint which will return URL and parameters for direct upload:

GET /presign HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json

{
  "url": "https://your-bucket.s3.us-east-1.amazonaws.com",
  "fields": {
    "key": "cache/df65ee371b42b87463b1840d69331692.jpg",
    "policy": "eyJleHBpcmF0aW9uIjoiMjFrs0wMS0wM1QxNzo0MjoxNloiLCJjb25kaXRpb25zIjpbeyJidWNrZXQiOiJzaHJpbmUtdGVzdGluZy0yIn0seyJrZXkiOiJjYWNoZS9kZjY1ZWUzNzFiNDJiODc0NjNiMTg0MGQ2OTMzMTY5Mi5qcGcifSx7IngtYW16LWNyZWRlbnRpYWwiOiJBS0lBSU1ESDJIVFNCM1JLQjRXUS8yMDE4MDEwMy9ldS1jZW50cmFsLTEvczMvYXdzNF9yZXF1ZXN0In0seyJ4LWFtei1hbGdvcml0aG0iOiJBV1M0LUhNQUMtU0hBMjU2In0seyJ4LWFtei1kYXRlIjoiMjAxODAxMDNUMTY0MjE2WiJ9XX0=",
    "x-amz-credential": "AKIAI8FLFDSB3RKB4WQ/20180103/eu-central-1/s3/aws4_request",
    "x-amz-algorithm": "AWS4-HMAC-SHA256",
    "x-amz-date": "20180103T164216Z",
    "x-amz-signature": "6003f73624724fd2e116620ddc77f1073b434c677ddf7070a67445016c62a263"
  }
}

Client side

On the client side we have the AwsS3 Uppy plugin (instead of XHRUpload). The AwsS3 plugin requires that we fetch the upload parameters ourselves; we can use window.fetch to make a request to our presign endpoint, which already happens to return data in the format that Uppy expects:

// ... other plugins ...

uppy.use(Uppy.AwsS3, {
  getUploadParameters: function (file) {
    return fetch('/presign?filename=' + file.name)
      .then(function (response) { return response.json() })
  }
})

uppy.run()

uppy.on('upload-success', function (fileId, data) {
  var file = uppy.getFile(fileId)

  var uploadedFileData = JSON.stringify({
    id: file.meta['key'].match(/^cache\/(.+)/)[1], // remove the Shrine storage prefix
    storage: 'cache',
    metadata: {
      size:      file.size,
      filename:  file.name,
      mime_type: file.type,
    }
  })

  // ...
})

You’ll notice that, unlike our simple upload endpoint which generated the uploaded file data for us, in the S3 case we need to construct the uploaded file data ourselves on the client side.

Fetching direct upload parameters dynamically like this is much more flexible than creating a static S3 upload form on page render, which is the approach that CarrierWave and Paperclip ecosystems seem to prefer. A static S3 form won’t work with multiple uploads, as S3 requires that each file has a new set of upload parameters.

Finally, this approach is not specific to Amazon S3, you can use it with any service that supports direct uploads, such as Google Cloud Storage, Cloudinary, Transloadit and others.

3. Resumable upload

For most use cases, direct upload to a custom endpoint or a cloud service should be everything you need. This is because the majority of applications are dealing only with images, documents and other small files. If your application happens to deal with large files such as videos, things get a bit more interesting.

If you’ve ever used a service where you needed to upload 500MB, 1GB or 5GB files, you know how frustrating when your upload is 80% complete and then it fails, because you either happened to have lost internet connection for a brief moment, or you have to change locations, or your browser/OS crashed. With slow and/or flaky internet connections it might not even be possible to upload larger files, because every time the upload fails it would have to be retried from the beginning.

Tus.io is the open protocol for resumable file uploads built on HTTP. It specifies how the client and the server should communicate and behave during file upload so that the upload can be automatically resumed in case the request failed. Try their demo to see this in action.

There are many server implementations of the tus protocol out there for various languages; in our case we’re interested in tus-ruby-server.

tus-ruby-server + goliath-rack_proxy + shrine-tus

Tus-ruby-server is implemented using the Roda web framework, and can be run inside your application or standalone, though for best performance it should be run standalone on Goliath, as Goliath is the only Ruby web server that supports streaming uploads and downloads without sacrificing throughput (thanks to EventMachine).

Now, Goliath is intended as both a web server and a web framework, it doesn’t know how to run Rack applications, but I’ve written goliath-rack_proxy which makes it possible to use Goliath only as a web server for Rack applications. At first I didn’t know anything about Goliath nor EventMachine, so it took a few rewrites to get it right, and a PR to Goliath to support streaming downloads, so you’re welcome! :smiley:

Ok, let’s set up tus-ruby-server and run it on port 9000:

# Gemfile
gem "tus-server", "~> 2.0"
gem "goliath-rack_proxy", "~> 1.0"
# tus.rb
require "tus/server"
require "goliath/rack_proxy"

class GoliathTusServer < Goliath::RackProxy
  rack_app Tus::Server
  rewindable_input false
end
$ ruby tus.rb --stdout --port 9000

The idea is that on the client side we’ll upload files directly to the tus-ruby-server instance, which uses its own storage (filesystem by default, but can also be configured to use S3). Once the file has been successfully uploaded to tus-ruby-server, we would submit its tus URL as the attachment, which Shrine would then use to download the file from the tus server, perform any user-defined processing, and upload the file to permanent Shrine storage (preferrably in a background job). In other words, tus-ruby-server would be an endpoint that saves uploads to temporary storage, just like with regular direct uploads.

At first it might sound that taking all that effort to upload the file to tus-ruby-server is a bit pointless if the app will just end up downloading and re-uploading it to another storage anyway. However, the latter will be orders of magnitude faster because servers have much faster and more stable internet connections than users (and if that’s not enough, shrine-tus can also do a smart copy directly from the tus storage). Also, you need be able to periodically clear expired or unfinished uploads, which wouldn’t be possible if tus-ruby-server and Shrine used the same storage.

To enable assigning tus URLs as Shrine attachments we’ll add the handy shrine-tus gem:

# Gemfile
gem "shrine-tus", "~> 1.1"
require "shrine/storage/tus"
require "shrine/storage/s3"

Shrine.storages = {
  cache: Shrine::Storage::Tus.new,
  store: Shrine::Storage::S3.new(...),
}

Client side

On the client side Uppy has our backs again with the Tus plugin (instead of XHRUpload or AwsS3), which internally uses tus-js-client. We need to give the Tus plugin the URL to our tus server, and after the upload we need to construct the uploaded file data as we did with direct uploads to S3.

// ... other plugins ...

uppy.use(Uppy.Tus, {
  endpoint: 'http://localhost:9000/'
})

uppy.run()

uppy.on('upload-success', function (fileId, data) {
  var file = uppy.getFile(fileId)

  var uploadedFileData = JSON.stringify({
    id: data.url, // Shrine will later use this tus URL to download the file
    storage: "cache",
    metadata: {
      filename:  file.name,
      size:      file.size,
      mime_type: file.type,
    }
  })

  // ...
})

That’s it, now uploads will automagically be resumed in case of temporary failures, without the user even knowing something happened.

Conclusion

Uploading files asynchronously greatly improves the user experience, and there are many ways to do that, each suitable for different requirements (storage, filesize etc). In order to implement direct uploads, both the client and the server side need to do their part.

Regardless of whether you’re just uploading to a simple endpoint in your app, directly to cloud, or doing something as advanced as resumable uploads, with Shrine & Uppy the setup is streamlined and is basically just a matter of swapping plugins.

In the next post I will talk about using background jobs with Shrine, so stay tuned!

  • Demo with direct upload to app and S3 – Roda & Rails
  • Demo with resumable upload – Roda
Janko Marohnić

Janko Marohnić

A passionate Ruby backend developer who fell in love with Roda & Sequel, and told Rails “it’s not me, it’s you”. He enjoys working with JSON APIs and SQL databases, while prioritizing testing, and always tries to find the best library for the job. Creator of Shrine and test.vim.

comments powered by Disqus