31 March 2014

Any time you think about "unlimited," you likely think about the cloud, and that is about the only way to do this. With PHP, any standard file upload must remain under the post_max_size and the upload_max_filesize. There are client-side solutions that can chunk a file to make it uploadable without increasing your limits, but even those require you to pay close attention to your disk space and to your temp directory space. Chunking the upload and storing the chunks on cloud-backed storage can address most concerns. However, the combination of Amazon S3, Amazon IAM and Plupload provide a compelling stack that allows for unlimited uploads to a secure cloud-based storage provider without any impact on your web server. Below are some of the keys steps that facilitate leveraging these exciting technologies.

Overview

  1. Setup
    1. Create new bucket that auto-expires all uploaded content after a period of time (30 minutes - 30 days).
    2. Create IAM credentials that can only upload to that newly-created bucket.
    3. Create a script to sign requests to upload to the bucket. This example uses AJAX since pre-authorizing an upload from the page leads to a stale data problem (e.g., the authorization times out) if they just leave the page open.
    4. Create a script to move an uploaded file to its final location after validation is performed.
  2. Example User Workflow for Upload
    1. Visit upload page.
    2. Select file(s) using Plupload.
    3. JavaScript: Before starting an upload, request the authorization from your server.
    4. JavaScript: Upload using Plupload.
    5. User sees: progress bar (optional)
    6. JavaScript: Notify server that the upload is complete and perform any action. Alternately, the upload id can be embedded in the POST data for a broader form, and the server will take the submission of that form as notification that the upload can be processed.
    7. Notify the user that the upload is complete.
  3. From the Server Perspective
    1. Server provides the upload form.
    2. Server authorizes the upload.
    3. Server is notified that the upload is complete.

How is this different?

This is different in almost every way from a standard file upload. It is more flexible, but it takes more setup time. One of the biggest differences is the handling of the file upload itself. If you notify the server as part of a POST form, then you are replacing the $_FILES variable with the object key where the file is stored on S3. Assuming your site is hosted on EC2 or has excellent bandwidth to download quickly from S3, you should be able to download extremely large files from S3 without ever seeing a spike in your memory usage and without wasting any disk space on partial uploads.

Piece 1: Configuring the Amazon Web Services

Many of these tips are derived from this page. It includes a link to an official upload example, and it is extremely helpful. This page notes some additional steps like IAM setup and on-demand signatures to address some lingering concerns.

  1. Do NOT use any dots in your bucket name. Use hyphens for namespacing. See an explanation.
  2. In bucket properties, expand Lifecycle and add a rule with these settings:
    1. Enabled: Yes
    2. Name: Remove by Default (you can rename)
    3. Apply to Entire Bucket: No
    4. Prefix: upload
    5. Time Period Format: Days from the creation date
    6. [Click + Expiration]
    7. Expiration (Delete Objects): 1 days from object's creation date
  3. In bucket properties, expand Permissions and Add CORS Configuration (see Piece 2).
  4. Upload crossdomain.xml (Piece 3) and clientaccesspolicy.xml (Piece 4) to the bucket.
  5. Go to IAM and create a new user (name "plupload" or "user-upload" or similar). Make sure that you store the credentials.
  6. On the IAM user's Permissions tab, click "Attach User Policy". Choose "Custom Policy" and then use the policy provided in Piece 5.

Piece 2: CORS Configuration

The following generic configuration should work, although you can definitely lock it down more.

<CORSConfiguration>
    <CORSRule>
        <AllowedOrigin>*</AllowedOrigin>
        <AllowedHeader>*</AllowedHeader>
        <AllowedMethod>GET</AllowedMethod>
        <AllowedMethod>POST</AllowedMethod>
        <MaxAgeSeconds>3000</MaxAgeSeconds>
    </CORSRule>
</CORSConfiguration>

Piece 3: crossdomain.xml

Adjust this for your domain name and upload it to the root of the S3 bucket.

<cross-domain-policy>
<allow-access-from domain="*.example.com" secure="false"/>
<allow-access-from domain="example.com" secure="false"/>
</cross-domain-policy>

Piece 4: clientaccesspolicy.xml

Upload this to the root of the S3 bucket.

<?xml version="1.0" encoding="utf-8" ?>
<access-policy>
  <cross-domain-access>
    <policy>
      <allow-from http-request-headers="*">
        <domain uri="*"/>
      </allow-from>
      <grant-to>
        <resource path="/" include-subpaths="true"/>
      </grant-to>
    </policy>
  </cross-domain-access>
</access-policy>

Piece 5: IAM Policy

Replace the bucket name below and then attach it to the IAM user.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "s3:PutObject",
        "s3:PutObjectAcl"
      ],
      "Sid": "Stmt1373830917000",
      "Resource": [
        "arn:aws:s3:::plupload-example-bucket-name/upload/*"
      ],
      "Effect": "Allow"
    }
  ]
}

Piece 6: Configuring Plupload

Plupload is a JS utility, but you can still configure from PHP. Build the array shown below and then make it available to your JS using your framework's preferred method. For example, Drupal users should drupal_add_js() to add this to the Drupal.settings array.

The example below is to click a button Depending on how you want Plupload to appear, you need settings similar to the following:

    $plupload_settings = array(
      // 'container' => 'container',
      'max_file_size' => '10mb',
      'runtimes' => 'html5,flash,silverlight',
      'flash_swf_url' => '/path/to/plupload/js/Moxie.swf',
      'silverlight_xap_url' => '/path/toplupload/js/Moxie.xap',
      'url' => "https://plupload-example-bucket-name.s3.amazonaws.com:443/",
      'multi_selection' => FALSE,
      // optional, but better be specified directly
      'file_data_name' => 'file',
      'filters' => array(
        array(
          'title' => "Image files",
          'extensions' => "jpg"
        ),
      ),
      'browse_button' => 'HTML ELEMENT ID OF "BROWSE" BUTTON',
    );

Piece 7: Plupload Events

To use the on-demand request authorization, you need to tie into the various Plupload events. These base functions assume that you want to have the server control the upload name precisely, show a progress bar, and start uploading as soon as a file is selected. All of those options can be changed with some careful scripting

Here are the key JS events to look at:

// You'll want to upgrade this very-basic progress bar.
var $status_bar = $('<div>').appendTo('#somewhere-in-your-doc');
function setStatusPercent(percent) {
  if ($status_bar == null);
  else if (percent == 100) {
    $status_bar.html('');
  }
  else {
    $status = $('<div style="background-color: #00f"></div>').width((1*percent) + '%').text(percent + '%');
    $status_bar.html('').append($status);
  }
}
// If you are using AJAX, then the upload should not be 100% complete
// until after your server is notified. This allows you to configure
// where the progress bar should be after uploading to S3 and before
// notifying your server. If you just store the key in your form, then set
// this to 1 or eliminate references to it throughout.
var preProcessPercent = .95;
var pluploader = new plupload.Uploader(settings.plupload);
pluploader.bind('Init', function(up, params) {
  $('#filelist').html("<div>Current runtime: " + params.runtime + "</div>");
});
$('#uploadfiles').click(function(e) {
  pluploader.start();
  e.preventDefault();
});
pluploader.init();
pluploader.bind('FilesAdded', function(up, files) {
  $.each(files, function(i, file) {
    $status_bar.append(
        '<div id="' + file.id + '">file: ' +
        file.name + ' (' + plupload.formatSize(file.size) + ') <b></b>' +
    '</div>');
    $.getJSON('/path/to/authorize/uploads', function(data) {
      if (typeof data.result != 'undefined') {
        up.settings = $.extend(up.settings, data.result.settings);
        up.start();
      }
    });
  });
  up.refresh(); // Reposition Flash/Silverlight
});
pluploader.bind('UploadFile', function(up, file) {
});
pluploader.bind('UploadProgress', function(up, file) {
  setStatusPercent(Math.round(preProcessPercent * file.percent));
});
pluploader.bind('Error', function(up, err) {
  $status_bar.append("<div>Error: " + err.code +
      ", Message: " + err.message +
      (err.file ? ", File: " + err.file.name : "") +
      "</div>"
  );
  up.refresh(); // Reposition Flash/Silverlight
});
pluploader.bind('FileUploaded', function(up, file) {
  // Update your form and the progress bar.
  setStatusPercent(100);
  // Alternately, set to preProcessPercent and make AJAX request to notify server
  // In either case, you can access the S3 key at:
  // up.settings.multipart_params.key
});

Piece 8: Authorize Uploads Script

This server-side script is accessed within the FilesAdded. You can adjust the data model, as long as you adjust the JS as well.

$s3_key = 'KEY';
$s3_secret = 'SECRET';
$s3_bucket = 'plupload-example-bucket-name';
// Create a UUID - or generate a sufficiently unique string
// with your user id plus uniqid.
$uuid = uuid_create();
$filename = 'upload/' . $uuid . '.' . $ext;
$acl = 'private';
$policy = base64_encode(json_encode(array(
  'expiration' => date('Y-m-d\TH:i:s.000\Z', strtotime('+1 day')),
  'conditions' => array(
	array(
	  'bucket' => $s3_bucket,
	),
	array(
	  'acl' => $acl,
	),
	array(
	  'key' => $filename
	),
	array(
	  'success_action_status' => '201'
	),
	array(
	  'starts-with',
	  '$name',
	  ''
	),
	array(
	  'Filename' => $filename
	),
  )
)));
$signature = base64_encode(hash_hmac('sha1', $policy, $s3_secret, true));
$plupload_settings = array(
  'multipart_params' => array(
	'key' => $filename,
	'Filename' => $filename,
	'success_action_status' => '201',
	'acl' => $acl,
	'AWSAccessKeyId' => $s3_key,
	'policy' => $policy,
	'signature' => $signature,
  ),
);
$plupload = array(
  'settings' => $plupload_settings,
);
// Adjust for JSON-RPC response structure, regardless of request format.
$json = array(
  'id' => 1,
  'result' => $plupload,
);
echo json_encode($json);

Summary

This is obviously not a plug-and-play solution. Even if the JS and PHP are packaged up, the AWS setup is still extensive, and the file handling on the backend is still application-dependent. Regardless, I hope that some useful tips can be gleaned from the example configurations and code snippets provided above.



blog comments powered by Disqus