18 July 2012

This wave of optimizations received some extra funding, which made an enormous difference. I upgraded the underlying virtual hardware beyond a very important threshold at Amazon.

The Changes

  1. Upgraded EC2 instance type to a level that included "I/O Performance: High".
    1. While the I/O was a driving force for the upgrade, it also came with 4x CPU and 4x RAM compared to the previous configuration.
    2. Upgraded to 64-bit architecture and started with a current Ubuntu server template.
    3. The CPU utilization has dropped to 1/6 of previous levels under normal load.
  2. Upgraded RDS instance class to a level that included "High I/O Capacity".
    1. While the I/O was a driving force for the upgrade, it also came with 8x CPU and 9x RAM compared to the previous configuration.
    2. Previously, we would spin a readonly replica during work hours, which means that we are seeing a lower effective increase of about 4x CPU and 4x RAM (since we were doubling up). Of course, those numbers ignore replication and overhead.
    3. The CPU utilization has dropped below 1/10 of previous levels under normal load.
  3. Reconfigured varnish to use "-s malloc" rather than "-s file" to take advantage of extra RAM now available.
    1. Cuts disk writes on instance storage to about 1/3 of previous levels.
    2. Maximizes speed of cached pages.
    3. Shifted static files into varnish so that static images could be served without hitting the disk at all. Very fast.
  4. Drupal module consolidation.
    1. Reduced number of modules by upgrading to consolidated versions, disabling less important modules, and consolidating small custom modules that did not need to be distinct (due to low probability of contributing publicly).
    2. Deleted unused modules from the file system to reduce entries in the system table.
  5. Improved PHP's handling of path resolution based on this post.
    1. realpath_cache_size=4096k
      Note: It was surprising how much higher than the default (16k) this had to be set to have an impact. 4096k is much higher than we actually utilize, but avoiding a full-cache issue down the line justified the extra high number for us.
    2. realpath_cache_ttl=3600
      Note: This is potentially dangerous when dealing with temporary files. As long as our temporary files rely on tempnam, we should be fine since it makes naming collisions unlikely.
  6. Upgrade EBS for main PHP files to ext4 without journaling
    1. Saw a 6% reduction in the Write Bandwidth on the device.
    2. Upgrade process:
      1. Create a second volume (/dev/xvdm) from an accurate snapshot of the EBS volume (/dev/xvdf).
      2. service apache2 stop; umount go; mount -o noatime -t ext3 /dev/xvdm /go; service apache2 start
      3. tune2fs -o journal_data_writeback /dev/xvdf
      4. tune2fs -O extents,uninit_bg,dir_index /dev/xvdf
      5. e2fsck -fDC0 /dev/xvdf
      6. service apache2 stop; umount go; mount -o noatime,data=writeback,commit=100,nouser_xattr -t ext4 /dev/xvd /go; service apache2 start
    3. References:
      1. noatime,data=writeback
      2. Upgrade ext3 to ext4
      3. data=writeback
      4. writeback details and "barrier"
      5. Reasons to avoid using barrier=0 and nobh
      6. Reiteration that raid optimizations are good on EBS

The Results

  1. Significantly less disk access.
  2. Faster network communications with database.
  3. Under normal load, non-cached Drupal pages load at least 10% faster and with less variability.
  4. Under heavy load, non-cached Drupal pages load more than 50% faster, and the upper limit of supported users is much higher.


blog comments powered by Disqus