How to get More Bang for your Heroku Buck While Making Your Rails Site Super Snappy [Redux]

I first wrote about how to get the most bang for your Heroku buck a year ago. Since then a few things have changed and we’ve learnt even more about how to deliver great performance from our Heroku hosted sites. Some of the advice remains the same, but there are some important changes. There is also an important caveat at the end. While this is written primarily for Rails developers using Heroku, much of it is applicable to any site hosted on any platform.

We love Heroku. It makes deployment easy and quick. However, it gets pricey when you add additional dynos at $35pm. With a bit of work you can get a lot more out of your Heroku whilst drastically improving the performance of your site for your users and providing better scalability. You might need to spend a bit on other services, but a lot less than if you simply moved the dyno slider.

There are two sides to site performance: how many requests your site can handle, and how long it takes to display in the browser. These are intimately connected but ultimately your users only care about the latter, while your boss or client probably cares more about the former. Shaving 50ms from your response time will increase your throughput, but it won’t help your users if they have 2mb of Javascript to download.

0. Before you Dive in: Measure Your Performance [New]

Remember the golden rule:

Premature optimization is the root of all evil
- Donald Knuth

You don’t have a performance problem until you can show me a graph and some numbers. Luckily for you, that’s easily done on Heroku. The performance monitoring service New Relic is available as a free add-on for all Heroku users. Add it to your app and start digging. Not only will it help you work out the problem areas but will give you confirmation that your efforts are actually paying off (or not). Other useful tools are available in your browser. Chrome’s Developer Tools (the other browsers have an equivalent) Network and Audit views will show you exactly what happens when you load your page, and give you suggestions of ways to speed up your site respectively. The audits are especially useful for spotting caching problems.

1. Use Phusion Passenger [New]

Use Phusion Passenger for Heroku. Really, it’s awesome. Phusion Passenger is a multi-treaded application server that now runs on Heroku using Nginx. On average we manage three or four concurrent threads per dyno, depending on memory use. Passenger has several advantages over the other application servers available for Rails on Heroku.

  • It’s consistently fast. I’m not convinced that it’s significantly faster than Unicorn, but it does seem to be more consistent. This may be related to its second advantage…
  • It’s more memory efficient than the alternatives. While it won’t drastically reduce the memory footprint of your app, it does seem to have shrunk at least one of our apps’ total footprint by 10-15%. That’s not masses, but on Heroku, with its 512mb limit, that can make all the difference. If you breach the 512mb limit Heroku will start swapping memory out to disk, at which point performance will get much less consistent as parts of your application are moved in and out of RAM.
  • Assets are served directly by Nginx, not Rails. While we still don’t want to serve lots of assets from our Heroku instance, doing so through Nginx is significantly better than doing so through the application stack.
  • Finally, and significantly for your users, Passenger/Nginx support HTTP compression out of the box, for both assets and application responses. You don’t have to do anything. If the browser sends the correct Accept-Encoding header the server will respond appropriately. This can radically reduce the size of the HTML, CSS, and JavaScript sent.

2. Keep Within the Memory Limits: Put Your App on a Diet and Don’t Get Greedy with Threads [New]

One of the main limitations of a Heroku dyno is the 512mb RAM limit (1gb if you pay for a 2x dyno). Once you hit that things start getting swapped out to disk, significantly affecting performance. Requests get slower on average, and response times get more unpredictable.

New Relic can give you an insight into your memory use, on a per instance (in our case Passenger threads) and total basis. You might even be able to squeeze in an extra thread to handle more requests.

Always keep your total memory footprint below the 512mb limit if you want consistently good performance.

There are three main approaches to reducing the size of your application:

First and most obvious, remove unused code from your app and Gems from your Gemfile. If you don’t need it, it shouldn’t be there.

Secondly, be fastidious about your Gemfile groups. Make sure that gems that are only used in test, development, or asset compilation are in the relevant groups, don’t just dump everything in the default group or all of it will be automatically required at startup, consuming memory. The Rails 4 default project has done away with the :assets Gemfile group, but you can easily add it back in by editing application.rb and changing

    Bundler.require(:default, Rails.env)

to

    Bundler.require(*Rails.groups(:assets => %w(development test)))

Finally, if there are any gems that are used solely for background workers or rake tasks, you should manually require them where you need them, don’t auto-require them at startup.

Don’t be tempted to use too many Passenger threads if it means going over the memory limit. The increase in concurrency will probably be outweighed by an overall reduction in performance of all the threads.

The graph below shows what happened when we reduced the number of threads on an application so that its memory consumption dropped from about 530mb to about 390mb. Throughput on the site was roughly comparable. Notice how much more consistent the performance is afterwards.

Application response times over six hours, compared with the same time the previous day showing the effect of reducing the number of Passenger threads to fit within the Heroku memory limits.

Application response times over six hours, compared with the same time the previous day, showing the effect of reducing the number of Passenger threads to fit within the Heroku memory limits.

3. Serve Static Assets and Uploads from a CDN on Multiple Subdomains – but Don’t Use asset_sync [Updated]

Last year I recommended using asset_sync to move your assets to S3, removing the need for your Heroku dyno to serve them. With the arrival of Passenger on Heroku this is no longer good advice. Because Passenger serves assets through Nginx and will serve the compressed versions where appropriate, serving your assets from your dyno through a CDN (content delivery network) such as Amazon Cloudfront will give your users a much better experience than asset_sync, while not increasing the load on your dyno. Because the cache expiry of your assets is set, by default, to a very long time, the number of requests that actually hit your dyno will be tiny (around once per asset per year).

To really juice up the load times of your site, configure four subdomains for your assets, numbered from 0 to 3, e.g. assets0.myapp.com to assets3.myapp.com, pointing at your asset CDN and set the following in your production configuration:

    config.action_controller.asset_host = "assets%d.myapp.com"

Rails will cycle through each of these subdomains when it generates asset links. Browsers are generally restricted to only two concurrent requests per host name, so having assets served from four allows the browser to make eight concurrent requests. Page load speeds will now be constrained only by the speed of your user’s connection. If you user has a good connection then they will be able to download most of your assets in parallel.

Heroku have documentation walking you through the Cloudfront setup.

4. Turbo-Charge your Application with Memcache backed View Caching and In-app Caches [Updated]

If you’ve not encountered caching in Rails, stop reading this article right now, go read the Rails Guide to Caching and then DHH’s short guide to key based cache expiry. Caching in Rails 4 is even better, with improved support for “Russian Doll” caching.

View caching in Rails can have a profound effect on your application’s response time. In the past we have found that rendering pages, especially complex ones with lots of partials, can easily account for two-thirds of the total processing time, much more than you might expect. Use New Relic to guide your improvements.

Memcache store is shared between your dynos so they all benefit from any cached item. The Memcachier addon gives you 25mb for free, and is pretty reasonably priced from there on up. Just adding a small cache store of 25mb can make a significant difference to the load time of your pages.

Don’t be afraid to de-normalise some of your data, where appropriate. Sometimes storing a precomputed value in a model, especially one based on complex transitive relationships with other models, makes up in performance improvement what it loses in programming purity and elegance. The most common example of this approach is ActiveRecord counter caches, but you can easily add your own.

5. Offload Complex Search to a Dedicated Provider [Unchanged]

If you have an application that needs to perform complex searches over large datasets don’t do it in your application directly. If searches regularly take a long time consider using something like Solr (available as a Heroku plugin), Amazon CloudSearch, or one of the many Search as a Service providers. You’ll not only get faster search performance, but you’ll save vast amounts of development time trying to optimise your in-app search. If search is a significant aspect of your site the cost of a good search service will probably be better value than just scaling your database.

6. Use Background Processing the Smart Way with Delayed::Job and HireFire [Unchanged]

Background processing with Delayed::Job is a great way of speeding up your web requests. Potentially slow tasks like image processing or sending signup emails can happen outside of the request-response cycle, making it much snappier and freeing up your dyno to handle more requests. The downside is that you need to run a worker dyno at $35/month.

Michael van Rooljen’s HireFire modifies Delayed::Job and Resque to automatically scale the number of worker dynos based on the jobs in the queue. Because Heroku charge by the dyno/second, spinning up 10 workers for one minute costs the same as one worker for ten minutes, so with HireFire you can potentially get things done quicker while paying less than you would if you ran a dedicated worker dyno.

HireFire does have one limitation, it only works for jobs scheduled for immediate execution. If that is an issue Michael has a HireFire service that will monitor your application for you, so jobs scheduled in the future will be run.

7. Don’t Upload and Process Files with your Web Dynos [Unchanged]

If you use something like CarrierWave or Paperclip, by default the uploading and processing of images is done by your dyno. While this is happening your dyno thread is completely tied up, unable to handle requests from any other user.

Decouple the upload process from your dyno using something like CarrierWave Direct. With a bit of client-side magic it uploads files to S3 directly, rather than through the dyno. The images then get resized by background processes using DelayedJob or Resque. This obviously has the downside that you’ll need a worker running.

Another option, which we’ve used recently, is the awesome Cloudinary service. They provide direct image uploading, on-demand image processing (including face detection, which even seems to work on cats) and a worldwide CDN all in one package. There is a free tier to get you started, and for $39 (slightly more than one Heroku dyno) their Basic plan will be more than enough for many sites.

Putting it all Together

At the end of all this we’ve freed up our Heroku dyno from doing things it’s not very good at like serving static files and uploads, and juiced up its performance when doing what it’s great at, serving Rails application requests with no sys-admin in sight.

Each technique can be easily applied to your existing applications, but if you develop with them in mind from the start you get all the benefits with almost no additional work. On their own each one will help the performance of your application, but combining them together will significantly extend the amount of time before you have to start forking out for lots more dynos, and when you do you’ll get much more bang for each of your thirty-five Heroku bucks.

If you’ve got any other tips for getting the most out of a Rails application, whether or not it’s on Heroku, we’d love to hear about it them!

Postscript: Caveat Developer

Heroku is fantastic for reducing developer overhead and with a bit of work you can serve large and popular sites on it for relatively little. We use it for many of the sites we build. However we also use other hosting platforms, especially Amazon AWS, so we can compare our experiences of the two and we’ve noticed a couple of issues.

We frequently see significant performance drops after deploying a new version. Response times sometimes treble, with all parts of the stack slowing by the same factor. Scaling the application down and then back up will often fix the problem. This is not a code issue, it can happen after deploying a change to some CSS.

No matter how minimal an app is, the best response time I’ve ever seen in the browser is about 150ms, and that’s not consistent, it’s frequently longer. Now, 150ms is pretty quick, in fact it’s about a blink of an eye, but applications we’ve hosted on single Small EC2 instances have shown consistently better performance without any optimisation. Both of these issues are probably due to a combination of Heroku’s routing infrastructure and the way your dyno shares resources with others on the same host hardware.

The differences are only in the order of 100ms or so, less than the blink of an eye, so how much it matters will depend on your use case. Constant monitoring of your application is key.

Obviously, while you get by on a single free Heroku dyno you can’t complain too much, but once you start forking out for extra dynos you might want to look at Amazon Elastic Beanstalk as an alternative. It’s still quite immature compared to Heroku (but improving all the time), and you’ll have to get your hands a bit dirty setting it up, but it gives you most of the ease of maintenance of Heroku. If you are prepared to pay up front, the cost of a single Small EC2 instance is on a par (or less) with a Heroku dyno, but gives you more memory and more consistent performance. You also get the advantages of AWS’s other services like automatic Elastic Scaling for those busy periods.

As with all such decisions, how and where you host is going to depend on what you need and how you want to spend you cash, but with a bit of work Heroku can form the core of a really good setup that will scale effortlessly, but it’s always worth keeping an eye on the other options.

By Paul Leader

Paul is our senior developer, and is often favourably compared to C-3PO in the office - not only for the fact that he is worshipped as a deity by small, furry creatures but also in his broad knowledge of programming languages, which he is able to wield expertly for extraordinary results in all of our projects.

paulleader.co.uk →

16 comments

  1. :)

    That’s always an option, but there’s always the compromise between price, performance and convenience. For us Heroku is pretty much fire-and-forget. The cost of a couple of Heroku dynos would be completely outweighed by the cost of having a developer maintain one or more servers. It’s the same reason we use AWS, it’s by no means the cheapest option, but the convenience (especially RDS) wins pretty much every time.

    Most of the advice in this article is as applicable if you host on Linode or any other platform. The memory issue is likely to be less of a problem, but keeping bloat down is good advice on any platform.

  2. And that’s a good point about the CDN stuff. I haven’t gone into too much detail, otherwise the post would have turned into a small book :)

    But I’ll add a link to the Heroku docs.

  3. This is a great article, but I wanted to point out that both Delayed::Job and Resque are single threaded.
    If you’re using a worker to access 3rd party services, a lot of the time, these worker processes are waiting around for the service to respond.

    I’ve had great improvements just from using Sidekiq for doing things like processing images uploaded to S3 and retrieving information from APIs. Because it’s multi-threaded, you need to make sure that your application is thread-safe, but Rails 4 is and most gems are these days.

    Just thought I’d mention it! Great article. Thanks!

  4. “Browsers are generally restricted to only two concurrent requests per host name, so having assets served from four allows the browser to make eight concurrent requests” => Not been true for years – http://www.stevesouders.com/blog/2008/03/20/roundup-on-parallel-connections/

    Most browsers default to 6 connections per host. Sharding across two domains is ideal. More than that will impact your performance negatively. http://calendar.perfplanet.com/2013/reducing-domain-sharding/

  5. Thanks for the heads-up Manu, I’ll update the article. Do you have a more recent article with numbers of connections per browser? I’ve looked, but I can’t find anything more recent than that 2008 one.

    The network congestion article is great, interesting to see how easily more is not always better. It’s a good example of just how complex the interactions between all the parts of an application can be. I’m just looking into ways to control this in Rails as it’s not a configurable option.

  6. I’m afraid Linode comes out on top again. Sure, there is more manual labour involved, but the result is much more control over your system, and crucial experience in managing the guts of your application.

    These are great performance tips, nonetheless, and, like you say, applicable to most applications regardless of the platform.

    I recently moved an app on Heroku (was costing $110/month) to a single Linode ($20/month). Heroku had unfair limitations on every layer (particularly redis, where they use redis-to-go with 20mb limit). The linode server performs faster than Heroku, with less memory, and no stupid limitations per service.

    It will be more difficult to scale, but I’m eager to learn how, rather than offload my ‘problems’ to Heroku for a premium.

    Saying that, after I’ve learned a sufficient amount about scaling an application manually, I wouldn’t mind throwing money at my problems to make them go away.

  7. I think it entirely depend on your context and how you account for your time. Working at an agency tends to make you very aware of the time/money equivalence.

    In many situations something like Heroku may not be the right option, but for us, and many of our clients it is frequently the cheapest way to go, especially at the start.

    You saved $90 a month moving from Heroku to Linode, which with an average developer salary of about £30,000 (according to Glassdoor) equals about 3.15 hours of just the salary.

    My Working: $90 = £52 at the current exchange rate. There are about 227 working days in the UK once you deduct weekends, national holidays and annual leave. The makes a daily salary of about £132, or £16.50 if you work on an 8 hour day.

    Most companies cost recovery rate (i.e. the total cost of employing someone with all the overheads) is somewhere around 2x the salary. At one place I worked it was 2.2, other places will be less depending on the management structure, office, location, tax regime, utility costs etc. So that 3.15 hours is probably about 2 hours or less in most cases.

    If you are working at a product company where you are your own customer then if you do any more than about 2 hours sys-admin a month (averaged over the life of the product), you are running at a loss. If, like us, you are an agency then you will be charging your clients more than your cost recovery rate in order to make a profit, so the losing money point is even shorter.

    Every application, every client, and every company has different needs. At the low value end where the time cost of dev/dev-ops people is low (or effectively zero for many startups with no working income and who work insane hours), spending more time doing that sort of thing can be worth it. Likewise, at the high end it makes sense to have your own infrastructure with your own dev-ops, sys-ops and networking people as this will probably work out cheaper than the alternative (although Netflix would appear to be one counter-example to this). There’s a large area in the middle where cloud providers of all types offer services that might not be cheap, but are cheaper than doing it all yourself.

    It’s no different from book-keeping. If you run a small company of a few people you probably do your own books. At some point as you grow it might make sense to contract it out to someone. As you grow even more you bring it in-house and employ some full-time accountants and booK-keepers.

    Each to their own. The important thing is to always add up the total cost of use, not just the headline figure.

  8. Hi Jim,

    In this particular case Passenger is nice because it handles static assets and compressed responses out of the box. As far as I am aware, if you use Puma you would have to handle that within your app. That approach is not much more work, and with a CDN in-front of your assets it would probably make little difference to your application performance.

    Hopefully next week I’m going to take one of our applications, fork it on Heroku, then do a performance analysis of Passenger vs Puma as I’d like to see a) is there any performance improvement b) is it significant enough to justify a switch.

  9. Thanks a LOT for all these amazing advices.
    Used some of the tips, but certainly not most of them, and they are really helpful.
    Wondering about something though.

    I tried to move all the assets libraries gems in the :assets group (on Rails 3.2). Eg.
    group :assets do
    gem ‘jquery-rails’
    end
    I thought it would get me so that gem is used on asset precompilation, but not during runtime. In practice, it did not work anymore.
    Is there a way I can achieve what I’m describing, or would you recommend to use raw library files instead of gem consuming uselessly memory?
    (I asked the question on Stackoverflow : http://stackoverflow.com/questions/22049058/heroku-assets-precompilation-and-gem-memory-usage but did not get an answer yet…)