Making our site faster with Varnish

In this post I’m going to talk about how we recently made some performance improvements to www.nzpost.co.nz to help us scale into the future, while offering a better experience to our current users now as well. By employing whole-page caching with Edge-Side Includes and Varnish, we’ve managed to reduce the average page render time for a logged-in user on the homepage from 2.5 seconds previously to just 10 milliseconds now, while reducing our server load by 40%. I’m going to describe how we accomplished this with our Drupal CMS and some of the hurdles we had to circumvent to accomplish this in a reliable and compatible way. Hopefully, much of this information is of use to anyone considering a more complex web caching strategy, whether or not you achieve this using Varnish or Drupal.

In this post we’ll look at:

  • background to our situation and why we needed better performance
  • the solution: Varnish and how we set it up
  • some gotchas we encountered and solutions
  • the outcome: how it improved our performance

Why?

Websites that perform well make users happy, and keep them coming back. It’s that simple. Many companies have been getting into the web performance game recently, for instance Google have released tools and guidelines over the past few years to encourage web developers to build snappier websites. Local seo company Aptimize also provide similar optimisation software that is used by the likes of Trade Me and Telecom in New Zealand, as well as a multitude of overseas organisations.

Search engines now increasingly prioritise search results based on how fast they found your site to load when they came to visit. They do this because they know that users are turned off by a site that doesn’t offer up what they want quickly, especially when there are other sites that could give them the same thing faster.

There are many different areas to focus on when looking to improve a website’s performance, some to do with the client-side and browser behaviour, and others to do with how the server handles and processes requests. Client-side render-time improvements tend to be long-tail fixes that will yield good incremental improvements, but will not make up for a server that simply takes too long to turn around requests. This is what we’ve been working on improving recently, and this post will focus on how we applied Varnish server-side caching to help us out with this.

We’re on Drupal

As you may already know, the New Zealand Post site (and some subsites) are powered by Drupal, a flexible and modular content management framework that allows us to iterate quickly to keep up with the changing needs of our customers. While there are many benefits to using Drupal, some of them unfortunately come at the cost of decreased runtime performance.

The main site (www.nzpost.co.nz) currently runs well over 150 off-the-shelf modules, handling everything from e-commerce to content editor conveniences and all sorts in between. Each module we install, however, counts against our efforts to make the website perform up to the expectations of our customers. Often it’s not that the third-party code is badly written, but just the sheer amount of it that’s running, that when combined with Drupal’s hook-based architecture oftens sees us suffering reduced performance with every new feature we add.

Without having a silver bullet to solve these bloat-related issues, many large Drupal sites turn to heavy use of caching to keep their sites running fast. We’re no exception, and as a mostly static site we can optimise quite heavily by simply not invoking Drupal unless we have to. Until recently, we were using the very common Boost module for Drupal. This works by saving the output of a page rendered by Drupal to a static file on disk. Under ideal circumstances, the web server will then look for such a file before resorting to cranking up a full Drupal instance to serve the request. This works really well for users who haven’t done anything that makes them different from 95% of other visitors, but as soon as they add something to their cart or log in, we need to personalise every page of the site for them, and Boost no longer helps us at all.

One of our goals is to offer more value to users who log in, by customising pages and offering to remember preferences such as tracking numbers. We’re also part of a single-sign-on ecosystem together with other sites that require you to sign on to do anything at all, for example mail redirections. This means that we’re increasingly encouraging users to login, and stay logged in (via “remember me” functionality). This goes counter to Boost’s strengths as a caching layer, and so we have found ourselves needing to tread into the niche area of authenticated Drupal page caching in order to continue offering good performance to our users.

Drupal itself has no built in support for authenticated page caching. There have been projects such as Authcache attempting to address this over the years, but such generic approaches tend to fail due to the sheer diversity of sites that use Drupal. Each site tends to have a different set of constraints and requirements that impact on their ability to cache things for logged-in users. We have therefore gone down a custom path, plugging together some best of breed off-the-shelf modules with the popular Varnish caching server.

What we needed

Before describing in depth what Varnish does and how we’ve set it up, it would pay to introduce how our site is put together. Specifically, I’ll show what changes for a logged in user globally.

The main www.nzpost.co.nz site is loosely divided into three areas: static(ish) content, e-commerce, and tools. For the static pages, and parts of the store that are not part of the actual ordering process, pretty much everything on the page is the same whether you’re logged in or out. The exceptions to this are the shopping cart block and the login/logout links in the top-right corner:

There is actually nothing else that you can see on an average page that needs to change based on whether or not someone is logged in. There are a few things that you can’t see, but I’ll cover those off in a later section of this post.

When we were running on Boost, all we could do is set a cookie that bypassed the page cache when someone put something in their cart or logged in. This meant every page was built from scratch for these users, and some pages (like the homepage) were quite slow to generate. This was really bad, because it both taxed our servers unnecessarily as well as penalising users for doing things we wanted and encouraged them to do! The problem was exacerbated on our other major Drupal site, Stamps and Collectables, where most users of the site were permanently logged in, and therefore Boost was not doing much at all to help.

Recognising that we have relatively little personalised content outside of the tools, it makes sense that we could cache everything except for the bits that change. Unfortunately, Drupal doesn’t make this easy. While sub-page caching is easy enough to accomplish with Panels (and believe me we do this heavily!), merely pulling up an instance of Drupal at all is still quite slow (a bit under a second for us), and as we later discovered, unnecessary.

Varnish + Drupal

So that brings us to the technology we use to do this. As I mentioned earlier, Drupal does not provide a generic solution for doing authenticated caching, so a lot of how we accomplished this is custom to our site.

Varnish itself is an in-memory caching reverse proxy server, otherwise known as a web accelerator. It sits in front of the web server that does the work and keeps hold of assets for a specified amount of time, reducing the load on the backing web server. In our case, it also performs some content manipulation in order to splice in ESI blocks just-in-time.

Varnish works much like a per-request finite state machine, accepting connections and performing actions in a well-defined sequence, with outcomes such as “restart” (go back to the beginning, probably with some instructions to do something different the next time), “pass” (send the request to the backend server), “lookup” (look for the asset in cache), and “deliver” (send the response to the client). The configuration for Varnish breaks with standard convention and resembles a procedural programming language more than a traditional configuration DSL like Apache might use. This language is called VCL, and is designed to easily translate into C code, which is exactly what happens when you fire up Varnish.

Here is an example of how you might achieve something with VCL:

sub vcl_recv {
  if (req.url ~ “.(js|css|jpg|png)$”) {
    set req.backend = static_assets;
  }
  if (req.http.user-agent ~ “MSIE5”) {
    set req.http.x-dinosaur = 1;
  }
  return (lookup);
}

Hopefully that gives you an idea of the type of syntax VCL uses, as well as the sorts of capabilities available. There are plenty of examples available on the Varnish wiki showing how to accomplish most things you might be considering doing with it.

Edge-Side Includes (ESI)

The first thing we did to improve our cache hit rate with Varnish was to switch to using dynamic Edge-Side Includes inside completely static cached pages to handle just the bits that change. While Varnish only implements a limited part of the ESI specification, it includes everything we need. Instead of re-rendering the entire page for just a small content delta (like before), we now render the page once and cache it, then graft on the dynamic parts using <esi:include> tags. This means the HTML for the shopping cart in the parent page now looks something like this:

<div id=”header”>...<esi:include src=”/esi/shopping-cart”/></div>

And therefore contains nothing user-specific, allowing us to cache it for everyone! If you feel like it, you can have a look at the source code for a page on our site, and you’ll see we’ve placed helpful <!– esi –>…<!– /esi –> comments around everything that’s ESI-loaded.

Cache variants in Varnish

Two common concepts in key-value store caches are the use of keys to identify a particular cached item (variant), and having a time-to-live (TTL) for that variant, after which the cache will no longer hold on to it. Varnish gives a fair degree of freedom in how you determine these two things, by allowing you to create your own cache key (called a hash) based on anything the client sends, then allowing you to write rules (based on either the request or response HTTP headers) to determine when that variant can be used, and for how long (the TTL) it can last for before needing to be fetched again.

An example of how you might use the custom cache hash would be for a site that is available in multiple languages. You may want to store a different version of the page for each language preference the user’s browser sends. In Varnish, you’d do that like this:

sub vcl_hash {
  # Base of the hash should be the host and URL
  hash_data(req.http.host);
  hash_data(req.url);
  if (req.http.accept-language) {
    hash_data(req.http.accept-language);
  }
  return (hash);
}

While this is perfectly fine, people experienced with the Accept-Language header may note that there can be many permutations of languages that a browser may support. They are typically listed in order of preference like this:

Accept-Language: en-NZ,en-GB,en-US;q=0.8,en;q=0.6

With each browser potentially having different scores and preferences for even the same locale. This means you may end up storing a multitude of English copies of your page in Varnish with the above Varnish code, as some browsers may specify just “en”, others “en-NZ”, etc. The way to work around this in Varnish is to pre-process the headers in vcl_recv and extract just what you care about, like this:

…
if (req.http.accept-language ~ “^en”) {
  set req.http.x-preferred-language = “english”;
}
else if (req.http.accept-language ~ “^mi”) {
  set req.http.x-preferred-language = “maori”;
}
… (later, in vcl_hash)
hash_data(req.http.x-preferred-language);

You’ll commonly find this kind of code in Varnish examples around the web, but there is actually another neater way to accomplish this in Varnish that you may not come across as often: the HTTP “Vary” header. In Varnish, your application can instruct the cache to automatically create an additional variant of the cache entry (in addition to any hash) based on changes in any request HTTP header (which includes any you fabricate in vcl_recv, as we did above with X-Preferred-Language), by simply listing the HTTP header name inside a “Vary” header in the response. Even though Vary support is awful in browsers, and not much better in most proxies, it’s a gem in Varnish because it allows our application logic to dynamically construct new cache variants without having to hardcode the logic for them into static VCL. This could be useful if, for example, not all your pages were available in multiple languages. Instead of enumerating the URL list of applicable pages in VCL, you can put some logic in your application that makes it write a Vary: Accept-Language (or in our revised case above, Vary: X-Preferred-Language) header only in the cases where a choice is available. This reduces duplication of cache entries, and therefore increases the cache hit rate.

How we use Varnish variants on our site

We’ve found that using Vary headers with specially crafted variables set in vcl_recv is much more flexible than using vcl_hash, so we reduced our hash to just the hostname, path, and an SSL flag. All other variation instructions come from Drupal through explicit Vary headers where appropriate.

As an example, let’s look again at our shopping cart block in the context of the homepage. The homepage looks the same no matter who you are, so there are no Vary headers (except the default gzip selector) supplied. We call this a “global” cache entry, as nothing the client sends us will affect the version we send back. The shopping cart, on the other hand, depends on who you are (unless the cart is empty – a special case).

Let’s say the user’s session ID is stored in a cookie called NZPOST_SESSION. Varnish can read the value of that cookie (unfortunately you have to use a regex to do this) when the request comes in, and copy it into say, req.http.X-Client-SessionID. This means that if the server sends a Vary: X-Client-SessionID header, that page will be cached separately for each different session ID. This is exactly what we need for the shopping cart ESI!

When Varnish retrieves the homepage from the global cache (assuming it’s there), it discovers the ESI tags within it that tell it to perform a sub-request to get the shopping cart, then looks to see if it’s cached a copy of that user’s cart. Because the cart has the above Vary header, it’ll fetch the right version of the cart for that user, showing their cart contents.

The ESI module for Drupal supports specifying blocks as per-user, per-role, no cacheable or globally cacheable, which turns out to be pretty much all we need. By default it assumes that you make use of vcl_hash to add the role name or session ID to the Varnish hash, but we’ve modified it to send Vary headers instead. We also re-use the ESI module’s roles cookie to provide per-role caching of certain pages. This is especially useful because we can set a vary header on the role (we use Vary: X-Client-Roles for this, a custom header parsed out of the cookie), and then set different TTLs for different roles. An example of this is our Tracking tool, which is currently cacheable for anonymous users, but not authenticated. To do this we set the TTL to 0 if the user is logged in, or a day if they’re logged out, then vary on the roles hash so Varnish will treat the two types of user separately.

Event-based purging and TTLs

While it’s great to have users hitting the cache all the time, sometimes you’ll find the need to remove something from Varnish before it’s due to expire naturally. This is possible using purge and ban requests. As an example, we discovered that the user’s shopping cart doesn’t usually change very rapidly, so we can set a TTL of a few minutes on that ESI and not have to re-render it on every page view. This means a logged in user with stuff in their cart can still browse around without invoking Drupal too much, even though the page is personalised for them. In order for this to still work when they do add something else to their cart, we need to be able to selectively remove just that user’s cart block from the cache.

There are two standard Drupal modules that work in tandem to expire entries: Expire for firing events when things change, and the Varnish module to catch those events and transmit them as Varnish “bans” over a special backend connection to the caching servers. Because a ban can be based on the contents of any header, we can send a ban that affects only one user’s cart block like this:

ban req.http.X-Client-SessionID == “abc123” && req.url = ‘/esi/cart’

If Varnish matches those conditions on the next request from that user, it knows that their cart block is no longer valid, and it fetches (and caches) a new version.

We supply the TTL for the Varnish variant using the X-Varnish-TTL header, rather than using standard Cache-Control: max-age=x that is built into Varnish, as we wanted to preserve that header for the client. We borrowed the VCL for this from the New York Times blog. A custom module is used to perform decisions on what mode a non-ESI page should be cached in, and set the appropriate TTL and Vary headers. While it would be nice to contribute this to drupal.org as a generic module, unfortunately it’s highly custom to our requirements at the moment. If you’d like a peek, leave a comment on this post and I’ll share what I can.

Dealing with special cases

The final piece to our puzzle is the Cookie Cache Bypass Advanced module, which automatically sets a special NO_CACHE cookie whenever the user submits a POST form on the site, including things like the login form. Our Varnish is configured to bypass the page cache (but not the ESI cache) when it sees this cookie. The need for this module arises from the assumption by Drupal modules that they can set a session message (with drupal_set_message()) and have it displayed on the next page loaded. Obviously if the next page comes from cache, they won’t see that message until they browse to a uncached page later. We added some custom code that explicitly unsets the NO_CACHE cookie if the user has no messages pending in their session, as otherwise they would continue to browse around not seeing cached pages until the cookie expires (set to 5 minutes right now).

Statically caching with GZip

As well as caching Drupal output, we have also enlisted Varnish to cache static assets such as javascript and CSS for a short amount of time. We do this because it allows us to crank on-the-fly gzip up to maximum on the backend web server, but not suffer the per-request CPU cost of doing this without a cache. Varnish simply holds on to these assets in-memory and serves them quickly from cache, allowing us to have much smaller JS and CSS than we would otherwise have. Page HTML is similarly compressed to the maximum level when it is entering the cache.

Gotchas (with solutions)

Implementing authenticated caching in Drupal is not without its difficulties, and we came across quite a few problems that needed solving along the way. I hope that the following lessons we learned will avoid someone else having to discover them the hard way!

Module assumptions

As I mentioned earlier, Drupal was not designed with authenticated caching in mind, so many modules (and indeed Drupal core itself) sometimes assume that their code will run on every page view. Some of our own custom modules made similar assumptions. This means we now need to test new modules more thoroughly, and occasionally we need to put exceptions in our Varnish config to account for things like cookies that have special meaning.

One way we have side-stepped many initial problems is by emulating Boost: our default cache strategy is to only cache for anonymous users. Because Boost usage is quite widespread in the Drupal community, module maintainers have learned to account for it when designing their code. By keeping our default behaviour the same as Boost, we avoided having to re-test a lot of modules that we rely on. As we learn of things that are safe for global (authenticated + anonymous) caching, we enable it for them. One early example we considered safe was most of our static content and product pages, as these look the same no matter who you are. As we write new code, we make sure it sets appropriate caching modes (using Vary headers and the TTL header).

Some modules, like Google Analytics, have been more painful to support. The GA module writes out a block of Javascript into the bottom of the HTML including things like the current user’s roles. Because that HTML is part of a page that may be cached globally, there’s a good chance of an anonymous visitor reporting to Analytics as a logged in one. The solution to this is to override the source of the roles data. We write some data about the current user into the Drupal.settings object using a per-session cacheable Javascript ESI. This means we can write Javascript code that reacts to the current user’s roles, even if the page came from the global cache.

Form tokens

Drupal has native XSRF protection for logged-in users in the form of a hidden token inserted by Form API that is tied to the current user’s session. Because of this, it’s dangerous to allow caching of a page generated by an authenticated user that includes a form, as any other user who gets the cached copy will be unable to submit the form.

In cases where you’re sure that XSRF protection is not important, such as a product add-to-cart form, Drupal provides a mechanism to disable form tokens, like this:

$form[‘#token’] = FALSE;

or in a form_alter hook:

unset($form[‘#token’]);

Note that you can’t set #token to false in form_alter and have it work, it must be unset as above. This is poorly documented.

We have a global #after_build attached to all forms that guards against pages having a positive TTL when a form with a token is present on that page. This safety net protects us against any forms that we have inadvertently left tokens enabled for, preventing users getting the dreaded “Validation error, please try again. If this error persists, please contact the site administrator.” XSRF error.

Varnish ban failure

If your site relies on Varnish bans to remove content on events, you need to be very sure that bans don’t fail. We encountered a problem with the Drupal Varnish module where it would  randomly time-out while writing long bans to Varnish, which could lead to one of the Varnish servers purging something, but the others not. This leads to nasty inconsistencies showing to the user, like their shopping cart being out of sync with reality on alternating page loads. It’s important to log and deal with any ban failures. Always keep an eye on Drupal’s logs.

Varnish bugs

Varnish is relatively young software, and it has a few quirks.

We came across a really dangerous gzip bug in the ESI handling code that is not fixed in the current stable release (3.0.2) as of writing, but is fixed in trunk here. When Varnish receives a request where the client specifies they accept gzip (using Accept-Encoding), Varnish assumes that for a given URL, the server will either always send a gzip response (say, for a CSS file or HTML page), or never send the response gzipped (eg. a JPEG image). Unfortunately, web servers often have conditional gzip logic that will prevent, for example, MSIE6 from receiving gzip content ever. This means that Varnish may cache something requested by an IE6 user that doesn’t get gzipped, but it’ll cache it against the gzip variant anyway. This on its own is harmless, and would just cause some users not to receive a gzipped page even if they requested one, but when combined with ESI the problem is very serious. Varnish knows how to splice a gzipped ESI into a gzipped page without decompressing either of them, but it fails to check if what it’s splicing is actually gzipped, instead assuming that it must be if it’s against the gzip variant. If an IE6 user originally requested the parent page, but an ESI was generated and cached (gzipped) by a non-IE6 user, Varnish will blindly splice a gzipped ESI block into a non-gzipped page, causing lots of binary gobbledygook (as our testing team put it) to output on the page.

The solution: tell Varnish to perform backstop gzip if the backend refuses to, by specifying something like:

if (beresp.http.content-type ~ "^text/.+") {
  set beresp.do_gzip = true;
}

in your vcl_fetch block. This will ensure that things are gzipped if they say they are.

To work around the IE6 (and only early versions of IE6 had it) bug with gzip, you could just do:

if (req.http.user-agent ~ “MSIE 6”) {
  unset req.http.accept-encoding;
}

in vcl_recv, to prevent anything that IE6 requests from getting stored against a gzip variant.

We must trust cookies

One major pitfall with our setup is that we rely on cookies working properly for Varnish to know which variant to serve. We have found that in particular the ESI module’s cookie that stores the roles hash is a source of problems, as if it gets out of sync with the user’s actual roles, it allows per-role content to be cached for the wrong role, then shown to other users who have the correct role. Varnish doesn’t allow us to override anything the client sent when doing a Vary, so we have to trust that their roles cookie is right. This assumption simply did not work out well for us.

We worked around this problem by verifying that the roles cookie matches their actual roles on every request that actually hits Drupal. If there’s a mismatch, we set the TTL to 0 to prevent caching of the response, and attempt to send a corrected roles cookie to the client. The cookie set will fail if the request is for an ESI, so we have to wait until the user hits an uncacheable parent page to fix their cookie. By setting the TTL to 0, we at least contain the wrong-content problem to the user that has the wrong roles cookie, rather than polluting a cached copy of a page or ESI that other users will see.

A more advanced method that colleagues have been trialling with other sites is to misuse the Varnish restart functionality to implement a “cache router”, which makes every request loop twice. The first loop rewrites the URL to a special path (eg. /cacherouter) and passes the request through to Drupal. Drupal then reads the session for that user and writes back some HTTP headers specifying the user’s roles. Varnish writes these back into the original request object and restarts the request, this time performing a normal cache lookup. Since the first loop can be cached for a few minutes, this shouldn’t cause too much extra latency, and the role information comes from a trusted source, rather than a cookie. We may look at rolling this out in the future once it’s matured a bit more. There are other alternatives, such as using a database driver within Varnish itself to discover additional user data in vcl_recv, but these are more experimental in nature.

Overcaching

A common problem when caching is added to a system is overcaching. Consider the example of a page that is customised by something a user submits, for instance a search results page. It’s relatively unlikely that the same search query will be entered twice, so caching that search results page for a day is likely a complete waste of cache storage space.

With Varnish, the entire cache resides in memory (for speed reasons), and because memory is a finite resource, filling it up with extraneous cached pages is not good. It’s therefore important to identify variations of a page that are not likely to be requested again within the standard TTL, and either drop the TTL to 0, or reduce it to something very short (like 5 minutes) if there is a small chance that the same URL may come up again soon. This needs to be done both for special page types, like search results, as well as for regular pages requested with GET parameters that may be random in nature. Sometimes you do want to cache pages with GET parameters present, such as pagination variables, but the common case is you don’t, so starting by setting TTL to 0 where GET params are present and whitelisting any exceptions makes sense.

Respecting RFCs

The HTTP/1.1 RFC, section 14.9.1 specifies that clients should be able to bypass caches by sending a Cache-control: no-cache header in their request. Debate as to whether web acceleration proxies are supposed to respect this is ongoing, but sometimes it’s nice to have for debugging.

Allowing a client to purge your cache (eg. with a browser force refresh) can be a potential DoS vector if not properly secured. We discovered that there are some badly behaved bots around that send a Cache-control: no-cache request header, and were therefore bypassing our Varnish cache and putting a lot of load on our backend servers.

Our current solution is simply to blacklist certain known user agents that abuse this privilege, but in future we may consider removing support for this entirely. Security requirements sometimes overrule standards.

Where we’re at now

Through implementing Varnish with ESIs across the main Post site and Stamps and Collectables, we’ve progressed from the the homepage taking around 2.5 seconds to load for an authenticated user at peak time to around 10 milliseconds now, an improvement of 250 times! Of course this assumes everything is in cache already, but we’re aiming for that to be the common case for everyone. We still have some configuration to roll out to make authenticated caching more widespread, and we have work to do on making our uncacheable tools go faster, but we’re well on the way. We are now ready for most users to be signed in all the time while still receiving good performance.

Freeing up server resources to only serve the requests that really need Drupal actually makes those requests faster than they were before too, as the server processes requests quicker when there are fewer of them. The homepage now only takes around 900 milliseconds to render fresh, under half of what it used to take during peak traffic.

We hope that our lessons learned with Varnish will be useful to others heading down this path with Drupal, and prove that Drupal is perfectly capable of giving authenticated users a good experience with just a little bit of tinkering and a bucket full of buzzwords.

If you have any questions, comments or advice regarding anything in this post, please feel free to leave a comment below.

Thank you for reading this far, and good luck!

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

13 Responses to Making our site faster with Varnish

  1. Trev says:

    Thanks for posting this. Very interesting to see what goes on under the hood for a large website.

  2. typhonius says:

    > Security requirements sometimes overrule standards.
    Very true, so to are cases where rigidly following standards causes a reduction in service level, site speed or user experience.

    Have you tried combining memcache with varnish at all (like Drupal 6 Pressflow [http://pressflow.org/] has done)? I’ve had some experience with varnish, memcache and pound but would be interested to know whether you’d thought of other services to use and why to not use them.

    • neilnz says:

      Hi typhonius,

      We do use Pressflow for our site, but not with Memcache. There is a Varnish Memcache VMod available (https://www.varnish-cache.org/vmod/memcached) that allows VCL to directly pull bits from Memcache (including ESI content, with tricky use of vcl_error and synthetic), but more commonly Memcache is just used as a cache backend for Drupal itself, independent of the page cache. In our benchmarks we’ve found Memcache isn’t much faster than running the Drupal cache on MySQL with query cache notched up a bit, but in some scenarios offloading cache traffic to Memcache can be beneficial to free up the database to do real work.

      In general though there is not really a tie between using Varnish and using Memcache with Drupal. They serve different roles in this context.

  3. Pingback: Bookmarks for October 30th | Chris’s Digital Detritus

  4. lasse says:

    Some very good info and pointers, however I can’t seem to get it working all the way. What solution would you suggest for a site that simply sets a boolean value Cookie, which the site then uses to to choose which blocks to display (if the boolean is true, one block with some content is displayed, if the boolean is false another block with some other content is shown). I can’t seem to pass the cookie value on to an ESI, or include one esi based on the cookie value – while still having Varnish kick in?

    • neilnz says:

      You should be able to accomplish this using something similar to what we do, using a custom “Vary” header from the server when the ESI is served. This assumes that it’s the same block being displayed, but what is rendered inside that block depends on that cookie value (otherwise the logic would need to sit with Drupal to swap the blocks based on visibility rules).

      What you need to do is extract the cookie value (using something like https://www.varnish-cache.org/trac/wiki/VCLExampleCachingLoggedInUsers) and stick it preferably in the request object. If it’s just a boolean, you could do something as simple as (in vcl_recv):

      if (req.http.Cookie ~ "mycookie=1") {
      set req.http.X-Special-Cookie = "1";
      }
      else {
      set req.http.X-Special-Cookie = "0";
      }

      You now have a variable set as a request header, independent from the cookie itself (which Varnish doesn’t make it easy to vary on directly).

      Now you can choose to add the value of that cookie to the hash (in vcl_hash), but only when the patch matches the ESI for that block, or do as we do and set “Vary: X-Special-Cookie” as a response header from the block’s render function, then Varnish will do the rest of the work for you. If the block isn’t custom, you can still use a hook_init() to detect the block’s ESI path and add the header that way. Sadly the ESI module doesn’t support Vary headers yet though 😦

      I hope that’s helpful! Your problem is quite a common one, and not tightly limited to Drupal, so you may find further help on the Varnish wiki or mailing list.

      Thanks for the question, and good luck!

  5. Francisco says:

    Varnish imprimía caracteres raros al utilizar ESI pero ya después de entender para que servía esto lo utilicé y voalá! Funcionó perfectamente!!!.

    Esta línea resolvío mi problema después del fetch:
    if (beresp.http.content-type ~ “^text/.+”) {
    set beresp.do_gzip = true;
    }

    Los caracteres raros se fueron..
    Gracias!!!

  6. fivenoom says:

    Para que varnish no imprima caracteres raros al utilizar ESI solo hace falta agregar esta línea en el principio de vcl_fetch:
    if (beresp.http.content-type ~ “^text/.+”) {
    set beresp.do_gzip = true;
    }

    =D saludos a mis hermanos hispanohablantes!!

  7. fivenoom says:

    Thanks to the author of this site for this post!!!!

    • neilnz says:

      No problem! Happy to have helped solve your problem. You may want to test with a browser with GZip disabled though, or curl (without –compressed) to make sure that Varnish doesn’t serve gzip content to non-gzip-requesting browsers.

      Eventually the fix from https://www.varnish-cache.org/trac/ticket/1029 should make it into a stable release and this kind of thing won’t be needed anymore…

  8. David Gil says:

    Hi Nelinz,
    really helpful post!. I am trying this ideas and it could be nice to take a look at your Varnish VCL config file and custom TTL module!.

    Could you send me anything? Best and Thanks in advance

    • neilnz says:

      Hi David,

      I’ll post the bits I can.

      http://pastebin.com/XKkevWCE is our default.vcl which is generic for multiple sites
      http://pastebin.com/N7x81zwY is nzpost_local.vcl which does the session/role cookie extraction from the ESI module’s cookies (has lots of domains it understands, this is just an example)
      http://pastebin.com/urCTyUFc is the nzpost_cache module, which sits in between the ESI module, the Expire module and the Varnish module to control our particular Varnish config. This is used on multiple sites that share this Varnish config.

      Some of the above have some alterations and omissions from what we actually run, for security reasons (or to not bore you with site-specific stuff). As you can see, it’s highly custom, tightly coupled and not necessarily easily reusable, but it’s evolved that way due to our complex requirements around cache control and variants of pages for authenticated users in particular.

Comments are closed.