PHP and Web Cache: The State of The Art

Spread the love

PHP Cache, Object Cache, Client Cache, CDN, Web Cache happens at every levels of the connection. This guide will list and explain the different cache options and software available for a PHP applications.

 

What This Guide Will Cover

I will cover the different caching solutions on the server side, for PHP applications.

Web Server Cache 9 levels
Every levels of web cache possible: client, server and CDN

As you already guessed it, this guide is for IaaS users only. You cannot tune what you don’t have access to, on a classic web hosting.

Also, this guide has a focus on nginx. Indeed, Apache can use fastCGI, however I switched to nginx for many good reasons and there is no turning back. Nginx clearly outperforms Apache at every levels.

 

Why I Decided to Write This Guide

Allow me to spare your time by giving you the conclusions right away: when you want to learn new skills on Web technologies, always look at the date of the posts you read. Always discard the ones older than 2 years. On the other hand, if you have the time, please be my guest and go on 😉

So, when I started optimizing my blog 5 months ago, I thought I knew a lot about web technologies. I was smug and ignorant.

Having some years of experience with Apache httpd, I knew the difference between memory and disk caching. Yet, I stumbled upon a lot of information and also names completely new to me: APC, APCu, OPcache, Redis, etc. Everything was confusing and soon enough, I entered into a new software installation frenzy, as excited as I was by a new hype terminology I just discovered: object caching.

OMG. A lot of blogs and stack overflow readings later, I thought I was smart enough to understand that I needed an object cache and a PHP code cache, but I couldn’t decide what technology to use. Worst, I realized that some of them were just incompatible or overlapped, especially when I started adding the related plugins to WordPress. Even more confusing, some plugins such as W3 Total Cache can handle cache at multiple mixed levels/software: Redis, disk cache, Memcached, OPcache…

Cherry on the cake, I was using PHP 7. Therefore, a lot of advices and guides I read just didn’t make any sense when I tried to apply them, since the software mentioned were not available on the usual apt/PECL repositories of my Ubuntu 16.04 server.

Turns out, that’s because Google and others do not discard irrelevant and outdated information. They give access to content based on their popularity. Therefore, as long as people continue to look for and write about outdated technologies, old posts and guides about deprecated software will continue to lurk around!

 

What Is Caching?

Caching is a method for improving server performance by allowing commonly requested content to be temporarily stored in a way that allows for faster access. This speeds up processing and delivery by cutting out some resource intensive operations, given that the cache medium is fast enough.

Every connected software you use do some cache operations. Think about it, the RAM in your desktop or tablet is also a form of cache for your Operating System. You know how fast the RAM is, right? You don’t want your CPU to wait for a slow hard drive to get its data from, do you? Conceptually, your computer could run without RAM and communicate only with the storage medias. Hopefully, that’s not what it does.

Same for internet: every time you refresh a page, loading time goes down. It’s less noticeable today with our modern cable and fiber connections, but on pages with big pictures, you can certainly feel it. Edge, Chrome, Opera, Firefox, Android, iOS, Safari, they all use a local, temporary file cache. The benefits of a cache are obvious. Do I need to list them? Files already downloaded are cached, so they are accessible at the speed of your hard drive, instead of being re-downloaded every time. Less bandwidth used, and less lag while surfing. Lags are due to the asynchronous connection mechanisms: IP packets can be lost, duplicated, retransmitted, delayed, take different routes between you and the server, etc.

Therefore, client caching is implemented everywhere, at different levels. You can now understand that caching at the server side can also be potentially beneficial. The reasons are different, though.

 

Why On Earth Would You Want to Enable Server Caching?

Because unless you live in the 80’s, you are serving dynamic content. Caching static content has no interest. Unless you have a super busy site, you cannot possibly notice a better performance by caching the disk content in memory. Disks access time are quite low, even though the RAM is a million times faster. Debates here and here forget to mention what they are comparing: what is access time? Throughput? Bandwidth? Goodput? Modern disks access time are still as low as milliseconds, and a million other factors will make you realize that serving content from RAM is NOT as fast as the nanoseconds advertised. And what are you serving? Static web pages are often smaller than 50KB. A 2015 study reported that the median web page was around 66 KB!

The problem is not the HTML code itself, it’s all the junk carried away: CSS, JS, images, etc. HTTP Archive gives up-to-date results from millions of popular websites. In 2018, median web page is 30KB… That’s so small. The total median size on the other hand, all included, for desktops, is bigger with 1.7MB.

Web_Server_Cache-median_web_page_30kb_2018
Median web page size is 30KB in 2018

Add the network latency, server latency, and medium transmission delays on top of that, and I doubt you can see the difference. And correct me if I’m wrong, but isn’t caching static content in memory a waste of money? Is the price for many Gigabytes of RAM to serve static images worth it? That’s not what people do.

What some people do is caching the full rendered page in memory with Redis or Memcached. Why? Maybe because they didn’t run load tests to compare it against the fastCGI disk cache? It’s pointless for many reasons, which I will demonstrate later on.

 

Types of Server Caching

There are 3 categories of caching systems on the server side: Memory code caching, Memory object caching, and Disk file caching. Objects can be anything though, including files. By file I mean any file: the generated HTML code that makes a page, a CSS, an image…

Memory Code caching

For PHP: Xcache, OPcache, APC, and others, are PHP extensions which allow to cache the generated OpCode to avoid reprocessing accessed php files every time. Only Zend OPcache is currently maintained.

For other servers such as Java or ASP, other third party solutions may be are available.

Memory Object Caching

Redis, APCu, Memcached, are advanced in-memory caching systems. They can cache anything, since objects can be anything.

  • Redis is often use for object caching, because it’s a kind of optimized mysql and you can use it to process the long queries instead of mysql.
  • APCu is a stripped version of APC, with only memory caching system
  • Memcached is an old memory caching system, as fast as Redis but with less options
  • Apache only offers memory cache for SSL sessions with socache_*.so

Currently, Redis is the most popular and powerful solution. Memcached offers less options, and APCu is local to each PHP server, therefore not clusterizable.

Disk File caching

  • FastCGI is a binary protocol for interfacing interactive programs with a front web server. In practice, it’s a module extension also used for static file caching of dynamically generated pages, stored by their URL. Therefore, content can be redundant as different query strings may generate the same pages. Caching URL with query string is optional. A list of web servers that implement FastCGI is found on Wikipedia.
  • Varnish is a proxy for static files caching, works just like fastCGI and can do more
  • Apache has different extensions available for static disk caching but I’m not sure they are maintained.

, and Varnish is quite hard to setup and requires some tricks to be used properly with SSL. I don’t know the truth but I’ve seen many posts with complaints about it.

 

PHP Code Caching Explained

What is OpCode?

In computing, an OpCode (abbreviated from operation code) is the portion of a machine language instruction that specifies the operation to be performed.

PHP is a scripting object language, that is compiled on the fly to produce machine code, i.e. the binary code that will be directly executed by the CPU. OpCode are actually the instructions codes used within the machine code. It is different from the Bytecode produced by a Java compiler in that the Bytecode processor is usually a program (example: JVM). Is that clear?

Long story short:

  • Java code javac Bytecode JVM OpCode CPU
  • PHP script PHP binary OpCode CPU

Now, you know that every time you call a PHP file, PHP will compile the code so the CPU can execute it.

Comparison between OpCode and Bytecode:

[table-wrap bordered=”true” striped=”true”]

OpCodeBytecode
Type ofMachine Language InstructionMachine Language Instruction
DescriptionIs a type of code that provides the computer with instructions indicating what to do with the data provided.Is a form of instruction set designed for efficient execution by a software interpreter.
What it doesInstructions for operations on dataInstructions indicating what to do
Run inRun by the machineRun in a virtual machine
Used byHardwareSoftware based interpreter like Java or CLR.

[/table-wrap]

 

Rationale For OpCode Caching

PHP accelerators work by caching the compiled OpCode from the PHP representation of php files to avoid the overhead of parsing and compiling source code on each request. This substantially improves PHP applications from 3 to 7 times and more! Caching the compiled code in memory instead of constantly compile it over and over again every time a PHP file is called has 3 benefits:

  • it minimizes the disks I/O accesses
  • it minimizes memory delays for accessing/reading/load the files
  • therefore, it minimizes the CPU required for applications since it’s already cached.

Benchmarks that prove its efficiency with Drupal are found here and here and here.

PHP OpCode Caching Solutions

An exhaustive list is available at Wikipedia so I am not going to rewrite it. Just FYI, Microsoft also provides an OpCode /file cache extention, but it’s for Windows only, with a special build of PHP paired with IIS.

Below are the most famous ones I read about and some quick facts about them:

  • APC: Formerly used by Facebook, dead since 2012 / PHP 5.4
  • Xcache: dead since 2013
  • Turck MMCache for PHP: dead since 2013/ PHP 4.3
  • eAccelerator: 2004 fork of Turck MMCache, dead since 2010 / PHP 5.3
  • Zend OPcache: renamed from Zend Optimizer+ in 2013, officially shipped with PHP since 5.5

What to remember: Every OpCode cache comments and wikis you will find on internet in 2018 are deprecated, since all the available software are discontinued. As of 2018, there is only one OpCode cache software, cross platform, seriously maintained, and shipped with PHP that you should use: Zend OPcache. Period.

Why the Confusion Around APC/APCu/OPcache?

There is a mix up usually because these extensions are about two unrelated technologies: OpCode caching and key-value data store. For CMS like WordPress/Joomla/Drupal you preferably want both or at least OPcache.

  • OpCode caching is really the “normal” way to run PHP, and lack of it is essentially crippled shared hosting way
  • Data stores can be used by CMS object cache plugins for better persistent caching, for heavily dynamic applications with lots of user interactions

So to summarize:

  • APC is OpCode cache and data store
  • APCu is a stripped version of APC, it’s only a data store (like Redis or Memcached)
  • OPcache is only OpCode cache

Since APC is older and the other projects defuncts, at the moment you only want OPcache. as well as some data store, not necessarily APCu (although it is perfectly fine choice).

 

Object Caching

Whereas OpCode caching is transparent at a source code level, data caching is not. Your application needs to be coded explicitly to use it. Standard PHP applications such as CMS: WordPress, Drupal, vBulletin, MediaWiki, phpBB, etc … include this support by default.

What Is Object Caching?

That’s the first question to ask, before trying to enable object caching. Personally, I didn’t know what I was doing months ago, and I tried APCu, W3 Total Cache + objects cache option, then I tried Redis Object Cache, because I wanted to be in the “hype”. People around were asking questions about how to enable this and that, and I thought I should do it too.

This is stupid. I was stupid, and I don’t want you to be stupid.

So, to take a CMS like WordPress is a good example. What do WordPress cache as objects? WordPress codex gives a list of functions your code can call to cache objects, but don’t say what objects are. WordPress developer wiki gives a partial answer:

The WordPress Object Cache is used to save on trips to the database. The Object Cache stores all of the cache data to memory and makes the cache contents available by using a key, which is used to name and later retrieve the cache contents.

A Stackexchange later, you have a better answer. Using the stat() method of the WP_Object_Cache structure outputs what’s in an object:

word image

Cache Hits: 110
Cache Misses: 98
Group: options - ( 81.03k )
Group: default - ( 0.03k )
Group: users - ( 0.41k )
Group: userlogins - ( 0.03k )
Group: useremail - ( 0.04k )
Group: userslugs - ( 0.03k )
Group: user_meta - ( 3.92k )
Group: posts - ( 1.99k )
Group: terms - ( 1.76k )
Group: post_tag_relationships - ( 0.04k )
Group: category_relationships - ( 0.03k )
Group: post_format_relationships - ( 0.02k )
Group: post_meta - ( 0.36k )

 

As you can see, all these data come from queries to the database and are now stored as key/pair values in the Object Cache system. It can be the fully generated page, author name, post ID, etc. It can be Anything.

Other CMS will have a different structure for the Object Cache structure but same principles apply. The ultimate goal being to save your code from issuing SQL queries to the database.

Is Object Cache Relevant For CMS?

  • For single instances of WordPress, NO. Absolutely NOT.
  • For clustered, distributed applications, with load balancing, YES.

Let me ask you this. What controls the cache access? Is it some sort of magic that stores and deliver objects to and from the cache so your PHP application will spare itself the time to process SQL requests? What about objects management and cache invalidation? PHP processing time is needed to communicate with the cache system. And a lot of it.

Let’s see which objects you can cache with WordPress:

  • SQL results
  • transients (temporary objects, options, data, or queries)
  • Any data/array you like

So you need to code some PHP to create, manage, retrieve, and handle expiration for these cached objects. Do you think it’s CPU free? No it’s not, jump to the load test results to see how.

About the SQL queries and database trip saving argument, please consider a modern version of MySQL or MariaDB. Do you really think every request are run through the database? No. They both have a memory cache area called Query Cache, which stores intelligently the SELECT queries for which the output didn’t change. Therefore, MySQL uses practically zero CPU on a regular basis. Every frequently used SELECT queries are cached already.

So you want to add another cache layer in between, for which you really need PHP CPU time, and you claim it makes your application faster? Please… It makes your application slower, but in distributed environments, it’s just a necessary evil.

What Is Object Cache Relevant For?

  • For large object collections applications such as Dropbox, Netxtcloud, etc.
  • For large web galleries with large collections of thumbnails, such as Pinterest, yeah. This is where you want to store the static images in Redis and deliver maximum throughput for large number of users.
  • For clustered applications. Cached objects must be shared between every instances.

These are the cases where using Redis makes sense. But you need RAM buffed servers, and that’s a cost. And you cannot spare yourself from using Redis paired with APCu for file locking for it’s faster and every millisecond count.

For everyday WordPress? Nope. Unless you want to serve and cache a large collection of thumbnails and can afford to pay for Gigabytes of memory? RAM is always more expensive than disk space. Even though, I’m not sure it would be faster anyway. Today’s SSD performances have so much improved that they are not the bottleneck they used to be.