I made a nice little shelf to get my computer off the floor to help the machine avoid sucking cat hair in (after a dead video card). I, quite foolishly, built it to lay the case sideways.
I made a nice little shelf to get my computer off the floor to help the machine avoid sucking cat hair in (after a dead video card). I, quite foolishly, built it to lay the case sideways.
A depressing sight, if I've ever seen one.
In one of my projects I need to create and use several fairly large, non-changing data structures — hash tables of string values, array’s of objects, nested object definitions several layers deep, etc. For example, from my battlefield 2 stats website I have several very large arrays that describe every award you can get, a sample:
'9' => array(
'short' => 'MgySgt',
'long' => 'Master Gunnery Sergeant',
'unlock' => true,
'notes' => '',
'thisrank' => '9',
'nextrank' => '12',
'requires' => array(
'awards' => '',
'rank' => '7',
'score' => '50000',
'round' => ''
)
),
So the question arises, how do you store this data, load it, and use it?
Of course, I found a page on StackOverflow about php storage. Perusing the article, you’ll see most people break down the caching theory into three options:
One responder (who should have more votes, if it was up to me) went pretty far and did a serious amount of benchmarking for the php array storage issue.
I won’t go into a lot of details on each method as the linked articles cover them each pretty well, but I want to open two other options and redo the benchmarks because something smelled fishy when JSON has been coming out the winner…
JSON certainly has some advantages: the cached file sizes are smaller (especially when you saving a numericly indexed array), it is usually faster to write, and if you have a highly JS oriented site architecture being able to make HTTP requests for your JSON data files can be quite useful.
What irked me is that, for my purposes, none of those things really truly matter. File size isn’t a real concern because my largest cached file weighs in at about 50KB. I won’t be distributing the data to JS applications, and writes only happens once.
So, what does matter? Read speed. And with that in mind, what I found out is that JSON can’t hold a candle to the other two methods.
I’d first like to rebut the cons list that var_export was given on Procurious.nl:
- Needs PHP wrapper code.
- Can not encode Objects of classes missing the __set_state method.
- When using an opcode cache your cache file will be stored in the opcode cache. If you do not need a persistant cache this is useless, most opcode caches support storing values in the shared memory. If you don’t mind storing the cache in memory, use the shared memory without writing the cache to disk first.
- Another disadvantage is that your stored file has to be valid PHP. If it contains a parse error (which could happen when your script crashes while writing the cache) your application will not work anymore.
str_replace('stdClass::__set_state(', '(object)', var_export($data,true)) done and fixed.I find it odd that the opcode cache was so readily dismissed; it’s a surefire way to speed up ANY of these operations. To that point, let’s introduce two other caching methods that an that can make unserialize and json_decode even faster. How? Just wrap the data up inside a PHP file!
json_data_cache.php:
<?php return json_decode('{"your": "json", "goes": "here"}');
The same treatement can be applied to unserialize. By introducing this boiler plate routine into your serialized/json_encode’ed data you can take advantage of the opcode cache. When you’ll see later is that even if you decide to go with JSON, doing this will lead to a significant speed up.
Here you will find the test script for benchmarking php caching with json, serialize, and var_export. A quick overview of how this is intended to be used:
$setsize value is the same.The data generated by this script is fairly comprehesive. It will test arrays of strings, ints, and objects. The arrays are indexed by numeric and string keys. Additionally, it will test two object collections. In all cases, the script creates a Large, Medium, and Small set of data based on the $setsize value (100% size, 50% and 10%).
When you are running the test, I highly recommend you run with the same $setsize and $itterations value with APC off, APC enabled (apc.stat=1), and finally APC enabled(apc.stat=0) restarting your webserver between each execution. apc.stat instructs APC to check the file for modifications to see if it should be recompiled before using the opcache variant. Turning this off eeks a bit of performance out of APC at the cost of needing to manually clear the APC cache if you update files. The testing, however, shows that you may or may not think this is valuable — the performance difference is very, very minimal.
Also, for the APC enabled test runs, perform then twice to ensure that your APC cache is primed.
I ran this test on a Fedora 15 virtual box on top of a Win7 install on a Corei7 machine with the settings $setsize=2500 and $itterations=200. The webserver was Apache/2.2.21 with PHP 5.3.8 with APC 3.1.9.
Here’s a graph comparing the best runs of each decoding situation:

And an ODS (Open Office Calc) file of the raw results and a few other graphs.
What you are looking at here is the decoding of each type of data structure by method, as compared to the fastest method, which is var_export with APC.
var_export without opcode caching is by far and away the very slowest methodIf you have any suggestions, feedback, or wish to contest the results, feel free to leave a comment or submit a patch.
1 If you have a small enough data cache, you should be reading up on Redis, and this whole discussion becomes pointless very quickly.
:D :D :D :D
Had a nice long post about this, got eaten by expired session. Engage tl;dr
"Bonerz!" :D
Who doesn’t like a little weekend project, right?
A combination of things has recently pushed me towards moving away from Wordpress — primarily the ever present threat of a high profile suite of software requiring constant upgrades to keep safe. Because of how rarely a I post (sometimes less often than Wordpress releases updates) I run a fairly high risk of having security holes in active code on my webserver. Between that and the fact that I simply don’t require such up-to-the-refresh dynamic flexibility (typically I only make layout changes once every 2 years or so) it just doesn’t make sense to use something so dynamic, thus I am ready to stop playing the upgrade game1.
Being that I keep exposed to new technologies, the Jekyll project has come across my path a few times. Between that and wanting to play around with the Phar format, I decided to have a go at the same concept written in PHP.
First, I looked around for some alternatives, and I did find one: Phrozn. However, while Phrozn is geared towards static site generation, it lacks some functionality unique to blogs: namely “tags” and chronological pagination.
So I figured, if the world could use one more PHP MVC framework2, then why not another static site generation script?
Thus, Alkemy was born.
Alkemy, like Jekyll, will take a list of posts, parse them up (Textile, Markdown, HTML, or just plaintext), smash them through some template files, and smoosh out a website. The website is fully generated and completely static. You serve nothing but set-in-stone HTML files and other non-code-interpreted content. This has 2 really big advantages:
Of course, one must eat his own dog food, so this blog is now managed by Alkemy. I have a local copy, versioning using Mercurial, where I get things written and tested. After I’m happy with my latest post, I commit locally, push to the server, and have the server run an update. My webserver has the repo’s “site” folder setup as the docroot and “TADA!” the site is updated.
You can find Alkemy at http://alkemy.info/ along with documentation and examples.
1 I have often wished Wordpress offered a LTS version of their software for this sole reason.
2 Not really.
UPDATE: Turns out static blogging like this kind of sucks. The overhead in producing the content, editing it in the right area, re-running the generation, and getting it synced up is too much of a burden. Compared to something like “Login, type, post” or even easier, bookmarklets (ala tumblr/posterous), it’s just no comparison. As such, this blog never even got off the ground with that code (not externally anyway) but instead is running some custom tumblresque software. Alkemy will stay released, in any case.
I rewatched Firefly last weekend, got to the 3rd disc, and shed a single tear when there were no more discs to load into the machine.