Monitoring File Changes in PHP
I’ve been working on a small BDD test framework, and I found myself wanting to implement a
--watch option. When the flag is set, the test runner would watch the current directory and re-run all specs when a change occurs. Though PHP offers the
inotify extension, I wanted this option to be cross-platform and work without a PECL extension. So I decided to write my own implementation.
Monitoring directory changes can be done using
fmtime() to get the last modification time for the directory. This includes files being renamed, added or deleted in a folder. By polling every second, for example, we can see whether or not such a change has been made to the directory tree.
However, it doesn’t cover modifications to the contents of individual files. For that purpose, I have to track every file in the directory and its sub-directories. I thought of two solutions for doing this, each with their pros and cons. The first solution would be to call
stat() on the file to get both its modification time and size. I could then store these values, and simply check against them on subsequent polls. If either the modification time or size differed, I would re-run the tests.
Though efficient enough, this method would suffer from missing modifications if they occurred within a 1-2 second frame, without changing the file size. This would be the case for the ext3 and fat filesystems. Though it might be rare, it would be an annoyance when performing quick edits due to typos.
A slower, more effective alternative would be to calculate and store the sha1 digest of the contents of each file. This way we’re no longer relying on the time resolution of the file system, but rather each individual bit of the file. This would definitely be slower, but by how much?
I hoped to answer that with a benchmark. I decided to write a script that would test both methods against a directory. It measures the time taken to recurse the directory and get the modified time and size of .php files, as well as calculate their sha1 digest. The test input consisted of the contents of Symfony_Standard_Vendors_2.3.6.tgz – a copy of Symfony 2.3.6 with necessary vendors already installed. The folder is 22.8MB in size, and contains 7,169 files. The test runner would only be tracking .php files, so the number of tracked files would be smaller:
1 2 3
Among those php files, there’s a reported 108,488 lines coming in at over 13 MB.
1 2 3 4 5 6 7 8
And now for the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
The script above calculates the average, minimum and maximum time spent iterating over all 3836 php files and applying both methods. The average is calculated over a sample of 1000 iterations. The results, on a 1.7GHz dual-core i7 MacBook Air with an SSD:
1 2 3 4 5 6 7 8 9 10 11
Referring to the output above, using
sha1_file is ~108% slower than using
clearstatcache() followed by calls to
stat(). For the intended use, I might be willing to sacrifice that bit of performance for greater accuracy. And though both are much less efficient than using inotify to listen for events, it’ll mean simpler installation and use.
If anyone has other ideas, I’d be grateful if you could share them!