This is a log from a conversation between myself and bharat, a developer from the Gallery2 project about testing software in general, with an eye toward setting up unit testing for WP. It took place in #wordpress on 30th Dec 05.
<bharat> hallo :-) <davidhouse> bharat, what do you use for it? phpunit? <bharat> we use a hacked up version of one of the forks of phpunit <bharat> when we started unit testing ~3 years ago there weren't any mature phpunit frameworks yet <davidhouse> SteamedPenguin, snap. <davidhouse> hmm <davidhouse> i tried out phpunit, but it seemed too complex. <davidhouse> all this messing around with test harnesses and enough classes and reflection to sink the titanic. <bharat> davidhouse: is your goal to do unit testing or regression testing? <davidhouse> err, we haven't decided yet. this is very informal discussion. <davidhouse> we just want it to Not Suck. <bharat> unit testing and regression testing are very different beasts <bharat> they help you at different levels of the product development cycle <davidhouse> what's regression testing? <bharat> http://www.devbistro.com/articles/Testing/Testing-Terminology-Glossary <bharat> davidhouse: in that glossary, I'm referring to a "system test" <skippy> "The objective of system test is to measure the effectiveness and efficiency of the system in the "real-world" environment." <davidhouse> right, that makes sense <bharat> system tests are not particularly useful in helping you to refactor the code <davidhouse> you use them at the release-candidate stage? <bharat> we don't have many system tests <bharat> we have ~1800 unit tests <bharat> I need to go purchase a clover license to figure out what our code coverage is <skippy> how many developers have commit access? <bharat> but our goal is to have unit tests for all the model/controller code <skippy> davidhouse: I meant in gallery2 <bharat> let's see <skippy> is there a split between developer and tester, or test writer and test executor? <bharat> roughly 10 <bharat> give or take <bharat> no, the developer writes the unit test as he writes the code it's testing <skippy> ok <davidhouse> how does it compare in size to wordpress? <bharat> http://codex.gallery2.org/index.php/Main_Page#id3006022 <davidhouse> i really should be a lot more familiar with gallery2 than i actually am <bharat> http://fisheye.gallery2.org/viewrep/gallery <bharat> this is better: http://codex.gallery2.org/index.php/Gallery2:Developers <h0bbel> The codebase of G2 is way bigger than WP's <davidhouse> okay. <davidhouse> so the number of unit tests for WP shouldn't be out of control. <davidhouse> like, 1500 at max? <bharat> I'm currently refactoring our main data representation, which is resulting in roughly 400 files changed. It's helpful to make a fundamental change then run all 1800 unit tests <davidhouse> i don't really know <bharat> roughly how many lines of code is wp? <bharat> expect to have about the same amount of lines of unit test code <bharat> a 50/50 ratio is reasonable <davidhouse> yeah <bharat> it'll probably be a lot more though unless you've got suitable abstractions or another way to create a seam <bharat> since in order to effectively unit test you need to be able to mock up the code you're not covering in a particular test <bharat> that's usually challenging if you're starting writing the tests after the code/design is more or less complete <bharat> this is a useful intro to mock objects: http://mockobjects.com/Faq.html <bharat> davidhouse: there's a lot of really good material out there to help you figure out how to start <davidhouse> yeah, i get that impression :) <bharat> but from what I know of the WP situation, it sounds like you should start with some characterization tests <bharat> that will help you establish your initial invariants so that you can develop unit tests and begin refactoring as necessary <davidhouse> could you sum up characterization tests quickly? <davidhouse> i should really get to know this side of software development in more detail. <bharat> sure. they are simple functional tests (see glossary link above) that let you measure what your current codebase does <bharat> so for example, if you have some code that generates a menu, you could write a characterization test that exercises the menu, captures the output and compares it to a "golden file" <bharat> when you write the test, you capture the initial output and save it as the golden file <davidhouse> right <bharat> now you've characterized your code. if you make a change, your test may fail because you've introduced a difference from the golden file, so you can examine the difference and determine whether or not this is an expected change <bharat> and update the golden file <bharat> this lets you know for sure if you're changing your behavior <bharat> which gives you the freedom to go in and hack things without worrying about weird side-effects <davidhouse> right <bharat> that's the upside. the downside is that they typically are brittle because they're testing many levels of functionality. unit testing each individual level will give you more stability <bharat> so usually when I go into a legacy codebase and want to introduce testing I start with a characterization test <bharat> then I start refactoring the code to introduce abstraction so that I can write unit tests <bharat> once I have good unit tests and have achieved the coverage level I want I may delete the characterization test because it's no longer necessary <davidhouse> bharat, right, so you write the characterization tests to make sure you're not breaking anything in a major way whilst writing the unit tests? <bharat> davidhouse: right <davidhouse> bharat, is unit testing the only testing you do? <bharat> davidhouse: we have a lot of consumers of our nightlies so we get a lot of manual testing <davidhouse> yes, that's true. <bharat> usually a 2 week interval on an alpha/beta/release-candidate is enough to shake out most of the issues <davidhouse> we need to expand our manual test userbase. <bharat> I'd estimate that we have 3-400 people using CVS and nightlies <davidhouse> we only have a limited set of people covering a limited set of functionality <bharat> our issues are almost always rendering problems because the rest of the code is covered pretty well by the tests <bharat> those are usually very easily fixed <davidhouse> what kind of level do you unit test at? individual functions? <bharat> yes <bharat> back when I started doing this I didn't know enough so I didn't mock out the database <davidhouse> right. <bharat> which is unfortunate because it means that our tests use the db which means they run slower than I'd like <davidhouse> you mean you didn't abstract to a db access layer you could swap at will? ;) <bharat> heh <bharat> we did, actually <bharat> right now we have enough abstractions that we support mysql, postgres, oracle with db2, firebird and sqlite in the works <davidhouse> and the API stays the same <bharat> but for our testing we don't swap out the db layer. <bharat> it's on my list though. <davidhouse> yeah. :) <bharat> because we have an abstraction at the right level it's probably on the work of about a week or so <bharat> there are many things to like about unit testing (and some things that many don't like) but one thing that I find is that it drives the right level of abstraction <davidhouse> bharat, this has been very valuable advice. <davidhouse> along with 'get the damn php debugger working', i think 'learn how to test right' is going to be a new years resolution. <bharat> davidhouse: I'm always around. we've been doing this for a couple of years now (and I do a lot of test driven design at work) so come on by #gallery any time you want to talk <davidhouse> thanks. i appreciate it :)