What are the units to test in an ontology?

Every now and again someone suggests that we need or should have “unit tests” for ontologies. This isn’t surprising since ontology engineering generally looks to software engineering for inspiration and unit testing is still reasonably hot. We don’t know a lot about testing ontologies, so picking a hot testing methodology is an easy way to go.

(Of course, ontologies have some testing built in, e.g., checks for consistency and class satisfiability. So it’s not all that dire.)

It is, however, not clear that unit tests, per se, make sense for ontologies, and even less clear that the “test first” methodology associated with them is appropriate. The main issue that cropped up for me is what is the unit we should test? The Art of Unit Testing says that a unit is a function or method. The Wikipedia article says that a unit is “the smallest testable part of an application.” These definitions seem to line up. A class or module is too big for focused tests and a line of code is too small to test at all, but a function or method is just right. Or right enough.

What’s the smallest testable part of an ontology? I don’t know! Perhaps an entailment? But entailments seem more like program outputs or behaviors. A term? (In OWLspeak, an entity?) I don’t know how testable terms are. We do test classes for satisfiability systematically. That’s good. Is there anything else to test about them?

I suppose we could test for key entailments, e.g., whether they are subsumed by something else. Of course, classification already “tests” for this. But really classification merely determines whether an atomic subsumption holds, for all possible atomic subsumptions. It doesn’t check for non-atomic subsumptions, nor does it throw a warning if a desired subsumption fails to hold or a desired non-subsumption falters.

So, perhaps we should write our desired subsumptions (and non-subsumptions, though we’ll want a short hand for that) separately, and match the results of classification with the “should subsume” and “should not subsumes”?

If this is true, then coverage is going to fly out the window, I think. It’s not just that we won’t, as a matter of course, not write enough tests, but that if we did, we’d wreck the ontology development process and its key advantages. Essentially, we’d destroy discoverability. We’d have to specify up front exactly all the things we generally hope the reasoner will tell us!

This isn’t to say that supporting unit tests of key entailments for key classes isn’t worth automating. After all, we do sanity check key subsumptions, just not in a very nice way (i.e., by remembering to inspect, or redoing queries in the DL Query tab in Protege 4). Having a format for tests and a nice test harness would allow us to repeat our testing more easily and to share them with other people.

The simplest format I can think of is just another OWL ontology where each test entailment is written as an axiom in that ontology with an annotation “should hold” or “should not hold”. The test runner simply loops over those axioms and checks to see whether they follow from the tested ontology.

Eventually, we may want to write more sophisticated tests using, e.g., SPARQL/OWL.