Always, always, always check the data

How many anomalies does it take to get a computer scientist to check the data? Too damn many, it seems.

(I thought about off by one errors and overflow jokes, but meh.)

I’ve just traced down the source of a bunch of nice little conflicts between various tests of a system (and between some of our tests and our expectations). What was the deal? The deal was that in several long running tests we were overwhelmingly testing cache behavior which was very fast. Since the cache was the same between different inputs, once you hit the cache all the time, even wildly different inputs looked the same.

What’s the worst part? The worst part is that I saw this earlier, but discounted it as “warm up” time. Which it was. Of the cache. D’oh.

So, now I can explain why we’re seeing a 2 order of magnitude slow down in performance in our new tests. We turned off the cache.


Oh well, at least we didn’t break the system. OTOH, I’d rather have those cache times as our real times…

1 thought on “Always, always, always check the data

  1. “Cache was the same” in the sense that retrieving stuff from the cache has the same performance no matter how it got into the cache. There were no cache flushing so basically the cache made all models look the same, performancewise.

Comments are closed.