Sunday Baking: No Knead, “5 minute” Whole Wheat Bread

Zoe wanted an incentive to get up earlier in the morning and I’d been baking a lot of bread. Many doughs can be kept in the fridge overnight and even benefit from doing so. But the brioche I’d been make still took a lot of time the night before.</;>

Then I remember I’d explored the “bread in 5 minutes a day” approach

. Basically, you make ≈5lbs of no knead bread dough and keep it in the fridge. When you want a loaf, cut off a chunk (usually about 1 lb), shape it, proof it, and bake it.

It works out pretty well!

The main issue I had was my container was too small. So it overflowed in the first warm fermentation and the first overnight cold ferment. Oh well!

If you can spare the fridge space, I really recommend this approach. It ain’t no five minutes, in my experience, even only counting the “active” time. But it’s not a lot of time. And you can have a fresh loaf any day with considerably less effort than starting from scratch.

One trick I came up with is using a cast iron frying pan as a “baking stone” instead of my dutch oven. Much easier to manage and has been working out well.

A Bioethical Dilemma in the UK Vaccine Rollout


The government and NHS England has ruled that GPs must move the second dose of the vaccine to 12 weeks after the first, instead of the originally planned three-week gap.

And from Monday, January 11 they were informed that they had to postpone any second dose vaccines they had originally booked within the timeframe.

Since December 15, 2,000 people in Tameside and Glossop received their second dose before the change in policy.

Dr Alan Dow, who works at in Cottage Lane Surgery in Glossop told a meeting of the primary care committee that some GPs feel they have been put in an ‘impossible position’ by the new national mandate.

“The breaking of the consent for the three week promise was a big thing,” he said.

“I personally told people that face to face as I vaccinated them on the day and then suddenly we can’t do it. That’s a big thing in general practice actually, breaking consent with our patients.

“It is close to moral injury, if not actually. I already think we’re out on a limb in terms of evidence.”

Moral injury is defined as the profound psychological distress which results from actions, or the lack of them, which violate one’s moral or ethical code.

Dr Dow added that he could understand the position of the medical chiefs to delay the second dose, but did not know how much that took into account ‘breaking consent which has been the model of personal care in the NHS for 70 plus years’.

The Pfizer vaccine has only been validated for its original 2 dose protocol (with that specific timing). We do not know how a 4x increase in the spacing affects the protection afforded by the vaccine. Perhaps you end up with the same protection (≈90%). Perhaps it’s a bit better. Perhaps it’s a lot worse! (There’s evidence of some protection from a single dose, but we probably only have data from three weeks after.) Lots of things could happen. We haven’t studied it so we don’t know. We don’t, as far as I know, have a great theoretical model that would help us make an educated guess.

Thus, a bunch of people are being subjected to an experimental treatment. It’s likely to be safe (i.e., the risks from the treatment itself are low…thought that’s more theoretical; mRNA clears in a few weeks; exposure to vaccine levels of spike protein don’t trigger cytokine storms; thus, from a safety perspective, get as many shots as you like). But we don’t know how effective it is and we won’t know until we see how many people in the new arm of the experiment contract severe COVID.

This would be fine…if the people involved gave informed consent in advance. They, of course, did not give such consent. The GPs told their patients they were going to get a certain treatment and then had to tell them they were going to get an untested alternative instead.

Ordinarily, this is no dilemma. It’s straight up wrong. Oh sure, if the second doses were destroyed or contaminated, etc., it would be unfortunate but not a wrongdoing in the simple scenarios. If only some were, then we’d have an allocation problem. Of course, assuming stocks can be replenished we could always have an interrupted treatment and then do the full treatment when possible.

But there is a dilemma! The UK is in the midst of a very bad outbreak with the whole country experiencing high rates of infection. The whole healthcare system is near the breaking point.

Things will get worse before getting better due to the time lag between exposure and manifestation of illness as well as the lack of transmission control and asymptomatic spreaders. Some people who will get sick are already infected. Some people who will get sick in the current trajectory are not yet infected. If we can get some of the second group at least one dose of the vaccine it could perhaps half the number of that second group getting hospitalised for COVID.

From a bioethics instruction perspective this is a pedogogically well structure dilemma. In first framing, you have deontological considerations (lack of consent) in tension with utilitarian ones (saving lives). If you poke at little further you have consequentialist issues buttressing the deontological ones (some of the people who relied on the normal course might get seriously ill; the “moral injury” is also a factor). There’s a lot of potential second order effects.

I wish it were a hypothetical instead of this grim reality.

I don’t envy any of the decision makers.

Ideally, everyone would eventually get the normal two dose protocol (for a total of three doses). It’d be best if everyone who’s received a dose got a choice whether to finish their treatment or postpone it, but it would depend a bit on the numbers. I think doing a one does, then delayed two dose is better than an off label 2 dose (unless we get excellent results from wide spaced 2 dose trials).

The damage to doctor/patient relations needs addressing as well. The government so routinely squanders everything, it’s hard to know the marginal effect of this overall, but there are directly relations that need restitution as well.

Update: Hmm:

For the Pfizer vaccine, the impact of stretching out the two doses hasn’t been tested in clinical trials. Pfizer cautioned that its trial only investigated giving two doses 21 days apart – far less than 12 weeks. But evidence increasingly suggests that spacing out doses of the AstraZeneca/Oxford vaccine may be more effective at protecting people.

The main risk is that people’s level of immunity falls before they receive their second dose, putting them at risk of Covid-19 – although this risk would still be lower than if they’d received no vaccine, and would be boosted when they eventually received their second shot.

However, a consensus statement by the British Society for Immunologists said that delaying the booster dose by eight or nine weeks was unlikely make much difference in the longer term.

Hmmmmm (following the AZ link):

Evidence now suggests that spacing out doses of the AstraZeneca/Oxford vaccine may be more effective at protecting people. Clinical trials revealed the efficacy of the vaccine was substantially higher, at 90%, in a subgroup of people who received half a dose followed by a full dose, rather than two full doses, which had an efficacy of 62%.

But Prof Wei Shen Lim, the chair of the Covid-19 immunisation group of the JCVI, told MPs further analysis by AstraZeneca showed the improved protection came from spacing out the doses.

“People who had the half dose then full dose were those who were vaccinated at a longer time interval, roughly six to 12 weeks, and what they’ve seen in their data is that people who have the second dose later probably have a three times higher antibody level than those who were vaccinated earlier. So if anything, it suggests that increasing the dose interval is beneficial,” he said.

Sir Mene Pangalos, the executive vice-president of biopharmaceuticals research and development at AstraZeneca, told the committee the first vaccine shot was more protective over time.

“What we’re seeing with our data so far is that as you go to the eight- to 12-week interval, you actually increase vaccine efficacy. People are protected enough with the first dose, to around 70%, but we see that within that eight- to 12-week interval is actually the sweet spot,” he said.

Again, I’ve not delved, but this reporting is worrisome. It is suggestive of motivated reasoning.

The brutal fact is that AstraZeneca and Oxford botched their trial. They didn’t do what they said they would do by accident. Maybe this will be a happy accident, but now you have, “Oh it’s the half dose, no, Oh it’s the spacing, no, oh, maybe it’s not the booster”. That’s not good!

It doesn’t mean that they are wrong but it’s all rather untrustworthy.


I’m sitting in Yet Another Meeting. Everyone in the meeting is being lovey. There are loads of interesting bits of discussion and info.

But it’s a lot. Just this meeting is a lot. There’s a lot of info and translating it into action is a lot of work.

This is a monthly meeting. It’s a union meeting. I’m doing other union meetings.

Then there are all the work meetings.

It’s a lot. Many people don’t thrive on this stuff. (I don’t.)

Our faculty reorganization added more levels of structure. Which means, nominally, I should be attending department forum meetings and school board meetings (instead of just the CS school board meetings). Plus there are faculty meetings I could attend. And Dept internal meetings. Plus all the other work meetings.

It’s too much. People tune out. There’s a bit of “oh let’s divide work up” but without “now we have more things to figure out and coordinate”).

A fair bit of bureaucratic complexity is inevitable esp for large and complex organisations. But if we want people to tune in we have to make that complexity manageable. Which often means reducing it or at least reducing the significance of a lot it. It also means clearly indicating priorities and making sure that they are flexibly attuned to people’s individual priorities.

Starting to Analyse UCU NEC Minutes

I’m running for the UCU National Execuative Committe (NEC) for Higher Education.

I, and many of my union friends in the Dept of Computer Science of the University of Manchester, were surprised, confused, and distresed by the round of strikes in 2020 (second round in 2019-2020). They blindsided us.

We (mostly) showed up because of some residual trust we had in UCU. But then came the levy, which broke me a bit. It made no sense and seemed to be a governance problem. And I couldn’t figure out how and why these things were decided and how my voice, as a member, would be able to influence these decisions.

In addition to running, I’m trying to tease out what’s going on from public records. It’s hard! I have not yet found a page listing the minutes (and people tell me that the minutes strongly undercapture what happens, even in terms of motions). I did find some recent minutes via the search box:

These were all I could find last week. They aren’t super recent (March and May 2020). They aren’t from the critical to my biggest interests period. (One problem is that there are lots of minutes for many different things—congresses, committes, etc. and “NEC” doesn’t filter hard. Alas, I forgot my earlier magic search string.)

In any case, these are recent, and I expect fairly typical.

FIRST BIG THING UCU!!! Can we please have a well organsied subsite that lists all the meetings and their minutes in a nice browseable and appropriately searchable way?
SECOND BIG THING UCU!!! This seems to be a long time to have no even draft minutes.

I did a preliminary extraction of the motion and votes from these two minutes and put them in a Google Spreadsheet:

My structure is rather coarse grained and motions get complex quickly. E.g., a motion might have successful and unsuccessful amendements. It may or may not be seconded. We might not know who the second is. Most of these votes have some sort of consensus but here’s the set of passing statuses:

  • AGREED 53-0 with 1 abstention
  • AGREED nem con.

(The plain “AGREED” perplexes me!)

Let’s consider a motion from Mark Pendelton:

Defending Trans Members and Students
NEC notes:
The government’s announcement of three principles guiding responses to Gender Recognition Act reform – ‘protecting’ single sex spaces; making sure that transgender adults can live lives without fear of persecution; and ‘protecting’ young people from ‘irreversible’ decisions
This union’s longstanding commitment to inclusion, and the extension of trans and nonbinary people’s rights, including revision of the GRA
NEC believes:
That these principles are based on factual inaccuracies and place trans people’s health and wellbeing at risk, particularly young people
That they also pose grave risks to Gillick competence, which allows young people some capacity to consent to medical treatment, including abortion and contraception
NEC resolves:
To publicly oppose further delay to GRA reform;
To publicly condemn the attempted erosion of the rights of children and young people; To provide branches with updated guidance on how to support trans and nonbinary members and students and campaign against these proposals.

I’m not sure the difference between noting and believing, but ok. The resolutions sound like the are actionable…but I’m confused as to who performs them and how they are tracked. It’s hard to see how I, a member, will know whether this motion resulted in any specific action by the union, much less whether than action was effective. I like Mark. I like the motion (a lot!). I agree strongly with the motion. But I don’t know if it was an effective motion.

Now, not all motions need to be “effective”. Sometimes we don’t win. Sometimes we take a symbolic stand. Sometimes we need to show our values and communicate with those we share them with, including our members.

Still, it’d be nice to know what the NEC things should happen in order to understand what sorts of motions we push going forward.

Anyhoo, this was far too much work to find out a set of motions and the information about them doesn’t seem remotely complete. Most people won’t make the effort (I never have!).

Esp when we come to elections, it’d be great to have easy access to candiate track records and clear fodder for competing candiate to discuss. If a candidate wouldn’t support some motion, that’s might be a reason to vote for or against them!

Tracing things through the complex union structure is daunting, but I think some structuring and simple linking could help a lot.

Oh 2021

Thus far, not looking great.

Thus I shall recommence with the blogging!

Robert Farley pointed out that the Monkee’s song, “Daydream believer” is actually a dark piece about a dissolving, shallow, materialistic marriage:

It’s not clear from the lyrics:

Oh, I could hide ‘neath the wings
Of the bluebird as she sings
The six o’clock alarm would never ring
But it rings, and I rise
Wipe the sleep out of my eyes
My shavin’ razor’s cold and it stings

Cheer up, sleepy Jean
Oh, what can it mean
To a daydream believer
And a homecoming queen

You once thought of me
As a white knight on his steed
Now, you know how happy I can be
Whoa, and our good times start and end
Without dollar one to spend
But how much, baby, do we really need

(OMG…the code block editor is unmitigated garbage…JUST LET ME HAVE A BLOCKQUOTE!!!!)

I mean:

Stewart was beginning to explore a more personal side to his writing when the Kingston Trio disbanded in 1967. In a 2006 interview, he recalled that he envisioned “Daydream Believer” as “part of a suburbia trilogy” that focused on the growing distance in a couple’s marriage. Its comparatively serious subject matter and its setting echoed “Pleasant Valley Sunday.”

The song’s oblique lyrics focused on the endgame of a comfy but increasingly distant relationship. The narrator is caught in mid-gaze before the bathroom mirror, reflecting on the quiet dissolution of his materialistic marriage – a union between “a daydream believer and a homecoming queen,” now curdled and (as originally written) “funky,” driven more by money than by romance.

It was surprisingly mature subject matter for ‘60s pop consumption, but the tune’s blissfully melodic, irresistible chorus screamed “major hit potential,” and overrode the vaguely sketched darkness at its narrative

Oblique is fucking right! It needs a third verse at least. The last verse line suggests a non materialistic relationship or hopes of one!

Perhaps something like:

It turns out we need lots
More than each of us will give
And it is the money not our love
No, there’s more to life and love
Than the emptiness of us
A facade cannot hide wood that rots

Ok, that’s just overt!

I listen to a song
That was ours and was all wrong
Promises we broke before they were even made
The razor’s work is done
Off to ride the train again
And to dream of bluebirds singing all day long

Then the last chorus could shift to:

Good bye, sleepy Jean
Don’t know what it’ll mean
To a daydream believer
And a homecoming queen

I think there still loads of ambiguity, but it’s not utterly obscure.

(Oh god WordPress…why why why why why why why why?!??!)

When Supposedly Smart People…

Sometimes I do something silly.

I god some fermenting gear. I’d wanted to make ginger beer for ages. But you really need appropriately strong bottles for the carbonation phase (the “secondary fermentation”). But I got some! But then I had no ginger.


But hey I had some strawberries and rhubarb and found a recipe for strawberry-rhubarb fermented soda and why not?

The primary fermentation went pretty well. It was easy. I ran it about 3 days and then decanted it into my flip top secondary fermentation bottle. There were bubbles!

I left it 24 hrs and put it in the fridge this morning. All is fine.

I was going to make baps but I wanted to try the soda. I wasn’t superconvinced by the bubbles. I had the bright idea of videoing the uncorking.

You can see the result.

So yeah. I got that on film.

Feel free to laugh. Zoe did when she saw the video. I haven’t seen her laugh so hard or so helplessly for a while so really, I did this for her. That’s the story and we’re sticking with it.

There wasn’t much soda left.

But it was delicious.

And I’ve learned my lesson: don’t make food videos!

Been a While

And not for lack of stuff to write about! I’ve still been v tired (exercise/going out levels have been v low) and I had grading (bleah) and eye troubles (triple bleah…I’m going to need to massively adjust my setups; dark mode for me).

BUT, my promotion case to Professor was successful so as of Aug 1, 2020, I will be a Professor of Computer Science. I never again have to think about a promotion! I won’t get the customary pay rise (due to COVID belt tightening).

There are still things I can apply for (pay rises in the future where they exist again, fellowships, a higher doctorate, etc.) but…they are all nice to have, not need to have. For some reason, making it to Professor was need to have for me. At least, in the sense that it felt obligatory to go for. Each year that I failed to make my promotion case felt like a real failure and cost me. I hope I’m now free of all that.

(Part was due to the weird detour my career took and feeling “behind”. Part was the unfortunate instillation of the feeling that if you didn’t make “full professor”/Professor you somehow didn’t have a proper career. <– This is super toxic! In any case, I feel relief.)

There’s a bunch of other cools stuff including that we have a first draft of a feature complete Imperial Model…which desperately needs optimising just to ru with 50,000 agents. The ball is back in my court and I’ve already had some interesting results which I’ll write up (I hope) shortly (I hope). I’m feeling pretty good about our chances of doing a full UK run by the end of the week.

(And I’m feeling SO GOOD about my adhering to the “measure first” philosophy. It’s been winning hard for me right now and we’ve had some nice object lessons for some students.)

It’s Obvious and Everyone Knows It

Dominic Cummings should resign.

We all know he broke the lockdown rules and rather dangerously. It’s immediate. This isn’t arguable or questionable.

The law has been more permissive than all the guidance (and this has been a problem), but there’s no question at all that he violated the guidance and that’s more that sufficient to require his resignation. And really, he pretty clearly violated the law. The only excuse that even remotely is relevant:

(j)in relation to children who do not live in the same household as their parents, or one of their parents, to continue existing arrangements for access to, and contact between, parents and children, and for the purposes of this paragraph, “parent” includes a person who is not a parent of the child, but who has parental responsibility for, or who has care of, the child;

isn’t appropriate as none of his extended family have parental responsibility or care of their child.

The pathetic orchestrated lies are particularly disgusting, of course. It is utterly bizarre to me. I hate Cummings but it’s not like I thought Ferguson should get a pass. I do not understand why all these politicians would go down this path of repugnant wagon circling.

Obviously, I would be happy for this inept government to be replaced but…in the middle of a crises it’d be better if they just got more apt!

Quick Hit on Replication

“Quick”…hopefully quicker than the last one as I’m fading fast.

I can tell you right now that there are parts of the Imperial model we’re just going to borrow, not rederive…at least for version 1. Maybe version all unless someone new steps up.

Why? Let’s consider a simple example: The formula for workplace size distribution. My current plan is just to use their reported equation instead of tracking down the source data (or comparable source data) and applying the same fitting and modelling that they did.

Why not? First, there has to be limits. I’m not going to try to rederive the Landscan data either. We only have so much time and expertise. My team’s current focus and expertise is on agent based epidemiological model engineering. We don’t have a ton of time either.

Second, it’s not clear that this will make a difference. Since we have the equation, we can use it and see if everything works out. We can easily try more ad hoc models to do some sensitivity testing. If I were going to go further, I might chase data on real workplaces.

Third, extending the first reason, there’s so much more to explore. We see lots of things in the Imperial model that we find a bit uncongenial and workplace size distributions ain’t one of them.

That being said, we will extract as much detail as we can and put it in a nice clear place so someone who is interesting in doing such work can easily find what’s needed and how to plug it into code.

A replication can serve many purposes. For us we want to:

  1. understand the Imperial model
  2. raise confidence in the results (or identify problems in a clear and rectifiable way)
  3. build out our epidemiological framework and expertise; in particular, we really want to establish a best of class infrastructure where reuse of models and parts of models is easy; unlike Ferguson et al, we’re less concerned with individual results or experiments and more with enabling such experiments
  4. learn more about the replicability of such models and build some tools for facilitating it

I certainly hope once we get this replication stood up that epidemiologists will want to use it and build on it in a variety of ways.

I’m not as concerned about “speed to market” though that’s an issue. At the moment we’re backward looking. We’re trying to understand a published and highly consequential model. One thing that will up the need for speed is if people and policy makers start rejecting epidemiological advice based on Ferguson’s personal issues and thus far rather silly complaints about the released code. I’d be very very surprised if our replication revealed any shenanigan’s about the results. This is, in part, because the Imperial results or, rather, the policy that was triggered by the results, are pretty straightforwardly correct and justifiable on a wide range of model outcomes.Simply put, you don’t need a complex model to know that lockdowns were necessary and came too late.

However, if people use facts about Ferguson to try to open too soon, that’d be a problem and a successful replication could help there.

So there’s some policy responsibility here. But again, it’s best served by us doing a careful job.

It’d be great to have a few more experienced Python programmers (it’s really me and one other with a bunch of CS students getting up to speed) and some Cython mentorship would be welcome. Any epidemiologist who wants to help with the extraction and validation is more than welcome.

(He wrote on a blog that is, to a first approximation, read by no one.)

Ok, I just spent a few hours I could have spent working on Cython or extraction or grading or…sleeping!

Imperial Model Quick Update

It’s just been over a week since I blogged last and a lot has been going on, not just in this replication effort. Zoe had a remote, prerecorded Zoom gig (which I’ve been helping out with) and tons of other things. Still haven’t heard about the promotion (though I think the latest they could tell me would be the end of June). I had some exercise energy early in the weak but that’s waned.

Oh, and I’ve been fighting with two things: measuring Python memory consumption and wrestling with Cython. There’s lots of good stuff to write about from those efforts…but it’s another thing and a tiring one. I need a smoother format than WordPress…maybe some Markdowny thingy.

First bit, other team members have made good progress on population density. We can load the appropriate Landscape and a rasterisation derived from ONS local authority based census data. So, getting closer.

Second bit, whoof, it’s super hard to measure the “deep” object memory weight of a Python object. I’ve tried 3 installed things, some manual stuff, and it’s not only not consistent between techniques, even within techniques, things are weird. Really weird.

I quickly found that our strings are almost certainly interned, so no big wins there. I tried the bit twiddling consolidation of booleans and saw memory consumption shrink at 10x the access time and a ton of ugly code. (Would want macros.) I quickly decided that the only productive way forward was Cython. Since a lot of our attribute values are essentially primitive, we could hope that they’d be captured as thin C types and popped in the struct Cython uses to represent extension object instance variables.That also would open the way for sub-word width representation with C speed bit operations. I found this cool feature of Cython where you can leave your Python file untouched and “just” add Cython declarations in a separate file…this seems perfect!

Well, “perfect” heh. Let’s say that I haven’t quite got it all working, esp with inheritance. (It’s a messy story for another day.) But I got it working well enough to see significant benefit in memory consumption. Here’s a guppy “relative” heap dump:

Partition of a set of 9000002 objects. Total size = 980889350 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
    0 1000000  11 408000000  42 408000000  42 dict of model.AgentsPlain.Person
    1 1000000  11 240000000  24 648000000  66 dict (no owner)
    2 1000000  11 104000000  11 752000000  77 list
    3 3000001  33 84000028   9 836000028  85 int
    4 1000000  11 64888890   7 900888918  92 str
    5 1000000  11 56000000   6 956888918  98 model.AgentsPlain.Person
    6 1000000  11 24000000   2 980888918 100 float
    7      1   0      432   0 980889350 100 types.FrameType

Here I just created 1 million Person objects. (Remember for the UK we need 60 million or so.) To a first approximation, the total size is probably about right. So 1GB for a million Persons. Index 5 gives us the “base” overhead of an object, about 56bytes. Index 0 is the dictionary which holds all they instance variables, which weighs in at around 400bytes. Anything that has a count of 1,000,000 is probably part of our Person object. (I created 1 distinct string per object, just to see. This makes the total size a bit of an overstatement but not as much as you might thing as schools, workplaces, and social centres we each one string and there would normally be more. It’s only 64byes for a total of 0.06GB/million or 3.6GB for 60 million. It’s currently a rounding error.)

After a naive Cython conversion, we see:

Partition of a set of 4000002 objects. Total size = 576889350 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
    0 1000000  25 240000000  42 240000000  42 dict (no owner)
    1 1000000  25 168000000  29 408000000  71 Agents.Person
    2 1000000  25 104000000  18 512000000  89 list
    3 1000000  25 64888890  11 576888890 100 str
    4      1   0      432   0 576889322 100 types.FrameType
    5      1   0       28   0 576889350 100 int


Only 60% or so the memory needed! That was totally worth the Cython learning curve.

Index 0 can be driven out just by a simple Python level refactor (that makes things a lot nicer) and perhaps the lists can too. If both of those work out (the dict is 100%, the list less so) then we get to around 0.16GB per million agents or 9.6GB for the UK population. This is with almost no changes to the substantive Python code. It looks the same. It runs the same. You can run it with our without the Cython version.

This is still about twice our overall model budget (if we’re going to run on certain sections of Manchester’s Condor setup) and we have some other stuff to represent. Not a lot, though and a lot of them are covered by our too many strs. 4GBs is a tight tight budget. 10GB for a model is low enough to run on my development laptop.

It does mean that the teams efforts at tessellating the map so we can work on parts isn’t so critical. I think it still can help in a variety of ways and for places like the US it will be critical. But assuming I’ve not messed up the measurement, we might be ok.

I do think it’s worth pressing on a bit with layout and alignment. I’m pretty sure all the fields of the Cython struct are 64bit and…we don’t need that for almost anything except population and even there, 32 bits gets us 4 billion. Unless we want to literally run 1 model with the entire world population, we have a lot of wasted space.

Just to hint, here’s the Cython declaration:

def class Person:
   cdef public bint is_infected, acquired_infection, dead, is_immune, timetable_may_change, is_at_hospital
   cdef public str  social_center, workplace, school, location, symptoms
   cdef public int id, hours_since_infection, incubation_length, infection_length, timetable_group
   cdef public float age, p_infection, p_death
   cdef public tuple  timetable
   cdef public dict hospital
   cdef public int household

(That household declaration is a bit of an experiment.)

Here’s the corresponding struct:

struct __pyx_obj_6Agents_Person {
 int is_infected;
 int acquired_infection;
 int dead;
 int is_immune;
 int timetable_may_change;
 int is_at_hospital;
 PyObject *social_center;
 PyObject *workplace;
 PyObject *school;
 PyObject *location;
 PyObject *symptoms;
 int id;
 int hours_since_infection;
 int incubation_length;
 int infection_length;
 int timetable_group;
 float age;
 float p_infection;
 float p_death;
 PyObject *timetable;
 PyObject *hospital;
 int household;


So, it’s pretty clear that all the PyObjects except timetable and hospital are dispensable. social_center et al are just ids and could be ints with almost no loss of code clarity. You’d have to add a bit of pretty printing code but that’s it. The names are all generated by interpolating an incremented integer into a name like ‘social_center’.

Most of the ints have very short ranges, even beyond the booleans. We should be able to pack them quite tightly. I’m not sure if alignment is even worth all that much given how much memory we’d drop. I’m no C programmer, even at the novice level. So I’m going to have to research a bit on bit twiddling vs. packed and unaligned struct access.

Back of the envelope suggests that we could comfortably get down to 32 bytes a Person which would be around 2GB for all of the UK.

There is a super modest speed win at the moment at least for tests comfortably in memory. Just compiling Person doesn’t gain all that much and I suspect there’s not huge gains from Cythonising Person alone. But there are places where we iterate over the whole population doing simple stat gathering. That seems like an easy win if we can push the whole thing into C.

Ok, this went on a bit!

I have to get back to the spec extraction. I’m trying to keep the extraction ahead of the team and they are catching up!

But all in all, I’m feeling pretty confident that it will work out ok. I’m going to push, at our Monday meeting, to open the repo, though it won’t be “nice” yet. (I mean, it’s pretty good right now, but the master branch is the ad hoc model we threw together. There’s some but nowhere enough documentation. Etc.) It just bugs me. But maybe opening it early will bug others! I don’t think we can absorb a lot of contributions at the moment (though I don’t see people champing at the bit…I’ve gotten no feedback or help on the extraction.)

I have thoughts there, but that’s another post.