*Caveat: These are working blog posts with a lot of “thinking aloud”. I’m a computer science academic studying the engineering of agent based models, not an epidemiologist. Critique and comment welcomed.*

Going on to the next paragraph of supplementary data for 5,

Terminology is always an issue 🙂 I’d tend to say that

**school generation**is the process of creating the right number and type of schools**school placement**or distribution is the process of assigning schools to specific locations**school allocation**is the association of a student with a school

these together form **school initialisation**.

In the Imperial work there are two basic approaches to generation and placement:

- use a database of existing schools and addresses
- generate a synthetic set of schools based on whole unit stats and placement with a probability proportionate to tile density

CALLOUT: When looking at country level statistics, I doubt that there’s *that* much difference between the approaches, esp if you do multiple initialisations and average it out. If we want to assess risk from a school perspective (in general do more local stuff) then actual schools are helpful, but how helpful without real attendance figures is unclear to me. Might be worth some experiments.

CALLOUT: To be clear, let me give an example. If I am a Greater Manchester official, I might want to use a simulation to know which schools are higher risk for spreading the disease to riskier populations. This will depend on the particular age/health structure of the populations interacting with the school populations. Let’s assume we won’t have real enrolment data and certainly not family data. So we have to generate the population. So…does it matter that we also generate the schools? Might we just case match schools in our community with schools in our model to get a reasonable estimate? Probably! I would take a profile of a school (size, surrounding population density, etc.) and extract all relevantly similar artificial schools in my model and do some averaging and exploration of that set. That might give me better insight than looking at a location matched school in my model.

Since we have a synthetic population, we need some function to assign students (and teachers!) to schools (whether the schools are synthetic or real) (bopping back to the supplementary data for 5):

These methods allocate children to schools using a free selection algorithm; each child ‘picks’ a school at random from the nearest 3 (for primary) or 6 (for secondary) schools of the appropriate type, with a probability weighted by the distance kernel derived from UK census data on commuting behaviour (see below). Restricting school choice to the nearest 3 or 6 schools avoids unrealistic long tails to the distribution of distances travelled to school, albeit at the cost of slightly underestimating mean distances travelled to school by approximately 30% 8 . Staff student ratios were used to determine the proportion of adults to schools rather than other workplaces.

There’s a lot of detail to pick out.

For each child, we need their student type (primary or secondary) presumably determined by their age (I guess we can assume faster or slower progressors balance out and we can use strict age bands?). They we need their nearest 3 or 6 appropriate schools with the distances. AND then we need to select one using a distance weighted probability.

The distance distribution function is…this?

The parameters and functional form of the choice kernel used in the worker-to-workplace assignment algorithm were then adjusted until the model fit the empirical distributions well. In the case of GB a 2 parameter offset-power law kernel fit well: f(d)~1/[1+(d/a) b], where a=4 km and b=3. For the US, the change in slope of the tail of the distribution (on a log scale) meant that a 4 parameter function was needed: f(d)~1/[1+d/a] b +k/[1+d/a] c where a=35km, b=6.5, k=0.0004 and c=2.2. Figure SI4 illustrates how well the model fitted the data.

Ok ,we have a function `f(d)~1/[1+(d/a) b], where a=4 km and b=3`

. I think, certaintly for v1 of our replication, we’re not going to attempt to re-derive this distribution. Actually, I suspect a simpler choice model like equal weighting would be fine. Our goal here will be to make sure that our model uses a choice function (and indeed their choice function) and we can always rederive that if it seems important (or someone wants too!).

So the school allocation algorithm:

```
# This has age stratified people at each pixel!
# It also has any schools
pixels = load_current_map()
for p in pixels:
primary_students = p.get_people(primary_age_band)
secondary_students = p.get_people(secondary_age_band)
# We only have pixel to pixel resolution on distance.
# So all students at a pixel have the same "nearest" schools
# Note that we might well have equidistant schools so we might
# want to be a bit clever, e.g., pull the wider pool and take a subset
primary_schools = nearest_schools('primary', p, pixels, 3)
secondary_schools = nearest_schools_choices('secondary', p, pixels, 6)
for s in primary_students:
# Uses probability function!
s.school = choose_school_from(primary_schools)
for s in secondary_students:
s.school = choose_school_from(secondary_schools)
# Note this assignment should probably add each student to their
# school but that's a convenience implementation detail.
```