By now, you’ve heard that Canada’s long form “mandatory” census is on its way back. It was replaced in the 2011 census year with the voluntary National Household Survey (NHS), with Prime Minister Stephen Harper arguing the importance of allowing citizens to opt out of intrusive government data collection. This decision was roundly criticized by scientists and researchers at the time, and perhaps surprisingly, this relatively esoteric issue of statistics and data accuracy became something of a campaign issue in the lead-up to this year’s election.
But beyond the bluster and politics of this issue (and, to be sure, there was a lot of politics around the census), here’s a quick primer on why the return of the long form census matters.
It costs less for the same amount of data
Some on the Conservative side were surprised to learn that the voluntary NHS cost $22M more than the long form census. How can that be? StatsCan typically sends out the long form census to about 1 in 5 households, and in 2006 achieved a response rate of 93.5%. In anticipation of a lower response rate to the NHS, StatsCan was forced to send out the survey to substantially more households — about 1 in 3, or 4.5 million total households. That costs a lot more money. In fact, more Canadian households responded to the NHS than initially anticipated, but not by much, as the rate still came in at a modest 68.6%.
Could StatsCan have saved that $22M by sending out the NHS to fewer households? Sure. But as we describe below, the data quality from the NHS was already bad, so it would only get commensurately worse.
A huge chunk of the country doesn’t even exist, according to the NHS
The census aggregates data according to a hierarchical level of regions, starting at the country and provincial levels. The country is then further divided up into so-called census divisions (which correspond to large cities, regions or counties, which we use in our language map, for example), even smaller census subdivisions (corresponding to municipalities), and sometimes even further down into census tracts (which may have just a few thousand residents).
As Maclean’s pointed out earlier this year, the small town (and census subdivision) of Melville, Saskatchewan has, for all intents and purposes, ceased to exist as a part of Canada. We know almost nothing about it, aside from a guess as to how many residents call it home. If a census subdivision had a response rate less than 50% for the NHS, StatsCan simply did not publish data about it, because the data was essentially meaningless. As a result, about a quarter of Canada’s 4,556 census subdivisions don’t exist in the way Melville doesn’t exist. Manitoba was hit especially hard by this 50% threshold; about a third of its rural communities are now dead zones on the map. No understanding of the languages spoken there, no insight into the aboriginal or immigrant population, no sense of the way folks get around, how or where they work, and how old they are.
As a side note, this is why our language map of Canada uses such a coarse division of the country. Creating a map with finer detail — say, dividing up cities like Toronto or Montreal into slightly less massive chunks of humanity — would have required using census subdivisions, which are simply unreliable when looking across the country.
Even where we have enough data, it’s probably not reliable
The statistical reason why it matters that a census is mandatory rather than voluntary is called sample bias. When the data you get from a survey, or poll, is somehow not representative of the population as a whole, then your conclusions are simply not accurate. What’s amazing in statistics is that you don’t actually need to ask everyone in the country to get an accurate picture — you can ask quite a small percentage, in fact — but you do need to ask a representative sample.
This is a big problem. Western Canada has been plagued by a spate of polling disasters in several recent elections due to sample bias, while the results of many recent high-profile academic studies have been called into question by this same issue.
The voluntary NHS is no different, but at least we knew beforehand that there would be a problem. StatsCan knows that certain groups are more or less likely to fill out a voluntary survey than others. Their own estimates suggests that the Filipino-Canadian population, for example, has been overcounted by a whopping 7.3%, while the black-Canadian population has been undercounted by more than 3%. Our immigrant map — which shows an impressive but surprising wave of Filipino-born Canadians in the Prairies and the territories — is perhaps just one victim of data inaccuracy.
We have enough trouble getting people to respond to the “mandatory” form
The mandatory form had a response rate of 93.5% in 2006, which is high, but not as high as the short form census, which was answered by 96.5% of all Canadian households that year. That’s a three percentage point gap of households who presumably found the long form questionnaire so intrusive as to risk potential consequences. Indeed, there has been a nominal penalty of $500 or three months in prison for not filling it out, but messaging on this from the new Liberal government as to whether the penalty may apply next year has been ambiguous. Moreover, a decade of opposition to the census from up high has girded many who may otherwise pause when filling out such a form to put up even more resistance. Expect low response rates to continue next year — if rates dip below 90%, we may not be out of the woods when it comes to poor data quality for another census cycle or two.