costs., Sanitation

Recall bias and cost data

I’ve been working on costing a few programmes recently where the intervention happened between 3-10 years ago. Both used household surveys asking people what they spent (in cash and in kind) towards the original infrastructure output (CapEx), towards regular operational and maintenance (OpEx) and irregular capital maintenance (CapManEx). It’s got me thinking about the various recall bias issues involved.

Look at the graph below, which is completely hypothetical. Let’s assume it’s for a sanitation intervention amongst 1,000 households which happened in 2008. The y-axis is some measure of ‘data quality’ when you ask 1,000 households about expenditure. If you asked in 2008 how much they spent (cash and kind) to construct a toilet (CapEx – blue line), they’ll probably still have a good idea because it was very recent. However, as time goes by, they’ll forget exactly what they spent, so the blue line drops. Data quality will drop fast at first, but then plateau after a while because people are likely to remember the order of magnitude of  what they paid.

recall bias

For OpEx (orange line), which is by definition recurrent expenditure which occurs with a regularity of one year or less, it’s a different problem. If you ask people in 2009 about their OpEx, they won’t have that good an idea because they only have 1 year’s experience of using the toilet. Maybe they’ve cleaned it regularly but not spent much more money so far. Over time, they build up more experience of how much they tend to spend on OpEx and data quality becomes good.

CapManEx (grey line) is the hardest. It is recurrent expenditure occurring less than once a year (e.g. pit or tank emptying costs). So any time you ask people about it they’re less likely to have experienced it recently than with OpEx. Stuff normally works well when it’s new, so people are unlikely to experience CapManEx for a fair few years. With a toilet, for example, you’re only likely to need to empty the pit or septic tank 3-8 years after installation depending on numerous factors. So you only start getting likelihood of good data on CapManEx many years after the intervention, but even then it’s never going to be brilliant because many people may not have incurred it recently, and you have the same recall bias issues.

So when should you do your cost data collection? It depends on your objectives. If you’re most interested in CapEx, then do it ASAP. But if the lifecycle element is more important to your study, then it’s probably best to wait at least 3 years, maybe even 6-8 years if it’s the CapManEx you’re most interested in. You can always impute the CapEx from other sources or ask a different sample of people who constructed more recently. Of course the best case would be to get regular data with the same panel of households at intervals, but who is going to fund that…

2 thoughts on “Recall bias and cost data”

  1. Not sure I entirely agree with OpEx, as you have another bias towards mor recent expenditure rather than a beautiful average forming in your head. I mean, I know how much my water costs now, but honestly can’t tremember for 3 years ago in a different home without looking at the bills again. So your only way may be to ask regularly rather than assume that OpEx response quality is fine after X years?


    1. Good point. It depends on how the question is asked, and I appreciate I wasn’t very clear on that. There are two main ways, probably more:

      1. “typically, how much do you spend on X in a 30-day period”.
      – I suppose that’s what I’m implying in the post, i.e. someone’s knowledge of the typical experience would improve over time.
      – but here your point about bias towards recent experience is valid. so maybe the question should be: “in the past year, how much did you
      typically spend on X in a 30-day period”. in theory clearer, but then you’re giving people two different timeframes to think about at the same time which is quite a cognitive burden…

      2. “in the past 30 days, how much did you spend on X”.
      – under this design, it’s not quite a recall issue, but more an issue of whether the last 30 days is representative of the mean over the time horizon of the analysis (10yrs in this case).
      – i.e. if you asked about monthly OpEx in year 1, you would get an underestimate probably, because less is likely to be needed than in year 3+

      Asking regularly would always be better. but that costs money…


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s