/ Data Science, Design

Value of good questions – lies in numbers

In several previous pieces (e.g. data-driven design tools, data-driven service design, and what is data-driven design) I’ve called for designers, and their organizations, to challenge themselves and increase data-driven decision making. This short story emphasizes the importance of carefully defining the questions you try to answer before even thinking of any statistical analysis of quantitative data. I use a case study of OK Cupid dating site and men’s partner age preferences to illustrate the interpretation issue.

Numbers can mislead

Using more data does not guarantee success. Interpretation of numeric data is tricky. This has been known for ages. There are at least common caveats with data:

  1. Having wrong or insufficient data
  2. Making wrong interpretations of data
  3. Torturing data so it is transformed to match our wishes

Similar data can be used to argue for very different ends. In a rear-mirror view, things look much more clear and stable – up to the point when something radical disruption happens! This is something that past data alone can never fully account for. This is also most inconvenient if you’re trying to innovate on data, as I’ve highlighted before.  Given that people are naturally resistant to change, defending any major change from this basis is not going be easy.

All of these problems are real in data-driven design. Any readily available data is subject to these issues. Traditional design research which relies on qualitative insights is naturally very much subject to same issues. But with qualitative research, everyone dealing with is usually well aware of the subjectivity and the risk of misinterpretations.

Lies, damned lies, and statistics

Overall I like to emphasize that whenever you deal with quantitative data and “insights” or “answers” it is even more critical to define what was your initial question.

Let’s have a look at an interpretation problem that surfaces when have several competing otherwise decently prepared numerical results.

Questions matter

Some might say that the numbers are always true, but your question might be wrong. A bit like the legendary solution ‘42’ by Douglas Adams.

To facilitate data interpretation, it is essential to formulate your questions as precisely as possible. If you don’t do this, you risk falling a victim of deliberate choice of data or purposeful interpretation.

Let me highlight this question with an example from the wealth of (aggregate and anonymized) user data presented in OKCupid blog. OKCupid is a major online dating service operating around the world. One of its founders, Christian Rudder has for years written a blog about users behave, as well as a great book Dataclysm which illustrates the interesting world of digital matchmaking.

Let me take a question of what is the relative age women men are into in online dating environment. You can understand what I’m thinking? Great, let’s get the answer.

Three relevant bits of data can be found on the topic.

Who do men search for?

 

Who do men message?

 

How attractive are the women they browse?

 

Which one of these is the truth? Well… Where did I go wrong? With the question, naturally.

Lesson is, don’t let yourself be fooled, but be careful on formulating the question you wish to answer with quantitative data. Typically, you have to include some segmentation criterion (e.g. European men between 30-45) and specific operationalization (interest as in preference of messaging) of what you wish to measure.

Recap

Digest numbers with care. Defining the right questions is a precious start for finding any valuable answer.