The field of data science deals with large amounts of website usage data, even if these are considered personal data by means of userIDs, working with such data is usually destined by anonymity and pseudonymity. No human faces, no actual names, or any further socio-demographic information. Machine learning algorithms are trained to detect behavioral patterns in historic data to make assumptions about future actions. From predicting subscriptions and churn to personalizing editorial content or product information, casting the appropriate logic into self-learning systems is bread and butter to most data scientist and for this purpose our craft primarily work with colleagues from Analytics and IT departments.
Photo by davide ragusa on Unsplash
Nevertheless, there are actual humans behind pseudonymized user IDs, whose behavior is determined by motives and attitudes. But these topics rarely become subject of discussion, at least in the field of data science. It is rather the concern of user research departments and therefore undertaken by experts on surveys and interviews, which are still hardly intertwined to the data science world, where we rarely talk about “qualitative data” and even less about the people from whom we get the data from. Everything is wished to be automated, preferably in large quantities. No scalability is not an option.
The team of Spiegel Research is set up in such way, that it includes colleagues from qualitative and quantitative user research as well as analytics, data science and testing. In addition to working on machine learning problems, audience research plays a significant role to our work, in which we starting to discover the synergies of having various research disciplines on board. The possibilities for blending methods are multifaceted and we are only just beginning to find promising combinations and suitable workflows to a more holistic view of the audience. Scientific papers¹ and industry use cases² helped us to get started and develop a vision for our work.
The “What-Why Framework,” highlighting some methodologies in each quadrant. Source: Simultaneous Triangulation: Mixing User Research & Data Science Methods
How we combine User Research & Data Science
To get started with mixed method research, one of our first projects involved a methodological combination of website usage and survey data to shed light on the behavior and needs of spiegel.de’s young readership. Based on this use case, the following paragraphs explain our steps towards a more integrated user research strategy. I will outline an approach that gave us deep insights into:
- the desires and attitudes of our young readers (through surveys)
- as well as their reading behavior (by combining their survey answers to their cookie-based reading profiles)
It worked as follows: We are conducting a survey on spiegel.de that is particularly addressed at young readers. Using this survey, we collect age and other socio-demographic factors as well as attitudes and desires towards spiegel.de. To take part in the survey, readers must agree to our data protection form, so we are later able to link their reading profiles with their survey answers. Following this approach, we had the opportunity to link usage behavior of several months to the age indication from the survey, along with any other question answered. A cookie-based reading profile includes a reader’s page and article views from her browser or app cookie. All data gets deleted 90 days after the survey.
The acceptance among our users were surprising to us with over 90% of our readers were willing to donate their reading history for this project. For now, the remaining 10% were unable to take part at the survey, but as soon as technically possible, we naturally want to give these participants the opportunity to also take part in our survey, to avoid any bias in the research results.
So, you wonder how this technically works? If a reader gets exposed to a survey on spiegel.de, she needs to confirm to donate her reading history via a data protection agreement. If she does so, her cookie-id — to which the reading behavior of a reader is mapped — automatically gets passed to our survey tool, where she answers the survey and her responses getting connected to her cookie-id. In this way we can later connect the reading behavior to the survey answers. By agreeing to the data protection form, we were allowed to access the reading profiles of the survey participants for the following 90 days. A total of almost 2.000 readers were willing to donate their reading profiles to our project. After we had sorted out incomplete or sparse reading profiles, we were left with around 800 rich reading profiles, most with several months of reading history. These 800 young readers alone made about 2 million page views on spiegel.de. To be able to evaluate the behavior not only within the young readership group, but also to have comparative values to our average readership, we have also created a group of another 800 reading profiles with a comparable structure in terms of usage intensity and device usage, so that only the age indication between the two groups differed. A few hundred reading profiles might not sound lot, but they hold deep information which wouldn’t be possible to gain on a large scale.
We used the resulting dataset to retrospectively evaluate which editorial topics and formats were particularly frequented by young readers. For example, we were able to determine which pieces of op-eds (columnists, guest writers and comments) or topics (political spiegel.de core topics or soft service journalism) are particularly valued by young readers. We also got interesting information about audio and video usage. Furthermore, any answers from the survey, like motives and thoughts around our subscription model, were used for deeper intersection and analysis.
Aside from analyzing usage data, we created hypotheses to be tested afterwards. Using our Adobe infrastructure to run A/B tests, we have validated whether formats that have worked particularly well with young readers in the past, also work under clean testing conditions in the present (live tested on our frontpage). For this purpose, we have targeted and intensified certain articles on the spiegel.de homepage to both the young target group and the comparison segment. To this process of A/B testing, the whole research process of finding young readers preferences became an ongoing cycle where we constantly evaluate their current needs.
Where are we (going)?
As initially stated, behind our project on young readers stands a larger question: Do traditional qualitative and quantitative user research methods and state-of-the-art data science approaches pursue the same goals and therefore must be thought together? In most media companies, we are seeing an increasing focus on data and algorithms. Ever more data points and complex models allow accurate predictions of user behavior. But analyzing and predicting usage data is only one approach to the audience. In my experience, the missing approaches to simple questions like young readers preferences draws from the team structures of web analysis departments at news websites, which are strongly oriented towards the pure analysis of website behavior data. Other teams also dealing with audience data, like user research, are often located in other departments and ideas is rarely thought together.
We have tried more combinations of research methods, involving even more qualitative aspects of user research, like integrating results from user interviews with surveys answers and usage patterns. I hope to share more about our journey to user research and the role of data science in near future.
Thank you for reading this article!
I hope you liked it, if so, just make it clap and follow me on Twitter.