Wednesday, May 10, 2017

Things unsaid at #mcb17 #rp17

Forty-five minutes went by pretty fast and many things have been left unsaid. So here is a collection of thoughts and responses to yesterday's panel at #mcb17 #rp17 and the event in general.

As I said at the end of the panel, I think we all have to learn to differentiate when we talk about "data". It seems that when "data" is said "user data that is used for advertising purposes" is meant. But optimising ads is only one minor part of what user data can be used for. Let me explain.


How does the advertising business work? Well, at the end it's solving a matching problem: to get "the right" ads to "the right" user. This is, in principle not bad at all, as both, advertiser and user, have a clear benefit from that: the advertiser makes good use of it's marketing bucks, the user (best case) gets product information that is actually of interest. So how do advertisers know where to allocate their marketing budget? Which user might be most interested in their product? That is where big global companies and ad networks come into play, whose business model is to "get to know" the user, in order to (based on statistical arguments) be able to predict which ad might be most interesting to the user. For those companies having as much data as possible, and especially as diverse ("multi-dimensional") data as possible, is a clear market advantage: the more dimensions and the more complete the data, the better the statistical models to predict what ad might be interesting to the user. Short: user data is an integral part of their business model. I do not want to discuss whether this business model is legitimate or not, just want to state what it is, in order to come to the next point.

Data as means to learn and provide service
Every shop owner, to employ a offline real-life example, observes her shop. She observes how people enter, what they look at, which corners they might never go to and at the end what they buy. Short: she gathers data. I think it's fair to say that no one would expect from the shop owner to not use that data in order to optimise their business: to order more of item A, to exclude item B and to maybe re-arrange the shop in such a way that her customers have a better shopping experience and, of course, increasing her profit. When at the panel it was said that it's "to some extent" understandable that website providers might utilise data to provide better service, but that still(!) at the end it's about making more profits, the only thing to reply here is: "yes, of course". It's always about increasing profits in business. But is this bad? No, because increasing your profits means that you were apparently able to provide a product or a service that consumers want and are willing to pay for. (Clearly, if you are not into markets and business in general, you will not buy this argument.) Short: in this scenario user data is used to optimise another underlying business model. 

Data in a (news) publishing company

And this is also how we see data. We are not an ad network. Our business model is not optimising ads. Our business model is to provide excellent high-quality journalism and to help our readers to be universally informed in a world of ever increasing complexity (other former content provides might see this different and actually shifted to the above outlined ad-network based business model). Data, on how our users use our products, might help us to transform the news media business into a "digitally native business". This is the real issue that (news) media companies face: how does news consumption in a digital-only world look like? In a world where there is an overabundance of information, mostly for free, a world where I could keep myself busy 24-7 with simply reading news. What value can a (former traditional) newspaper-publishing company provide? We strongly believe there is a value in what we do, as understanding and analysing a complex world is not getting less, but more important. And yes, we believe that there are people willing to pay for that service. Still data-enabled innovation is (in my personal opinion) an important key to succeed in re-defining, or let's say: extending, this very old business model into a more digital world. Specifically, I am convinced that we will see more "smart news products", news products that help us get the news and information we need in order to be up-to-date.

Personalisation --> Smart News Products

I just introduced the phrase "smart news products". Why? Sure, I could say "personalisation", but as the word "personalisation" seems to be emotionally charged and in negative connotation with "personalised ads", let's stick to "smart news products". So what in general makes a product "smart"? Mostly it's about being adaptive to the user who uses it. Example: my smart phone is, well, smart. Because it's not a one-size-fits-all product. It has apps, that customise my personal experience. Google's search is smart. It shows me from all those potentially million search results, the ones that are most likely important to me. So when I am in Berlin and I am searching the phrase "bakery", what I am actually looking for (most likely) is "bakery in Berlin" and google deduces this from my location. It's "smart", and it uses data (in this case the implicit information that I am in Berlin) to do so. In a similar way, I imagine "smart news products", news products that are to some extent adaptive to the user, using data. So now I hear you already thinking "filter bubble!". This is a tricky issue and worth a separate blog post. The only thing I want to say here is that "the filter bubble" is not an inevitable effect of using "algorithms" and data to build smart products. It can be avoided, and has to do with the question how I, as data scientist, design the recommendation algorithm for the smart news product. Sure, I could go ahead and say "you always read articles tagged X, I show you everything I have on X", but that's only *one* choice (and frankly a pretty bad one). I could also go ahead and say, "you always read articles tagged X and because I have this data, I will go ahead and show you articles that are not tagged X in order to increase the diversity of information you see". That  is another completely valid (and possible) design choice for the underlying algorithm. Technically it's not as simple as it sounds, but it's doable. Clearly, for a smart news product the reality has to lie somewhere in the middle, or both as two products, one named "more on X", one named "things you usually don't read". Would this be valuable added service to our readers? I don't know, but we will try.