It appears that we’re truly reaching the next phase of big data: data transparency. Certainly, open data is a movement that has been around for some time. But it feels like the act of demanding access to the data that companies are tracking is only now entering the general consciousness, spurred by the Facebook-Cambridge Analytica saga, and maybe also GDPR (I do still need to set aside time to truly understand GDPR).
There are several levels of data transparency:
There’s the first level of transparency where the company tells you that data is being collected, without going into detail on which data.
There’s the second level, where you understand exactly what data is being collected, but you do not have access to that collected data. This is the level of transparency most tech companies maintained pre-Cambridge Analytica.
At the next level, companies give you access to some or all of your data that has been collected. This is what many companies like Google and Facebook are now putting in place, with tools to export data.
It occurs to me that there is a fourth level of data transparency. That is transparency into the data that’s not just been collected about you the individual user, but the data that’s been aggregated across the user base. This data would have to be anonymized and distilled into aggregate insights, which is how it should be analyzed within these tech companies anyway. This level of transparency tells the user: you have access to the same overall insights that we have.
This may be transparency that oversight organizations such as governmental authorities can demand. But it isn’t a level of transparency that the individual user should feel entitled to. What I’m wondering is this: is this something that individual users can be offered for a fee? This fee would be fair payment for the company’s investment in collecting, anonymizing, analyzing, and packaging the general insights.
This thought was triggered by my previous post about aggregating the best of the best content out there. I could see people being willing to pay for access to the same granular insights that content creators have about their body of content. I may be willing to pay to be able to receive a filtered list of only the top content from a single publication, and to be able to define “top” anyway I want e.g. most shared, most seen in past __ days, or most visited.
It also occurs to me that the public “like” and comment counts on posts and curated “best of” lists are essentially a free version of this offering. But it isn’t in the publisher’s interest to make it easy to filter out only the most popular content – readers/followers would then ignore the majority of the content created.
Who’s written/talked about this before? What conclusions have they come to?