The New York Times reported recently that Facebook gave tech giants like Amazon and Microsoft greater access to users' personal data than it had previously acknowledged, and although Facebook issued a denial the episode could again send shivers down the spines of nervous privacy-conscious social media users.
As tech-based systems have become all but indispensable, many institutions might assume user data will be reliable, meaningful and, most of all, plentiful. But what if this data became unreliable, meaningless or even scarce? Seemingly lost in the debates over data ownership and privacy is the possibility that data availability could dwindle, rather than continuing to grow—and change the booming business of data aggregation as we know it.
Repeated data breaches have led Americans to be increasingly pessimistic that the personal data they give government and commercial entities is secure, a development that could cause people to stop sharing their data.Share on Twitter
Most of the estimated 500 million people whose sensitive personal information was stolen in the Marriott data breach revealed last month had no say in how the information had been stored and protected, or how widely it might have been shared with other companies. Repeated data breaches have led Americans to be increasingly pessimistic that the personal data they give government and commercial entities is secure, a development that could cause people to stop sharing their data or to share it more judiciously. Public outrage over data breaches and distrust of government and tech companies also could diminish tech workers' appetite to develop technological innovations and corresponding systems that are built from and rely on large amounts of data.
If the public broadly opts out of using tech tools, or makes fewer efforts to build them, insufficient or unreliable user data could destabilize the data aggregation business model that powers much of the tech industry. Developers of technologies such as artificial intelligence, as well as businesses built on big data, could no longer count on ever-expanding streams of data. Without this data, machine learning models would be less accurate, and targeted ads would be less precise and thus less lucrative.
Institutions that enjoy access to this data (such as Facebook's trove of personal data) and the capacity to organize it (China's vast data-tagging workforce) could see these built-in advantages, which help them collect and make use of online data, begin to erode. It might, however, give rise to competitors with new business practices or methods of capturing user information, such as relying on data from voice assistants or smart connected devices.
Most Internet users seem to understand they are creating massive amounts of easily accessible data, either deliberately or inadvertently through “digital exhaust”—the valuable personal data organizations capture and analyze for their own purposes, whether those are profit-driven, to support policy making, or for national security. Internet giants like Facebook and Google are built to leverage seemingly ceaseless flows of user-generated data, culled from web browsing or smartphone apps. Governments are exploring how to use social media or ride-sharing data to inform policy and decision making.
But the proliferation of hacking incidents appears to have altered the public conversation around online privacy, as well as online behavior. More than 50 percent of victims of a data breach surveyed for a RAND study had taken steps to maintain online privacy, such as changing passwords. Tech users and developers appear increasingly willing to take principled stances against what they view as unethical or objectionable business practices. Populist attitudes that convey suspicion of the institutions that hold and exploit this data—whether the organizations are based in Silicon Valley or Washington, D.C.—could encourage people to curb their tech usage. For instance, an estimated 200,000 Uber users deleted their accounts with the ride-hailing service in early 2017 to protest perceived misbehavior by the company and the CEO. And Silicon Valley employees have staged walkouts and in some cases resigned to protest the actions of their employers.
Certain groups could end up opting out, skewing user data to make it unrepresentative of the broader population. Technologies and systems that are based on biased data can exacerbate inequities. The potential trend could be made worse if large numbers of minorities or entire communities declined to use tools and provide data to train algorithms.
Policymakers might want to consider exploring how to go about restoring trust in public and private institutions that collect personal data, which could include promoting transparency and accountability, or strengthening data security.
Institutions built on the assumption of an unending flow of data should consider how to prepare for tech users to shut off the spigot. Approaches that account for these behavior shifts could include, for instance, artificial intelligence methods that do not rely on large amounts of training data, such as creating synthetic data rather than looking for real-world data.
Ensuring the long-term sustainability of data-based technologies will require that the data they rely on is accessible and representative of the population. That won't be possible if those who desire to acquire more of that data keep giving people reasons to opt out.
Douglas Yeung is a behavioral scientist at the nonprofit, nonpartisan RAND Corp. and serves on the faculty of the Pardee RAND Graduate School.
This commentary originally appeared on United Press International on December 28, 2018. Outside View © 2018 United Press International.
Commentary gives RAND researchers a platform to convey insights based on their professional expertise and often on their peer-reviewed research and analysis.