These days, the very word “data” elicits fear and suspicion in many of us — and with good reason. DNA-testing companies are sharing genetic information with the government. A firm hired by the Trump campaign gained access to the private information of 50 million Facebook users. Hotels, hospitals, and a consumer credit reporting agency have admitted to major breaches. But while many of us are rightfully concerned about the misuse of our personal data by private entities, we should be just as worried about the important national stories that aren’t told when our fellow citizens don’t feel secure enough to share theirs with researchers.
Part of the reason so many of us are nervous about our data and who has access to it is that pieces of our data can be combined to paint a detailed picture of our lives: how much money we make, what we’re interested in, what car we drive. But in a similar way, individual experiences become data points in sets that shape our understanding of what’s happening in this country.
Data for the Public Good
Data tells important stories about our country. We should treat it with more respect.
by Abdullah Shihipar
https://www.nytimes.com/2019/10/24/opinion/data-privacy-research.html
These days, the very word “data” elicits fear and suspicion in many of us — and with good reason. DNA-testing companies are sharing genetic information with the government. A firm hired by the Trump campaign gained access to the private information of 50 million Facebook users. Hotels, hospitals, and a consumer credit reporting agency have admitted to major breaches. But while many of us are rightfully concerned about the misuse of our personal data by private entities, we should be just as worried about the important national stories that aren’t told when our fellow citizens don’t feel secure enough to share theirs with researchers.
Part of the reason so many of us are nervous about our data and who has access to it is that pieces of our data can be combined to paint a detailed picture of our lives: how much money we make, what we’re interested in, what car we drive. But in a similar way, individual experiences become data points in sets that shape our understanding of what’s happening in this country.
This is especially true in the public health context. One death of a black woman in a maternity ward may be dismissed as an isolated case, until it is combined with thousands of other cases and compared to white maternal morbidity rates. When residents of Flint, Mich., repeatedly complained about getting sick from orange-tinged tap water, they were largely ignored and dismissed as paranoid, only to be vindicated when Dr. Mona Hanna-Attisha published a study showing that children in Flint had elevated levels of lead in their blood after the city’s water source had been switched.
Data collected by the Census Bureau — including on education status, employment, housing status, food security, income — is particularly important. It informs decisions about resource allocation and, crucially, political redistricting.
Without this kind of data, our ability to understand the world around us is restricted. Canada, where publicly available data is relatively limited in size and scope, provides a cautionary tale. Dr. Arjumand Siddiqi of the University of Toronto, tried to conduct a study similar to one done in the United States that showed middle-aged white Americans without a college degree were dying at higher rates in recent years, especially of deaths caused by alcohol and drugs. But her efforts were frustrated because death records in Canada, shockingly, do not record information on race or education.
Even in the United States, there are gaps in the data. The artist Mimi Onuoha curates “On Missing Data Sets” — a list of pieces of information that are absent from the public record. This includes things like the number of people who are excluded from public housing due to a criminal record, poverty statistics that incorporate incarcerated people and the number of police departments that use stingray technology.
When it comes to data collected by the state, those who do not appear in data sets are often marginalized people who decline to share their data because of fear and distrust. This was the concern when, in 2018, the Trump administration announced that it would put a citizenship question on the census: Critics worried that the question would stoke fear and depress responses from noncitizens as well as their family members. The administration ultimately walked back the proposal after it was struck down by the Supreme Court.
To encourage data collection that protects the rights of people, we need basic restrictions on data sharing between agencies. While there are some laws that govern the collection of data, few place restrictions on access to existing databases. Those of us who do research on health care, for instance, are governed by the Health Insurance Portability and Accountability Act
(HIPAA), which dictates how we can use and gain access to sensitive patient information, while keeping the privacy of patients paramount. Law enforcement are generally exempt from provisions in HIPAA and, in those cases, can view data rather easily.
In abandoning his plan to use the census to collect information on citizenship, Mr. Trump directed federal agencies to mine their databases for citizenship data. Across the country, law enforcement agencies have connected to databases that are not under their jurisdiction. ICE officers routinely had access to a database of license plates run by the Washington State Department of Licensing. In Massachusetts, state police ran thousands of searches through the state’s prescription data monitoring program, which contains millions of sensitive records on people’s prescription histories.
[If you’re online — and, well, you are — chances are someone is using your information. We’ll tell you what you can do about it. Sign up for our limited-run newsletter.]
All of this has to stop, if the government is to be trusted with citizens’ data. We need legislation that restricts agencies’ access to data for reasons other than the purpose for which it was collected. Since 2018, Massachusetts, along with 12 other states, has restricted access to patient data, requiring law enforcement to obtain a warrant in order to retrieve it. Similar laws need to be passed for all identifiable data existing in government data sets. If an agency needs access to data, it should have to obtain a warrant.
Resources should be devoted to ensuring that ordinary people know how to download data sets for their own research and curiosity. Already, community organizers and activists have used data to hold government agencies accountable. Mijente, a Latinx-led immigrant rights group, has used multiple Freedom of Information Act requests to reveal the details behind high-profile ICE raids.
In addition, agencies should make it easier for people to know when someone else has viewed their records.
Finally, researchers and those tasked with collecting data need to think like the advocates who were successful in defeating the Trump administration’s attempts to add a citizenship question to the census, and ask themselves if the data they are collecting will put the people at risk. Is the potential harm caused by asking a question greater than its research value? Sometimes, the answer will be yes.
By providing a picture of Americans’ experiences, data collection can in fact serve the public good — but only if the people who provide the data are treated with the respect they deserve.