The other day I read a case study on Patreon Data Hack. Those of you who don’t know, Patreon, in short, is a platform that provides tools for creators to run a subscription content service. The website got hacked.
The authors are students of Crowdfunding which is the concept of using public funds obtained via the Internet for a project. The owner of the idea for the project puts it forward to the general public using a crowdfunding website and the financial supporters of the idea are then entitled to some reward when the project is successful. This kind of an exchange creates a lot of digital data. Some of that is publicly available while the rest is private. The authors needed public data, from Patreon. However, even after putting in a lot of effort in scraping the website for the relevant data they were at a loss. Email inquiries weren’t answered either.
When the exact data you wanted but couldn’t get presents itself to you thanks to hackers, can you use the data?
A few months later, the website in question was hacked. Almost 15 GB of data from the website was exposed, which included project data that the authors wanted to study as well as private messages and code. This put them in a huge dilemma. Some members of the team considered the data as unusable only because it was obtained by unethical means. Others compared the data to a newspaper and argued that as it was now public, they could safely utilize it.
In the discussion, they analyzed the concept of hacked public data by studying its similarities to controversial journalism. The publishing of classified documents by Edward Snowden and hacking of Rupert Murdoch’s cellphone were some of the prominent examples they discussed at length. The concept of unethical means used for a greater good was considered in the discussion. Similarly, hacked data gave the researchers access to data that they wouldn’t have got otherwise. They used the example of computer security researcher Mark Burnett’s releasing of a huge database of login information for research purposes. Burnett gathered this information from hacked data dumps that were released on the Internet. He maintained that the noble intention of knowledge gain is above the illegal nature of the process in which the knowledge was gained.
Using private data, such as passwords, addresses, and user preferences violates privacy, and breaches the law, making the use of Patreon data problematic. Sometimes even the use of public data can cause an outrage, as when a New York state newspaper published names and addresses of gun owners in their readership area, the information obtained through a freedom of information request. Making data widely public is often viewed as inappropriate to whom the data concerns.
This is an accepted practice in journalism if done for the greater good in a careful manner. Data scientists, however, lack a peer-group consensus and public goodwill.
The relatively young field of data science does not possess a well-established ethical code that journalism does. For example, journalists use data in cases when the public doesn’t want the data to be public, as in Wikileaks. This is an accepted practice in journalism if done for the greater good in a careful manner. Data scientists, however, lack a peer-group consensus and public goodwill.
The authors took guidance from the ethics statement of the Association of Internet Researchers. Their discussion with other such researchers led to no proper conclusion even though most participants admitted to having reservations about the use of such data.
The same data can be obtained ethically.
By then an underlying concept of people are not anonymous, dehumanized research subjects that can be treated as a line of numbers in a file; thus, ethics must be applied to user data in order to respect their humanity. There are some paradoxical differences between online data and more traditional data types, such as magazines. Journalistic content is viewed as a fair game in research because it’s considered public. Online data is easily found by anyone. Yet, just because the content is publicly accessible, it doesn’t mean it was meant to be consumed by just anyone. They concluded that the same data can be obtained ethically. Hence, did not utilize the hacked data for their research.