Big publishers are getting into large scale user data collection that – without sufficient privacy protection – enables public surveillance. This change in business model puts academic and intellectual freedom at risk by making people reluctant to read or share publications for fear of government or commercial reprisal. One solution is more collective attention to and pushback on contract terms to curb use of library users’ personal data.
With library collections turning more and more digital, there is increased opportunity for content vendors to collect information about digital content use. Vendors describe these practices as important to security and activity analytics, but there is little in a standard contract that permits what types of data can be collected and how they may be used.
Trusting vendors to “do the right” thing with user data is not sufficient. According to Callan Bignoli and colleagues, “library vendors do not have a track record of sharing library workers’ values about patron privacy, and they build their platforms accordingly.”
“We have accepted vendor proposals and practices as they are written, and we imagine a walled garden between library data and the massive corpus of data collected about all people where there likely is none,” say the authors.
One daunting possibility is that patron data could feed into large scale surveillance. Both Thomson Reuters (owner of the Westlaw legal database) and RELX group (formerly Reed Elsevier) have had contracts to provide data to U.S. law enforcement, including Immigrations and Customs Enforcement (ICE).
It’s not clear that patron data has ever directly pipelined from vendors to police. But according to CUNY School of Law Professor Sarah Lamdan, both Thomson Reuters and RELX have not denied the possibility, and their privacy statements would not discount it.
“We imagine a walled garden between library data and the massive corpus of data collected about all people where there likely is none.”– Callan Bignoli et al.
Plus, it is still disconcerting that library subscriptions to these vendors help fund big-data public surveillance, which has been shown to recapitulate bias and disproportionately impact minoritized groups.
“These corporations are no longer the publishers that librarians are used to dealing with, the kind that focus on particular data types…” writes Lamdan at In the Library with the Lead Pipe. “Instead, the companies are data barons, sweeping up broad swaths of data to repackage and sell.”
Sharing user data from academic libraries also has chilling implications for academic freedom.
In an interview published last month by zbw-mediatalk, Felix Reda from the Berlin-based Society for Civil Rights notes that the Chinese government has pressured some publishers to block access to certain articles for researchers deemed problematic, and placed sanctions on scientists and researchers working in specific fields. He warns this could easily extend to monitoring researchers’ reading habits.
“The resulting ‘scissors in the head’ (self-censoring) that begins before the actual restrictions of scientific freedom even occur, is particularly dangerous,” said Reda in the interview.
Academic publishing platforms already track user behavior in ways that run counter to libraries’ stated commitment to patron privacy, according to work by University of Minnesota Libraries Director of Information Technology Cody Hanson.
Studying the most frequently-accessed article from each of fifteen publishing platforms available at the University of Minnesota Libraries, Hanson found third-party code loaded alongside the articles. He found an average of eighteen and median of ten code scripts, or “assets”, per article. The third-parties included Google, Adobe Audience Manager, Oracle Marketing Cloud, and Neustar.
“The resulting ‘scissors in the head’ (self-censoring) that begins before the actual restrictions of scientific freedom even occur, is particularly dangerous.”– Felix Reda, Society for Civil Rights
These third-party assets can access the full content of the page the article appears on and track every action a user takes – such as search terms the library user used. In theory, they can identify an individual and link it to their library behavior when used alongside browser fingerprinting, IP addresses, first-party data like user account information, and third-party information from data brokers.
And last month, University of Oregon neuroscience PhD candidate Jonny Saunders noted in a tweet that Elsevier embeds a unique code in PDF metadata every single time a paper is downloaded, which could make it possible to identify the source of any PDF shared among researchers.
In a follow-up Vice article by Lorenzo Franceschi-Bicchierai, Elsevier did not deny the practice but said it was meant to prevent ransomware, not piracy. However, a tweet by social researcher Sunil Rodger points out that tracking individual downloads in metadata could deter researchers from sharing articles through aggregators like ResearchGate.
Whether or not vendors and third parties create workable systems to surveil in this fashion, the possibility is enough to change researchers’ behavior.
The new role of vendor as data broker leaves librarians in an ethical quandary. By not scrutinizing vendor privacy practices, we’re not just complicit in surveillance; we’re technically paying them to collect and sell data on our own patrons.
This directly conflicts with the ALA’s Code of Ethics on privacy rights.
Yet librarians have little leverage to request changes, as we currently rely on these vendors to provide digital content. That leverage grows slimmer as those vendors find new revenue streams in digital data.
Even getting transparency around vendors’ privacy practices is a challenge. Katy DiVittorio and Lorelle Gianelli describe the disheartening experience of a library taskforce paying for and independently gathering scant data on its vendors’ business ethical practices.
Bignoli and colleagues remind us that even incremental changes are helpful, and invite us to “take a harm reduction approach to crisis surveillance capitalism, rooted in the understanding that it can be difficult to make massive change quickly.”
One way to start is by shifting expectations around what privacy protections should be standard in contracts. We make our values clear to vendors and push back on excessively permissive contract language. We can ask vendors to make their privacy policies clear, and be upfront with patrons about how any terms we accept could affect them.
ALA has produced multiple guides to help libraries adopt good privacy practices, including one specific to Vendors and Privacy on how to negotiate with vendors for better privacy and what to watch out for in contract language.
Reda offers four principles for seeking privacy-friendly vendors and contract terms:
- Bid so that different companies have to compete
- Avoid “lock-in effects” such as proprietary platforms that leave libraries permanently dependent on a specific provider
- Let licenses allow unlimited further use on any platform, for any purpose
- Prohibit search tracking at the level of individual researchers and run software in-house wherever possible
Not all will be practical or feasible cases – for instance, often only a single vendor offers the content required – but they offer a good place to start.
Emily Cukier is a Science Librarian at Washington State University. Her interests include biology/life sciences, chemistry, human health and pharmacotherapy, data librarianship, and research ethics. Before coming to WSU, she has worked as a Senior Writer for BioCentury, a pharmaceutical trade publication, and as a nonproprietary naming consultant to the pharmaceutical industry.