By Adam Chandler
Cornell University Libraries
Director, Automation, User Experience, and Post-Cataloging Services
Piwik (piwik.org) as a replacement for GA. Piwik is free, open source, and perhaps most importantly, it supports local data collection. In this brief blog post, I will summarize what some in the library literature say about web analytics tools, explain why we selected Piwik, and describe what is involved when migrating from GA to Piwik.
This blog post is an abridged version of a much longer article I co-authored with Melissa Wallace.1 In researching that article, we found recommendations to use Google Analytics written by librarians in every year back to 2007. In reading through the librarian-authored articles advocating for the use of GA, clearly librarians like it, but what is odd is the extent to which the authors are disconnected from the reader privacy tradition in libraries. There is occasional mention of privacy as a consideration, but not enough to change the recommendation to use Google Analytics. The most explicit statement against the use of GA in libraries we found is a blog post published by the Ontario Library Association written by Susanna Galbraith. Galbraith writes:
Many of us in the library community who have a responsibility to assess the usage of our library’s websites have become very familiar with the popular Google Analytics. Google Analytics is free and robust, and yet the data it collects belongs to Google and is housed on U.S. servers, where data may be subject to the legislation of that country. While many may see this as inconsequential (hey, Canada.ca uses Google Analytics, why can’t we?), those of us in the library community who wish to uphold the longstanding tradition in our profession of protecting user privacy, may wish to seek other alternatives.2
We agree with Galbraith. For privacy-related reasons alone, Piwik is a better web analytics solution for libraries. It is also a powerful open source web analytics tool, feature for feature, on par with GA. The table below is a high level summary of the two products.
|Data storage||Library controlled server||Google controlled server|
|Data may be collected ingesting Apache log files||yes||no|
|Command line SQL access to database||yes||no|
|Aggregate IP addresses to location-based groups defined by library||yes||no|
|Management of logins||Centralized||Decentralized|
|Segment or filter data||yes||yes|
|Goal conversion tracking||yes||yes|
|Reporting features (email, export, etc.)||yes||yes|
|IP and URL exclusion||yes||yes|
Piwik installation was relatively simple, with library systems administrator following the steps outlined in Piwik’s online documentation. It is hosted on a Cornell University server. The university’s standard security profile is in place, with periodic scans and monitoring by Cornell central IT. We chose a user-friendly, product-agnostic URL (webanalytics.library.cornell.edu), at which the installation could be completed through an easy point-and-click process. In addition to the default installation, we set up a recommended automated cron task to process reports periodically; without this task the system would recalculate statistics on the fly and would be considerably slower. Last, we used Piwik’s log import script to parse our Apache logs. This process was also straightforward, and once configured, it runs automatically and does not require much day-to-day maintenance.
We would be remiss if we failed to acknowledge that not every institution has the IT resources of a library like Cornell. Before Piwik can see widespread adoption across libraries, IT support is a gap that might need to be filled by a privacy-sensitive non-profit.
1Adam Chandler and Melissa Wallace, “Using Piwik Instead of Google Analytics at the Cornell University Library,” The Serials Librarian 71, no. 3–4 (November 16, 2016): 173–79, doi:10.1080/0361526X.2016.1245645.