Piwik, An alternative to Google Analytics
By Adam Chandler
Cornell University Libraries
Director, Automation, User Experience, and Post-Cataloging Services
Piwik (piwik.org) as a replacement for GA. Piwik is free, open source, and perhaps most importantly, it supports local data collection. In this brief blog post, I will summarize what some in the library literature say about web analytics tools, explain why we selected Piwik, and describe what is involved when migrating from GA to Piwik.
This blog post is an abridged version of a much longer article I co-authored with Melissa Wallace.1 In researching that article, we found recommendations to use Google Analytics written by librarians in every year back to 2007. In reading through the librarian-authored articles advocating for the use of GA, clearly librarians like it, but what is odd is the extent to which the authors are disconnected from the reader privacy tradition in libraries. There is occasional mention of privacy as a consideration, but not enough to change the recommendation to use Google Analytics. The most explicit statement against the use of GA in libraries we found is a blog post published by the Ontario Library Association written by Susanna Galbraith. Galbraith writes:
Many of us in the library community who have a responsibility to assess the usage of our library’s websites have become very familiar with the popular Google Analytics. Google Analytics is free and robust, and yet the data it collects belongs to Google and is housed on U.S. servers, where data may be subject to the legislation of that country. While many may see this as inconsequential (hey, Canada.ca uses Google Analytics, why can’t we?), those of us in the library community who wish to uphold the longstanding tradition in our profession of protecting user privacy, may wish to seek other alternatives.2
We agree with Galbraith. For privacy-related reasons alone, Piwik is a better web analytics solution for libraries. It is also a powerful open source web analytics tool, feature for feature, on par with GA. The table below is a high level summary of the two products.
Functionality | Piwik | Google Analytics |
Data storage | Library controlled server | Google controlled server |
Data may be collected by Javascript widget embedded on page | yes | yes |
Data may be collected ingesting Apache log files | yes | no |
Command line SQL access to database | yes | no |
Aggregate IP addresses to location-based groups defined by library | yes | no |
Management of logins | Centralized | Decentralized |
API | yes | yes |
Real-time data | yes | yes |
Event tracking | yes | yes |
Segment or filter data | yes | yes |
Customizable dashboard | yes | yes |
E-commerce support | yes | yes |
Goal conversion tracking | yes | yes |
Search keywords | yes | yes |
Geolocation | yes | yes |
Heat mapping | yes | yes |
Reporting features (email, export, etc.) | yes | yes |
IP and URL exclusion | yes | yes |
Plugins/CMS integration | yes | yes |
Piwik installation was relatively simple, with library systems administrator following the steps outlined in Piwik’s online documentation. It is hosted on a Cornell University server. The university’s standard security profile is in place, with periodic scans and monitoring by Cornell central IT. We chose a user-friendly, product-agnostic URL (webanalytics.library.cornell.edu), at which the installation could be completed through an easy point-and-click process. In addition to the default installation, we set up a recommended automated cron task to process reports periodically; without this task the system would recalculate statistics on the fly and would be considerably slower. Last, we used Piwik’s log import script to parse our Apache logs. This process was also straightforward, and once configured, it runs automatically and does not require much day-to-day maintenance.
In addition to data collection by Apache logs, CUL also collects web statistics via Javascript. While Javascript embed code must be manually added to websites, it allows for greater customization and additional features, such as a real-time map of visitors and the tracking of exit links. The Javascript option also allows us to collect statistics on sites that are hosted by third parties, such as Illiad and 360 Link.
We would be remiss if we failed to acknowledge that not every institution has the IT resources of a library like Cornell. Before Piwik can see widespread adoption across libraries, IT support is a gap that might need to be filled by a privacy-sensitive non-profit.
References
1Adam Chandler and Melissa Wallace, “Using Piwik Instead of Google Analytics at the Cornell University Library,” The Serials Librarian 71, no. 3–4 (November 16, 2016): 173–79, doi:10.1080/0361526X.2016.1245645.
2Susanna Galbraith, “Piwik: Breaking Away from Google Analytics,” Open Shelf, http://www.open-shelf.ca/160215-piwik/ (accessed February 15, 2016).