Tag Archives: statistics

Analysing logs #anzreg2018

How to work with EZproxy logs in Splunk. Why; how; who
Linda Farrall, Monash University

Monash uses EZproxy for all access either on/off campus. Manage EZproxy themselves. Use logs for resource statistics and preventing unauthorised access. Splunk is a log-ingestion tool – could use anything.

Notes can’t rely just on country changes though this is important as people use VPNs a lot. Eg people in China especially appear elsewhere; and people often use US VPN to watch Netflix and then forget to turn it off. Similarly total downloads isn’t very important as illegal downloads often happen a bit by bit.

Number of events by sessionid can be an indicator; as can number of sessions per user. And then there’s suspicious referrers eg SciHub! But some users do a search on SciHub because it’s more user-friendly and then come to get the article legally through their EZproxy.

https://github.com/prbutler/EZProxy_IP_Blacklist – doesn’t use this directly as doesn’t want to encourage them to just move to another IP.

A report of users who seem to be testing accounts with different databases.

Splunk can send alerts based on queries. Also is doing work with machine learning so could theoretically identify ‘normal’ behaviour and alert for abnormal behaviour.

But currently Monash does no automated blocking – investigates anything that looks unusual first.


Working with Tableau, Alma, Primo and Leganto
Sabrina Alvaro UNSW Megan Lee Monash University

Tableau server: self-hosted or Tableau-hosted (these two give you more security options to make reports private), and public (free) version.

Tableau desktop: similarly enterprise vs public.

UNSW using self-hosted server and enterprise desktop, with 9 dashboards (or ‘projects’)

For Alma/Primo can’t use Ex Libris web data connector so extract Analytics data manually but it may be a server version issue.

Easy interface to create report and then share with link or embed code.

UNSW  still learning. Want to join sources together, identify correlations, capture user stories.

E-resource usage analytics in Alma #anzreg2018

“Pillars in the Mist: Supporting Effective Decision-making with Statistical Analysis of SUSHI and COUNTER Usage Reports
Aleksandra Petrovic, University of Auckland

Increasing call for evidence-based decision making in combination with rising importance of e-resources (from 60% -> 87% of collection in last ten years), in context of decreasing budget and changes in user behaviour.

Options: EBSCO usage consolidations, Alma analytics or Journal Usage Statistics Portal (JUSP). Pros of Alma: no additional fees; part of existing system; no restrictions for historical records; could modify/enhance reports; could have input in future development. But does involve more work than other systems.

Workflow: harvest data by manual methods; automatic receipt of reports, mostly COUNTER; receipt by email. All go into Alma Analytics, then create reports, analyse, make subscription decisions.

Use the Pareto Principle eg 20% of vendors responsible for 80% of usage. Similarly 80% of project time spent in data gathering creates 20% of business value; 20% of time spent in analysis for 80% of value.

Some vendors slow to respond (asking at renewal time increased their motivation….) Harvesting bugs eg issue with JR1. There were reporting failures (especially in move from http to https) and issues tracking the harvesting. Important to monitor what data is being harvested before basing decisions on it! Alma provides a “Missing data” view but can’t export into Excel to filter so created a similar report on Alma Analytics (which they’re willing to share).

So far have 106 SUSHI, 45 manual COUNTER vendors and 17 non-COUNTER vendors. Got stats from 85% of vendors.

Can see trends in open access usage. Can compare whether users are using recent vs older material – drives decisions around backfiles vs rolling embargos. Can look at usage for titles in package – eg one where only three titles had high usage so just bought those and cancelled package.

All reports in one place. Can be imported into Tableau for display/visualisation: a nice cherry on the top.

Cancelling low-use items / reducing duplication has saved money. Hope more vendors will use SUSHI to increase data available. If doing it again would:

  • use a generic contact email for gathering data
  • use the dashboard earlier in the project

Cost per use trickier to get out – especially with exchange rate issues but also sounds like reports don’t quite match up in Alma.

Alma plus JUSP
Julie Wright, University of Adelaide

Moved from using Alma Analytics to JUSP – to both. Timeline:

  • Manual analysis of COUNTER: very time intensive: 2-3 weeks each time and wanted to do it monthly…
  • UStat better but only SUSHI, specific reports, and no integration with Alma Analytics
  • Alma Analytics better still but still needs monitoring (see above-mentioned https issues)
  • JUSP – only COUNTER/SUSHI, reports easy and good, but can’t make your own
much work easy
complex analyses available only simple reports
only has 12 months data data back to 2014
benchmarking works with vendors on issues
quality control of data

JUSP also has its own SUSHI server – so can harvest from here into Alma. This causes issues with duplicate data when the publishers don’t match exactly. Eg JUSP shows “BioOne” when there are actually various publishers; or “Wiley” when Alma has “John Wiley and Sons”. Might need to delete all Alma data and use only JUSP data.

Tracking usage of QR codes

QR code for this blog, via

I have to admit to scepticism about QR codes in libraries. I see them everywhere, but (almost) the only time I hear success stories they turn out to be “Oh my goodness someone actually used it!” stories.

On the other hand, recently I am hearing more anecdata about students using QR codes in other contexts, so perhaps it’s just a matter of motivation and/or accustomisation.

Besides which, QR codes are pretty ridiculously easy to create and slap on a poster – it’s not like the time investment, say, Second Life required. Even so, you don’t want to clutter up important real estate (and potentially look a bit try-hard) if no-one’s actually going to use the code.

So how to track the usage of your QR codes? It’s just about as easy: if you create the QR code through a site like bitly.com (there are probably others, this is just what I’m familiar with) then it’ll keep count for you of how many people are following the code.

For example, a bitly URL gives you a QR code at https://bitly.com/107etxA.qrcode and a stats page at https://bitly.com/107etxA+. (The QR code and the original link will both take you to the target webpage and will both be tracked in the stats, but the “referrer” stats will tell you which visits came from a website and which came via the QR code.)

Best practice: If you are slapping these on a poster, spare a thought for those who don’t have a smartphone and include a human-readable URL as well. You could use that bitly one, in which case you might want to customise it when you first create it so it’s also human-memorable. 🙂

Out of curiosity: Is anyone out there already using QR codes and has their own success stories (whether quantitative or qualitative)?

Possible topics for crowd-sourced research

Since first talking about this I’ve been pondering what topics would make good candidates to try out the model. I think it should be something that:

  1. is of interest to as many people as possible; and
  2. can be contributed to by as many people as possible;
  3. as easily as possible.

With these criteria in mind I’ve come up with two possible ideas:

A. Trends in patrons’ use of electronic equipment in the library
This is basically an extension of the article that inspired my thinky thoughts to start with, which did headcounts to measure laptop use in their library. We could extend this to, say, a headcount of

  • total people, of course;
  • users of library computers;
  • users of personal laptops;
  • PDAs;
  • cellphones;
  • and a handy ‘other’ category.

We could decide what time(s)/day(s) to run the headcount on, set up an online spreadsheet, and anyone wanting to participate could do their headcount and enter the data into the spreadsheet. Whether people can only participate once, or can do it recurrently, there’ll be value either way. It’s simple and quantitative and easy.

B. Librarians’ perceptions of the quality of vendor training
(ie training provided by vendors in the use of their products to librarians, in case that’s not clear)
This is. Perhaps a delicate topic. I’ve been thinking for a while about blogging about my own perceptions, all aggregated and anonymised but it still feels a bit “bite the hand that holds all our resources”, because my perceptions are not good. But perhaps it would be less awkward if it came from a whole lot of librarians. And vendors are starting to respond more and more to concerns raised in social media so maybe it would actually get some attention and help vendors provide better training.

OTOH this would be an inherently messy topic to research. It’d be a good test of whether crowdsourcing a qualitative research topic could work, but perhaps not a good test of whether crowdsourcing research per se is workable. There’d need to be a lot of discussion about what exactly we want to research:

  • Likert scales of measures on eg amount of new info, amount of info already known, familiarity of trainer with database, ability of trainer to answer questions…?
  • more freeform answers about problems with presentations eg slides full of essays, trainer bungles example searches…?
  • surveying trainers themselves to find out what kind of training they get in how to give a good presentation?

So, for anyone interested in going somewhere with this — or just interested in reading the results — what do you think? Topic A, topic B, topic C (insert your own topic here), or all of the above?

Links of interest 11/8/10 – open access, accessibility, statistics and more

Open Access


  • Char Booth writes about e-texts and library accessibility including a great quote that “ebooks were created by the blind, then made inaccessible by the sighted.”
  • NZETC has just posted about the 1064 works in DAISY format available in their collection for people with print-related disabilities. (DAISY = “Digital Accessible Information SYstem”)

Library statistics


  • The first year of research on the Researchers of Tomorrow (pdf) study finds that “in broad approaches to information‐seeking and use of research resources, there are no marked differences between Generation Y doctoral students and those in older age groups. Nor are there marked differences in these behaviours between doctoral students of any age in different years of their study. The most significant differences revealed in the data are between subject disciplines of study irrespective of age or year of study.”
  • Assessments of Information Literacy collects links to infolit tests, assessments, rubrics and tutorials available online.
  • Christina Pikas lists a Rundown of the new [database etc] interfaces this summer. There were some surprises, including a ScienceDirect/Scopus merger apparently due August 28…

[Edited 12/8 to fix broken links]

Crowdsourcing library research

Reading Snapshots of Laptop Use in an Academic Library crystallised some thinky thoughts I’ve vaguely had for a while about the possibility of libraries working together on library research.

The very short version of the article is that in their library “28% of students used laptops in existing spaces in 2005, while 62% of students used laptops in the same spaces in 2008”. But of course they’re not sure exactly what’s causing the change. Is it just the changing times? Changing university policy? Changing library spaces? Something in the water? When you’ve only got one datapoint – your own library – it’s hard to see what the real trend is.

But if you had the same data from a whole bunch of libraries then you’d be able to get a better idea of the nationwide/global trends. And if your data was different from that trend, you’d be able to get a better idea of how your local circumstances are affecting what’s going on.

I’ve had thinky thoughts in the past about libraries sharing their statistics and research and stuff and part of the problem I recognised then was that everyone counts different statistics, so results aren’t always comparable.

But. What if, when we want to do this kind of research, instead of doing it in-house, we open it up:

  1. stick up a wiki where we can collaborate with a pile of other libraries on deciding the methodology,
  2. stick up a Google spreadsheet where participating libraries can enter their stats,
  3. ???
  4. profit Publish!

Potential for awesomesauce, yes/yes? Does anyone have any burning research questions they’d like to try this with? Because my burning research question is currently “Let’s do it!” which, um, technically isn’t a question.

Mobile vs Smartphones & other links of interest 14/4/10

Mobile vs Smartphones
Roy Tennant suggests not making any more mobile websites as research suggests more people (in the US) are getting smartphones that can support anything a normal web-browser can support. (Though I don’t know of any smartphone that supports a 1024×768 screensize…) Smartphone applications seem to be trending instead. The iLibrarian rounds up her Top 30 Library iPhone Apps (part 2 and part 3). Why an application when you’ve already got a website? Phil Windley points out that “If my bank can get me to download an app, then they have a permanent space on my app list.” The trade-off is that whereas a website should work on any browser, smartphone apps often need to be in proprietary formats (the Librarian in Black particularly complains about Apple’s iPhone in this respect).

Web 2.0
Common Craft has a 3-minute video explaining “Cloud Computing in Plain English“.

The Metropolitan Museum of Art Libraries and Brown University Library provide a “dashboard” of widgets on their websites displaying current statistics about library usage.

View from the top 🙂
The University Librarian at McMaster University Library blogs results from their laptop survey. Apparently laptop circulation now accounts for about a third of their total circulation stats; their survey looks into how students are using the laptops.

The Director of Librarys at the State University of New York at Potsdam blogs about “What I’ve Learned” in the first 10 months of her job there.

Scandal of the week…
Barbara Fister summarises recent discussion about EBSCO as the “New Evil Empire” in her Library Journal article “Big vendor frustrations, disempowered librarians, and the ends of empire“.

Alice for the iPad – one of the ways technology can enhance the book.