Tag Archives: usage analysis

Analysing logs #anzreg2018

How to work with EZproxy logs in Splunk. Why; how; who
Linda Farrall, Monash University

Monash uses EZproxy for all access either on/off campus. Manage EZproxy themselves. Use logs for resource statistics and preventing unauthorised access. Splunk is a log-ingestion tool – could use anything.

Notes can’t rely just on country changes though this is important as people use VPNs a lot. Eg people in China especially appear elsewhere; and people often use US VPN to watch Netflix and then forget to turn it off. Similarly total downloads isn’t very important as illegal downloads often happen a bit by bit.

Number of events by sessionid can be an indicator; as can number of sessions per user. And then there’s suspicious referrers eg SciHub! But some users do a search on SciHub because it’s more user-friendly and then come to get the article legally through their EZproxy.

https://github.com/prbutler/EZProxy_IP_Blacklist – doesn’t use this directly as doesn’t want to encourage them to just move to another IP.

A report of users who seem to be testing accounts with different databases.

Splunk can send alerts based on queries. Also is doing work with machine learning so could theoretically identify ‘normal’ behaviour and alert for abnormal behaviour.

But currently Monash does no automated blocking – investigates anything that looks unusual first.


Working with Tableau, Alma, Primo and Leganto
Sabrina Alvaro UNSW Megan Lee Monash University

Tableau server: self-hosted or Tableau-hosted (these two give you more security options to make reports private), and public (free) version.

Tableau desktop: similarly enterprise vs public.

UNSW using self-hosted server and enterprise desktop, with 9 dashboards (or ‘projects’)

For Alma/Primo can’t use Ex Libris web data connector so extract Analytics data manually but it may be a server version issue.

Easy interface to create report and then share with link or embed code.

UNSW  still learning. Want to join sources together, identify correlations, capture user stories.

EZproxy log monitoring with Splunk for security management #anzreg2018

Ingesting EZproxy logs into Splunk. Proactive security breach management and generating rich eResource metrics
Linda Farrall, Monash University

Use Alma analytics for usage, but also using EZproxy logs.

EZProxy is locally hosted and administered by library/IT. On- and off-campus access is through EZproxy where possible, and Monash has always used EZproxy logs to report on access statistics. (For some vendors it’s the only stats available.) Used a Python script to generate html and CSV files.

Maintenance hard, logs bigger so execution took longer, python libraries no longer supported, skewed statistics due to EZproxy misuse/compromised accounts. So moved to Splunk (already had enterprise version at university) to ingest logs; can then enrich with faculty data, and improve detection of compromised accounts.

EZproxy misuse – mostly excessive downloads, eg using script or browser plugin – related to study but the amount triggers vendor Concerns (ie block all university access) – in this case check in with user to make sure it was them and sort out the issue. Or compromised accounts due to phishing. Have created a process to identify issues and block the account until ITS educates the user (because phishing emails will get sent to the same person who fell for it last time).

Pre-Splunk, it was time-consuming to monitor logs and investigate. Python script monitoring downloads no longer worked due to change of file size/number involved in typical download.

Most compromised accounts from Canada, US, Europe – in Splunk can look at reports where a user has bounced between a few countries within one week. Can look at total download size (file numbers, file size) – and can then join these two reports to look for accounts downloading a lot from a lot of countries.

To investigate have to go into identity management accounts – but can then see all their private data. Once they integrate faculty information into Splunk they don’t have to look them up so can actually enhance privacy – can see downloading lots of engineering data but are actually in engineering faculty so probably okay.

In 2016 had 10 incidents with resources blocked by vendors for 26 days. In 2017 16 incidents (all before August when started using Splunk). In 2018, 0 incidents of blocking – because they’re staying on top of compromised accounts (identifying an average of 4 a week) and taking pre-emptive action (see an issue, block the account, notify the vendor). Also now have a very good relationship with IEEE! (Notes that when IEEE alerts you to an issue it’s always a compromised account, there’s never any other explanation.)

Typically account compromised; tested quietly over several days; then sold on and used heavily. If a university hasn’t been targeted yet, it will be. By detecting accounts downloading data, are also protecting the university from other damage they can cause to university systems.

Notes that each university will have different patterns of normal use: you get to know your own data.

Lots of vendors moving to SSO. Plan to do SSO through EZproxy – haven’t done it yet so not sure it’ll work or not but testing it within a couple of months. ITS will implement SSO logging for the university, so hopefully they’ll pick up issues before it gets to EZproxy. Actively asking vendors to do it through IP recognition/EZproxy.

E-resource usage analytics in Alma #anzreg2018

“Pillars in the Mist: Supporting Effective Decision-making with Statistical Analysis of SUSHI and COUNTER Usage Reports
Aleksandra Petrovic, University of Auckland

Increasing call for evidence-based decision making in combination with rising importance of e-resources (from 60% -> 87% of collection in last ten years), in context of decreasing budget and changes in user behaviour.

Options: EBSCO usage consolidations, Alma analytics or Journal Usage Statistics Portal (JUSP). Pros of Alma: no additional fees; part of existing system; no restrictions for historical records; could modify/enhance reports; could have input in future development. But does involve more work than other systems.

Workflow: harvest data by manual methods; automatic receipt of reports, mostly COUNTER; receipt by email. All go into Alma Analytics, then create reports, analyse, make subscription decisions.

Use the Pareto Principle eg 20% of vendors responsible for 80% of usage. Similarly 80% of project time spent in data gathering creates 20% of business value; 20% of time spent in analysis for 80% of value.

Some vendors slow to respond (asking at renewal time increased their motivation….) Harvesting bugs eg issue with JR1. There were reporting failures (especially in move from http to https) and issues tracking the harvesting. Important to monitor what data is being harvested before basing decisions on it! Alma provides a “Missing data” view but can’t export into Excel to filter so created a similar report on Alma Analytics (which they’re willing to share).

So far have 106 SUSHI, 45 manual COUNTER vendors and 17 non-COUNTER vendors. Got stats from 85% of vendors.

Can see trends in open access usage. Can compare whether users are using recent vs older material – drives decisions around backfiles vs rolling embargos. Can look at usage for titles in package – eg one where only three titles had high usage so just bought those and cancelled package.

All reports in one place. Can be imported into Tableau for display/visualisation: a nice cherry on the top.

Cancelling low-use items / reducing duplication has saved money. Hope more vendors will use SUSHI to increase data available. If doing it again would:

  • use a generic contact email for gathering data
  • use the dashboard earlier in the project

Cost per use trickier to get out – especially with exchange rate issues but also sounds like reports don’t quite match up in Alma.

Alma plus JUSP
Julie Wright, University of Adelaide

Moved from using Alma Analytics to JUSP – to both. Timeline:

  • Manual analysis of COUNTER: very time intensive: 2-3 weeks each time and wanted to do it monthly…
  • UStat better but only SUSHI, specific reports, and no integration with Alma Analytics
  • Alma Analytics better still but still needs monitoring (see above-mentioned https issues)
  • JUSP – only COUNTER/SUSHI, reports easy and good, but can’t make your own
much work easy
complex analyses available only simple reports
only has 12 months data data back to 2014
benchmarking works with vendors on issues
quality control of data

JUSP also has its own SUSHI server – so can harvest from here into Alma. This causes issues with duplicate data when the publishers don’t match exactly. Eg JUSP shows “BioOne” when there are actually various publishers; or “Wiley” when Alma has “John Wiley and Sons”. Might need to delete all Alma data and use only JUSP data.