Monthly Archives: June 2017

Perverse incentives and the reward structures of academia #or2017

Perverse incentives: how the reward structures of academia are getting in the way of scholarly communication and good science
by Sir Timothy Gowers

Abstract:

The internet has been widely used for the last 20 years and has revolutionized many aspects of our lives. It has been particularly useful for academics, allowing them to interact and exchange ideas far more rapidly and conveniently than they could in the past. However, much of the way that science proceeds has been affected far less by this development than one might have expected, and the basic method of communication of ideas — the journal article — is not much different from how it was in the seventeenth century.

It is easy to imagine new and better methods of dissemination, so what is stopping them from changing the way scientists communicate? Why has the journal system proved to be far more robust than, say, the music industry, in the face of the new methods of sharing information?

The dream that all information is available on-tap, accessible through a few clicks. We’ve got a bit of that via Google, Wikipedia, YouTube, map sites, travel sites, news sites. But a lot of content is for subscribers only: you can find content in Google Books – but only a page before cut off due to copyright.

Of course copyright holders need incentives to create, to cover costs, etc. Academics aren’t directly paid though – actually the barriers to content are the bigger problem. Covering costs? maybe “if you insist on antiquated methods of publication”.

What could we share? The not-yet-complete idea, that others can build on. OTOH if everyone shared everything they thought it could end up a complete mess. But we can make order from chaos.

In maths the revolution has started: his library transitioning from painfully closed stacks to open stacks coincided unfortunately with not even needing to go to the library for content anyway. Wikipedia for basic concepts; arXiv.org for preprints (don’t need journals at all in maths); OEIS (database of sequences of whole numbers along with formulae for generating them); MathOverflow – for questions at the research level – usually get a useful answer within a few hours.

Traditional way of doing things is the “lone genius” model. But thought it’d be interesting to solve a problem in the public. So posted some initial thoughts on his blog and invited contribution. Traditionally there’s a fear of getting scooped – but doing it completely in the open, timestamps mean no-one can take credit; in fact it rewards putting your comment up quickly before someone else can. Problem was solved in just 6 weeks.

Perverse incentives in maths:

  • personal ambition
  • reward for being first (not for being inspiration, or for being second but with a better solution)
  • primacy of journal article while expository and enabling activities are downplayed – when you start writing textbooks instead of journal articles this is seen as your career slowing down
  • little recognition for incomplete ideas

These are obstacles to efficiency.

Paradox of paywalls: mathematicians write, peer review, edit; dissemination costs almost nothing; almost all interesting recent content is on arXiv (which can include final accepted manuscript – and anyway it’s not much different from the preprint); and still libraries pay huge subscription fees. The problem is the internet came along very quickly while we’re still doing things the old way.

Some initiatives:

  • his blog post about personal Elsevier boycott which inspired someone to set up a pledge which thousands signed
  • Open Library of Humanities set up when  Journal Lingua left Elsevier and became Glossa
  • Discrete Analysis (arXiv overlay journal) set up as proof of concept for cheap journal publication, with US$10 submission charge – and a nice user interface
  • No.Big.Deal – trying without success to get Cambridge and JISC to bargain better
  • Freedom of Information Act requests to UK universities for how much unis are paying Elsevier (contra confidentiality clauses)

Perverse incentives are held up by the whole network of publishers, editors, writers, readers (subdivides into people actually reading it, and people scanning it to judge the writer eg hiring committees), librarians (who have the power to cancel – but subject to academic criticism), scholarly societies (who often derive income from publishing journals), consortium negotiators, funders (in a good position to create mandates) – creating a situation where it’s very difficult to change things.

Feels like has had little effect, but it’s important to have lots of little initiatives which together build to pull the wall down.

Electronic Poster Display #or2017

There were lots of fantastic posters, these are just the ones I wanted to refer back to as they sparked thoughts I want to followup on. In no particular order:

  • Governmental Educational Repository in Health – they have 7000+ open access learning objects. [We have 175. Which isn’t nothing. But actually what I’m still mostly interested in is whether anyone’s ever going to develop an aggregator for OA learning objects….]

  • Extending the value of the institutional repository with metrics integration – they’ve got individual researcher profiles showing metrics. [We’ve got some of this in Elements. To get the rest though would require coding, and dealing with authentication to keep it private to the researcher. I recently wrote an authentication module for a hand-coded php/sql app using EZproxy which I could adapt to something like this – or any other homegrown personalisation effort.]

  • COAR Resource Type Controlled Vocabulary: Dspace Prototype implementation – [I saw (and gave feedback on) a draft of this a while back; should have a look at the latest version (v1.1) and check how it maps (or doesn’t) to PBRF types]

  • The PLACE Toolkit: exposing geospatial ready digital collections – [what value would there be, in our own collections, of adding time/location metadata to content to enable eg map/timeline exploration? (and therefore would it outweigh the cost?)]

  • COR(E)CID: Analysing the use of unique author identifiers in repositories via CORE to support the uptake of ORCID iDs – this gives repositories a dashboard to check how many ORCIDs are in their repository. [I wasn’t clear though on whether it’s available for public use or requires a sign-up. Further investigation shows the CORE Repository Dashboard does require registration and is specifically for repositories submitting data to CORE, which makes sense.]

  • International Image Interoperability Framework (IIIF) – [this is beyond my expertise but I want to check it’s on the radar of our non-research-output repository vendor]

  • If you digitise them they will come: creating a discoverable and accessible thesis collection – U of Tasmania made their theses open access retrospectively if at least 10years old, with a disclaimer. [I’ve heard of a number of universities doing similarly; we’ve been more conservative, only making them available to staff and students unless we can secure permission. I’d like to push for the more open model.]

  • Strategies for increasing the amount of open access content in your repository – one tip they suggest is to set up a ScienceDirect email alert for ‘accepted manuscripts’ at your institution. When you get the email, download it immediately before it gets replaced by the ScienceDirect-branded ‘in press’ version. [This. Is. Genius.]

  • Enabling collaborative review with the DSpace configurable workflow – [I did some javascript hacking of the workflow, to sort the items by age and allow other sorting, but there’s very limited information still.] This poster shows improvements like displaying extra metadata fields (eg item type – author/publisher/year might be useful for us), adding statuses (eg questions for the researcher), and adding other notes. [This is Relevant To Our Interests.]

 

FOLIO

hosted by EBSCO; summary at Eventbrite. Disclaimers: there was a free lunch; I love open access; I’m appropriately suspicious of vendors and vapourware; and (I didn’t think this would be relevant before attending, but…) I like zebras.

FOLIO is “a community collaboration to develop an open source Library Services Platform (LSP) designed for innovation”.

Introduction from EBSCO
Community

  • Vendors – Ebsco, ByWater, SirsiDynix
  • ‘Open’ orgs – Koha, Index Data, Open Library Environment
  • Universities – Cornell, University of Sydney, Aberdeen, Glasgow, Newcastle, Università di Roma, National Széchényi

Platform

  • Will support ILS functions but broader – a ‘library services platform’ [à la Alma etc]
  • Each function as its own app – so can create completely new apps eg data mining, IR integration, learning management, research data, predictive analytics, grant management]

Marketplace

  • Apps from around the world built by commercial vendors who may charge, and by libraries who probably won’t. Can buy professional services.

Introduction from Peter Murray (open source community advocate for Index Data)
“an open source Library Services Platform built to support ILS functions and to encourage community innovation”

LSP

  • a platform intended for people to build on – a healthy platform depends on how much people contribute to it, which depends on the platform making this easy
  • made up of services
  • geared towards libraries – patrons, bibliographic records, authority records

Goals

  • create community where libraries can come together to innovate
  • leverage open source to reduce the “free as in kittens” costs
  • improve products by involving libraries more in development
  • bring more choice to libraries – eg multiple circulation apps you can switch between if one doesn’t suit; replace the fines app with a demerits app

Technical stuff:

  • “APIs all the way down”; inspired by microservices so can interface with the core through standard HTTP/REST, JSON/XML, etc; cloud-ready: scalable, ready for deployment on cloud but not bound to a particular vendor. Building with AWS as reference but could be run on Azure, on private VMware, etc.
  • Middleware inspired by the API Gateway pattern. (Core Okapi [this is where the zebras come in: the okapi is in the zebra family] is mostly complete, developers starting to work on functionality.)
  • Multi-tenant capability built-in
  • Vert.x; RESTful style, JSON for data format; request/response pipelines eg first request routed to authentication module then sent to next module; Event Bus that can be exposed with various protocols (eg STOMP, AMQP)
  • Dynamic binding – dependencies are interfaces, not implementations – allows you to replace circ module with another one that respects the same interface

Modules

  • self-contained http services (programming-language agnostic) – small, fast, do one thing very well
  • Okapi gateway requirements – hooks for lifecycle manage, strong REST/JSON preference (some libraries hosting hackathons with their comp.sci. department students)
  • might be grouped into applications (with dependencies) eg cataloguing, circulation

Client-side

  • Stripes – a user interface toolkit to let you quickly build the UIs you need to speak to the backend

Metadata – the FOLIO Codex

  • Takes concepts from FRBR (work, instance, holdings).
  • Format-agnostic (MARC, MODS, DC, whatever): core metadata “enough for other modules to understand”; native metadata “for apps that understand it” (eg circ module needs a title but doesn’t care about all MARC subfields or alternate title or or or…
  • Original format gets derived into FOLIO Codex (with work, instance, holdings) which gets used in modules. Current debate in the community about whether the original format should also be part of the codex.
  • Support multiple bib utilities and knowledge bases. Maintain list of local changes. Automated and semi-automated processes for updating local records with changes from source. “Cataloguing by reference”.

Progress
Timeline: Aug 2016 opened github repositories; Sept 2016 Open Library Foundation created to hold IP but licensed Apache; phase 1 Aug 2016-2018 (availability of FOLIO apps to run library (ILS)) followed by extended apps.
Project plan:

  • 2016 built gateway, sample app, UI toolkit, but also SIGs
  • Jan-Mar 2017 built circ, resource management, user&rights management, but also documenting
  • Apr-Jun 2017 acquisitions, system ops, knowledgebase, and onboarding dev teams
  • Jul-Dec 2017 apps marketplace and certification, discovery integration
  • 2018 vendor services and hosting, implementation, migration, data conversion, support

Websites

Community engagement
Lots happening on Slack channels, many meetups

Governance / lazy consensus
Open Library Foundation > Folio > Folio product council > SIGs > Development

OLF – 501(c)(3) (took a lot of time to get this status as had to prove EBSCO resources it but doesn’t control it) – mission to help libraries develop open stuff to support libraries, research, learning and teaching. Board inc Texas A&M, Duke, California Inst of Tech, EBSCO, JISC, CALIS (China).

Dev cycle:
SIGS >(Design process)> Design Teams >(Requirements process)> Analytics Teams >(Development process)> Dev Teams >(Review & feedback process)> SIGs

SIGs currently on topics like metadata management, resource access, user management, internationalisation

When OLE got libraries to map requirements, got 6000+; went back and said we need to cut this down, so they came back with only 3000+. Processes for FOLIO project to identify which ones needed by July 2018

Dev team – anyone can join in (biweekly check-ins, open toolds with wiki, forums, Slack, GitHub) but takes time/effort to really enjoy and contribute

Lots of other companies build something then demo and ask for feedback – by which time it’s too late to provide really meaningful feedback. FOLIO is getting the feedback during/before the dev process.

Demo
This was on a working FOLIO instance. UI still very(!) sketchy but nav bar along the top with apps, eg users, items, scan. Demo’d searching/filtering users; switching to items and back and the search results still display; search for an item to copy barcode; switch to scan, lookup user, paste in barcode, click ‘checkout’ button, switch back to users and can see user now has book borrowed; switched to items and can see item now checked out.

[My current thoughts: this is clearly not production-ready at present, and even assuming everything stays on track for the rest of phase 1 I wouldn’t consider implementing it in 2018 – but I think it’s worth keeping an eye on. And the open nature of the development makes keeping an eye on its progress easy.

One risk I see in the architecture is that it’d be quite possible for every library to be running a different set of modules which may complicate community troubleshooting. This is by design and also a strength (so public libraries don’t get forced into an academic mode of thinking, or vice versa, or both get forced into some terrible compromise), and the requirement that everything be built around core APIs and data structures probably mitigates much of the mess it could otherwise turn into.

Relatedly, a proliferation of similar but subtly different modules which are each used by only a few libraries could also be a problem. At the moment for example in the user module, the data fields are fixed. If you wanted to add eg preferred language for communications, you’d have to create an entirely new module. But it sounds like there’ll be some work in future to allow a certain amount of customisation so you could still use the same basic module.

I also see a risk in the marketplace potentially getting full of pay-for modules. Hopefully it gets populated with enough free modules to start with to keep things on an even keel – or even tilted towards open as vendors find limited demand for a pay-for module when there are so many free competitors. I could see a freemium model develop… The fact that there are so many libraries and open-friendly organisations involved from the start is promising.]

Getting started with Angular UI development for Dspace #OR2017

by Tim Donohue and Art Lowel; session overview; Wiki with instructions for setup; demo site

[So my experience started off inauspiciously because I have an ancient version of MacOS so installing Node.js and yarn ran into issues with Homebrew and Xcode developer tools and I don’t know what else, but after five and a half hours I got it working. I then left all my browser and Terminal windows open for the next several days on the “if it ain’t broke” principle….]

Angular in DSpace

Angular 4.0 came out March 2017 – straight after conference there’ll be a sprint to get that into DSpace. (Angular tutorial) How Angular works with DSpace:

  • user goes to website
  • server returns first page in pre-compiled html, and javascript
  • user requests data via REST
  • API (could be hosted on different server than Angular) returns JSON data

With Angular Universal (available in Angular 2, packaged in Angular 4), it can still work with a browser that doesn’t have javascript (search engine browser, screen-reader, etc). Essentially if Angular app doesn’t load, your browser requests the page instead of the json, so the server will return the pre-compiled html again.

Caches (API replies and objects) on client-side in your browser so very quick to return to previously seen pages.

Building/running Angular apps

  • node.js – server-side JS platform (can provide pre-compiled html)
  • npm – Node’s package manager (pulls in dependencies from registry)
  • yarn – third-party Node package manager (same config, faster)
  • TypeScript language – extension of ES6 (latest javascript – adds types instead of generic ‘var’) – gets compiled down by Angular to ES5 javascript before it gets sent to the browser

You write angular applications by

  • composing html templates with angularized markup – almost all html is valid; can load other components via their selector; components have their own templates
  • writing component classes to manage those templates – lets you create new html tags that come with their own code and styling; consist of view (template) and controller. Implements interfaces eg onInit; extends another component; has a <selector>; has a constructor defining inputs; has a template. Essentially a component has a class and a template.
  • adding app logic in services – retrieve data for components, or operations to add or modify data – created once, used globally by injecting into component
  • boxing component(s) and optionally service(s) in modules – useful for organising app into blocks of functionality – would use this for supporting 3rd-party DSpace extensions (however business logic would be dealt with in REST API not in the Angular UI)

DSpace-angular folder structure

  • config/
  • resources/ – static files eg i18n, images
  • src/app/ – each feature in its own subfolder
    • .ts – component class
    • .html – template
    • .scss – component style
    • .spec.ts – component specs/test
    • .module.ts – module definition
    • .service.ts – service
  • src/backend/ – mock REST data
  • src/platform/ – root modules for client/server
  • src/styles/ – global stylesheet
  • dist/ – compiled code

Hands-on

[Here we got into the first couple of steps from the Wiki/gitHub project linked from there.]

Session 1 #CAULRD2017 – reports from groups/uni implementations

The CAUL Research Repositories Day programme

COAR presentation by Kathleen Shearer

The scholarly communication field is skewed towards the northern hemisphere, etc – exacerbated by impact factors. Big publishers play the game to make sure we subscribe only to their journals. Open access has arrived – the transition is proving interesting in some cases as organisations try to live up to their open access goals. Germans, Finns and Dutch have all been strong negotiating with Elsevier. Germany’s Elsevier access got cut off – then Elsevier gave them a grace period because they were worried researchers wouldn’t care…

However flipping to APC this doesn’t help the skew [just means the skew is at who can publish rather than at who can read] or long-term sustainability – so another option is to strengthen repositories. MIT Future of Libraries report envision libraries as an “open global platform”. This means a dual mission for repositories: to showcase and provide access (past focus) but now also a node in a global knowledge commons.

Need to create a network supporting global nature of science. Aligning Repository Networks International Accord signed 8 May 2017. To support a distributed system – cross-regional harvesting, interoperability. COAR Controlled Vocabularies v.1.1 coming soon.

Looking at next generation repositories – how do we go beyond just including copies of full-text previously published by Big Publishers? Needs to have distribution of control, be inclusive, be for the public good, intelligently open. And need to think beyond articles: open data, [open code/methods], open notebook science, citizen science. Include services like metrics, comments, peer reviews (not in the repository but in the network layer on top), links between resources, notifications, global sign-on. Need to expose the content, not just the metadata.

“A vision in which institutions, universities and their libraries are the foundational nodes in a global scholarly communication system.”

Q on progress in different regions
A: Europe probably farthest along: South America has a strong network but individual institutions often need support; ŪSA has lots of strong individual repositories but no national network so rather siloed.

[On breaking into group discussions, the New Zealand contingent discussed the possibility of formalising the currently-grassroots New Zealand Institutional Repositories Community to have more impact in this kind of initiative.]

Repository interoperability standards – Australasian Repository Working Group feedback by various

Metadata standards and vocabularies. NISO “Free_to_read” tag and “License_ref” tag (linking to license). Deakin and UNE are using these and Deakin has output both to Trove. rioxx is a metadata profile for the UK to provide guidelines (but there are still people in the UK without standard metadata).

Challenge is lack of national harvester feeding into international harvesters. Harvested by Trove and others but would be good to be linked to an international aggregator and then be able to get data back to enrich our repositories.

Currently we have a well-established community of practice but would be helpful to formalise this as a repository network. Elsewhere there are professional networks; institutional networks; technical service networks. They have governance, terms of reference, membership (either personal or institutional), funding streams (varying models including ad hoc on a project basis). Would allow a defined way to share info between annual events; strength in numbers when talking with government etc; better informed on global developments.

Asked if people present were interested in the group seeking membership of COAR and/or forming a formal group – show of hands showed support for both.

New institutional repository for Curtin by Janice Chan

Wanted sustainability, flexibility, interoperability. Used Atmire for installation and ongoing support (as local IT staff had competing priorities). Decided to delay integration with Elements to take advantage of Repository Tools 2 – so currently doing ad hoc bulk ingest. Don’t use collections for faculty – is in metadata instead so searchable as a facet.

Repository Skill Set Survey 2017 by Natasha Simons and Joanna Richardson

2011 survey was about what training/skills people wanted/needed; advertised via CAIRSS – a response per individual, not per institution. Findings written up as New Roles, New Responsibilities: Examining Training Needs of Repository Staff Reporting and copyright were big themes; time and staffing were common challenges – lots of people only work part-time and there’s a specific skillset. Survey quoted as inspiration for Bepress’s repository manager certification course. Will rerun in November to see how much things have changed, or not….

Research publications workflow at Deakin University by Michelle Watson

Deakin Research Online was created in 2008, using Fez/Fedora – half-half open access/dark archive. Counted as point of truth, and its data is fed into the Research Office. Mostly manual processes:

  • Faculty views each publication and adds HERDC classification
  • Library adds additional metadata, checks copyright/OA, publishes

Okay for 100 records a week. But late 2014 added Elements and backlog increased dramatically because:

  • more records coming in
  • no guidelines to prioritise material
  • now clear owner of workflow
  • same level of checking for all kinds

Researchers unhappy with delays and convoluted workflow process. So working on improvements based on assumptions that Faculty and library vetting is still needed – but non-reportable outputs don’t need the same level of analysis. Value of repository is to preserve research, make it discoverable, and make it openly accessible (to increase citation rates).

New “smart ingest” approach where no additional checking is needed if outputs meet certain criteria – run reports which faculty download and filter so they can confidently assign as C1. Want to work with Symplectic to automate this (eg using the API to add the C1 tag).

Elements is now the source of truth, and feeds data to Research Master and staff profiles. Have developed guidelines, procedures, revamped wiki space to provide guidance to researchers – especially for inducting new researchers but also downloadable infographs to provide an at-a-glance overview.

UWA Research repository: how collaboration contributed to the development of a CRIS by Kate Croker

Looked into whether interdepartmental collaboration helped – discovered it was crucial. Interviewed key participants. Findings:

  • built/cemented relationships
  • collaboration improved the product – influenced a change in direction
  • facilitated better understanding of other sections’ work
  • changed views on issues for researchers and potential for research
  • collaborating early clarifies business requirements better
  • builds a shared vision – asked interviewees what was next and they all answered similarly