Tag Archives: institutional repositories

Session 1 #CAULRD2017 – reports from groups/uni implementations

The CAUL Research Repositories Day programme

COAR presentation by Kathleen Shearer

The scholarly communication field is skewed towards the northern hemisphere, etc – exacerbated by impact factors. Big publishers play the game to make sure we subscribe only to their journals. Open access has arrived – the transition is proving interesting in some cases as organisations try to live up to their open access goals. Germans, Finns and Dutch have all been strong negotiating with Elsevier. Germany’s Elsevier access got cut off – then Elsevier gave them a grace period because they were worried researchers wouldn’t care…

However flipping to APC this doesn’t help the skew [just means the skew is at who can publish rather than at who can read] or long-term sustainability – so another option is to strengthen repositories. MIT Future of Libraries report envision libraries as an “open global platform”. This means a dual mission for repositories: to showcase and provide access (past focus) but now also a node in a global knowledge commons.

Need to create a network supporting global nature of science. Aligning Repository Networks International Accord signed 8 May 2017. To support a distributed system – cross-regional harvesting, interoperability. COAR Controlled Vocabularies v.1.1 coming soon.

Looking at next generation repositories – how do we go beyond just including copies of full-text previously published by Big Publishers? Needs to have distribution of control, be inclusive, be for the public good, intelligently open. And need to think beyond articles: open data, [open code/methods], open notebook science, citizen science. Include services like metrics, comments, peer reviews (not in the repository but in the network layer on top), links between resources, notifications, global sign-on. Need to expose the content, not just the metadata.

“A vision in which institutions, universities and their libraries are the foundational nodes in a global scholarly communication system.”

Q on progress in different regions
A: Europe probably farthest along: South America has a strong network but individual institutions often need support; ŪSA has lots of strong individual repositories but no national network so rather siloed.

[On breaking into group discussions, the New Zealand contingent discussed the possibility of formalising the currently-grassroots New Zealand Institutional Repositories Community to have more impact in this kind of initiative.]

Repository interoperability standards – Australasian Repository Working Group feedback by various

Metadata standards and vocabularies. NISO “Free_to_read” tag and “License_ref” tag (linking to license). Deakin and UNE are using these and Deakin has output both to Trove. rioxx is a metadata profile for the UK to provide guidelines (but there are still people in the UK without standard metadata).

Challenge is lack of national harvester feeding into international harvesters. Harvested by Trove and others but would be good to be linked to an international aggregator and then be able to get data back to enrich our repositories.

Currently we have a well-established community of practice but would be helpful to formalise this as a repository network. Elsewhere there are professional networks; institutional networks; technical service networks. They have governance, terms of reference, membership (either personal or institutional), funding streams (varying models including ad hoc on a project basis). Would allow a defined way to share info between annual events; strength in numbers when talking with government etc; better informed on global developments.

Asked if people present were interested in the group seeking membership of COAR and/or forming a formal group – show of hands showed support for both.

New institutional repository for Curtin by Janice Chan

Wanted sustainability, flexibility, interoperability. Used Atmire for installation and ongoing support (as local IT staff had competing priorities). Decided to delay integration with Elements to take advantage of Repository Tools 2 – so currently doing ad hoc bulk ingest. Don’t use collections for faculty – is in metadata instead so searchable as a facet.

Repository Skill Set Survey 2017 by Natasha Simons and Joanna Richardson

2011 survey was about what training/skills people wanted/needed; advertised via CAIRSS – a response per individual, not per institution. Findings written up as New Roles, New Responsibilities: Examining Training Needs of Repository Staff Reporting and copyright were big themes; time and staffing were common challenges – lots of people only work part-time and there’s a specific skillset. Survey quoted as inspiration for Bepress’s repository manager certification course. Will rerun in November to see how much things have changed, or not….

Research publications workflow at Deakin University by Michelle Watson

Deakin Research Online was created in 2008, using Fez/Fedora – half-half open access/dark archive. Counted as point of truth, and its data is fed into the Research Office. Mostly manual processes:

  • Faculty views each publication and adds HERDC classification
  • Library adds additional metadata, checks copyright/OA, publishes

Okay for 100 records a week. But late 2014 added Elements and backlog increased dramatically because:

  • more records coming in
  • no guidelines to prioritise material
  • now clear owner of workflow
  • same level of checking for all kinds

Researchers unhappy with delays and convoluted workflow process. So working on improvements based on assumptions that Faculty and library vetting is still needed – but non-reportable outputs don’t need the same level of analysis. Value of repository is to preserve research, make it discoverable, and make it openly accessible (to increase citation rates).

New “smart ingest” approach where no additional checking is needed if outputs meet certain criteria – run reports which faculty download and filter so they can confidently assign as C1. Want to work with Symplectic to automate this (eg using the API to add the C1 tag).

Elements is now the source of truth, and feeds data to Research Master and staff profiles. Have developed guidelines, procedures, revamped wiki space to provide guidance to researchers – especially for inducting new researchers but also downloadable infographs to provide an at-a-glance overview.

UWA Research repository: how collaboration contributed to the development of a CRIS by Kate Croker

Looked into whether interdepartmental collaboration helped – discovered it was crucial. Interviewed key participants. Findings:

  • built/cemented relationships
  • collaboration improved the product – influenced a change in direction
  • facilitated better understanding of other sections’ work
  • changed views on issues for researchers and potential for research
  • collaborating early clarifies business requirements better
  • builds a shared vision – asked interviewees what was next and they all answered similarly

Institutional repositories for data?

Via my Twitter feed:

University researcher sites lack of “institutional repositories” where data can be published as a reason more data isn’t online. #nethui

— Jonathan Brewer (@kiwibrew) July 8, 2013

(And discussion ensuing.)

I’m not an expert in data management. A year ago it was top of my list of Things That Are Clearly Very Important But Also Extremely Scary, Can Someone Else Please Handle It? But then I got a cool job which includes (among other things) investigating what this data management stuff is all about, so I set about investigating.

Sometime in the last half year I dropped the assumption that we needed to be working towards an institutional data repository. In fact, I now believe we need to be working away from that idea. Instead, I think we should be encouraging researchers to deposit their datasets in the discipline-specific (or generalist) data repositories that already exist.

I have a number of reasons for this:

  • My colleague and I, with a certain amount of outsourcing, already have to run a catalogue, the whole rickety edifice of databases and federated searching and link resolving and proxy authentication, library website and social media account, institutional repository, community archive, open journal system, etc etc. Do we look like we need another system to maintain?
  • An institutional archive is great kind of serviceable for pdfs. But datasets come in xls, csv, txt, doc, html, xml, mp3, mp4, and a thousand more formats, no, I’m not exaggerating. They can be maps, interviews, 3D models, spectral images, anything. They can be a few kilobytes or a few petabytes. Yeah, you can throw this stuff into DSpace, but that doesn’t mean you should. That’s like throwing your textbooks, volumes of abstracts, Kindles, Betamax, newspapers, murals, jigsaw puzzles, mustard seeds, and Broadway musicals (not a recording, the actual theatre performance) onto a single shelf in a locked glass display cabinet and making people browse by the spine labels.
  • If you want a system that can do justice to the variety of datasets out there, you’d better have the resources of UC3 or DCC or Australia or PRISM. Because you’re either going to have to build it or you’re going to have to pay someone to build it, and then you’re going to have to maintain it. And you’re going to have to pay for storage and you’re going to have to run checksums for data integrity and you’re going to have to think about migrating the datasets as time marches on and people forget what the current shiny formats are. And you’re going to have to wonder if and how Google Scholar indexes it (and hope Google Scholar lasts longer than Google Reader did) or no-one will ever find it. And a whole lot more else.
  • If anything’s in it. Do you know how hard it is to get researchers to put their conference papers into institutional repositories? My own brother flatly refuses. He points out that his papers are already available via his discipline’s open access repository. That’s where people in his discipline will look for it. It’s indexed by Google. Why put it anywhere else? I conceded the point for the sake of our family dinner, and I haven’t brought it up again because on reflection he’s right. (He’s ten years younger than me; he has no business being right, dammit.) And because it’s hard enough to get researchers to put their conference papers into institutional repositories even when their copy is the only one in existence.
  • Do you know how hard it is to convince most researchers that they should put their datasets anywhere online other than a private Dropbox account? (Shameless plug: Last week another colleague and I did a talk responding to 8 ‘myths’ or reasons why many researchers hesitate – slides and semi-transcript here. That’s summarised from a list we made of 23 reasons, and other people have come up with more objections and necessary counters.) The lack of an institutional repository for data doesn’t even rate.

No, forget creating institutional data repositories. What we need to be doing is getting familiar with the discipline data repositories and data practices that already exist, so when we talk to a researcher we can say “Look at what other researchers in your discipline are doing!”

This makes it way easier to prove that this data publishing thing isn’t just for Those Other Disciplines, and that there are ways for them to deal with [confidentiality|IP issues|credibility|credit]. And it makes sure the dataset is where other researchers in that discipline are searching for it. And it makes sure the datasets are deposited according to that discipline’s standards and that discipline’s needs, not according to the standards and needs of whoever was foremost in mind of the developer who created the generic institutional data repository – so the search interface will be more likely to work reasonable for that discipline. And it means the types of data will be at least a little more homogenous (in some cases a lot more) so there’s more potential for someone to do cool stuff with linked open data.

And it means we can focus on what we do best, which is helping people find and search and understand and use and cite and publish these resources. Trust me, there is plenty more to do in data management than just setting up an institutional data repository.