Tag Archives: institutional repositories

Open discussion on ORCID by Liz Krznarich

Identifier, registry, set of standard procedures for connecting researchers to affiliations and activities – to simplify reporting and analysis. What’s new?

collect and connect program to enable two-way syncing between ORCID apps
API v2.0 – simpler, faster, scalable
institutional sign-in via eduGAIN – NZ Tuakiri eduGAIN membership in progress
NZ ORCID Hub to launch this month – will enable inviting researchers to connect/create ORCID; collect and store; add authoritative affiliations – being made open source and will continue to develop the platform. Test site

What’s next?

Continued focus on automation, interoperability, getting more (trustworthy) data in. 2016 focused on works; 2017 on peer review, api, affiliations; 2018 funding
online tools to manage membership, api
new training materials for researchers
printable record view
ID widget for personal websites
2-factor authentication option

Usage@Deakin by Bernadette Houghton

Tips for using Omeka

create your own theme plugin, but minimise use of other plugins – stick to those available on omeka.org
3rd-party tools – externally hosted (eg Timelines, Tag Clouds) vs locally hosted (eg pdf.js)

ISO 16363 for self-assessment of repositories by Bernadette Houghton

Deakin did a self-assessment using ISO 16363 Another tool would be fine – but review the criteria at the start and recognise its conceptual nature. You’ll want to prefer local knowledge over ISO suggested documentation. And be ready to allocate resources to address identified areas of improvement. [This is perhaps the most challenging part!]

Comparing Apples with Apples: A repository output health check by Julia Hickie

All Australian unis send theses to Trove; most send research outputs; some include cultural or course materials. Ran out of funding last year; 9 months later got new funding but coming back to a large backlog have fresh eyes on the metadata issues.

Identifiers: these are proliferating – being implemented in repositories everywhere, but huge variations in what’s actually getting sent to Trove, eg ORCIDs almost invisible – less than 1% of records include an ORCID. [At Lincoln we’ve avoided putting these in our OAI feed to avoid cluttering up harvesters’ author listings as we haven’t found any best practice for what metadata field to include it in – may need to revisit this.] Grant IDs; DOIs (have been various forms of this recommended at different times/by different people – CrossRef currently recommends https://doi.org/10.1234/asdfb and Trove prefers a url).

Standards: everyone sends some form of Dublin Core. Some NISO Access and License Recommended Practice (2015). Creative Commons licences – helps people find reusable material.

Standards checkup:

Output a link to your repository in the dc.identifier field
ORCID in dc.relation (as full url)
grant identifiers in dc.relation (as full url)
DOI in either dc.relation or dc.identifier (as full url)

Also check:

how are you doing open access indicators? eg <free_to_read/>
creative commons licenses in full url form in dc.rights or ali.license_ref
rights statements in full url form in dic.rights or ali.license_ref

Considering NTROs in a new repository infrastructure presented by by Robin Burgess (lead investigator Marissa Cassin)

Non-traditional research outputs have been completely separate from the repository – they wanted to fix this. Types include original creative works (mostly visual arts, then musical, then textual, then others); live performances; recorded/rendered; exhibitions/events; research reports for an external body.

Researcher concerns around copyright and time, but also saw value. Keen on metrics (example of Altmetrics widget display in Primo brief record), can show relationships between researchers, provide context for output, etc

Need a clear interface, feed into reporting tool as well as open access repository. Need a flexible metadata schema. Need ability to create metadata-only entries and to upload multiple large files. Also interested in linking to external profiles and pull in various metrics.

Supporting peer review of creative works by Kate Sergeant and Avonne Newton

Creative works as the ‘problem child’ of repositories. Instead of getting the output itself you might get a photo plus a form and photocopy of a programme. This didn’t meet ERA standards for research statements, metadata, or evidence requirements. In 2015 responsibility moved to the library and they could make changes. New process:

tailored, tiered submission forms
metadata and evidence entered by researchers and processed by library staff (in Alma)
flows through to staff activity reports
NTRO working group assesses
library updates source data once review completed

This has enabled more timely and consistent reporting, providing a better foundation for ERA submission. Next steps:

continuous process improvement
Alma Digital migration – getting appropriate people access to dark archive content
evolving requirements

Session 1 #CAULRD2017 – reports from groups/uni implementations

Leave a reply

The CAUL Research Repositories Day programme

COAR presentation by Kathleen Shearer

The scholarly communication field is skewed towards the northern hemisphere, etc – exacerbated by impact factors. Big publishers play the game to make sure we subscribe only to their journals. Open access has arrived – the transition is proving interesting in some cases as organisations try to live up to their open access goals. Germans, Finns and Dutch have all been strong negotiating with Elsevier. Germany’s Elsevier access got cut off – then Elsevier gave them a grace period because they were worried researchers wouldn’t care…

However flipping to APC this doesn’t help the skew [just means the skew is at who can publish rather than at who can read] or long-term sustainability – so another option is to strengthen repositories. MIT Future of Libraries report envision libraries as an “open global platform”. This means a dual mission for repositories: to showcase and provide access (past focus) but now also a node in a global knowledge commons.

Need to create a network supporting global nature of science. Aligning Repository Networks International Accord signed 8 May 2017. To support a distributed system – cross-regional harvesting, interoperability. COAR Controlled Vocabularies v.1.1 coming soon.

Looking at next generation repositories – how do we go beyond just including copies of full-text previously published by Big Publishers? Needs to have distribution of control, be inclusive, be for the public good, intelligently open. And need to think beyond articles: open data, [open code/methods], open notebook science, citizen science. Include services like metrics, comments, peer reviews (not in the repository but in the network layer on top), links between resources, notifications, global sign-on. Need to expose the content, not just the metadata.

“A vision in which institutions, universities and their libraries are the foundational nodes in a global scholarly communication system.”

Q on progress in different regions
A: Europe probably farthest along: South America has a strong network but individual institutions often need support; ŪSA has lots of strong individual repositories but no national network so rather siloed.

[On breaking into group discussions, the New Zealand contingent discussed the possibility of formalising the currently-grassroots New Zealand Institutional Repositories Community to have more impact in this kind of initiative.]

Repository interoperability standards – Australasian Repository Working Group feedback by various

Metadata standards and vocabularies. NISO “Free_to_read” tag and “License_ref” tag (linking to license). Deakin and UNE are using these and Deakin has output both to Trove. rioxx is a metadata profile for the UK to provide guidelines (but there are still people in the UK without standard metadata).

Challenge is lack of national harvester feeding into international harvesters. Harvested by Trove and others but would be good to be linked to an international aggregator and then be able to get data back to enrich our repositories.

Currently we have a well-established community of practice but would be helpful to formalise this as a repository network. Elsewhere there are professional networks; institutional networks; technical service networks. They have governance, terms of reference, membership (either personal or institutional), funding streams (varying models including ad hoc on a project basis). Would allow a defined way to share info between annual events; strength in numbers when talking with government etc; better informed on global developments.

Asked if people present were interested in the group seeking membership of COAR and/or forming a formal group – show of hands showed support for both.

New institutional repository for Curtin by Janice Chan

Wanted sustainability, flexibility, interoperability. Used Atmire for installation and ongoing support (as local IT staff had competing priorities). Decided to delay integration with Elements to take advantage of Repository Tools 2 – so currently doing ad hoc bulk ingest. Don’t use collections for faculty – is in metadata instead so searchable as a facet.

Repository Skill Set Survey 2017 by Natasha Simons and Joanna Richardson

2011 survey was about what training/skills people wanted/needed; advertised via CAIRSS – a response per individual, not per institution. Findings written up as New Roles, New Responsibilities: Examining Training Needs of Repository Staff Reporting and copyright were big themes; time and staffing were common challenges – lots of people only work part-time and there’s a specific skillset. Survey quoted as inspiration for Bepress’s repository manager certification course. Will rerun in November to see how much things have changed, or not….

Research publications workflow at Deakin University by Michelle Watson

Deakin Research Online was created in 2008, using Fez/Fedora – half-half open access/dark archive. Counted as point of truth, and its data is fed into the Research Office. Mostly manual processes:

Faculty views each publication and adds HERDC classification
Library adds additional metadata, checks copyright/OA, publishes

Okay for 100 records a week. But late 2014 added Elements and backlog increased dramatically because:

more records coming in
no guidelines to prioritise material
now clear owner of workflow
same level of checking for all kinds

Researchers unhappy with delays and convoluted workflow process. So working on improvements based on assumptions that Faculty and library vetting is still needed – but non-reportable outputs don’t need the same level of analysis. Value of repository is to preserve research, make it discoverable, and make it openly accessible (to increase citation rates).

New “smart ingest” approach where no additional checking is needed if outputs meet certain criteria – run reports which faculty download and filter so they can confidently assign as C1. Want to work with Symplectic to automate this (eg using the API to add the C1 tag).

Elements is now the source of truth, and feeds data to Research Master and staff profiles. Have developed guidelines, procedures, revamped wiki space to provide guidance to researchers – especially for inducting new researchers but also downloadable infographs to provide an at-a-glance overview.

UWA Research repository: how collaboration contributed to the development of a CRIS by Kate Croker

Looked into whether interdepartmental collaboration helped – discovered it was crucial. Interviewed key participants. Findings:

built/cemented relationships
collaboration improved the product – influenced a change in direction
facilitated better understanding of other sections’ work
changed views on issues for researchers and potential for research
collaborating early clarifies business requirements better
builds a shared vision – asked interviewees what was next and they all answered similarly

Institutional repositories for data?

2 Replies

Via my Twitter feed:

University researcher sites lack of “institutional repositories” where data can be published as a reason more data isn’t online. #nethui

— Jonathan Brewer (@kiwibrew) July 8, 2013

(And discussion ensuing.)

I’m not an expert in data management. A year ago it was top of my list of Things That Are Clearly Very Important But Also Extremely Scary, Can Someone Else Please Handle It? But then I got a cool job which includes (among other things) investigating what this data management stuff is all about, so I set about investigating.

Sometime in the last half year I dropped the assumption that we needed to be working towards an institutional data repository. In fact, I now believe we need to be working away from that idea. Instead, I think we should be encouraging researchers to deposit their datasets in the discipline-specific (or generalist) data repositories that already exist.

I have a number of reasons for this:

My colleague and I, with a certain amount of outsourcing, already have to run a catalogue, the whole rickety edifice of databases and federated searching and link resolving and proxy authentication, library website and social media account, institutional repository, community archive, open journal system, etc etc. Do we look like we need another system to maintain?
An institutional archive is ~~great~~ kind of serviceable for pdfs. But datasets come in xls, csv, txt, doc, html, xml, mp3, mp4, and a thousand more formats, no, I’m not exaggerating. They can be maps, interviews, 3D models, spectral images, anything. They can be a few kilobytes or a few petabytes. Yeah, you can throw this stuff into DSpace, but that doesn’t mean you should. That’s like throwing your textbooks, volumes of abstracts, Kindles, Betamax, newspapers, murals, jigsaw puzzles, mustard seeds, and Broadway musicals (not a recording, the actual theatre performance) onto a single shelf in a locked glass display cabinet and making people browse by the spine labels.
If you want a system that can do justice to the variety of datasets out there, you’d better have the resources of UC3 or DCC or Australia or PRISM. Because you’re either going to have to build it or you’re going to have to pay someone to build it, and then you’re going to have to maintain it. And you’re going to have to pay for storage and you’re going to have to run checksums for data integrity and you’re going to have to think about migrating the datasets as time marches on and people forget what the current shiny formats are. And you’re going to have to wonder if and how Google Scholar indexes it (and hope Google Scholar lasts longer than Google Reader did) or no-one will ever find it. And a whole lot more else.
If anything’s in it. Do you know how hard it is to get researchers to put their conference papers into institutional repositories? My own brother flatly refuses. He points out that his papers are already available via his discipline’s open access repository. That’s where people in his discipline will look for it. It’s indexed by Google. Why put it anywhere else? I conceded the point for the sake of our family dinner, and I haven’t brought it up again because on reflection he’s right. (He’s ten years younger than me; he has no business being right, dammit.) And because it’s hard enough to get researchers to put their conference papers into institutional repositories even when their copy is the only one in existence.
Do you know how hard it is to convince most researchers that they should put their datasets anywhere online other than a private Dropbox account? (Shameless plug: Last week another colleague and I did a talk responding to 8 ‘myths’ or reasons why many researchers hesitate – slides and semi-transcript here. That’s summarised from a list we made of 23 reasons, and other people have come up with more objections and necessary counters.) The lack of an institutional repository for data doesn’t even rate.

No, forget creating institutional data repositories. What we need to be doing is getting familiar with the discipline data repositories and data practices that already exist, so when we talk to a researcher we can say “Look at what other researchers in your discipline are doing!”

This makes it way easier to prove that this data publishing thing isn’t just for Those Other Disciplines, and that there are ways for them to deal with [confidentiality|IP issues|credibility|credit]. And it makes sure the dataset is where other researchers in that discipline are searching for it. And it makes sure the datasets are deposited according to that discipline’s standards and that discipline’s needs, not according to the standards and needs of whoever was foremost in mind of the developer who created the generic institutional data repository – so the search interface will be more likely to work reasonable for that discipline. And it means the types of data will be at least a little more homogenous (in some cases a lot more) so there’s more potential for someone to do cool stuff with linked open data.

And it means we can focus on what we do best, which is helping people find and search and understand and use and cite and publish these resources. Trust me, there is plenty more to do in data management than just setting up an institutional data repository.

Deborah Fitchett

Tag Archives: institutional repositories

Session 2 #CAULRD2017 – metadata and standards

Session 1 #CAULRD2017 – reports from groups/uni implementations

Institutional repositories for data?