Scholarly workflows #or2017


Supporting Tools in Institutional Repositories as Part of the Research Data Lifecycle by Malcolm Wolski, Joanna Richardson

Have been working on research data management in context of the whole research data lifecycle. Started asking question: once research data management is under control, what will be the next focus? Their answer was research tools. Produced two journal articles:

  • Wolski, M., Howard, L., & Richardson, J. (2017). The importance of tools in the data lifecycle. Digital Library Perspectives, 33(3), in press
  • Wolski, M., Howard, L., & Richardson, J. (2017). A trust framework for online research data services. Publications, 5(2), article 14

Research life cycle: Data creation and deposit (plan and design, collect and capture) -> Managing active data (Collect and capture, collaborate and analyse) -> Data repositories and archives (manage, store, preserve; share and publish) -> Data catalogues and registries

Research data repositories vary a lot. Collection or ecosystem? Open or closed? End point or part of workflow? Why is it hard to build them? Push-and-pull between re-usability and preservation:

  • technical aspects
  • interoperability
  • lega/regulatory/ethical constraints
  • one-off activity or continuous
  • diversity of accessibility issues
  • diversity of re-usability issues

The average number of research tools per person was 22 per person (includes Word, ResearchGate, email through to SurveyMonkey, Dropbox, Figshare, through to R and really specialised ones). Kramer and Bosman (2016) divided tools into assessment, outreach, publication, writing, analysis, discovery, preparation phases. Tools exploding as research activity scales up, collaboration increases. Large-capacity projects being funded. Data science courses upskilling researchers.

Researchers use lots of tools as part of the data workflow. The institution may manage data, but have no ownership of workflow. Since data has to move seamlessly between tools, interoperability is key – but how do we built these interoperable workflows and infrastructures?

Need to remember repository is only part of the research ecosystem. Need to take an institutional approach – or approaches rather than a single design solution. Look at main workflows and tools used – check out research communities who may already have the solutions – focus must be meeting the researchers’ needs.

Q: Will we see researchers use fewer tools as disciplinary workflows develop?A: Probably not but will see more integration between them eg Qualtrics adding an R connector.

Research Offices As Vital Factors In The Implementation Of Research Data Management Strategies by Reingis Hauck

Have a full-text repository on DSpace, building data repository on CKAN. What if we build something (at great expense) and they don’t come? We need cultural change. Eg UK seems far ahead but only 16% of respondents are accessing university RDM support services in 2016.

They have data repository, and provide support service by research office, library and IT services.

Research offices provides support in grant writing; advocates on policies; helps with internal research funding; report to senior leadership. Their toolkit:

  • need to win research managers over – explain how important it is
  • embedded an RDM-expert
  • upskilled research office staff about data management planning and how to make a case for data management.

Look out for game changers:

  • eg large collaborative research projects – produce lots of data and need to share it to be successful so more likely to listen
  • DMP preview as standard procedure for proposal review and training on proposal writing. (Want data management planning to be like brushing your teeth: you do it every day and if you forget you can’t sleep.)
  • adapt incentives – eg internal funding for early career researchers requires data management plans
  • use existing networks – researchers go to lots of boards and meetings already so feed this as a topic like any other topic
  • engage with members of DFG[German science foundation] review board – to get them to draw up criteria to reward researchers doing it

Cultural change towards open science can be supported by your research office. Let’s team up more!

Towards Researcher Participation in Research Information Management Systems by Dong Joon Lee, Besiki Stvilia, Shuheng Wu

RIMS – include ResearchGate, Academia, Google Scholar; ORCID, ImpactStory; PURE, Elements

ResearchGate sends out a flood of emails – good for some, a put-off for others. How can we improve our RIMS to improve researcher engagement?

Interviewed 15 researchers then expanded to survey 412 participants; also analysed metadata on 126 ResearchGate profiles of participants. Preliminary findings:

  • Variety of different researcher activities in RIMS eg write manuscripts, interact with peers, curate, evaluate, look for jobs, monitor literature, identify  collaborators, disseminate research, find relevant literature.
  • Different levels of participation: readers may have a profile but don’t maintain it or interact with people; record managers maintain their profile, but don’t interact with others; community members maintain profiles but also interact with others etc.
  • Different motivations to maintain profile: to share scholarship (most popular); improve status, enjoyment, support evaluation, quality of recommendations, external pressure (least popular)
  • Different use of metadata categories: people tend to use the person, publication, and research subject catories. Maybe research experience, but rarely education, award, teaching experience, other other.
    • In Person most people put in first, last name, affiliation, dept;
    • Publication: Most use most of these except only 30% of readers share the file – about 80% of record managers and community member

Want to develop design recommendations to enable RIMS to increase participation.

Research and non-publications repositories, Open Science #or2017


OpenAIRE-Connect: Open Science as a Service for repositories and research communities by Paolo Manghi, Pedro Principe, Anthony Ross-Hellauer, Natalia Manola

Project 2017-19 with 11 partners (technical, research communities, content providers) to extend technological services and networking bridges – creating open science services and building communities. Want to support reuse/reproducibility and transparent evaluation around research literature and research data, during the scientific process and in publishing artefacts and packages of artefacts.

Barriers – repositories lack support (eg integration, links between repositories). OpenAIRE want to facilitate new vision so providing “Open Science as a Service” – research community dashboard with variety of functions and catch-all broker service.

RDM skills training at the University of Oslo by Elin Stangeland

Researchers using random storage solutions and don’t really know what they’re doing. Need to improve their skills. Have been setting up training for various groups in organisation. Software Carpentry for young researchers to make their work more productive and reliable. 2-day workshops which are discipline-specific and well-attended. Now running their own instructor training which allows expanding service. Author carpentry, data carpentry, etc.

Training for research support staff who are first port of call on data management plans, data protection, basic data management. Recently made mandatory by Dept of GeoSciences to attend DMP training.

Expanding library carpentry to national level.

IIIF Community Activities and Open Invitation by Sheila Rabun

Global community that develops shared APIs for web-based image delivery; implements that in software; to expose interoperabie image content.

Many image repositories are effectively silos. IIIF APIs allows a layer that lets servers talk to each other and allow easier management and better functionality for end-users. Lots of image servers and clients around now so you can mix-and-match your front and back-ends. Can have deep zoom; compare images and more.

Everything created by global community so always looking for more participants. Community groups, technical specification groups eg extending to AV resources, discovery, text granularity (in text annotations). Also a consortium to provide leadership and communication channels.

Data Management and Archival Needs of the Patagonian Right Whale Program Data by Harish Maringanti, Daureen Nesdill, Victoria Rowntree

Importance of curating legacy datasets. World’s longest continuous study of large whale species: 47 years and counting of data. Two problems:

  • to identify whales – found the callosities of right whales were unique (number, position, shape) and pattern remained same despite slight change over time. So can take aerial photos when they surface. Data analysed with complicated computer system and compared with existing photos.
  • to gather data over a period of times – where to find whales regularly. Discovered whales gather in three places: 1) mothers and calves; 2) males and females; 3) free-for-all.

Collection has tens of thousands b&w negatives; color slides; analysis notebooks; field notes; Access 1996 database records; sightings maps.

Challenges: heterogeneity of data; metadata – including how much can be displayed publically; outdated databases.

Why should libraries care? We can provide continuity beyond life of individual researchers. Legacy data is as important as current data in biodiversity type fields and generally isn’t digitised yet.

Repository driven by the data journal: real practices from China Scientific Data by Lili Zhang, Jianhui Li, Yanfei Hou

China Scientific Data is a multidisciplinary journal publishing data papers – raw data and derived datasets. Submission (of paper and dataset), review (paper review and curation check), peer review, editorial voting.

How to publish:

  • Massive data? – on-demand sample data publication: can’t publish the whole set, but publishes a sample (typical, minimum sized) to announce the dataset’s existence
  • Complex data? – publish data and supplementary materials together eg background info, software, vocabulary, code, etc. Eg selected font collections for minority languages
  • Dynamic data? – eg when updating with new data using same methodology and data quality control. Could publish as new paper but it’s duplicative so published instead as another version with same DOI. Can be good for your citations!

Encourage authors to store data in their repository so its long-term availability is more reliable.

RDM and the IR: Don’t Reuse and Recycle – Reimplement by Dermot Frost, Rebecca Grant

We all have IRs and they’re designed for PDF publications. Research Data Management is largely driven by funder mandates; some disciplines are very good at it, some less so (eg historians claiming “I have no data” – having just finished a large project including land ownership registries from 17th century, georectified etc!)

FAIR (findable, accessible, interoperable, reusable) data concept (primarily machine-oriented ie findable by machines). IRs can’t do this well enough. Technically uploading a zip file is FAIR but time-costly to user.

Instead should find a domain-specific repository (and read the terms and conditions carefully especially around preservation!) Or implement your own institutional data repository (but different scale of data storage can take serious engineering efforts). Follow the Research Data Alliance.

Developing a university wide integrated Data Management Planning system by Rebecca Deuble, Andrew Janke, Helen Morgan, Nigel Ward

Need to help researcher across the life-cycle. UofQueensland identifying opportunity to support researchers around funding/journal requirements. Used DMP Online but poor uptake due to lack of mandate. UQ Research Data Manager system:

  • Record developed by research – active record (not plan though includes project info) which can change over course of project. Simple dynamic form, tailored to researchers, with guidance for each field.
  • Storage auto-allocated by storage providers for working data – given a mapped drive accessible by national collaborators (hopefully international soon) using code provided in completing the form.
  • [Working on this part] Publish and manage selected data to managed collection (UQ eSpace). Currently manual process filling in form with metadata fields in eSpace. Potential to transfer metadata from RDM system to eSpace.
  • Developing procedures to support the system.

Benefits include uni oversight of research in progress, researcher-centric, improves impact/citation, provides access of data to public.

Preserving and reusing high-energy-physics data analyses by Sünje Dallmeier-Tiessen, Robin Lynnette Dasler, Pamfilos Fokianos, Jiří Kunčar, Artemis Lavasa, Annemarie Mattmann, Diego Rodríguez Rodríguez, Tibor Šimko, Anna Trzcinska, Ioannis Tsanaktsidis

Data very valuable – data published even 15 years after funding stopped, and although always building new and bigger colliders, data is still relevant even decades after collected.

Projects involve 3000 people, including high turnover of young researchers. CERN need to capture everything need to understand and rerun an analysis years later – data, software, environment, workflow, context, documentation.

  • Invenio (JSON schema with lots of domain-specific fields) to describe analysis
  • Capture and preserve analysis elements
  • Reusing – need to reinstantiate the environment and execute the analysis on the cloud.

REANA = REusable ANAlyses supports collaboration and multiple scenarios.

Perverse incentives and the reward structures of academia #or2017

Perverse incentives: how the reward structures of academia are getting in the way of scholarly communication and good science
by Sir Timothy Gowers


The internet has been widely used for the last 20 years and has revolutionized many aspects of our lives. It has been particularly useful for academics, allowing them to interact and exchange ideas far more rapidly and conveniently than they could in the past. However, much of the way that science proceeds has been affected far less by this development than one might have expected, and the basic method of communication of ideas — the journal article — is not much different from how it was in the seventeenth century.

It is easy to imagine new and better methods of dissemination, so what is stopping them from changing the way scientists communicate? Why has the journal system proved to be far more robust than, say, the music industry, in the face of the new methods of sharing information?

The dream that all information is available on-tap, accessible through a few clicks. We’ve got a bit of that via Google, Wikipedia, YouTube, map sites, travel sites, news sites. But a lot of content is for subscribers only: you can find content in Google Books – but only a page before cut off due to copyright.

Of course copyright holders need incentives to create, to cover costs, etc. Academics aren’t directly paid though – actually the barriers to content are the bigger problem. Covering costs? maybe “if you insist on antiquated methods of publication”.

What could we share? The not-yet-complete idea, that others can build on. OTOH if everyone shared everything they thought it could end up a complete mess. But we can make order from chaos.

In maths the revolution has started: his library transitioning from painfully closed stacks to open stacks coincided unfortunately with not even needing to go to the library for content anyway. Wikipedia for basic concepts; for preprints (don’t need journals at all in maths); OEIS (database of sequences of whole numbers along with formulae for generating them); MathOverflow – for questions at the research level – usually get a useful answer within a few hours.

Traditional way of doing things is the “lone genius” model. But thought it’d be interesting to solve a problem in the public. So posted some initial thoughts on his blog and invited contribution. Traditionally there’s a fear of getting scooped – but doing it completely in the open, timestamps mean no-one can take credit; in fact it rewards putting your comment up quickly before someone else can. Problem was solved in just 6 weeks.

Perverse incentives in maths:

  • personal ambition
  • reward for being first (not for being inspiration, or for being second but with a better solution)
  • primacy of journal article while expository and enabling activities are downplayed – when you start writing textbooks instead of journal articles this is seen as your career slowing down
  • little recognition for incomplete ideas

These are obstacles to efficiency.

Paradox of paywalls: mathematicians write, peer review, edit; dissemination costs almost nothing; almost all interesting recent content is on arXiv (which can include final accepted manuscript – and anyway it’s not much different from the preprint); and still libraries pay huge subscription fees. The problem is the internet came along very quickly while we’re still doing things the old way.

Some initiatives:

  • his blog post about personal Elsevier boycott which inspired someone to set up a pledge which thousands signed
  • Open Library of Humanities set up when  Journal Lingua left Elsevier and became Glossa
  • Discrete Analysis (arXiv overlay journal) set up as proof of concept for cheap journal publication, with US$10 submission charge – and a nice user interface
  • No.Big.Deal – trying without success to get Cambridge and JISC to bargain better
  • Freedom of Information Act requests to UK universities for how much unis are paying Elsevier (contra confidentiality clauses)

Perverse incentives are held up by the whole network of publishers, editors, writers, readers (subdivides into people actually reading it, and people scanning it to judge the writer eg hiring committees), librarians (who have the power to cancel – but subject to academic criticism), scholarly societies (who often derive income from publishing journals), consortium negotiators, funders (in a good position to create mandates) – creating a situation where it’s very difficult to change things.

Feels like has had little effect, but it’s important to have lots of little initiatives which together build to pull the wall down.

Electronic Poster Display #or2017

There were lots of fantastic posters, these are just the ones I wanted to refer back to as they sparked thoughts I want to followup on. In no particular order:

  • Governmental Educational Repository in Health – they have 7000+ open access learning objects. [We have 175. Which isn’t nothing. But actually what I’m still mostly interested in is whether anyone’s ever going to develop an aggregator for OA learning objects….]

  • Extending the value of the institutional repository with metrics integration – they’ve got individual researcher profiles showing metrics. [We’ve got some of this in Elements. To get the rest though would require coding, and dealing with authentication to keep it private to the researcher. I recently wrote an authentication module for a hand-coded php/sql app using EZproxy which I could adapt to something like this – or any other homegrown personalisation effort.]

  • COAR Resource Type Controlled Vocabulary: Dspace Prototype implementation – [I saw (and gave feedback on) a draft of this a while back; should have a look at the latest version (v1.1) and check how it maps (or doesn’t) to PBRF types]

  • The PLACE Toolkit: exposing geospatial ready digital collections – [what value would there be, in our own collections, of adding time/location metadata to content to enable eg map/timeline exploration? (and therefore would it outweigh the cost?)]

  • COR(E)CID: Analysing the use of unique author identifiers in repositories via CORE to support the uptake of ORCID iDs – this gives repositories a dashboard to check how many ORCIDs are in their repository. [I wasn’t clear though on whether it’s available for public use or requires a sign-up. Further investigation shows the CORE Repository Dashboard does require registration and is specifically for repositories submitting data to CORE, which makes sense.]

  • International Image Interoperability Framework (IIIF) – [this is beyond my expertise but I want to check it’s on the radar of our non-research-output repository vendor]

  • If you digitise them they will come: creating a discoverable and accessible thesis collection – U of Tasmania made their theses open access retrospectively if at least 10years old, with a disclaimer. [I’ve heard of a number of universities doing similarly; we’ve been more conservative, only making them available to staff and students unless we can secure permission. I’d like to push for the more open model.]

  • Strategies for increasing the amount of open access content in your repository – one tip they suggest is to set up a ScienceDirect email alert for ‘accepted manuscripts’ at your institution. When you get the email, download it immediately before it gets replaced by the ScienceDirect-branded ‘in press’ version. [This. Is. Genius.]

  • Enabling collaborative review with the DSpace configurable workflow – [I did some javascript hacking of the workflow, to sort the items by age and allow other sorting, but there’s very limited information still.] This poster shows improvements like displaying extra metadata fields (eg item type – author/publisher/year might be useful for us), adding statuses (eg questions for the researcher), and adding other notes. [This is Relevant To Our Interests.]



hosted by EBSCO; summary at Eventbrite. Disclaimers: there was a free lunch; I love open access; I’m appropriately suspicious of vendors and vapourware; and (I didn’t think this would be relevant before attending, but…) I like zebras.

FOLIO is “a community collaboration to develop an open source Library Services Platform (LSP) designed for innovation”.

Introduction from EBSCO

  • Vendors – Ebsco, ByWater, SirsiDynix
  • ‘Open’ orgs – Koha, Index Data, Open Library Environment
  • Universities – Cornell, University of Sydney, Aberdeen, Glasgow, Newcastle, Università di Roma, National Széchényi


  • Will support ILS functions but broader – a ‘library services platform’ [à la Alma etc]
  • Each function as its own app – so can create completely new apps eg data mining, IR integration, learning management, research data, predictive analytics, grant management]


  • Apps from around the world built by commercial vendors who may charge, and by libraries who probably won’t. Can buy professional services.

Introduction from Peter Murray (open source community advocate for Index Data)
“an open source Library Services Platform built to support ILS functions and to encourage community innovation”


  • a platform intended for people to build on – a healthy platform depends on how much people contribute to it, which depends on the platform making this easy
  • made up of services
  • geared towards libraries – patrons, bibliographic records, authority records


  • create community where libraries can come together to innovate
  • leverage open source to reduce the “free as in kittens” costs
  • improve products by involving libraries more in development
  • bring more choice to libraries – eg multiple circulation apps you can switch between if one doesn’t suit; replace the fines app with a demerits app

Technical stuff:

  • “APIs all the way down”; inspired by microservices so can interface with the core through standard HTTP/REST, JSON/XML, etc; cloud-ready: scalable, ready for deployment on cloud but not bound to a particular vendor. Building with AWS as reference but could be run on Azure, on private VMware, etc.
  • Middleware inspired by the API Gateway pattern. (Core Okapi [this is where the zebras come in: the okapi is in the zebra family] is mostly complete, developers starting to work on functionality.)
  • Multi-tenant capability built-in
  • Vert.x; RESTful style, JSON for data format; request/response pipelines eg first request routed to authentication module then sent to next module; Event Bus that can be exposed with various protocols (eg STOMP, AMQP)
  • Dynamic binding – dependencies are interfaces, not implementations – allows you to replace circ module with another one that respects the same interface


  • self-contained http services (programming-language agnostic) – small, fast, do one thing very well
  • Okapi gateway requirements – hooks for lifecycle manage, strong REST/JSON preference (some libraries hosting hackathons with their comp.sci. department students)
  • might be grouped into applications (with dependencies) eg cataloguing, circulation


  • Stripes – a user interface toolkit to let you quickly build the UIs you need to speak to the backend

Metadata – the FOLIO Codex

  • Takes concepts from FRBR (work, instance, holdings).
  • Format-agnostic (MARC, MODS, DC, whatever): core metadata “enough for other modules to understand”; native metadata “for apps that understand it” (eg circ module needs a title but doesn’t care about all MARC subfields or alternate title or or or…
  • Original format gets derived into FOLIO Codex (with work, instance, holdings) which gets used in modules. Current debate in the community about whether the original format should also be part of the codex.
  • Support multiple bib utilities and knowledge bases. Maintain list of local changes. Automated and semi-automated processes for updating local records with changes from source. “Cataloguing by reference”.

Timeline: Aug 2016 opened github repositories; Sept 2016 Open Library Foundation created to hold IP but licensed Apache; phase 1 Aug 2016-2018 (availability of FOLIO apps to run library (ILS)) followed by extended apps.
Project plan:

  • 2016 built gateway, sample app, UI toolkit, but also SIGs
  • Jan-Mar 2017 built circ, resource management, user&rights management, but also documenting
  • Apr-Jun 2017 acquisitions, system ops, knowledgebase, and onboarding dev teams
  • Jul-Dec 2017 apps marketplace and certification, discovery integration
  • 2018 vendor services and hosting, implementation, migration, data conversion, support


Community engagement
Lots happening on Slack channels, many meetups

Governance / lazy consensus
Open Library Foundation > Folio > Folio product council > SIGs > Development

OLF – 501(c)(3) (took a lot of time to get this status as had to prove EBSCO resources it but doesn’t control it) – mission to help libraries develop open stuff to support libraries, research, learning and teaching. Board inc Texas A&M, Duke, California Inst of Tech, EBSCO, JISC, CALIS (China).

Dev cycle:
SIGS >(Design process)> Design Teams >(Requirements process)> Analytics Teams >(Development process)> Dev Teams >(Review & feedback process)> SIGs

SIGs currently on topics like metadata management, resource access, user management, internationalisation

When OLE got libraries to map requirements, got 6000+; went back and said we need to cut this down, so they came back with only 3000+. Processes for FOLIO project to identify which ones needed by July 2018

Dev team – anyone can join in (biweekly check-ins, open toolds with wiki, forums, Slack, GitHub) but takes time/effort to really enjoy and contribute

Lots of other companies build something then demo and ask for feedback – by which time it’s too late to provide really meaningful feedback. FOLIO is getting the feedback during/before the dev process.

This was on a working FOLIO instance. UI still very(!) sketchy but nav bar along the top with apps, eg users, items, scan. Demo’d searching/filtering users; switching to items and back and the search results still display; search for an item to copy barcode; switch to scan, lookup user, paste in barcode, click ‘checkout’ button, switch back to users and can see user now has book borrowed; switched to items and can see item now checked out.

[My current thoughts: this is clearly not production-ready at present, and even assuming everything stays on track for the rest of phase 1 I wouldn’t consider implementing it in 2018 – but I think it’s worth keeping an eye on. And the open nature of the development makes keeping an eye on its progress easy.

One risk I see in the architecture is that it’d be quite possible for every library to be running a different set of modules which may complicate community troubleshooting. This is by design and also a strength (so public libraries don’t get forced into an academic mode of thinking, or vice versa, or both get forced into some terrible compromise), and the requirement that everything be built around core APIs and data structures probably mitigates much of the mess it could otherwise turn into.

Relatedly, a proliferation of similar but subtly different modules which are each used by only a few libraries could also be a problem. At the moment for example in the user module, the data fields are fixed. If you wanted to add eg preferred language for communications, you’d have to create an entirely new module. But it sounds like there’ll be some work in future to allow a certain amount of customisation so you could still use the same basic module.

I also see a risk in the marketplace potentially getting full of pay-for modules. Hopefully it gets populated with enough free modules to start with to keep things on an even keel – or even tilted towards open as vendors find limited demand for a pay-for module when there are so many free competitors. I could see a freemium model develop… The fact that there are so many libraries and open-friendly organisations involved from the start is promising.]

Getting started with Angular UI development for Dspace #OR2017

by Tim Donohue and Art Lowel; session overview; Wiki with instructions for setup; demo site

[So my experience started off inauspiciously because I have an ancient version of MacOS so installing Node.js and yarn ran into issues with Homebrew and Xcode developer tools and I don’t know what else, but after five and a half hours I got it working. I then left all my browser and Terminal windows open for the next several days on the “if it ain’t broke” principle….]

Angular in DSpace

Angular 4.0 came out March 2017 – straight after conference there’ll be a sprint to get that into DSpace. (Angular tutorial) How Angular works with DSpace:

  • user goes to website
  • server returns first page in pre-compiled html, and javascript
  • user requests data via REST
  • API (could be hosted on different server than Angular) returns JSON data

With Angular Universal (available in Angular 2, packaged in Angular 4), it can still work with a browser that doesn’t have javascript (search engine browser, screen-reader, etc). Essentially if Angular app doesn’t load, your browser requests the page instead of the json, so the server will return the pre-compiled html again.

Caches (API replies and objects) on client-side in your browser so very quick to return to previously seen pages.

Building/running Angular apps

  • node.js – server-side JS platform (can provide pre-compiled html)
  • npm – Node’s package manager (pulls in dependencies from registry)
  • yarn – third-party Node package manager (same config, faster)
  • TypeScript language – extension of ES6 (latest javascript – adds types instead of generic ‘var’) – gets compiled down by Angular to ES5 javascript before it gets sent to the browser

You write angular applications by

  • composing html templates with angularized markup – almost all html is valid; can load other components via their selector; components have their own templates
  • writing component classes to manage those templates – lets you create new html tags that come with their own code and styling; consist of view (template) and controller. Implements interfaces eg onInit; extends another component; has a <selector>; has a constructor defining inputs; has a template. Essentially a component has a class and a template.
  • adding app logic in services – retrieve data for components, or operations to add or modify data – created once, used globally by injecting into component
  • boxing component(s) and optionally service(s) in modules – useful for organising app into blocks of functionality – would use this for supporting 3rd-party DSpace extensions (however business logic would be dealt with in REST API not in the Angular UI)

DSpace-angular folder structure

  • config/
  • resources/ – static files eg i18n, images
  • src/app/ – each feature in its own subfolder
    • .ts – component class
    • .html – template
    • .scss – component style
    • .spec.ts – component specs/test
    • .module.ts – module definition
    • .service.ts – service
  • src/backend/ – mock REST data
  • src/platform/ – root modules for client/server
  • src/styles/ – global stylesheet
  • dist/ – compiled code


[Here we got into the first couple of steps from the Wiki/gitHub project linked from there.]

Session 1 #CAULRD2017 – reports from groups/uni implementations

The CAUL Research Repositories Day programme

COAR presentation by Kathleen Shearer

The scholarly communication field is skewed towards the northern hemisphere, etc – exacerbated by impact factors. Big publishers play the game to make sure we subscribe only to their journals. Open access has arrived – the transition is proving interesting in some cases as organisations try to live up to their open access goals. Germans, Finns and Dutch have all been strong negotiating with Elsevier. Germany’s Elsevier access got cut off – then Elsevier gave them a grace period because they were worried researchers wouldn’t care…

However flipping to APC this doesn’t help the skew [just means the skew is at who can publish rather than at who can read] or long-term sustainability – so another option is to strengthen repositories. MIT Future of Libraries report envision libraries as an “open global platform”. This means a dual mission for repositories: to showcase and provide access (past focus) but now also a node in a global knowledge commons.

Need to create a network supporting global nature of science. Aligning Repository Networks International Accord signed 8 May 2017. To support a distributed system – cross-regional harvesting, interoperability. COAR Controlled Vocabularies v.1.1 coming soon.

Looking at next generation repositories – how do we go beyond just including copies of full-text previously published by Big Publishers? Needs to have distribution of control, be inclusive, be for the public good, intelligently open. And need to think beyond articles: open data, [open code/methods], open notebook science, citizen science. Include services like metrics, comments, peer reviews (not in the repository but in the network layer on top), links between resources, notifications, global sign-on. Need to expose the content, not just the metadata.

“A vision in which institutions, universities and their libraries are the foundational nodes in a global scholarly communication system.”

Q on progress in different regions
A: Europe probably farthest along: South America has a strong network but individual institutions often need support; ŪSA has lots of strong individual repositories but no national network so rather siloed.

[On breaking into group discussions, the New Zealand contingent discussed the possibility of formalising the currently-grassroots New Zealand Institutional Repositories Community to have more impact in this kind of initiative.]

Repository interoperability standards – Australasian Repository Working Group feedback by various

Metadata standards and vocabularies. NISO “Free_to_read” tag and “License_ref” tag (linking to license). Deakin and UNE are using these and Deakin has output both to Trove. rioxx is a metadata profile for the UK to provide guidelines (but there are still people in the UK without standard metadata).

Challenge is lack of national harvester feeding into international harvesters. Harvested by Trove and others but would be good to be linked to an international aggregator and then be able to get data back to enrich our repositories.

Currently we have a well-established community of practice but would be helpful to formalise this as a repository network. Elsewhere there are professional networks; institutional networks; technical service networks. They have governance, terms of reference, membership (either personal or institutional), funding streams (varying models including ad hoc on a project basis). Would allow a defined way to share info between annual events; strength in numbers when talking with government etc; better informed on global developments.

Asked if people present were interested in the group seeking membership of COAR and/or forming a formal group – show of hands showed support for both.

New institutional repository for Curtin by Janice Chan

Wanted sustainability, flexibility, interoperability. Used Atmire for installation and ongoing support (as local IT staff had competing priorities). Decided to delay integration with Elements to take advantage of Repository Tools 2 – so currently doing ad hoc bulk ingest. Don’t use collections for faculty – is in metadata instead so searchable as a facet.

Repository Skill Set Survey 2017 by Natasha Simons and Joanna Richardson

2011 survey was about what training/skills people wanted/needed; advertised via CAIRSS – a response per individual, not per institution. Findings written up as New Roles, New Responsibilities: Examining Training Needs of Repository Staff Reporting and copyright were big themes; time and staffing were common challenges – lots of people only work part-time and there’s a specific skillset. Survey quoted as inspiration for Bepress’s repository manager certification course. Will rerun in November to see how much things have changed, or not….

Research publications workflow at Deakin University by Michelle Watson

Deakin Research Online was created in 2008, using Fez/Fedora – half-half open access/dark archive. Counted as point of truth, and its data is fed into the Research Office. Mostly manual processes:

  • Faculty views each publication and adds HERDC classification
  • Library adds additional metadata, checks copyright/OA, publishes

Okay for 100 records a week. But late 2014 added Elements and backlog increased dramatically because:

  • more records coming in
  • no guidelines to prioritise material
  • now clear owner of workflow
  • same level of checking for all kinds

Researchers unhappy with delays and convoluted workflow process. So working on improvements based on assumptions that Faculty and library vetting is still needed – but non-reportable outputs don’t need the same level of analysis. Value of repository is to preserve research, make it discoverable, and make it openly accessible (to increase citation rates).

New “smart ingest” approach where no additional checking is needed if outputs meet certain criteria – run reports which faculty download and filter so they can confidently assign as C1. Want to work with Symplectic to automate this (eg using the API to add the C1 tag).

Elements is now the source of truth, and feeds data to Research Master and staff profiles. Have developed guidelines, procedures, revamped wiki space to provide guidance to researchers – especially for inducting new researchers but also downloadable infographs to provide an at-a-glance overview.

UWA Research repository: how collaboration contributed to the development of a CRIS by Kate Croker

Looked into whether interdepartmental collaboration helped – discovered it was crucial. Interviewed key participants. Findings:

  • built/cemented relationships
  • collaboration improved the product – influenced a change in direction
  • facilitated better understanding of other sections’ work
  • changed views on issues for researchers and potential for research
  • collaborating early clarifies business requirements better
  • builds a shared vision – asked interviewees what was next and they all answered similarly

Dataset published on access to conference proceedings – thank you!

Thanks to all who’ve helped —

(Andrea, apm, Catherine Fitchett, Sarah Gallagher, Alison Fields, KNB, Manja Pieters, Brendan Smith, Dave, Hadrian Taylor, Theresa Rielly, Jacinta Osman, Poppa-Bear, Richard White, Sierra de la Croix, Christina Pikas, Jo Simons, and Ruth Lewis, plus some anonymous benefactors)

— all the conferences I was investigating have been investigated. 🙂  I’ve since checked everything for consistency and link rot, added in a set of references that I had to research myself as I couldn’t anonymise them sufficiently in the initial run; deduplicated a few more times – conference names vary ridiculously – and finally ended up with a total of 1849 conferences which I’ve now published at

The immediately obvious stats from this dataset include:

Access to proceedings

  • 23.36% of conferences in the dataset had some form of free online proceedings – full-text papers, slides, or audiovisual recordings.
  • 21.85% had a non-free online proceedings
  • 30.72% had a physical proceedings available – printed book, CD/DVD, USB stick, etc, but not including generic references to proceedings having been given to delegates
  • 45.27% had no proceedings identifiable

(Percentages don’t add to 100% as some conferences had proceedings in multiple forms.)

Access to free online proceedings by year

This doesn’t seem to have varied much over the 6 years most of the conferences took place in:

2006: 39 / 173 = 22.54%
2007: 39 / 177 = 22.03%
2008: 62 / 258 = 24.03%
2009: 63 / 284 = 22.18%
2010: 105 / 428 = 24.53%
2011: 123 / 520 = 23.65%

Conferences attended by country

Conferences attended were in 75 different countries, including those with more than 20 conferences:

New Zealand: 429
USA: 297
Australia: 286
UK: 130
Canada: 67
China: 66
Germany: 44
France: 41
Italy: 35
Portugal: 31
Japan: 29
Spain: 28
Netherlands: 27
Singapore: 25

I won’t break down access to proceedings here, because this data is inherently skewed by the nature of the sample: conferences attended by New Zealand researchers. This means that small conferences in or near New Zealand are much more likely to be included than small conferences in other parts of the world. If a small conference is less resourced to put together and maintain a free online proceedings – or conversely a large society conference is prone to more traditional (non-free) publication options – this variation by conference size/type could easily outweigh any actual variation by country. So I need to do some thinking and discussing with people to see if there’s any actual meaning that can be pulled from the data as it stands. If you’ve got any thoughts on this I’d love to hear from you!

Further analysis now continues….

Progress report on how you’ve helped my research

At this point at least 20 people have helped me look for conference proceedings (some haven’t left a name so it’s somewhere between 20 and 42), which is awesome: thank you all so much! Last week saw us pass the halfway mark, an exciting moment. As of this morning, statistics are:

  • 1187 out of 1958 conferences investigated = 59% done
  • 312 have proceedings free online (26%)
  • of those without free proceedings, 292 have non-free proceedings online
  • of those without any online proceedings, 109 have physical proceedings (especially books or CDs)
  • 472 have no identifiable proceedings (40%)

I’ve got locations for all 1958, pending some checking. Remember this is out of conferences that New Zealand researchers presented at and nominated for their 2012 PBRF portfolio.

The top countries are:
New Zealand    492
Australia    315
USA    304
UK    133
Canada    69
(with China close behind at 68)

In New Zealand, top cities are predictably:
Auckland    154
Wellington    98
Christchurch    53
Dunedin    38
Hamilton    35

Along the way I’ve noticed some things that make the search harder:

  • sometimes authors, or the people verifying their sources, made mistakes in the citation
  • or sometimes people cited the proceedings instead of the conference itself – this isn’t a mistake in the context of the original data entry but makes reconciling the year and the city difficult.
  • or sometimes their citation was perfectly clear, but my attempt to extract the data into tidy columns introduced… misunderstandings (aka terrible, terrible mistakes).
  • or we’ve ended up searching for the same conference a whole pile of times because various people call it the Annual Conference of X, the Annual X Conference, the X Annual Conference, the International Conference of X, the Annual Meeting of X, etc etc.

On the other hand I’ve also noticed some things that make the search easier – either for me:

  • having done so many, I’m starting to recognise titles, so I can search the spreadsheet and often copy/paste a line
  • when all else fails I have access to the source data, so I can look up the title of the paper if I need to figure out whether I’m trying to find the 2008 or 2009 conference.

And things that could be generally helpful:

  • if a conference makes any mention of ACM, whether in the title or as a sponsor, then chances are the proceedings are listed in
  • if it mentions IEEE, try  If it’s there, then on the page for the appropriate year, scroll down and look on the right for the “Purchase print from partner” link – chances are you’ll get a page with an ISBN for the print option; plus confirming the location which is harder to find on IEEEXplore itself.
  • if it’s about computer science in any way, shape or form, then can probably point you to the source(s). This is the best way to find anything published as a Lecture Notes in Computer Science (LNCS) because Springer’s site doesn’t search for conferences very well.
  • if you do a web search and see a search result for, this will confirm the year/title/location of a conference, and give you an event website (which may or may not still be around, but it’s a start). Unfortunately I haven’t found a way to search the site directly for past conferences.
  • a search result for WorldCat will usually confirm year/title/location and (if you scroll down past the holding libraries) often give you the ISBN for the print proceedings.

And two things that have delighted me:

  • Finding some online proceedings in the form of a page listing all the papers’ DOIs – which resolve to the papers on Dropbox.
  • Two of the conferences in the dataset have no identifiable city/country – because they were held entirely online.

I I am of course still eagerly soliciting help, if anyone has 10 minutes here or there over the next month (take a break from the silly season? 🙂  Check out my original post for more, or jump straight to the spreadsheet.