Tag Archives: or2017

Scholarly workflows #or2017

Supporting Tools in Institutional Repositories as Part of the Research Data Lifecycle by Malcolm Wolski, Joanna Richardson

Have been working on research data management in context of the whole research data lifecycle. Started asking question: once research data management is under control, what will be the next focus? Their answer was research tools. Produced two journal articles:

Wolski, M., Howard, L., & Richardson, J. (2017). The importance of tools in the data lifecycle. Digital Library Perspectives, 33(3), in press
Wolski, M., Howard, L., & Richardson, J. (2017). A trust framework for online research data services. Publications, 5(2), article 14 https://doi.org/10.3390/publications5020014

Research life cycle: Data creation and deposit (plan and design, collect and capture) -> Managing active data (Collect and capture, collaborate and analyse) -> Data repositories and archives (manage, store, preserve; share and publish) -> Data catalogues and registries

Research data repositories vary a lot. Collection or ecosystem? Open or closed? End point or part of workflow? Why is it hard to build them? Push-and-pull between re-usability and preservation:

technical aspects
interoperability
lega/regulatory/ethical constraints
one-off activity or continuous
diversity of accessibility issues
diversity of re-usability issues

The average number of research tools per person was 22 per person (includes Word, ResearchGate, email through to SurveyMonkey, Dropbox, Figshare, through to R and really specialised ones). Kramer and Bosman (2016) divided tools into assessment, outreach, publication, writing, analysis, discovery, preparation phases. Tools exploding as research activity scales up, collaboration increases. Large-capacity projects being funded. Data science courses upskilling researchers.

Researchers use lots of tools as part of the data workflow. The institution may manage data, but have no ownership of workflow. Since data has to move seamlessly between tools, interoperability is key – but how do we built these interoperable workflows and infrastructures?

Need to remember repository is only part of the research ecosystem. Need to take an institutional approach – or approaches rather than a single design solution. Look at main workflows and tools used – check out research communities who may already have the solutions – focus must be meeting the researchers’ needs.

Q: Will we see researchers use fewer tools as disciplinary workflows develop?A: Probably not but will see more integration between them eg Qualtrics adding an R connector.

Research Offices As Vital Factors In The Implementation Of Research Data Management Strategies by Reingis Hauck

Have a full-text repository on DSpace, building data repository on CKAN. What if we build something (at great expense) and they don’t come? We need cultural change. Eg UK seems far ahead but only 16% of respondents are accessing university RDM support services in 2016.

They have data repository, and provide support service by research office, library and IT services.

Research offices provides support in grant writing; advocates on policies; helps with internal research funding; report to senior leadership. Their toolkit:

need to win research managers over – explain how important it is
embedded an RDM-expert
upskilled research office staff about data management planning and how to make a case for data management.

Look out for game changers:

eg large collaborative research projects – produce lots of data and need to share it to be successful so more likely to listen
DMP preview as standard procedure for proposal review and training on proposal writing. (Want data management planning to be like brushing your teeth: you do it every day and if you forget you can’t sleep.)
adapt incentives – eg internal funding for early career researchers requires data management plans
use existing networks – researchers go to lots of boards and meetings already so feed this as a topic like any other topic
engage with members of DFG[German science foundation] review board – to get them to draw up criteria to reward researchers doing it

Cultural change towards open science can be supported by your research office. Let’s team up more!

Towards Researcher Participation in Research Information Management Systems by Dong Joon Lee, Besiki Stvilia, Shuheng Wu

RIMS – include ResearchGate, Academia, Google Scholar; ORCID, ImpactStory; PURE, Elements

ResearchGate sends out a flood of emails – good for some, a put-off for others. How can we improve our RIMS to improve researcher engagement?

Interviewed 15 researchers then expanded to survey 412 participants; also analysed metadata on 126 ResearchGate profiles of participants. Preliminary findings:

Variety of different researcher activities in RIMS eg write manuscripts, interact with peers, curate, evaluate, look for jobs, monitor literature, identify collaborators, disseminate research, find relevant literature.
Different levels of participation: readers may have a profile but don’t maintain it or interact with people; record managers maintain their profile, but don’t interact with others; community members maintain profiles but also interact with others etc.
Different motivations to maintain profile: to share scholarship (most popular); improve status, enjoyment, support evaluation, quality of recommendations, external pressure (least popular)
Different use of metadata categories: people tend to use the person, publication, and research subject catories. Maybe research experience, but rarely education, award, teaching experience, other other.
- In Person most people put in first, last name, affiliation, dept;
- Publication: Most use most of these except only 30% of readers share the file – about 80% of record managers and community member

Want to develop design recommendations to enable RIMS to increase participation.

Research and non-publications repositories, Open Science #or2017

Leave a reply

Abstracts

OpenAIRE-Connect: Open Science as a Service for repositories and research communities by Paolo Manghi, Pedro Principe, Anthony Ross-Hellauer, Natalia Manola

Project 2017-19 with 11 partners (technical, research communities, content providers) to extend technological services and networking bridges – creating open science services and building communities. Want to support reuse/reproducibility and transparent evaluation around research literature and research data, during the scientific process and in publishing artefacts and packages of artefacts.

Barriers – repositories lack support (eg integration, links between repositories). OpenAIRE want to facilitate new vision so providing “Open Science as a Service” – research community dashboard with variety of functions and catch-all broker service.

RDM skills training at the University of Oslo by Elin Stangeland

Researchers using random storage solutions and don’t really know what they’re doing. Need to improve their skills. Have been setting up training for various groups in organisation. Software Carpentry for young researchers to make their work more productive and reliable. 2-day workshops which are discipline-specific and well-attended. Now running their own instructor training which allows expanding service. Author carpentry, data carpentry, etc.

Training for research support staff who are first port of call on data management plans, data protection, basic data management. Recently made mandatory by Dept of GeoSciences to attend DMP training.

Expanding library carpentry to national level.

IIIF Community Activities and Open Invitation by Sheila Rabun

Global community that develops shared APIs for web-based image delivery; implements that in software; to expose interoperabie image content.

Many image repositories are effectively silos. IIIF APIs allows a layer that lets servers talk to each other and allow easier management and better functionality for end-users. Lots of image servers and clients around now so you can mix-and-match your front and back-ends. Can have deep zoom; compare images and more.

Everything created by global community so always looking for more participants. Community groups, technical specification groups eg extending to AV resources, discovery, text granularity (in text annotations). Also a consortium to provide leadership and communication channels.

Data Management and Archival Needs of the Patagonian Right Whale Program Data by Harish Maringanti, Daureen Nesdill, Victoria Rowntree

Importance of curating legacy datasets. World’s longest continuous study of large whale species: 47 years and counting of data. Two problems:

to identify whales – found the callosities of right whales were unique (number, position, shape) and pattern remained same despite slight change over time. So can take aerial photos when they surface. Data analysed with complicated computer system and compared with existing photos.
to gather data over a period of times – where to find whales regularly. Discovered whales gather in three places: 1) mothers and calves; 2) males and females; 3) free-for-all.

Collection has tens of thousands b&w negatives; color slides; analysis notebooks; field notes; Access 1996 database records; sightings maps.

Challenges: heterogeneity of data; metadata – including how much can be displayed publically; outdated databases.

Why should libraries care? We can provide continuity beyond life of individual researchers. Legacy data is as important as current data in biodiversity type fields and generally isn’t digitised yet.

Repository driven by the data journal: real practices from China Scientific Data by Lili Zhang, Jianhui Li, Yanfei Hou

China Scientific Data is a multidisciplinary journal publishing data papers – raw data and derived datasets. Submission (of paper and dataset), review (paper review and curation check), peer review, editorial voting.

How to publish:

Massive data? – on-demand sample data publication: can’t publish the whole set, but publishes a sample (typical, minimum sized) to announce the dataset’s existence
Complex data? – publish data and supplementary materials together eg background info, software, vocabulary, code, etc. Eg selected font collections for minority languages
Dynamic data? – eg when updating with new data using same methodology and data quality control. Could publish as new paper but it’s duplicative so published instead as another version with same DOI. Can be good for your citations!

Encourage authors to store data in their repository so its long-term availability is more reliable.

RDM and the IR: Don’t Reuse and Recycle – Reimplement by Dermot Frost, Rebecca Grant

We all have IRs and they’re designed for PDF publications. Research Data Management is largely driven by funder mandates; some disciplines are very good at it, some less so (eg historians claiming “I have no data” – having just finished a large project including land ownership registries from 17th century, georectified etc!)

FAIR (findable, accessible, interoperable, reusable) data concept (primarily machine-oriented ie findable by machines). IRs can’t do this well enough. Technically uploading a zip file is FAIR but time-costly to user.

Instead should find a domain-specific repository (and read the terms and conditions carefully especially around preservation!) Or implement your own institutional data repository (but different scale of data storage can take serious engineering efforts). Follow the Research Data Alliance.

Developing a university wide integrated Data Management Planning system by Rebecca Deuble, Andrew Janke, Helen Morgan, Nigel Ward

Need to help researcher across the life-cycle. UofQueensland identifying opportunity to support researchers around funding/journal requirements. Used DMP Online but poor uptake due to lack of mandate. UQ Research Data Manager system:

Record developed by research – active record (not plan though includes project info) which can change over course of project. Simple dynamic form, tailored to researchers, with guidance for each field.
Storage auto-allocated by storage providers for working data – given a mapped drive accessible by national collaborators (hopefully international soon) using code provided in completing the form.
[Working on this part] Publish and manage selected data to managed collection (UQ eSpace). Currently manual process filling in form with metadata fields in eSpace. Potential to transfer metadata from RDM system to eSpace.
Developing procedures to support the system.

Benefits include uni oversight of research in progress, researcher-centric, improves impact/citation, provides access of data to public.

Preserving and reusing high-energy-physics data analyses by Sünje Dallmeier-Tiessen, Robin Lynnette Dasler, Pamfilos Fokianos, Jiří Kunčar, Artemis Lavasa, Annemarie Mattmann, Diego Rodríguez Rodríguez, Tibor Šimko, Anna Trzcinska, Ioannis Tsanaktsidis

Data very valuable – data published even 15 years after funding stopped, and although always building new and bigger colliders, data is still relevant even decades after collected.

Projects involve 3000 people, including high turnover of young researchers. CERN need to capture everything need to understand and rerun an analysis years later – data, software, environment, workflow, context, documentation.

Invenio (JSON schema with lots of domain-specific fields) to describe analysis
Capture and preserve analysis elements
Reusing – need to reinstantiate the environment and execute the analysis on the cloud.

REANA = REusable ANAlyses supports collaboration and multiple scenarios.

Perverse incentives and the reward structures of academia #or2017

Leave a reply

Perverse incentives: how the reward structures of academia are getting in the way of scholarly communication and good science
by Sir Timothy Gowers

Abstract:

The internet has been widely used for the last 20 years and has revolutionized many aspects of our lives. It has been particularly useful for academics, allowing them to interact and exchange ideas far more rapidly and conveniently than they could in the past. However, much of the way that science proceeds has been affected far less by this development than one might have expected, and the basic method of communication of ideas — the journal article — is not much different from how it was in the seventeenth century.

It is easy to imagine new and better methods of dissemination, so what is stopping them from changing the way scientists communicate? Why has the journal system proved to be far more robust than, say, the music industry, in the face of the new methods of sharing information?

The dream that all information is available on-tap, accessible through a few clicks. We’ve got a bit of that via Google, Wikipedia, YouTube, map sites, travel sites, news sites. But a lot of content is for subscribers only: you can find content in Google Books – but only a page before cut off due to copyright.

Of course copyright holders need incentives to create, to cover costs, etc. Academics aren’t directly paid though – actually the barriers to content are the bigger problem. Covering costs? maybe “if you insist on antiquated methods of publication”.

What could we share? The not-yet-complete idea, that others can build on. OTOH if everyone shared everything they thought it could end up a complete mess. But we can make order from chaos.

In maths the revolution has started: his library transitioning from painfully closed stacks to open stacks coincided unfortunately with not even needing to go to the library for content anyway. Wikipedia for basic concepts; arXiv.org for preprints (don’t need journals at all in maths); OEIS (database of sequences of whole numbers along with formulae for generating them); MathOverflow – for questions at the research level – usually get a useful answer within a few hours.

Traditional way of doing things is the “lone genius” model. But thought it’d be interesting to solve a problem in the public. So posted some initial thoughts on his blog and invited contribution. Traditionally there’s a fear of getting scooped – but doing it completely in the open, timestamps mean no-one can take credit; in fact it rewards putting your comment up quickly before someone else can. Problem was solved in just 6 weeks.

Perverse incentives in maths:

personal ambition
reward for being first (not for being inspiration, or for being second but with a better solution)
primacy of journal article while expository and enabling activities are downplayed – when you start writing textbooks instead of journal articles this is seen as your career slowing down
little recognition for incomplete ideas

These are obstacles to efficiency.

Paradox of paywalls: mathematicians write, peer review, edit; dissemination costs almost nothing; almost all interesting recent content is on arXiv (which can include final accepted manuscript – and anyway it’s not much different from the preprint); and still libraries pay huge subscription fees. The problem is the internet came along very quickly while we’re still doing things the old way.

Some initiatives:

his blog post about personal Elsevier boycott which inspired someone to set up a pledge which thousands signed
Open Library of Humanities set up when Journal Lingua left Elsevier and became Glossa
Discrete Analysis (arXiv overlay journal) set up as proof of concept for cheap journal publication, with US$10 submission charge – and a nice user interface
No.Big.Deal – trying without success to get Cambridge and JISC to bargain better
Freedom of Information Act requests to UK universities for how much unis are paying Elsevier (contra confidentiality clauses)

Perverse incentives are held up by the whole network of publishers, editors, writers, readers (subdivides into people actually reading it, and people scanning it to judge the writer eg hiring committees), librarians (who have the power to cancel – but subject to academic criticism), scholarly societies (who often derive income from publishing journals), consortium negotiators, funders (in a good position to create mandates) – creating a situation where it’s very difficult to change things.

Feels like has had little effect, but it’s important to have lots of little initiatives which together build to pull the wall down.

Electronic Poster Display #or2017

Leave a reply

There were lots of fantastic posters, these are just the ones I wanted to refer back to as they sparked thoughts I want to followup on. In no particular order:

Governmental Educational Repository in Health – they have 7000+ open access learning objects. [We have 175. Which isn’t nothing. But actually what I’m still mostly interested in is whether anyone’s ever going to develop an aggregator for OA learning objects….]
Extending the value of the institutional repository with metrics integration – they’ve got individual researcher profiles showing metrics. [We’ve got some of this in Elements. To get the rest though would require coding, and dealing with authentication to keep it private to the researcher. I recently wrote an authentication module for a hand-coded php/sql app using EZproxy which I could adapt to something like this – or any other homegrown personalisation effort.]
COAR Resource Type Controlled Vocabulary: Dspace Prototype implementation – [I saw (and gave feedback on) a draft of this a while back; should have a look at the latest version (v1.1) and check how it maps (or doesn’t) to PBRF types]
The PLACE Toolkit: exposing geospatial ready digital collections – [what value would there be, in our own collections, of adding time/location metadata to content to enable eg map/timeline exploration? (and therefore would it outweigh the cost?)]
COR(E)CID: Analysing the use of unique author identifiers in repositories via CORE to support the uptake of ORCID iDs – this gives repositories a dashboard to check how many ORCIDs are in their repository. [I wasn’t clear though on whether it’s available for public use or requires a sign-up. Further investigation shows the CORE Repository Dashboard does require registration and is specifically for repositories submitting data to CORE, which makes sense.]
International Image Interoperability Framework (IIIF) – [this is beyond my expertise but I want to check it’s on the radar of our non-research-output repository vendor]
If you digitise them they will come: creating a discoverable and accessible thesis collection – U of Tasmania made their theses open access retrospectively if at least 10years old, with a disclaimer. [I’ve heard of a number of universities doing similarly; we’ve been more conservative, only making them available to staff and students unless we can secure permission. I’d like to push for the more open model.]
Strategies for increasing the amount of open access content in your repository – one tip they suggest is to set up a ScienceDirect email alert for ‘accepted manuscripts’ at your institution. When you get the email, download it immediately before it gets replaced by the ScienceDirect-branded ‘in press’ version. [This. Is. Genius.]
Enabling collaborative review with the DSpace configurable workflow – [I did some javascript hacking of the workflow, to sort the items by age and allow other sorting, but there’s very limited information still.] This poster shows improvements like displaying extra metadata fields (eg item type – author/publisher/year might be useful for us), adding statuses (eg questions for the researcher), and adding other notes. [This is Relevant To Our Interests.]

Getting started with Angular UI development for Dspace #OR2017

Leave a reply

by Tim Donohue and Art Lowel; session overview; Wiki with instructions for setup; demo site

[So my experience started off inauspiciously because I have an ancient version of MacOS so installing Node.js and yarn ran into issues with Homebrew and Xcode developer tools and I don’t know what else, but after five and a half hours I got it working. I then left all my browser and Terminal windows open for the next several days on the “if it ain’t broke” principle….]

Angular in DSpace

Angular 4.0 came out March 2017 – straight after conference there’ll be a sprint to get that into DSpace. (Angular tutorial) How Angular works with DSpace:

user goes to website
server returns first page in pre-compiled html, and javascript
user requests data via REST
API (could be hosted on different server than Angular) returns JSON data

With Angular Universal (available in Angular 2, packaged in Angular 4), it can still work with a browser that doesn’t have javascript (search engine browser, screen-reader, etc). Essentially if Angular app doesn’t load, your browser requests the page instead of the json, so the server will return the pre-compiled html again.

Caches (API replies and objects) on client-side in your browser so very quick to return to previously seen pages.

Building/running Angular apps

node.js – server-side JS platform (can provide pre-compiled html)
npm – Node’s package manager (pulls in dependencies from registry)
yarn – third-party Node package manager (same config, faster)
TypeScript language – extension of ES6 (latest javascript – adds types instead of generic ‘var’) – gets compiled down by Angular to ES5 javascript before it gets sent to the browser

You write angular applications by

composing html templates with angularized markup – almost all html is valid; can load other components via their selector; components have their own templates
writing component classes to manage those templates – lets you create new html tags that come with their own code and styling; consist of view (template) and controller. Implements interfaces eg onInit; extends another component; has a <selector>; has a constructor defining inputs; has a template. Essentially a component has a class and a template.
adding app logic in services – retrieve data for components, or operations to add or modify data – created once, used globally by injecting into component
boxing component(s) and optionally service(s) in modules – useful for organising app into blocks of functionality – would use this for supporting 3rd-party DSpace extensions (however business logic would be dealt with in REST API not in the Angular UI)

DSpace-angular folder structure

config/
resources/ – static files eg i18n, images
src/app/ – each feature in its own subfolder
- .ts – component class
- .html – template
- .scss – component style
- .spec.ts – component specs/test
- .module.ts – module definition
- .service.ts – service
src/backend/ – mock REST data
src/platform/ – root modules for client/server
src/styles/ – global stylesheet
dist/ – compiled code

Hands-on

[Here we got into the first couple of steps from the Wiki/gitHub project linked from there.]