Tag Archives: research data management

Open data? Perceptions of barriers to research data-sharing – Jo Simons #open17

Many aspects of open data – today focusing on research data, ie created by research projects at an institution.

Research workflow is very complex but to really simplify: researchers start a project, get lots of data, and summarise results in journals.  But it’s not the data – it’s a summary of the data with maybe a few key examples. The rest goes to places where only the researcher can access it.

Why do we care?

  • for the good of all
  • expensive to generate so want to maximise use eg validate, meta-analyses, used in different ways
  • much funded by government therefore taxpayer – so they should be able to access it

Used to work in a group which shared greenhouse space but had no idea what else was in there. Proposed sharing basic information about what was there and what to do in case of emergency – and was shocked when some said no. Supervisor said don’t let it stop you asking the question but that’ll happen, yeah.

Requesting data, odds of it being extant decrease 17% each year. (cite: Vines (2013) 10.1016/j.cub/2013.11.014)

This is where academic libraries come in – getting the data off the USB drives. So need to understand why they might not want to share. Did interviews to inform survey construction to get info from more people. 102 responses from researchers across 10 disciplines; 18 from librarians (about 20% response rate).

Do librarians and researchers agree on the major drivers that determine whether researchers choose to share their data?

Is data-sharing part of the research culture? Librarians: 7% said common/essential; researchers 26%

Factors influencing data-sharing

  • agreement in some areas eg ability to publish, inappropriate use, copyright and IP pretty high; then resources, interest to others, system structure and data access
  • differences: librarians thought institutional policy, system integration very important; funder policy, system usability somewhat important – all very low for researchers. What was important for researchers were: ethics (>40%); culture, research quality (10-15%); data preservation, publisher policy (5-10%)

Are there differences across major disciplines in what those drivers are?

5 disciplines with 10+ responses: business, medicine/health, phys/chem/earth; life sci/bio; soc sci/education. Ethics important for most but not a high-ranking factor for phys/chem/earth due to nature of their data. Whereas data preservation/archiving is more important for them (and med/health), somewhat important for life sci and soc sci, while business barely cared.

Take home

So consult with your community to find out what’s worrying them. Target those concerns in promotion and training. Eg we know system usability is important so definitely fix it – but don’t waste your communication opportunities talking about it when they’re worried about other things.

Scholarly workflows #or2017


Supporting Tools in Institutional Repositories as Part of the Research Data Lifecycle by Malcolm Wolski, Joanna Richardson

Have been working on research data management in context of the whole research data lifecycle. Started asking question: once research data management is under control, what will be the next focus? Their answer was research tools. Produced two journal articles:

  • Wolski, M., Howard, L., & Richardson, J. (2017). The importance of tools in the data lifecycle. Digital Library Perspectives, 33(3), in press
  • Wolski, M., Howard, L., & Richardson, J. (2017). A trust framework for online research data services. Publications, 5(2), article 14 https://doi.org/10.3390/publications5020014

Research life cycle: Data creation and deposit (plan and design, collect and capture) -> Managing active data (Collect and capture, collaborate and analyse) -> Data repositories and archives (manage, store, preserve; share and publish) -> Data catalogues and registries

Research data repositories vary a lot. Collection or ecosystem? Open or closed? End point or part of workflow? Why is it hard to build them? Push-and-pull between re-usability and preservation:

  • technical aspects
  • interoperability
  • lega/regulatory/ethical constraints
  • one-off activity or continuous
  • diversity of accessibility issues
  • diversity of re-usability issues

The average number of research tools per person was 22 per person (includes Word, ResearchGate, email through to SurveyMonkey, Dropbox, Figshare, through to R and really specialised ones). Kramer and Bosman (2016) divided tools into assessment, outreach, publication, writing, analysis, discovery, preparation phases. Tools exploding as research activity scales up, collaboration increases. Large-capacity projects being funded. Data science courses upskilling researchers.

Researchers use lots of tools as part of the data workflow. The institution may manage data, but have no ownership of workflow. Since data has to move seamlessly between tools, interoperability is key – but how do we built these interoperable workflows and infrastructures?

Need to remember repository is only part of the research ecosystem. Need to take an institutional approach – or approaches rather than a single design solution. Look at main workflows and tools used – check out research communities who may already have the solutions – focus must be meeting the researchers’ needs.

Q: Will we see researchers use fewer tools as disciplinary workflows develop?A: Probably not but will see more integration between them eg Qualtrics adding an R connector.

Research Offices As Vital Factors In The Implementation Of Research Data Management Strategies by Reingis Hauck

Have a full-text repository on DSpace, building data repository on CKAN. What if we build something (at great expense) and they don’t come? We need cultural change. Eg UK seems far ahead but only 16% of respondents are accessing university RDM support services in 2016.

They have data repository, and provide support service by research office, library and IT services.

Research offices provides support in grant writing; advocates on policies; helps with internal research funding; report to senior leadership. Their toolkit:

  • need to win research managers over – explain how important it is
  • embedded an RDM-expert
  • upskilled research office staff about data management planning and how to make a case for data management.

Look out for game changers:

  • eg large collaborative research projects – produce lots of data and need to share it to be successful so more likely to listen
  • DMP preview as standard procedure for proposal review and training on proposal writing. (Want data management planning to be like brushing your teeth: you do it every day and if you forget you can’t sleep.)
  • adapt incentives – eg internal funding for early career researchers requires data management plans
  • use existing networks – researchers go to lots of boards and meetings already so feed this as a topic like any other topic
  • engage with members of DFG[German science foundation] review board – to get them to draw up criteria to reward researchers doing it

Cultural change towards open science can be supported by your research office. Let’s team up more!

Towards Researcher Participation in Research Information Management Systems by Dong Joon Lee, Besiki Stvilia, Shuheng Wu

RIMS – include ResearchGate, Academia, Google Scholar; ORCID, ImpactStory; PURE, Elements

ResearchGate sends out a flood of emails – good for some, a put-off for others. How can we improve our RIMS to improve researcher engagement?

Interviewed 15 researchers then expanded to survey 412 participants; also analysed metadata on 126 ResearchGate profiles of participants. Preliminary findings:

  • Variety of different researcher activities in RIMS eg write manuscripts, interact with peers, curate, evaluate, look for jobs, monitor literature, identify  collaborators, disseminate research, find relevant literature.
  • Different levels of participation: readers may have a profile but don’t maintain it or interact with people; record managers maintain their profile, but don’t interact with others; community members maintain profiles but also interact with others etc.
  • Different motivations to maintain profile: to share scholarship (most popular); improve status, enjoyment, support evaluation, quality of recommendations, external pressure (least popular)
  • Different use of metadata categories: people tend to use the person, publication, and research subject catories. Maybe research experience, but rarely education, award, teaching experience, other other.
    • In Person most people put in first, last name, affiliation, dept;
    • Publication: Most use most of these except only 30% of readers share the file – about 80% of record managers and community member

Want to develop design recommendations to enable RIMS to increase participation.

Research and non-publications repositories, Open Science #or2017


OpenAIRE-Connect: Open Science as a Service for repositories and research communities by Paolo Manghi, Pedro Principe, Anthony Ross-Hellauer, Natalia Manola

Project 2017-19 with 11 partners (technical, research communities, content providers) to extend technological services and networking bridges – creating open science services and building communities. Want to support reuse/reproducibility and transparent evaluation around research literature and research data, during the scientific process and in publishing artefacts and packages of artefacts.

Barriers – repositories lack support (eg integration, links between repositories). OpenAIRE want to facilitate new vision so providing “Open Science as a Service” – research community dashboard with variety of functions and catch-all broker service.

RDM skills training at the University of Oslo by Elin Stangeland

Researchers using random storage solutions and don’t really know what they’re doing. Need to improve their skills. Have been setting up training for various groups in organisation. Software Carpentry for young researchers to make their work more productive and reliable. 2-day workshops which are discipline-specific and well-attended. Now running their own instructor training which allows expanding service. Author carpentry, data carpentry, etc.

Training for research support staff who are first port of call on data management plans, data protection, basic data management. Recently made mandatory by Dept of GeoSciences to attend DMP training.

Expanding library carpentry to national level.

IIIF Community Activities and Open Invitation by Sheila Rabun

Global community that develops shared APIs for web-based image delivery; implements that in software; to expose interoperabie image content.

Many image repositories are effectively silos. IIIF APIs allows a layer that lets servers talk to each other and allow easier management and better functionality for end-users. Lots of image servers and clients around now so you can mix-and-match your front and back-ends. Can have deep zoom; compare images and more.

Everything created by global community so always looking for more participants. Community groups, technical specification groups eg extending to AV resources, discovery, text granularity (in text annotations). Also a consortium to provide leadership and communication channels.

Data Management and Archival Needs of the Patagonian Right Whale Program Data by Harish Maringanti, Daureen Nesdill, Victoria Rowntree

Importance of curating legacy datasets. World’s longest continuous study of large whale species: 47 years and counting of data. Two problems:

  • to identify whales – found the callosities of right whales were unique (number, position, shape) and pattern remained same despite slight change over time. So can take aerial photos when they surface. Data analysed with complicated computer system and compared with existing photos.
  • to gather data over a period of times – where to find whales regularly. Discovered whales gather in three places: 1) mothers and calves; 2) males and females; 3) free-for-all.

Collection has tens of thousands b&w negatives; color slides; analysis notebooks; field notes; Access 1996 database records; sightings maps.

Challenges: heterogeneity of data; metadata – including how much can be displayed publically; outdated databases.

Why should libraries care? We can provide continuity beyond life of individual researchers. Legacy data is as important as current data in biodiversity type fields and generally isn’t digitised yet.

Repository driven by the data journal: real practices from China Scientific Data by Lili Zhang, Jianhui Li, Yanfei Hou

China Scientific Data is a multidisciplinary journal publishing data papers – raw data and derived datasets. Submission (of paper and dataset), review (paper review and curation check), peer review, editorial voting.

How to publish:

  • Massive data? – on-demand sample data publication: can’t publish the whole set, but publishes a sample (typical, minimum sized) to announce the dataset’s existence
  • Complex data? – publish data and supplementary materials together eg background info, software, vocabulary, code, etc. Eg selected font collections for minority languages
  • Dynamic data? – eg when updating with new data using same methodology and data quality control. Could publish as new paper but it’s duplicative so published instead as another version with same DOI. Can be good for your citations!

Encourage authors to store data in their repository so its long-term availability is more reliable.

RDM and the IR: Don’t Reuse and Recycle – Reimplement by Dermot Frost, Rebecca Grant

We all have IRs and they’re designed for PDF publications. Research Data Management is largely driven by funder mandates; some disciplines are very good at it, some less so (eg historians claiming “I have no data” – having just finished a large project including land ownership registries from 17th century, georectified etc!)

FAIR (findable, accessible, interoperable, reusable) data concept (primarily machine-oriented ie findable by machines). IRs can’t do this well enough. Technically uploading a zip file is FAIR but time-costly to user.

Instead should find a domain-specific repository (and read the terms and conditions carefully especially around preservation!) Or implement your own institutional data repository (but different scale of data storage can take serious engineering efforts). Follow the Research Data Alliance.

Developing a university wide integrated Data Management Planning system by Rebecca Deuble, Andrew Janke, Helen Morgan, Nigel Ward

Need to help researcher across the life-cycle. UofQueensland identifying opportunity to support researchers around funding/journal requirements. Used DMP Online but poor uptake due to lack of mandate. UQ Research Data Manager system:

  • Record developed by research – active record (not plan though includes project info) which can change over course of project. Simple dynamic form, tailored to researchers, with guidance for each field.
  • Storage auto-allocated by storage providers for working data – given a mapped drive accessible by national collaborators (hopefully international soon) using code provided in completing the form.
  • [Working on this part] Publish and manage selected data to managed collection (UQ eSpace). Currently manual process filling in form with metadata fields in eSpace. Potential to transfer metadata from RDM system to eSpace.
  • Developing procedures to support the system.

Benefits include uni oversight of research in progress, researcher-centric, improves impact/citation, provides access of data to public.

Preserving and reusing high-energy-physics data analyses by Sünje Dallmeier-Tiessen, Robin Lynnette Dasler, Pamfilos Fokianos, Jiří Kunčar, Artemis Lavasa, Annemarie Mattmann, Diego Rodríguez Rodríguez, Tibor Šimko, Anna Trzcinska, Ioannis Tsanaktsidis

Data very valuable – data published even 15 years after funding stopped, and although always building new and bigger colliders, data is still relevant even decades after collected.

Projects involve 3000 people, including high turnover of young researchers. CERN need to capture everything need to understand and rerun an analysis years later – data, software, environment, workflow, context, documentation.

  • Invenio (JSON schema with lots of domain-specific fields) to describe analysis
  • Capture and preserve analysis elements
  • Reusing – need to reinstantiate the environment and execute the analysis on the cloud.

REANA = REusable ANAlyses supports collaboration and multiple scenarios.