Tag Archives: figshareFestNZ2019

State of Open Data 2019 #FigshareFestNZ

State of Open Data 2019 report
Dr Mark Hahnel, Figshare

EU reckons it could save 10.2billion Euros a year using FAIR data. In the US the OPEN Government Data Act has been signed. NIH has said it requires everyone to make all their data available – but where are they going to put it?

FAIR – at least start with F and don’t name it “dataset.xlsx” because that’s hardly findable.

Figshare/Zenodo/Dryad – increasing uptake over time 20-30% year on year. Effects of EPSRC mandate in UK, NSFC mandate in China are significant. NZ is actually following the same trend even without mandates – global culture seems to be having an effect.

arXiv:1907.02565 found linking paper to dataset associated with 25% increase in citations

Annual survey – 8500 responses (three times as many as last few times)

  • 74.5% responses were extremely/somewhat likely to use other people’s data
  • 2/3 say funders should mandate data sharing, and 2/3 say that funders should withhold funding if people don’t
  • 66.4% think they don’t get enough credit for datasets – they want full citation, co-authorship (of paper based on their data – technically against norms but many do get it), consideration in job reviews, financial reward
  • Awareness of FAIR – slightly up but still <20% familiar, <30% heard of them

South African data repository (for compliance with a funder there)

NZ Research Information System #FigshareFestNZ

NZ Research Information System (NZRIS) update.
Chris Dangerfield, MBIE

Concept model covering goal/purpose, resources, requests for resources, asset pools (eg funds), awards, projects, activities, people, teams, proposals, organisations…

Data model divided into two sides “asset pool managers” (funders) and “research, science and innovation managers” (eg unis, CRIs)

Implementation Phase 1 – plan to release in March with publically available historic data (from 2009 to present), with mostly mandatory data resulting from HRC proposals, Marsden Fund, Endeavour Fund, Sustainable Farming Fund, Partnerships, etc.

Phase 2 – bringing on more funding organisations and bringing in optional data

Review of phase 1 – data sovereignty, privacy, engaging with Māori stakeholders, creating engaging visualisations, improving data quality

Phase 3 – bring on unis, CRIs, etc in 2021-2022 – looking at projects, outputs (including datasets)

More information

Preparing the healthcare workforce #FigshareFestNZ

Preparing the healthcare workforce to deliver the digital future – the good, the bad and
the ugly.
Dr Ehsan Vaghefi

Lots of lessons learned through commercialisation involving AI.

The Good

  • Great for science – IBM’s Watson Oncology can provide evidence-based treatment options, generate notes and reports, etc: the oncologist then audits this. Enables the hospital to increase capacity as AI is doing heavy lifting. Would it replace radiologists? Some yes; but other jobs have been created to work with the AI.
    • linking diseases to different genetic profiles
    • predicting possible treatments/vaccines for testing
    • AI-assisted cardiac imaging system
  • Gift of time – clinicians will have more time to focus on interacting with patients
  • (Good reads: “The patient will see you now”
  • Ophthalmology/optometry relying heavily on pattern recognition eg AI often more accurate detecting cataracts; can match accuracy detecting glaucoma (which you otherwise don’t know you have until too late); can match accuracy detecting diabetic screening

The Bad

  • Implementation – customer request, design, documentation, customer actual needs often all very different!
    • Eg one example where they provided more information to clinicians it slowed them down and made them worse. Clinicians are scared of AI so start double-guessing themselves. They do get faster using it with AI with more practice – but never reach their unassisted screening rates! Similar study in Thailand – when gathering data, clinicians only passed on the good data that they were confident about. So when AI tried to deal with ambiguous situations it didn’t cope.

The Ugly

  • Deepmind Health got more than 1million anonymised eye scans with related medical records – then sold themselves to Google. (In 2017 UK ruled that the NHS had broken the law in providing medical records.) Microsoft partnering with Prasad Eye Institute in India. IBM acquiring Merge Healthcare and IBM Watson analysing glaucoma images for deep learning. Streams medical diagnosis app to help you self-manage your health – and provides the results to hospital and your insurance company…..  Zorgprisma Publiek “pre-clinical decision-maker” helps “avoid unnecessary hospitalizations” – in practice the hospital can see in advance that you’ll be a costly patient and not admit you.
  • Re-identification – based on a single photo you can guess so much about a person you can start to work out who the person is.
  • AI bias – racism – based on incomplete datasets. Eg police using AI to assign risk factor based on risk background and face but because it’s got lots of racially biased data, it produces racially-biased risk factors. Eg a health-care algorithm where only 17.7% of patients receiving extra care were black (should have been 46.5%). Vital to be very careful about data collection – who’s contributing and not contributing – and invest more in diversifying the AI field itself.

Is ethical AI an oxymoron? Need to work out data ownership, governance, custodianship, security, impact on future.

Five pillars ethical AI

  • Transparency (informed consent etc)
  • Justice and faireness (make sure you’re not missing parts of the community)
  • Non-maleficence
  • Responsibility
  • Privacy

Is ethical AI a bargain/contract? A bargain struck between data sources and data users. Science needs data so it must be shared – but what benefit does the data source receive? Next evolution of big data in healthcare is “learning health systems” so instead of just holding your information the system can learn about you and give you better treatment.

Is privacy always beneficial? Sometimes sharing the data with an AI lets you get a better treatment plan.

A roadmap: “First do no harm”. Choose the right problem, not going fishing for data and make sure when gathering data the population understands everything about the research

Removing barriers to sharing for the benefit of Māori #FigshareFestNZ

Data for whom? Removing barriers to sharing for the benefit of Māori
Dr Kiri Dell, Ngapera Riley, CEO figure.nz

Ref Decolonising Methodologies by Linda Tuhiwai Smith

The academy privileges a certain type of knowing, but indigenous people have other ways as well (which we all use to some degree) eg

  • Sense perception – I felt it
  • Imagination – I envisioned it
  • Memory – I remembered it
  • Inherited – My nanny told me
  • Faith – God told me

Example of using data badly: MOTU (economic research centre) put out research a few years ago comparing Māori and other ethnicities and concluding that collectivist beliefs were holding Māori back from economic success. This was not taken well by Māori…. Researchers made sweeping statements about Māori culture where they had no research; compared Māori to completely different groups (eg African Americans, Chinese) with different histories and belief systems; interpreted through white male lens.

Basho haiku
Pull the wings off a dragonfly and look – you get a red pepperpod!
Add wings to a pepperpod, and look – you get a red dragonfly

Figure.nz set up as charity to democratise data – aims to provide valid and ethical data. Data is important but so is context and people behind it. Draws licensed data from over a hundred sources. Exists for the benefit of Aotearoa so believe if they can get data right for Māori they can get it right for all. Kaupapa that data is for everyone not just experts.

Partnering with Te Mana Rarauranga – Māori Data Sovereignty Network; and with nine government agencies who’ve got a lot of data that was never meant to be shared so navigating benefits and dangers of sharing. How will data be used, for whom, why?

Data is never perfect – it’s just one tool alongside experience and connections. Māori data has traditionally not been collected well – cf especially the latest census – so have to be careful about conclusions drawn.

Figure.NZ –  over 44,000 charts and datasets (CSVs and images) around people, travel, health, education, employment, economy, environment, social welfare, technology, broken down by geographic area. Very careful to publish metadata around sources etc.

Original data was a mess so have been working hard to tidy it up.  Check sources, make sure it’s statistically valid (no small datasets) – have a robust process to work with source to make sure the metadata explains methodology and context.

Focused on public aggregate data but starting to use other sources. Wondering how to safely share research data. Excited to see people have started publishing theses etc with CC licensing.


Rolling out research for public good #FigshareFestNZ

The bumpy road to rolling out the Index of Multiple Deprivation
Assoc Prof Daniel Exeter

IMD looks at deprivation in different areas eg employment, education, income, housing, access to services. Most rural areas in South Island not very deprived, but the east and north of North Island. In Christchurch also various quintile 5 areas surrounded by quintile 3 areas – very sharp north-west/south-east divide. Can drill down to see what’s driving deprivation in each area.

Want to maximise the impact of the public good money so put up website and made all datasets and reports freely available, easy to download, with full attribution. But first hurdle was getting data out of StatsNZ IDI who are very cautious – had to work closely with them pointing out their data was very aggregated. Published papers with journal who wanted data and wouldn’t accept institutional site they’d painstakingly created. [In fairness, institutions do mess up their websites frequently so it’s not good for longterm storage!] So quickly created a Figshare website – one advantage is other institutions can use API to access this data.

Bumpy ride = mostly interesting challenge. Eg

  • someone creating a survey wanting to know people’s geographical area without asking for actual address to make it easier to manage data. Created something where users could enter address and it’d convert it to the area along with IMD data
  • looking at poor oral health of children per deprivation per ethnicity
  • getting it into policy – mostly from getting emails from people asking how to interpret something. One DHB in North Island interacts with 5 local govt authorities and was asked to create one per authority – lots of work to recreate this at that level but achieved
  • someone using it for climate change – heat vulnerability index including IMD, social isolation, age bands
  • noted some areas ended up not included in census which affects accuracy

Funded by Health Research Council but most uptake through govt, eg also Canterbury Wellbeing Index; Ministry of Education, Alcohol Health Watch lets you type in an address and get a report to make a submission against liquor licenses

Are big data and data sharing the panacea? It’s fantastic but there are big issues around too: attirbution, ethics, risk of data gerrymandering, need to use theory to inform methods. Sometimes people use the IMD in ways that are useful but potentially misleading. Institutions are playing catchup, not yet sure how to deal with this. Hopes the community being built around Figshare etc will help develop solutions and best practices.


Developing research data services #FigshareFestNZ

I talked about our implementation and various integrations, including glitches along the road, eg

  • being “piggy in the middle” between Figshare and our ITS trying to troubleshoot CNAME issues without knowing what a CNAME is
  • setting up emails so that Figshare system emails come from a Lincoln address – while our ITS is cracking down on phishing and Figshare emails look to our system like they’re spoofed.
  • our first attempt at an HR feed only sent through academics, not postgrads; now redoing this from scratch with new extract scripts, and Figshare is working out how to deal with the ~ we use to signify someone doesn’t have a first name
  • indexing in LibrarySearch (Primo) using the OAI-PMH feed – also in data.govt.nz and DigitalNZ
  • Next year implementing Repository Tools 2 in Elements which will let researchers deposit datasets from there – doing pre-work to make sure we can have two user IDs in the HR feed: one to integrate with the login system, the other one to integrate with Elements

Salila Bryant from ESR about the background behind implementing Figshare as an institution repository, and some of the challenges (including getting support from the other side of the world; getting researchers on board with publishing data; covering privacy, metadata requirements etc)

Katy Miller from Victoria University of Wellington used to have Rosetta for everything but researchers couldn’t interact with it. Ended up choosing Figshare as it interacts with Elements; easy for academic staff to deposit; modern interface; ‘just right’ – looks, well indexed etc; proven solution. Lots of decisions needed to be made to set it up: thought it’d all be simple but turns out not to be as simple as it looks.

  • Groups – most places use academic units, but this is greatly in flux at VUW so are probably going to use publication types as the groups (eg a group for theses etc)
  • Metadata – keeping it light to focus on their key aims
  • Mediation – trying for no mediation between deposit and availability
  • Scope – starting with journal articles and conference papers as this is mainly to support OA mandate. Theses will be next (and will need to add postgrads into the system) – then think about research data down the track.
  • DOIs – grappling with how we handle situations where we publish a preprint with a DOI and later the published version becomes available with a different DOI? Currently planning to use the publisher DOI field in Figshare when the published DOI becomes available.

Laura Armstrong, UofAuckland talking about engaging with researchers. Talk about their “data publishing and discovery service” but still at https://auckland.figshare.com Currently 236 datasets in data.govt.nz; 666 DOIs minted year-to-date.

  • Researchers want DOIs (often not knowing why); discoverability (via Google); browser preview; branding/match website; metrics
  • Have group sites for different use cases eg conferences, research groups
  • Process – usually researcher requests; then discuss use case, what they want to achieve, who’s the owner, etc; then create a group site based on this, and figshare uploads branding overnight; they make it live but not public; once at least one item is published, the site is made public.
  • User types – most people only assigned to one group and can only publish there. Can set people up as admins for multiple groups, they can then publish to any of those groups. For external users, an internal person in group X can create an unpublished project and invite the external person to it: project items then get published to group X.

Simon Porter and Jared Watts, Digital Science looking at visualisation of collaboration networks based on Jupyter Notebook using Figshare API. Making available raw data in CSVs, visualisations, and technical report “This Poster is Reproducible”.

National DOI consortium #FigshareFestNZ

National DOI consortium
Andrea Goethals, National Library of New Zealand

DOIs: persistent and unique identifiers for digital objects. Support the sharing of research as part of creating FAIR data (eg principle F1 “(meta)data are assigned a globally unique and persistent identifier”). Mostly assigned by either DataCite (especially for data and grey literature) or Crossref (especially for journal articles etc).

DataCite’s model requires being a member – either direct member, or consortial – New Zealand has done the latter. Consortial is cheaper, can leverage each others’ skills and expertise, have more influence, and share strategies. Each consortium member might have one or more repositories. Responsibilities: full autonomy creating your own DOIs using their tools; communicating with institution’s researchers; and paying annual fees.

Consortial fees:
Membership  fee: 2000Euro / #members
Repository fee: 500Euro per repository
DOI fee: 500Euro up to 10,000

Minting DOIs – at UoA can either create manually through a form or uploading a file; or by API automatically.

Also have an NZDOI Interest Group – open to anyone to join.

Have more information at National Library site.

Building research data services at NeSI #FigshareFestNZ

Building research data services at NeSI
Brian Flaherty, NeSI

NeSI is a collaboration supporting researchers to tackle large problems (ie super-computers). Core services around HPC, consultancy, training. Two supercomputers Mahuika and Māui, a couple dozen staff, covering a range of disciplines.

2011-2014 mostly about computer infrastructure with a little storage and consultancy

2014-2019 shifting

2019-future looking at research platforms, virtual labs, scientific gateways

Data management – mostly want to deal with the active data: collection, pre-processing, analysis and modelling, repurposing pre-publication.

Refreshing its offering around transferring data – looking at a national data transfer platform. Nodes at UoA, NIWA in Wellington, AgResearch in Christchurch, and Dunedin. (There’s a big need for an Australasian platform as lots of data gets sent back eg to the Garvan Institute sequencing laboratory. Lots of hard drives still being shipped.) transfer.nesi.org.nz has a point-and-click transfer interface.

Automated workflows: Genomics Aotearoa doing a project sequencing taonga species eg the kākāpō. DoC was storing data in the cloud in Australia; Ngāi Tahu weren’t happy so have brought it back into NZ stored with NeSI. Genomics Aotearoa Data Repository being developed at NeSI – starting small (“don’t try to boil the ocean”), with downloading, storing, sharing data with group-based access control and group membership management. FAIR – so far at findable (just) and accessible (in that it’s sharable) but still working towards interoperable and reusable.

Indigenous data: a kāhui Māori working to make sure Māori data managed within a Māori context so need to map this into security, auditing, permissions process, etc. Work in process. Underpinned by “Te Mata Ira” guidelines for genomic research with Māori.

Increasingly researchers want the data to be stored in the same place as the compute so it doesn’t have to be transferred backwards and forwards.

Security around sensitive data: have firewalls, multifactor authentication, need to look more at privacy policy and standards around health information security frameworks etc.

Curation: “maintaining, preserving, and adding value to digital data/object through its life cycle”. Eg transforming formats, evaluating for FAIRness. Knowing what to get rid of when storage is stretched.

Metadata: Creating README files – getting stuff out of people’s heads. RO-CRATE as a way to package it in a human- and computer-readable way.