Building research data services at NeSI
Brian Flaherty, NeSI

NeSI is a collaboration supporting researchers to tackle large problems (ie super-computers). Core services around HPC, consultancy, training. Two supercomputers Mahuika and Māui, a couple dozen staff, covering a range of disciplines.

2011-2014 mostly about computer infrastructure with a little storage and consultancy

2014-2019 shifting

2019-future looking at research platforms, virtual labs, scientific gateways

Data management – mostly want to deal with the active data: collection, pre-processing, analysis and modelling, repurposing pre-publication.

Refreshing its offering around transferring data – looking at a national data transfer platform. Nodes at UoA, NIWA in Wellington, AgResearch in Christchurch, and Dunedin. (There’s a big need for an Australasian platform as lots of data gets sent back eg to the Garvan Institute sequencing laboratory. Lots of hard drives still being shipped.) has a point-and-click transfer interface.

Automated workflows: Genomics Aotearoa doing a project sequencing taonga species eg the kākāpō. DoC was storing data in the cloud in Australia; Ngāi Tahu weren’t happy so have brought it back into NZ stored with NeSI. Genomics Aotearoa Data Repository being developed at NeSI – starting small (“don’t try to boil the ocean”), with downloading, storing, sharing data with group-based access control and group membership management. FAIR – so far at findable (just) and accessible (in that it’s sharable) but still working towards interoperable and reusable.

Indigenous data: a kāhui Māori working to make sure Māori data managed within a Māori context so need to map this into security, auditing, permissions process, etc. Work in process. Underpinned by “Te Mata Ira” guidelines for genomic research with Māori.

Increasingly researchers want the data to be stored in the same place as the compute so it doesn’t have to be transferred backwards and forwards.

Security around sensitive data: have firewalls, multifactor authentication, need to look more at privacy policy and standards around health information security frameworks etc.

Curation: “maintaining, preserving, and adding value to digital data/object through its life cycle”. Eg transforming formats, evaluating for FAIRness. Knowing what to get rid of when storage is stretched.

Metadata: Creating README files – getting stuff out of people’s heads. RO-CRATE as a way to package it in a human- and computer-readable way.

