Flexible, Secure and Sharable Storage for Researchers (abstract)
Andrew Nielson and Stephen McGregor
Talked a lot to researchers. Quarter of researchers didn’t know how much storage needed. Few needed more than 10TB. Built http://research-storage.griffith.edu.au/
Found existing services were uni-focused – hard to give access to external collaborators. Need to be competitive with cloud services. Want to let people collaborate with everyone, but not everyone. So there’s a form that lets researchers invite other users to sign in using a uni, Google, or LinkedIn account.
Needed multiple ways to share. Internal sharing – share with people by name. External sharing – provide a web URL with password protection / expiration date.
Device support: web interface plus apps including desktop sync apps.
Project spaces – you get 5GB storage by default but set up a project and storage space is unlimited. Space is a folder / “logical grouping of data”. When creating, have to include metadata for admin purposes (owner, project name, funder, backup contact). Instant approval and provision – don’t want to get in the way. Unless told to delete old / unaccessed data, just move to cheaper storage – effectively archiving off.
Block level deduplication (basically store a reference to previously stored data) better than single-instancing and lower overhead than compression. Have managed to save 46% space this way. This is needed because software stores entire new version, instead of a diff. “Don’t keep backups” but do replicate/sync between their geographically separated datacenters.
Used by Sciences but also Arts/Ed/Law, Business, and Health.
30% of projects (18 researchers) unfunded – data that would otherwise be on hard drives and uni wouldn’t even know it exists.
Future:
Developing and piloting more services including storage for use by instruments.
Currently administrators need to be hands-on to setup service – want to automate.
Q: Mandate?
A: If you force it people get annoyed. Providing option.
Q: Funding going forward given that new data probably bigger?
A: Yeah… basically want to build it well, get data off hard drives, show popularity, and then write business case if/when new space needed. Nowhere near this need yet.
Audience comment that fantastic usability for researchers.
A: Getting feedback from researchers has helped this.
Q: Any data publication service in development?
A: Project focused on working storage. eResearch Services department are working on a system for post-publication storage.
Q: Is it accessible to computational services?
A: Another project in early stages working on computational needs. Data in this format isn’t ideal for putting on servers – technically possible but usually when people are doing stuff on a server they want their storage there too.