Tag Archives: data storage

Flexible, Secure and Sharable Storage for Researchers #theta2015

Flexible, Secure and Sharable Storage for Researchers (abstract)
Andrew Nielson and Stephen McGregor

Talked a lot to researchers. Quarter of researchers didn’t know how much storage needed. Few needed more than 10TB. Built http://research-storage.griffith.edu.au/

Found existing services were uni-focused – hard to give access to external collaborators. Need to be competitive with cloud services. Want to let people collaborate with everyone, but not everyone. So there’s a form that lets researchers invite other users to sign in using a uni, Google, or LinkedIn account.

Needed multiple ways to share. Internal sharing – share with people by name. External sharing – provide a web URL with password protection / expiration date.

Device support: web interface plus apps including desktop sync apps.

Project spaces – you get 5GB storage by default but set up a project and storage space is unlimited. Space is a folder / “logical grouping of data”. When creating, have to include metadata for admin purposes (owner, project name, funder, backup contact). Instant approval and provision – don’t want to get in the way. Unless told to delete old / unaccessed data, just move to cheaper storage – effectively archiving off.

Block level deduplication (basically store a reference to previously stored data) better than single-instancing and lower overhead than compression. Have managed to save 46% space this way. This is needed because software stores entire new version, instead of a diff. “Don’t keep backups” but do replicate/sync between their geographically separated datacenters.

Used by Sciences but also Arts/Ed/Law, Business, and Health.
30% of projects (18 researchers) unfunded – data that would otherwise be on hard drives and uni wouldn’t even know it exists.

Developing and piloting more services including storage for use by instruments.
Currently administrators need to be hands-on to setup service – want to automate.

Q: Mandate?
A: If you force it people get annoyed. Providing option.

Q: Funding going forward given that new data probably bigger?
A: Yeah… basically want to build it well, get data off hard drives, show popularity, and then write business case if/when new space needed. Nowhere near this need yet.

Audience comment that fantastic usability for researchers.
A: Getting feedback from researchers has helped this.

Q: Any data publication service in development?
A: Project focused on working storage. eResearch Services department are working on a system for post-publication storage.

Q: Is it accessible to computational services?
A: Another project in early stages working on computational needs. Data in this format isn’t ideal for putting on servers – technically possible but usually when people are doing stuff on a server they want their storage there too.