Tag Archives: eresearch

eResearchNZ2013 Day 1 Wrap-up #nzes

Selected notes from the audience inspired by today’s sessions:

  • Synergies between sectors, between Australia/New Zealand. Ability to move to researcher-centric rather than infrastructure-centric.
  • No connections apparent to government systems which are needed by digital humanities.
  • From experience researchers need lots of help. Australian ideal seems to be it’s all there and easy-to-use on desktop. Nice ideal but how practical?
  • Data management and data curation are still “dragons in a swamp. We know there’s dragons there, don’t know what they look like, but we’re planning to kill them anyway.”
  • Need data management policy and a national solution. And if going to invest all this money in research don’t want to delete all the data so need to work on preservation too.
  • Good to see REANNZ looking at service level and tools. Lots to learn from Australia about where we need to put our efforts.
  • There is a policy direction from government around access and reuse of data. Challenge is around how to most effectively implement this. Especially re publically funded research (cf commercially sensitive) there’s an expectation that there’d be access to the results and, where possible, the data. But still work to do.
  • Users who don’t get help can get something out of the system; but users to do get help can do a whole lot more. Hence software carpentry sessions. [Cf this blog post about software carpentry I coincidentally read today.]
  • Peer instruction becomes very important – need someone who’s doing similar things to come in and teacher researchers and students.
  • Can embed slides, photos, etc into ‘abstract’ pages linked from the conference programme.
  • Many tools and skills great to instill in people but don’t always fit with packages – eg version control doesn’t really work with MATLAB. šŸ™
  • Therefore “the less software researchers write, the better”. There’s a limit to how much we can afford to maintain.
  • Benefit to software carpentry is so people can collaborate on software rather than write your own. The best software is what lots of people work on.

Design patterns for lab #labpatterns; Research cloud – #nzes

A pattern language for organising laboratory knowledge on the web #labpatterns
Cameron McLean, Mark Gahegan & Fabiana Kubke, The University of Auckland
Google Site

Lots of lab data hard to find/reuse – big consequences for efficiency, reproducibility, quality.
Want to help researchers locate, understand, reuse, and design data. Particularly focused on describing semantics of the experiment (rather than semantics of data).

Design pattern concept originated in field of architecture. Describes a solution to a problem in a context. Interested in idea of forces – essential/invariant concepts in a domain.

Kitchen recipe as example of design pattern for lab protocol.
What are recurring features and elements? Forces for a cake-like structure include: structure providers (flour), structure modifier (egg), flavours; aeration and heat transfer.

Apply this to lab science, in a linked science setting. Take a “Photons alive” design pattern (using light to virtualise biological processes in an animal). See example paper. Can take a sentence re methodology and annotate eg “imaging” as diagnostic procedure. This using current ontologies gives you the What but not the Why. Need to tag with a “Force” concept eg “immobilisation”. Deeper understanding of process – with role of steps. And can start thinking about what other methods of immobilisation there may be.

So how can we make these patterns? Need to use semantic web methods.
A wiki for lab semantics. (Wants to implement this.) Semantic form on wiki – a template. Wiki serves for attribution, peer review, publication – and endpoint to RDF store.

Q: How easy is this to use for a domain expert?
A: Semantic modeling is iterative process and not easy. But semantic wiki can hide complexity from enduser so domain expert can just enter data.

Q: We spend lots of time pleading with researchers to fill out webforms. How else can we motivate them, eg to do it during process rather than at end?
A: Certain types of people are motivated to use wiki. This is first step, proof of concept. Need a critical mass before self-sustaining.

Q: How much use would this actually be for domain experts? Would people without implicit knowledge gain from it?
A: Need to survey this and evaluate. It’s valuable as a democratising process.

Q: What about patent/commercial knowledge?
A: Personally taking Open science / linked science approach – intended for research that’s intended to be maximally shared.

A “Science Distribution Network” – Hadoop/ownCloud syncronised across the Tasman
Guido Aben, AARNet; Martin Feller, The University of Auckland; Andrew Farrell, New Zealand eScience Infrastructure; Sam Russell, REANNZ

Have preferred to do one-to-few applications rather than google-style one-to-billions. Now changing. Because themselves experiencing trouble sending large files. Scraped up own file transfer system, marketed as cloudstor though not in the cloud and doesn’t store things. Expected couple hundred uses, got 6838 users over the last use. Why linear growth? “Apparently word of mouth is a linear thing…” Seem to be known by everyone who have file-sharing issues.

FAQs:
Can we keep files permanently?
Can I upload multiple files?
Why called cloudstor when it’s really for sending?

“cloudstor+ beta” – looks like dropbox so why doing this if already there? They’re slow (hosted in Singapore or US). Cloudstor+ 30MB/s cf 0.75MB/s as a maximum for other systems. Pricing models not geared towards large datasets. And subject to PRISM etc.

Built on a stack:
Anycast | AARNet
ownCloud – best OSS they’ve seen/tested so far – has plugin system and defined APIs
MariaDB
hadoop – but looking at substituting with XTREEMFS which seems to work with latencies.

Distributed architecture – can be extended internationally. Would like one in NZ, Europe, US, then scale up.

Bottleneck is from desktop to local node. Only way they can address this is to get as close to researcher as possible – want to build local nodes on campus.

Official statistics; NeSI; REANNZ; Australian eResearch infrastructure – #eResearchNZ2013

Don’t know how long I’ll be live-blogging, but here’s the start of the eResearchNZ 2013 conference:

Some thoughts on what eResearch might glean from official statistics
Len Cook,
* Research-based info competes with other sources of info people use to make decisions
Politicians like weathercocks – have to respond to wind. Sources of info include: official stats, case studies, anecdote, and ideology/policy framework. More likely to hear anecdotes than research. NZ’s data-rich but poor at getting access to existing data. Confidentiality issues: “Statisticians spend half the time collecting data and the other preventing people from accessing it.” Need to shift ideas – recent shifts in legislation a step to this.

* Official statistics has evolved over the last few centuries
19th century: measurement developed to challenge policy. Florence Nightengale wanted to measure wellbeing in military hospitals because it was like taking hundreds of young men, lining them up and shooting them. Mass computation and ingenuity of graphical presentation – all by hand.
20th century: development of sampling, reliability, meshblocks. Common classifications, frameworks.
1990s and beyond: mass monitoring of transactions. Politics of info access/ownership important. Obligations created when data collected. Registers and identifiers now central. Importance of investing in metadata to categorise and integrate information.

* Managing data not just about technology – probably the reverse.

* Structural limitations. Need strong sectoral leadership. Need a chief information office for a sector not for government as a whole.

NeSI’s Experience as National Research Infrastructure
Nick Jones, New Zealand eScience Infrastructure
NZ is very good at scientific software. Also significant national investments in data (GeoNet, NZSSDS, StatsNZ, DigitalNZ, cellML, LRIS, LERNZ, CEISMIC, OBIS). But also significant (unintended) siloisation and no investment to break down barriers and integrate. However do have good capability. NeSI wants to enhance existing capabilities but also help people meet each other. Build up shared mission, collegiality.

Heterogeneous systems improve ability to deal with specific datasets, but increasingly need ability to adapt software. NeSI gives capability to support maturing of existing scientific computing capabilities.

CRIs are widespread. So are research universities. All connected by REANNZ (KAREN). Research becoming more highly connected, collaborative. National Science Challenges targeted to building collaboration too. But sector still fragmented and small-scale. “Each project creates, and destroys, its own infrastructure.”

Research eInfrastructure roadmap 2012 includes NZ Genomics Ltd (->Bioinformatics Cloud); BeSTGRID, BlueFern, NIWA (->NeSI); BeSTGRID Federation (Tuakiri); KAREN->REANNZ. Is a big gap in area of research data infrastructure.

Need government investment to overcome coordination failure. Institutions should support national infrastructure. NeSI to create scalable computing infrastructure; provide middleware and user-support; encourage cooperation; contribute to high quality research outputs. In addition to infrastructure have team of experts to support researchers.

REANNZ: An Instrument for Data-intensive Science
Steve Cotter, REANNZ
Move from experimental -> theoretical -> computational sciences, and now to data-intensive science (see “The Fourth Paradigm“). Exponential data growth. Global collaboration and requirement for data mobility. “Science productivity is directly proportional to the ease with which we can move data.” Trend towards cloud-based services.

And trend to need for lossless networking. Easy to predict capacity for youtube etc. But when simulating global weather patterns, datasets are giant and unpredictable – big peaks and troughs in traffic. TCP good at handling loss for small packets, but can be crushed by a large packet loss – 80x reduction in data transfer rates for NZ-type distances. So can’t rely on commercial networks.

Higgs-Boson work example of network as part of the scientific instrument and workflow.

Working on customisation, flexibility. Optimising end-to-end: Data transfer node; Science DMZ (REANNZ working with NZ unis, CRIs etc to deploy); perfSONAR.

Firewalls are harmful to large data flows and unnecessary. Not as effective as they once were.

If you can’t move 1TB in 20 minutes, talk to to REANNZ – they’ll raise your expectations.

Progressing to work with services above the network.

Australian Research Informatics Infrastructure
Rhys Francis, eResearch Coordination Project
Sustained strategic investment over a decade into tools, data, computation, networks, and buildings (for computation). (Personnel hidden in all of these.) Tools are mission critical, data volumes explode, systems grow to exascale, global bandwidth scales up. High ministerial turnover; each one takes about six months then realises we need this infrastructure. Breaking it down into these areas helps explain it to people.

OTOH volume of well-curated data is not exploding.

National capabilities: Want extended bandwidth, better HPC modelling, larger data collections. Shared access highly desirable but very hard to get agreement on how.
Research integration: Want research data commons and problem-oriented digital laboratories.

Hard to explain top, and when you chop it up into bits people think “Any university could have done that bit.” But need expertise and need to share it.

In last 7 years added fibre and super-computing infrastructure. Many software tools and lab integration projects. Hundreds of data and data flow improvement projects. Single sign-on. Data commons for publication/discovery. Recruit overseas but still only so much they can resource.

These things are hard, and it was data slowing it down because didn’t know where collections would physically be. If you’re dealing with petabytes, the only way to move it is by forklift.

eResearch infrastructure brings capabilities to the researcher.
NCI and Pawsey: do computational modeling, data analysis, visualise results
NeCTAR: use new tools, apps, work remotely and colaborate in the cloud
ANDS and RDSI: keep data and observations, describe, collect, share, etc.

Current status (I’m handpicking data-related bulletpoints):
* 50,000 collections published in research data commons
* coordination project to work with implementation projects to deploy data and tools as service for key national data holdings

Looking to 2014:
* data publication delivered
* Australian data discoverable
* 50 petabytes of quality research data online
* colocation of data and computing delivered

Need to focus on content (including people/knowledge, data, tools) as infrastructure. Datasets and skillsets. Less and less bespoke tools; more and more open-source or commercial products.

Need to support and fund infrastructure as business-as-usual.