Tag Archives: cloud computing

Cloud gazing #vala14 #s8 and #s9

Michelle McLean, Residing in the cloud: looking at the forecast now and into the future
Service models:
Software as a service (LibGuides, Office365, HathiTrust)
Platform as a service (eg Yahoo Pipes, OCLC Web Services, Google App Engine)
Infrastructure as a service (Britash Library, Library of Congress, My Kansas Library)

Deployment models:
Private cloud
community cloud
hybrid cloud
public cloud

Essential characteristics:
Resource pooling
rapid elasticity
on-demand self-service
measured service
broad network access

Pros

  • Scale and cost
  • Change management done for you – you don’t have to worry about upgrades
  • Choice and agility – if you want something new just pay and you get it
  • Next-generation architecture
  • IT isn’t a library core business – let the experts do it. Better security, better sustainability, better reliability

Cons

  • Security – when people leave need to remove their access right away because access through the web. All big companies have had failures
  • Lock-in. Need to be sure you can take your data with you if you leave
  • Lack of control. If the website is down where is the problem?
  • Financial savings mightn’t be as good as predicted.
  • You lose your IT expertise if you outsource, but then you lose your first point of trouble-shooting.

Preparing for the cloud
Consider security, privacy, access, law, lock-in, whether it’s right for your business.
Cloud computing services are marginally more reliable that IT departments (99% vs 98% uptime). So make sure you have backup systems.

Derek Whitehead All on the ground: there is no cloud
Metaphor of cloud as fluffy, friendly, faraway – slideshows never show stormclouds!
Behind the metaphor nothing’s actually in the cloud, they’re in servers in a building on the ground in a legal jurisdiction (not always ours).

There are four basic perspectives on the cloud:

  • Technology
  • Content – “information located remotely” but information is rarely independent of computation
  • Personal – companies want us to locate our info elsewhere than our own computers so they can ‘develop a relationship’ with us [lovely euphemism there! -Deborah]
  • Legal – jurisdiction makes a difference though not quite as simple as “in Australia = free of PATRIOT Act”. Frequently mirrored, moved around, using redundancy to safeguard info. People mostly concerned about privacy legislation – strong in Australia and Europe.

Swinburne’s policy is to externally host/manage most where possible – “opportunistic vendor hosting”. Student email; HR; learning management system, library system, OJS, etc.

What do we want the cloud people to do for us? Vendor cloud hosting vs service aggregator provision. Huge range of hybrid or multisource options. But services have to be efficient, reliable, high quality, fast to access, and cost-effective.

Why would we do it? When a kid, generated own electricity – not a great way to live. Thinks IT will one day look back at the idea of having your own server in your basement in the same way. Cost minimisation, efficiency, economies of scale — all of these issues. Security is an issue because bigger targets for hackers, but also have bigger resources to defend against them.

Will need a realignment in skillsets. Getting ability to read/write/negotiate contracts is vital.
But libraries are leaders. Remember when we moved from print to CD-ROMs? (Okay, this was the wrong direction…)
Exit strategies where possible – harder in monopoloy situations.
Helped by clear customer benefits and freeing up buildings. Libraries have access to economies of scale, we’re comfortable with automation, it benefits collaboration.

Q: What’s the customer experience of change to the cloud?
A: Infrastructure/management should be invisible to customers. But having info in the cloud brings huge benefits: eg huge increase in number of articles used by academics when they can get them from their desktop.

Q: What if things go wrong?
A: With an external host you’ll have remedies in the contract if things go wrong – no such remedy if you stuff up yourself!

U of Washington eScience Institute #nzes

eScience and Data Science at the University of Washington eScience Institute
“Hangover” Keynote by Bill Howe, Director of Research, Scalable Data Analytics, eScience Institute Affiliate Assistant Professor, Department of Computer Science & Engineering, University of Washington

Scientific process getting reduced to database problem – instead of querying the world we download the world and query the database…

UoW eScience Inst to get in the forefront of research in eScience techniques/technology, and in fields that depend on them.

3Vs of big data:
volume – this gets lots of attention but
variety – this is the bigger challenge
velocity

Sources a longtail image from Carol Goble showing lots of data in Excel spreadsheets, lab books, etc, is just lost.
Types of data stored – especially data data and some text. 87% of time is on “my computer”; 66% a hard drive…
Mostly people are still in the gigabytes range, or megabytes, less so in terabytes (but a few in petabytes).
No obvious relationship between funding and productivity. Need to support small innovators, not just the science stars.

Problem – how much time do you spend handling data as opposed to doing science? General answer is 90%.
May be spending a week doing manual copy-paste to match data because not familiar with tools that would allow a simple SQL JOIN query in seconds.
Sloan Digital Sky Survey incredibly productive because they put the data online in database format and thousands of other people could run queries against it.

SQLShare: Query as a service
Want people to upload data “as is”. Cloud-hosted. Immediately start writing queries, share results, others write their queries on top of your queries. Various access methods – REST API -> R, Python, Excel Addin, Spreadsheet crawler, VizDeck, App on EC2.

Metadata
Has been recommending throwing non-clean data up there. Claims that comprehensive metadata standards represent a shared consensus about the world but at the frontier of research this shared consensus by definition doesn’t exist, or will change frequently, and data found in the wild will typically not conform to standards. So modifies Maslow’s Needs Hierarchy:
Usually storage > sharing > curation > query > analytics
Recommends: storage > sharing > query > analytics > curation
Everything can be done in views – cleaning, renaming columns, integrating data from different sources while retaining provenance.

Bring the computation to the data. Don’t want just fetch-and-retrieve – need a rich query service, not a data cemetary. “Share the soup and curate incrementally as a side-effect of using the data”.

Convert scripts to SQL and lots of problems go away. Tested this by sending postdoc to a meeting and doing “SQL stenography” – real-time analytics as discussion went on. Not a controlled study – didn’t have someone trying to do it in Python or R at same time – but would challenge someone to do it as quickly! Quotes (a student?) “Now we can accomplish a 10minute 100line script in 1 line of SQL.” Non-programmers can write very complex queries rather than relying on staff programmers and feeling ‘locked out’.

Data science
Taught an intro to data science MooC with tens of thousands of students. (Power of discussion forum to fix sloppy assignment!)

Lots of students more interested in building things than publishing, and are lost to industry. So working on ‘incubator’ projects, reverse internships pulling people back in from industry.

Q: Have you experimented with auto-generating views to cleanup?
A: Yes, but less with cleaning and more deriving schemas and recommending likely queries people will want. Google tool “Data wrangler”.

Q: Once again people using this will think of themselves as ‘not programmers’ – isn’t this actually a downside?
A: Originally humans wrote queries, then apps wrote queries, now humans are doing it again and there’s no good support for development in SQL. Risk that giving people power but not teaching programming. But mostly trying to get people more productive right now.

Design patterns for lab #labpatterns; Research cloud – #nzes

A pattern language for organising laboratory knowledge on the web #labpatterns
Cameron McLean, Mark Gahegan & Fabiana Kubke, The University of Auckland
Google Site

Lots of lab data hard to find/reuse – big consequences for efficiency, reproducibility, quality.
Want to help researchers locate, understand, reuse, and design data. Particularly focused on describing semantics of the experiment (rather than semantics of data).

Design pattern concept originated in field of architecture. Describes a solution to a problem in a context. Interested in idea of forces – essential/invariant concepts in a domain.

Kitchen recipe as example of design pattern for lab protocol.
What are recurring features and elements? Forces for a cake-like structure include: structure providers (flour), structure modifier (egg), flavours; aeration and heat transfer.

Apply this to lab science, in a linked science setting. Take a “Photons alive” design pattern (using light to virtualise biological processes in an animal). See example paper. Can take a sentence re methodology and annotate eg “imaging” as diagnostic procedure. This using current ontologies gives you the What but not the Why. Need to tag with a “Force” concept eg “immobilisation”. Deeper understanding of process – with role of steps. And can start thinking about what other methods of immobilisation there may be.

So how can we make these patterns? Need to use semantic web methods.
A wiki for lab semantics. (Wants to implement this.) Semantic form on wiki – a template. Wiki serves for attribution, peer review, publication – and endpoint to RDF store.

Q: How easy is this to use for a domain expert?
A: Semantic modeling is iterative process and not easy. But semantic wiki can hide complexity from enduser so domain expert can just enter data.

Q: We spend lots of time pleading with researchers to fill out webforms. How else can we motivate them, eg to do it during process rather than at end?
A: Certain types of people are motivated to use wiki. This is first step, proof of concept. Need a critical mass before self-sustaining.

Q: How much use would this actually be for domain experts? Would people without implicit knowledge gain from it?
A: Need to survey this and evaluate. It’s valuable as a democratising process.

Q: What about patent/commercial knowledge?
A: Personally taking Open science / linked science approach – intended for research that’s intended to be maximally shared.

A “Science Distribution Network” – Hadoop/ownCloud syncronised across the Tasman
Guido Aben, AARNet; Martin Feller, The University of Auckland; Andrew Farrell, New Zealand eScience Infrastructure; Sam Russell, REANNZ

Have preferred to do one-to-few applications rather than google-style one-to-billions. Now changing. Because themselves experiencing trouble sending large files. Scraped up own file transfer system, marketed as cloudstor though not in the cloud and doesn’t store things. Expected couple hundred uses, got 6838 users over the last use. Why linear growth? “Apparently word of mouth is a linear thing…” Seem to be known by everyone who have file-sharing issues.

FAQs:
Can we keep files permanently?
Can I upload multiple files?
Why called cloudstor when it’s really for sending?

“cloudstor+ beta” – looks like dropbox so why doing this if already there? They’re slow (hosted in Singapore or US). Cloudstor+ 30MB/s cf 0.75MB/s as a maximum for other systems. Pricing models not geared towards large datasets. And subject to PRISM etc.

Built on a stack:
Anycast | AARNet
ownCloud – best OSS they’ve seen/tested so far – has plugin system and defined APIs
MariaDB
hadoop – but looking at substituting with XTREEMFS which seems to work with latencies.

Distributed architecture – can be extended internationally. Would like one in NZ, Europe, US, then scale up.

Bottleneck is from desktop to local node. Only way they can address this is to get as close to researcher as possible – want to build local nodes on campus.

Mobile vs Smartphones & other links of interest 14/4/10

Mobile vs Smartphones
Roy Tennant suggests not making any more mobile websites as research suggests more people (in the US) are getting smartphones that can support anything a normal web-browser can support. (Though I don’t know of any smartphone that supports a 1024×768 screensize…) Smartphone applications seem to be trending instead. The iLibrarian rounds up her Top 30 Library iPhone Apps (part 2 and part 3). Why an application when you’ve already got a website? Phil Windley points out that “If my bank can get me to download an app, then they have a permanent space on my app list.” The trade-off is that whereas a website should work on any browser, smartphone apps often need to be in proprietary formats (the Librarian in Black particularly complains about Apple’s iPhone in this respect).

Web 2.0
Common Craft has a 3-minute video explaining “Cloud Computing in Plain English“.

The Metropolitan Museum of Art Libraries and Brown University Library provide a “dashboard” of widgets on their websites displaying current statistics about library usage.

View from the top šŸ™‚
The University Librarian at McMaster University Library blogs results from their laptop survey. Apparently laptop circulation now accounts for about a third of their total circulation stats; their survey looks into how students are using the laptops.

The Director of Librarys at the State University of New York at Potsdam blogs about “What I’ve Learned” in the first 10 months of her job there.

Scandal of the week…
Barbara Fister summarises recent discussion about EBSCO as the “New Evil Empire” in her Library Journal article “Big vendor frustrations, disempowered librarians, and the ends of empire“.

Fun
Alice for the iPad – one of the ways technology can enhance the book.