Tag Archives: data curation

Developing eResearch@Flinders #vala14 #s35

Amanda Nixon, Liz Walkley Hall, Ian McBain, Richard Constantine and Colin Carati We built it and they are coming: the development of eResearch@Flinders

“eResearch” – use of info/comm technologies in a research space:
* in data management
* high performance computing
* collaboration tools
* visualisation / haptics (tactile sense of using computing)

Operating at Flinders Uni since April 2012, started with ANDS/uni funding and longstanding relationship with academics. Using core library skills:
* liaison with researchers
* liaison with service providers
* metadata creation
* service ethic

Structure is partnership between library/ICT. Includes statistical consultant, metadata stores project officers, eresearch support librarian, open scholarship and data management librarian. Because they’re new do a lot of reporting: to library senior staff committee, info services executive, eResearch advisory committee.

Primarily dealing with data storage (big or small, complex or simple), high performance computing, collaboration skills. Identify tools and services, refer researchers to service providers, prepare info on return on investment, do outreach to researchers.

ReDBox software (had a ReDBox community day for all institutions using/developing this)
Planning, coordination of data management services – set up ReDBox, got it running, and right on cue ARC are requiring data management plans. Have done lots of outreach but now as result of ARC rule changes researchers are coming to them.

Statistica consultant – individual consultations and workshops, covering use of SPSS, NVivo, handles licensing

Mapping old skills:
staff management -> staff management
researcher liaison -> researcher liaison
vendor relationships -> eResearch service provider relationships
assessing value of resources -> assessing value of eResearch tools
referral to services -> referral to eResearch tools
metadata creation re publications -> metadata creation re research data

New skills:
business analysis
social media
event management
managing software development
having an ear on the ground to make connections
matchmaking

Why does it work?
* we come from the library which is well-respected so good PR
* we do good liaison
* building on existing skills
* building on institutional knowledge
* don’t know all the answers but can find them
* most importantly: there was an unfulfilled need

Launch by vice-chancellor, 8-session staff development programme to introduce library staff to what they do. Since ARC rule change haven’t had to do any coldcalling because people are calling them. Brokering more access to federally funded data storage. In uni Research Strategic Plan and Info Services Strategic Plan.

Q: How to show you’re successful?
A: Want to collate list of new relationships built because of matchmaking, successful grant applications where they’ve given advice, publications coming out of things. Don’t know how to pull it together yet but probably a matter of following up and keeping relationships going.

Q: What KPIs do you have?
A: Strategic plans very high-level – getting people involved in things. Usage stats of data storage. Further down the track as business model changes, more cost, might be harder to create useful KPIs.

Innovate #vala14 #s13 #s14 #s15

Hue Thi Pham and Kerry Tanner Influences of technology on collaboration between academics and librarians

Interrelationships between collaboration, institutional structure, and technology.
Things like Google Apps tend to be used within departments – less use on smaller campuses because more casual face-to-face interaction. Level of use varies by discipline, faculty, campus.
Social technologies like Twitter used in lectures
Learning management system (eg Moodle) most important technology mentioned in interviews.
Institutional repository common space for depositing resources

Technology facilitating transition from traditional to digital library – more electronic resources, communicating over telephone, email, Skype. But purely online interaction means a reduced mutual understanding of partners’ contributions, and an old perception of librarians’ roles.

Divide between library system and learning management system leads to a divide between the two communities around these. Librarians complain they can’t do a workshop about an assignment without Moodle access to see the assignment. Academics say they think librarians could have a role but they don’t understand why they would need access or what they would do with it. Lack of coordination can be a problem – means LMS people and library people make decisions that each other isn’t aware of. Siloisation.

Library staff need to consider roles of interpersonal interaction with technology – value of tech, value of face-to-face interaction, importance of space design / architecture. Get automatic access to learning management system but avoid resulting workload. Need to find ways to integrate library management system with learning management system.

Audience comment: Involvement of librarian in discussion boards can be useful – some topics the academics are relieved to leave to librarian. But important to have awareness of mutual roles.

Lisa Ogle and Kai Jin Chen Just accept it! Increasing researcher input into the business of research outputs

Implementing Symplectic Elements at UoNewcastle. (37,000 students, 1000 academics plus 1500 professional staff) HERDC is reporting exercise to Australian government to secure funding – sounds similar to New Zealand’s PBRF. Work managed by research division but most data entry done by admin folk. Issues include duplicate data entry, variance in data quality, many publications never reported – funding missed out on. Library asked to assist from 2005 – centralised model addresses many issues.

Various identification mechanisms: scholarly databases, researchers, conference lists, uni website, library orders. All put manually into Endnote library, then manually copy/pasted into Callista database. Labour-intensive and would often be a 2-6 month delay for researchers, very frustrating.

Getting Elements. Loved harvesting from databases (based on search settings: “We think this is your publication, please log in to claim or reject it”). Originally not keen on opening up to researchers, but after demos got convinced researchers could add manual entry without compromising data quality as library/research staff can verify and lock it.

Benefits: database searches can be customised to minimise false positives/negatives. Can delegate others to act on researchers’ behalf. Publications appear on profile within 48 hours. Can upload Endnote libraries. Can include ‘in press’ publications without messing up workflow. Easily generate publication lists. Capture of bibliometric data. Pretty graphs on user’s dashboard.

Have been running 4 months, 2 thirds of publishing academics have logged in and interacted with system. (800 in first two weeks, and a lull over summer). 2900 publications in the system from current collection year (usually 3500).

Challenges: early adopter in Australian market. Development module took longer than expected – learned that everyone does HERDC differently.

Most negative feedback so far is from people who haven’t yet logged into the system. Someone complaining it was too hard – talked her through it over the phone and now fine.

Need to investigate further repository integration.

Malcolm Wolski and Joanna Richardson Terra Nova: a new land for librarians?
Big issues emerging around vast amounts of data and trying to connect it. Global connectedness another impact.

Researchers needing a “dry lab” to work with data instead of hands-on wet-lab. Seeing this in many areas.
Researchers can’t afford to work solo any more. Much infrastructure costs beyond reach of individual researcher or individual centre. Problems are too much for one person.
Can get storage and computing power – but may need to work with data for ten years so need to be able to retain it and keep working on it through changing technology. Lots of outputs are governmental reports not journal articles.
Most large research projects these days involve communities – even incorporated bodies.
80% of papers in the EU are of people collaborating with people outside their institution.

NeCTAR have invested heavily in virtual laboratories because it’s not just about creating data but using it – of course this creates more data.
In theory nothing stops a researcher going to Research Data Storage Infrastructure for storage without their university knowing.
Various community solutions like Tropical Data Hub, Australian National Corpus – slide lists a pile and he points out that for each of these, some institution has put their hand up to take responsibility for maintenance.

Approach of institutions keeping their own data but having to share metadata. Requires lots of discussion around data schemas – what you expect to find in data descriptions. Eg Research Data Australia from 85 participating organisations and growing. Goal to get more data, better connected data, more findable/usable.

Two impacts around:
Research tools: New suite from NeCTAR and ANDS eg virtual laboratories, discipline-specific tools. Need to choose which we’ll support, which data collection schemes we’ll be involved in. May need to develop our own tools for specific disciplines.
Library/research collaboration: Moving more to a partnership model.

Libraries provide support for data management plans and citing data, but there’s huge demand for archiving/preserving data.

Impact on university libraries:

  • New jobs coming out for the “databrarian”.
  • Need research services to help develop common data structures
  • Participation in cross-disciplinary teams bringing librarian skills
  • Development of legal frameworks for acquiring, generating, storing and sharing data
  • Assisting with development of tools – lots of disciplines have different ways of exploring/analysing data so national collections/communities may have specific search (eg maps, chemical structure, vs facets) or visualisation tools.
  • Archiving and preservation services

Librarian support roles

  • Sourcing relevant data sets
  • Consultancy – identify faculty needs, refer back to experts
  • Targeted outreach services re data citation or data repositories
  • New support service tools and processes

Want to be able to offer a service to researchers and them not have to worry about where it’s stored, whether on campus or Amazon Web Services or whatever.

Institutional repositories for data?

Via my Twitter feed:

University researcher sites lack of “institutional repositories” where data can be published as a reason more data isn’t online. #nethui

— Jonathan Brewer (@kiwibrew) July 8, 2013

(And discussion ensuing.)

I’m not an expert in data management. A year ago it was top of my list of Things That Are Clearly Very Important But Also Extremely Scary, Can Someone Else Please Handle It? But then I got a cool job which includes (among other things) investigating what this data management stuff is all about, so I set about investigating.

Sometime in the last half year I dropped the assumption that we needed to be working towards an institutional data repository. In fact, I now believe we need to be working away from that idea. Instead, I think we should be encouraging researchers to deposit their datasets in the discipline-specific (or generalist) data repositories that already exist.

I have a number of reasons for this:

  • My colleague and I, with a certain amount of outsourcing, already have to run a catalogue, the whole rickety edifice of databases and federated searching and link resolving and proxy authentication, library website and social media account, institutional repository, community archive, open journal system, etc etc. Do we look like we need another system to maintain?
  • An institutional archive is great kind of serviceable for pdfs. But datasets come in xls, csv, txt, doc, html, xml, mp3, mp4, and a thousand more formats, no, I’m not exaggerating. They can be maps, interviews, 3D models, spectral images, anything. They can be a few kilobytes or a few petabytes. Yeah, you can throw this stuff into DSpace, but that doesn’t mean you should. That’s like throwing your textbooks, volumes of abstracts, Kindles, Betamax, newspapers, murals, jigsaw puzzles, mustard seeds, and Broadway musicals (not a recording, the actual theatre performance) onto a single shelf in a locked glass display cabinet and making people browse by the spine labels.
  • If you want a system that can do justice to the variety of datasets out there, you’d better have the resources of UC3 or DCC or Australia or PRISM. Because you’re either going to have to build it or you’re going to have to pay someone to build it, and then you’re going to have to maintain it. And you’re going to have to pay for storage and you’re going to have to run checksums for data integrity and you’re going to have to think about migrating the datasets as time marches on and people forget what the current shiny formats are. And you’re going to have to wonder if and how Google Scholar indexes it (and hope Google Scholar lasts longer than Google Reader did) or no-one will ever find it. And a whole lot more else.
  • If anything’s in it. Do you know how hard it is to get researchers to put their conference papers into institutional repositories? My own brother flatly refuses. He points out that his papers are already available via his discipline’s open access repository. That’s where people in his discipline will look for it. It’s indexed by Google. Why put it anywhere else? I conceded the point for the sake of our family dinner, and I haven’t brought it up again because on reflection he’s right. (He’s ten years younger than me; he has no business being right, dammit.) And because it’s hard enough to get researchers to put their conference papers into institutional repositories even when their copy is the only one in existence.
  • Do you know how hard it is to convince most researchers that they should put their datasets anywhere online other than a private Dropbox account? (Shameless plug: Last week another colleague and I did a talk responding to 8 ‘myths’ or reasons why many researchers hesitate – slides and semi-transcript here. That’s summarised from a list we made of 23 reasons, and other people have come up with more objections and necessary counters.) The lack of an institutional repository for data doesn’t even rate.

No, forget creating institutional data repositories. What we need to be doing is getting familiar with the discipline data repositories and data practices that already exist, so when we talk to a researcher we can say “Look at what other researchers in your discipline are doing!”

This makes it way easier to prove that this data publishing thing isn’t just for Those Other Disciplines, and that there are ways for them to deal with [confidentiality|IP issues|credibility|credit]. And it makes sure the dataset is where other researchers in that discipline are searching for it. And it makes sure the datasets are deposited according to that discipline’s standards and that discipline’s needs, not according to the standards and needs of whoever was foremost in mind of the developer who created the generic institutional data repository – so the search interface will be more likely to work reasonable for that discipline. And it means the types of data will be at least a little more homogenous (in some cases a lot more) so there’s more potential for someone to do cool stuff with linked open data.

And it means we can focus on what we do best, which is helping people find and search and understand and use and cite and publish these resources. Trust me, there is plenty more to do in data management than just setting up an institutional data repository.

Links of Interest 30/3/2012 – article linker, impact factors of open access journals, and more

Customer service
UConn Discovers What Students Want From Their Library – too complex for a pull quote, follow the link for a summary.

Two solutions for increasing the usability of that blasted Article Linker page:

Open Access
JQ at the University of Oregon writes about High-impact open access journals and includes some invaluable tables of OA journals ranked by SJR, SciMago, and Eigenfactor impact factors. These (sorted by subject) could be useful for promoting OA to departments and to students graduating from university who still want to keep up with research.

Positioning Open Access Journals in a LIS Journal Ranking looks at OA journals in the library science field:
This research uses the h-index to rank the quality of library and information science journals between 2004 and 2008. Selected open access (OA) journals are included in the ranking to assess current OA development in support of scholarly communication. It is found that OA journals have gained momentum supporting high-quality research and publication, and some OA journals have been ranked as high as the best traditional print journals. The findings will help convince scholars to make more contributions to OA journal publications, and also encourage librarians and information professionals to make continuous efforts for library publishing.

Data curation
Demystifying the data interview: Developing a foundation for reference librarians to talk with researchers about their data
As libraries become more involved in curating research data, reference librarians will need to be trained in conducting data interviews with researchers to better understand their data and associated needs. This article seeks to identify and provide definitions for the basic terms and concepts of data curation for librarians to properly frame and carry out a data interview using the Data Curation Profiles (DCP) Toolkit.

Subscription statistics
Subscriptions in Context (powerpoint) is a clear and elegant presentation for University of Central Oklahoma library faculty liaisons on all the factors the Serials department considers when evaluating subscriptions.

Just for fun
A Library Society of the World thread began, “Gregor Samsa awoke from uneasy dreams to find he had been transformed into a monstrous librarian” and went on from there.