Dataset published on access to conference proceedings – thank you!

Thanks to all who’ve helped —

(Andrea, apm, Catherine Fitchett, Sarah Gallagher, Alison Fields, KNB, Manja Pieters, Brendan Smith, Dave, Hadrian Taylor, Theresa Rielly, Jacinta Osman, Poppa-Bear, Richard White, Sierra de la Croix, Christina Pikas, Jo Simons, and Ruth Lewis, plus some anonymous benefactors)

— all the conferences I was investigating have been investigated. ūüôā¬† I’ve since checked everything for consistency and link rot, added in a set of references that I had to research myself as I couldn’t anonymise them sufficiently in the initial run; deduplicated a few more times – conference names vary ridiculously – and finally ended up with a total of 1849 conferences which I’ve now published at https://dx.doi.org/10.6084/m9.figshare.3084727.v1

The immediately obvious stats from this dataset include:

Access to proceedings

  • 23.36% of conferences in the dataset had some form of free online proceedings – full-text papers, slides, or audiovisual recordings.
  • 21.85% had a non-free online proceedings
  • 30.72% had a physical proceedings available – printed book, CD/DVD, USB stick, etc, but not including generic references to proceedings having been given to delegates
  • 45.27% had no proceedings identifiable

(Percentages don’t add to 100% as some conferences had proceedings in multiple forms.)

Access to free online proceedings by year

This doesn’t seem to have varied much over the 6 years most of the conferences took place in:

2006: 39 / 173 = 22.54%
2007: 39 / 177 = 22.03%
2008: 62 / 258 = 24.03%
2009: 63 / 284 = 22.18%
2010: 105 / 428 = 24.53%
2011: 123 / 520 = 23.65%

Conferences attended by country

Conferences attended were in 75 different countries, including those with more than 20 conferences:

New Zealand: 429
USA: 297
Australia: 286
UK: 130
Canada: 67
China: 66
Germany: 44
France: 41
Italy: 35
Portugal: 31
Japan: 29
Spain: 28
Netherlands: 27
Singapore: 25

I won’t break down access to proceedings here, because this data is inherently skewed by the nature of the sample: conferences attended by New Zealand researchers. This means that small conferences in or near New Zealand are much more likely to be included than small conferences in other parts of the world. If a small conference is less resourced to put together and maintain a free online proceedings – or conversely a large society conference is prone to more traditional (non-free) publication options – this variation by conference size/type could easily outweigh any actual variation by country. So I need to do some thinking and discussing with people to see if there’s any actual meaning that can be pulled from the data as it stands. If you’ve got any thoughts on this I’d love to hear from you!

Further analysis now continues….

Progress report on how you’ve helped my research

At this point at least 20 people have helped me look for conference proceedings (some haven’t left a name so it’s somewhere between 20 and 42), which is awesome: thank you all so much! Last week saw us pass the halfway mark, an exciting moment. As of this morning, statistics are:

  • 1187 out of 1958 conferences investigated = 59% done
  • 312 have proceedings free online (26%)
  • of those without free proceedings, 292 have non-free proceedings online
  • of those without any online proceedings, 109 have physical proceedings (especially books or CDs)
  • 472 have no identifiable proceedings (40%)

I’ve got locations for all 1958, pending some checking. Remember this is out of conferences that New Zealand researchers presented at and nominated for their 2012 PBRF portfolio.

The top countries are:
New Zealand    492
Australia    315
USA    304
UK    133
Canada    69
(with China close behind at 68)

In New Zealand, top cities are predictably:
Auckland    154
Wellington    98
Christchurch    53
Dunedin    38
Hamilton    35

Along the way I’ve noticed some things that make the search harder:

  • sometimes authors, or the people verifying their sources, made mistakes in the citation
  • or sometimes people cited the proceedings instead of the conference itself – this isn’t a mistake in the context of the original data entry but makes reconciling the year and the city difficult.
  • or sometimes their citation was perfectly clear, but my attempt to extract the data into tidy columns introduced… misunderstandings (aka terrible, terrible mistakes).
  • or we’ve ended up searching for the same conference a whole pile of times because various people call it the Annual Conference of X, the Annual X Conference, the X Annual Conference, the International Conference of X, the Annual Meeting of X, etc etc.

On the other hand I’ve also noticed some things that make the search easier – either for me:

  • having done so many, I’m starting to recognise titles, so I can search the spreadsheet and often copy/paste a line
  • when all else fails I have access to the source data, so I can look up the title of the paper if I need to figure out whether I’m trying to find the 2008 or 2009 conference.

And things that could be generally helpful:

  • if a conference makes any mention of ACM, whether in the title or as a sponsor, then chances are the proceedings are listed in http://dl.acm.org/proceedings.cfm
  • if it mentions IEEE, try http://ieeexplore.ieee.org/browse/conferences/title/¬† If it’s there, then on the page for the appropriate year, scroll down and look on the right for the “Purchase print from partner” link – chances are you’ll get a page with an ISBN for the print option; plus confirming the location which is harder to find on IEEEXplore itself.
  • if it’s about computer science in any way, shape or form, then http://dblp.uni-trier.de/search/ can probably point you to the source(s). This is the best way to find anything published as a Lecture Notes in Computer Science (LNCS) because Springer’s site doesn’t search for conferences very well.
  • if you do a web search and see a search result for www.conferencealerts.com, this will confirm the year/title/location of a conference, and give you an event website (which may or may not still be around, but it’s a start). Unfortunately I haven’t found a way to search the site directly for past conferences.
  • a search result for WorldCat will usually confirm year/title/location and (if you scroll down past the holding libraries) often give you the ISBN for the print proceedings.

And two things that have delighted me:

  • Finding some online proceedings in the form of a page listing all the papers’ DOIs – which resolve to the papers on Dropbox.
  • Two of the conferences in the dataset have no identifiable city/country – because they were held entirely online.

I I am of course still eagerly soliciting help, if anyone has 10 minutes here or there over the next month (take a break from the silly season? ūüôā¬† Check out my original post for more, or jump straight to the spreadsheet.

Help me research conference proceedings and open access

I’ve been interested for a while in the amount of scientific/academic knowledge that gets lost to the world due to conference proceedings not being open access / disappearing off the face of the internet. My main question at the moment is, just how much is lost and how much is still available?

Unfortunately googling 1,955 conferences will rapidly give me RSI, so I’m hoping I can convince you to do a few for me – in the interests of science!

Background: I’ve written elsewhere about Open Access to conference literature (short version: conferences are where a huge amount of research gets its first public airing, yet conference papers are notoriously hard to track down after the fact) and Open Access and the PBRF (short version: if conference papers were all OA, PBRF verification/auditing would become a lot easier). Here I’m wanting to quantify the situation.

The data: The original dataset was sourced from TEC, from the list of conference-related NROs (nominated research outputs) from the 2012 PBRF round. There are obvious and non-obvious limitations but basically I feel this makes it a fairly good listing of conferences between 2006-2011 that New Zealand academics presented at and felt that presentation was worthy of being included among their best work for the period. The original dataset is confidential, but I’ve received permission to post a derived, anonymised dataset publically for collaborative purposes, and in due course publish it on figshare.

How you can help:
(Note: by contributing to the spreadsheet you’re agreeing to licence your contribution under a Creative Commons Zero licence, meaning anyone can later reuse it in any way with or without attribution. (Though I’ll be attributing it in the first instance – see below.))

  1. Go to the spreadsheet containing the list of conferences
  2. Pick a conference that doesn’t have any URLs/notes/name-to-credit
  3. SearchGoogle/DuckDuckGo/your search engine of choice for the conference name, year, and city to find a conference website. Assuming you find one:
  4. Correct any details that are wrong or missing: eg expand the acronym; add in missing locations; if the website says it’s the 23rd annual conference put “23” in the “No.” column, etc.
  5. Browse on the website for proceedings, list of papers, table of contents, etc. If you find:
    • a list of papers including links to the full text of each paper freely accessible, paste the URL in “Proceedings URL: free online”
    • a list of papers including links to the full text but requiring a login (including in a database or special journal issue), paste the URL in “Proceedings URL: non-free online”
    • information about offline proceedings eg a CD or book, paste the URL in “Proceedings URL/info re print/CD/etc”
    • none of the above, paste the URL of the conference website for that year in “Other URL: conference website”
  6. If you can’t find any conference website at all, write that in “Any notes” so others don’t try endlessly repeating the futile search!
  7. Sign with a “Name to credit” for your work. If you’d prefer to remain anonymous, put in n/a.
  8. If you like, return to step 2. ūüôā
  9. Share this link around!

What I’ll do with it:
First I’ll check it all! And obviously I’ll pull it back into my research and finish that up. I’ll also publish the final checked dataset on figshare under Creative Commons Zero licence so others can use it in their research. I’ll acknowledge everyone who helps and provides a name, in the creation of the dataset and in the paper I’m working on. And if someone wants to do a whole pile and/or be otherwise involved in the research then talk to me about coauthorship!

Why don’t I just use…

  • Mechanical Turk: I’m boycotting Amazon, for various reasons. Plus I consider a fair price for the work would be at least US$0.50 a conference (possibly double that) and as that’s a bit harder to afford I feel more ethical being upfront about asking folk to do it for free.
  • Library assistants: I am doing this a bit but there’s a limited period where they’re still working before summer hours and things have got quiet enough that they have time.
  • Something else: Ask me, I may want to!

Other questions
Please comment or email me.

Innovate #vala14 #s13 #s14 #s15

Hue Thi Pham and Kerry Tanner Influences of technology on collaboration between academics and librarians

Interrelationships between collaboration, institutional structure, and technology.
Things like Google Apps tend to be used within departments – less use on smaller campuses because more casual face-to-face interaction. Level of use varies by discipline, faculty, campus.
Social technologies like Twitter used in lectures
Learning management system (eg Moodle) most important technology mentioned in interviews.
Institutional repository common space for depositing resources

Technology facilitating transition from traditional to digital library – more electronic resources, communicating over telephone, email, Skype. But purely online interaction means a reduced mutual understanding of partners’ contributions, and an old perception of librarians’ roles.

Divide between library system and learning management system leads to a divide between the two communities around these. Librarians complain they can’t do a workshop about an assignment without Moodle access to see the assignment. Academics say they think librarians could have a role but they don’t understand why they would need access or what they would do with it. Lack of coordination can be a problem – means LMS people and library people make decisions that each other isn’t aware of. Siloisation.

Library staff need to consider roles of interpersonal interaction with technology – value of tech, value of face-to-face interaction, importance of space design / architecture. Get automatic access to learning management system but avoid resulting workload. Need to find ways to integrate library management system with learning management system.

Audience comment: Involvement of librarian in discussion boards can be useful – some topics the academics are relieved to leave to librarian. But important to have awareness of mutual roles.

Lisa Ogle and Kai Jin Chen Just accept it! Increasing researcher input into the business of research outputs

Implementing Symplectic Elements at UoNewcastle. (37,000 students, 1000 academics plus 1500 professional staff) HERDC is reporting exercise to Australian government to secure funding – sounds similar to New Zealand’s PBRF. Work managed by research division but most data entry done by admin folk. Issues include duplicate data entry, variance in data quality, many publications never reported – funding missed out on. Library asked to assist from 2005 – centralised model addresses many issues.

Various identification mechanisms: scholarly databases, researchers, conference lists, uni website, library orders. All put manually into Endnote library, then manually copy/pasted into Callista database. Labour-intensive and would often be a 2-6 month delay for researchers, very frustrating.

Getting Elements. Loved harvesting from databases (based on search settings: “We think this is your publication, please log in to claim or reject it”). Originally not keen on opening up to researchers, but after demos got convinced researchers could add manual entry without compromising data quality as library/research staff can verify and lock it.

Benefits: database searches can be customised to minimise false positives/negatives. Can delegate others to act on researchers’ behalf. Publications appear on profile within 48 hours. Can upload Endnote libraries. Can include ‘in press’ publications without messing up workflow. Easily generate publication lists. Capture of bibliometric data. Pretty graphs on user’s dashboard.

Have been running 4 months, 2 thirds of publishing academics have logged in and interacted with system. (800 in first two weeks, and a lull over summer). 2900 publications in the system from current collection year (usually 3500).

Challenges: early adopter in Australian market. Development module took longer than expected – learned that everyone does HERDC differently.

Most negative feedback so far is from people who haven’t yet logged into the system. Someone complaining it was too hard – talked her through it over the phone and now fine.

Need to investigate further repository integration.

Malcolm Wolski and Joanna Richardson Terra Nova: a new land for librarians?
Big issues emerging around vast amounts of data and trying to connect it. Global connectedness another impact.

Researchers needing a “dry lab” to work with data instead of hands-on wet-lab. Seeing this in many areas.
Researchers can’t afford to work solo any more. Much infrastructure costs beyond reach of individual researcher or individual centre. Problems are too much for one person.
Can get storage and computing power – but may need to work with data for ten years so need to be able to retain it and keep working on it through changing technology. Lots of outputs are governmental reports not journal articles.
Most large research projects these days involve communities – even incorporated bodies.
80% of papers in the EU are of people collaborating with people outside their institution.

NeCTAR have invested heavily in virtual laboratories because it’s not just about creating data but using it – of course this creates more data.
In theory nothing stops a researcher going to Research Data Storage Infrastructure for storage without their university knowing.
Various community solutions like Tropical Data Hub, Australian National Corpus – slide lists a pile and he points out that for each of these, some institution has put their hand up to take responsibility for maintenance.

Approach of institutions keeping their own data but having to share metadata. Requires lots of discussion around data schemas – what you expect to find in data descriptions. Eg Research Data Australia from 85 participating organisations and growing. Goal to get more data, better connected data, more findable/usable.

Two impacts around:
Research tools: New suite from NeCTAR and ANDS eg virtual laboratories, discipline-specific tools. Need to choose which we’ll support, which data collection schemes we’ll be involved in. May need to develop our own tools for specific disciplines.
Library/research collaboration: Moving more to a partnership model.

Libraries provide support for data management plans and citing data, but there’s huge demand for archiving/preserving data.

Impact on university libraries:

  • New jobs coming out for the “databrarian”.
  • Need research services to help develop common data structures
  • Participation in cross-disciplinary teams bringing librarian skills
  • Development of legal frameworks for acquiring, generating, storing and sharing data
  • Assisting with development of tools – lots of disciplines have different ways of exploring/analysing data so national collections/communities may have specific search (eg maps, chemical structure, vs facets) or visualisation tools.
  • Archiving and preservation services

Librarian support roles

  • Sourcing relevant data sets
  • Consultancy – identify faculty needs, refer back to experts
  • Targeted outreach services re data citation or data repositories
  • New support service tools and processes

Want to be able to offer a service to researchers and them not have to worry about where it’s stored, whether on campus or Amazon Web Services or whatever.

Open Access cookies

Creative Commons Aotearoa New Zealand are running a series of blogposts for Open Access Week, and I’ve contributed Levelling up to open research data.

I also, for Reasons, had an urge tonight to make Open Access biscuits. (I know my title says ‘cookies’, but the real word is of course ‘biscuits’, and I shall use it throughout the rest of this post along with real measurements and real temperatures. Google can convert for you, should you need it to.) The following instructions I hereby license as Creative Commons Zero, which should not be taken as a reflection on their calorie count.

First I started with a standard biscuit base recipe. You could use your own. I used the base for my family’s recipe for chocolate chip biscuits, which probably means it ultimately derives from Alison Holst, but I think I’ve modified it sufficiently that it’s okay to include here:

  1. Cream 125 grams of butter and 125 grams of sugar. The longer you beat it, the light and crisper the biscuits will be.
  2. Beat in 2 tablespoons sweetened condensed milk (or just milk will do, at a pinch) and 1 teaspoon vanilla essence.
  3. Sift in 1.5 cups of flour and 1 teaspoon of baking powder and mix to a dough.

Now we diverge from the chocolate chip recipe by not adding 90 grams of chocolate chips. We also divide the mixture in half, dying one half orange by using a few drops of red colouring and three times as many drops of yellow colouring:

Open Access biscuits step 1

The plain lot should then be divided into halves, each half rolled long and flat.
The orange lot should have just a small portion taken off and rolled into a fat spaghetto (a bit thinner than I did would be ideal), and the rest rolled into a large rectangle.

Then start rolling it together into our shape. The orange spaghetto gets rolled up into one of the plain rectangles. In this photo I’m doing two steps at once – most of the orange hasn’t been properly rolled out yet:

Open Access biscuits step 2

Then roll the rest of the orange around that with enough hanging off the top that you can fit some more plain stuff in to keep the lock open:

Open Access biscuits step 3

The ends will be raggedy. Don’t worry, this is all part of the plan.

At this point, put your roll of dough into the fridge to firm up a bit while you do the dishes. You could also consider feeding the cat, cooking dinner, etc. Or you can skip this step (or shorten it as I did) and it won’t hurt the biscuits, you’ll just have to do more shaping with your fingers because cutting the slices squashes them into rectangles:

Open Access biscuits step 4

These slices are about half a centimetre thick. I got about 38 off this roll, plus the raggedy ends. Remember I said those were part of the plan? Right, now – listen carefully, because this is very important – what you need to do is dispose of all the raggedy ends that won’t make pretty biscuits by eating the raw dough. I know, I know, but somebody’s got to do it.

The rest of the biscuits you put on a tray in the oven on a slightly low setting, say 150 Celsius, while you do the dishes that you missed last time because they were under things, and generally tidy up. 10 minutes or so, but whatever you do don’t go and start reading blogs because once these start to burn they burn quickly. Take them out when the ones in the hottest part of the oven are just starting to brown, and turn out onto a cooling rack.

Et voilà, open access biscuits:

Open Access biscuits step 5

Evidence-based librarianship #lianza11 #keynote7

Andrew Booth
Evidence based library and information practice: harnessing professional passions to the power of research

Wants to talk about a passion for continually monitoring, evaluating and improving our practice.

Four-square of:
Research   | Practice-based research
Practice     | Research-based practice

Need to be using research and researching our own service. Not just research that gets into journals, but local data.


  • EBLIP has a bit of a “cold fish” reputation. Wants to get away from this idea. Really initiative and enthusiasm is vital.
  • The Librarian knows best (aka the Divine Right of Librarians). Our passion may colour our view of what’s best for users. Have to be cautious about thinking we know what our users want – doesn’t always match up. See also cognitive biases (eg primacy effects, recency effects, stereotypes, perseverance of belief, selective perception). “And we all suffer from Question Framing Bias, don’t we?” Librarians keep assuming there’s a right way of searching, rather than showing users how to harness the skills they have – eg “the dreaded Google Search typically displays a PubMed Abstract on page one”.

Can harness evidence-based practice and passion together. Quotes an informant from Partridge (2007) saying they’re passionate about things so wanted to have things to “back up my passion”.

We need to be evidence-based. Difference between barber (just the same inventory of skills as ever) and surgeon (body of knowledge continually built on) – also talks about the “Oops” factor: how do they behave when things go wrong? Note that abuse of evidence-based practice can be as dangerous as cutting off the wrong leg.

Using research = best practice + best use of resources. Lets professionals add value to work practices. Need to evaluate both ourselves and our professional practice. At the professional level it can inform practice, help see where we’re going, raise profile of librarianship, and improve status of library.

If we’re not practicing EBLIP we might be deferring to (Isaacs & Fitzgerald, 1999): eminence-based LIP, vehemence-based LIP, eloquence-based LIP, providence-based LIP; diffidence-based LIP; nervousness-based LIP, confidence-based LIP; propaganda-based LIP.

Must align evidence, profession and passion.

Role of evidence-based library and info practice
“Difference between research and using evidence-based practice to make workplace decisions”. Quotes someone saying there’s nothing wrong with reinventing the wheel – it’s reinventing the flat tyre you want to avoid.

Not just about research. About integrating user-reported, practitioner-observed, and research-derived evidence. Not undervaluing what staff say, but restoring balance to value users too.

Start starting            | Start stopping
(innovation)             | (discontinuing
                                 | ineffective practice)
Stop starting            | Stop stopping
(not introducing       | (continuing
ineffective practice) | effective practice)

The 5 As:
Ask a focused question
Acquire the evidence
Appraise the studies
Apply the findings
Assess the impact

Reflection for, before, in, on action and Re-action.

EBLIP comes from medicine and is suitable for healthcare but less so for other systems.  So must adapt the model, not adopt it uncritically.

Eg doctors are often autonomous; librarians work together.

So rewriting 5 As:
Articulating the problem
Assembling the evidence base
Assessing the evidence
Agreeing the actions
Adapting the implementation

Q: Your last slide is about ultimate goal of EBLIP to create a toolbox we can dip into, and thus to write itself out of existence – that was the last slide six years ago so how long will it take?
A: Did think about it! There’s been progress.¬† Still people see EBLIP as a project to stop and start, not to sustain.

Q: Can you give examples?
A: One workshop he does is called “Walking the Walk”. Some great examples around developing webpages – many poorly designed – Cancer Library in the UK came up with webdesign guidelines backed up by evidence. Much has been done esp in Canada around evidence-based collection development.