Tag Archives: crowdsourcing

Dataset published on access to conference proceedings – thank you!

Thanks to all who’ve helped —

(Andrea, apm, Catherine Fitchett, Sarah Gallagher, Alison Fields, KNB, Manja Pieters, Brendan Smith, Dave, Hadrian Taylor, Theresa Rielly, Jacinta Osman, Poppa-Bear, Richard White, Sierra de la Croix, Christina Pikas, Jo Simons, and Ruth Lewis, plus some anonymous benefactors)

— all the conferences I was investigating have been investigated. 🙂  I’ve since checked everything for consistency and link rot, added in a set of references that I had to research myself as I couldn’t anonymise them sufficiently in the initial run; deduplicated a few more times – conference names vary ridiculously – and finally ended up with a total of 1849 conferences which I’ve now published at https://dx.doi.org/10.6084/m9.figshare.3084727.v1

The immediately obvious stats from this dataset include:

Access to proceedings

  • 23.36% of conferences in the dataset had some form of free online proceedings – full-text papers, slides, or audiovisual recordings.
  • 21.85% had a non-free online proceedings
  • 30.72% had a physical proceedings available – printed book, CD/DVD, USB stick, etc, but not including generic references to proceedings having been given to delegates
  • 45.27% had no proceedings identifiable

(Percentages don’t add to 100% as some conferences had proceedings in multiple forms.)

Access to free online proceedings by year

This doesn’t seem to have varied much over the 6 years most of the conferences took place in:

2006: 39 / 173 = 22.54%
2007: 39 / 177 = 22.03%
2008: 62 / 258 = 24.03%
2009: 63 / 284 = 22.18%
2010: 105 / 428 = 24.53%
2011: 123 / 520 = 23.65%

Conferences attended by country

Conferences attended were in 75 different countries, including those with more than 20 conferences:

New Zealand: 429
USA: 297
Australia: 286
UK: 130
Canada: 67
China: 66
Germany: 44
France: 41
Italy: 35
Portugal: 31
Japan: 29
Spain: 28
Netherlands: 27
Singapore: 25

I won’t break down access to proceedings here, because this data is inherently skewed by the nature of the sample: conferences attended by New Zealand researchers. This means that small conferences in or near New Zealand are much more likely to be included than small conferences in other parts of the world. If a small conference is less resourced to put together and maintain a free online proceedings – or conversely a large society conference is prone to more traditional (non-free) publication options – this variation by conference size/type could easily outweigh any actual variation by country. So I need to do some thinking and discussing with people to see if there’s any actual meaning that can be pulled from the data as it stands. If you’ve got any thoughts on this I’d love to hear from you!

Further analysis now continues….

Progress report on how you’ve helped my research

At this point at least 20 people have helped me look for conference proceedings (some haven’t left a name so it’s somewhere between 20 and 42), which is awesome: thank you all so much! Last week saw us pass the halfway mark, an exciting moment. As of this morning, statistics are:

  • 1187 out of 1958 conferences investigated = 59% done
  • 312 have proceedings free online (26%)
  • of those without free proceedings, 292 have non-free proceedings online
  • of those without any online proceedings, 109 have physical proceedings (especially books or CDs)
  • 472 have no identifiable proceedings (40%)

I’ve got locations for all 1958, pending some checking. Remember this is out of conferences that New Zealand researchers presented at and nominated for their 2012 PBRF portfolio.

The top countries are:
New Zealand    492
Australia    315
USA    304
UK    133
Canada    69
(with China close behind at 68)

In New Zealand, top cities are predictably:
Auckland    154
Wellington    98
Christchurch    53
Dunedin    38
Hamilton    35

Along the way I’ve noticed some things that make the search harder:

  • sometimes authors, or the people verifying their sources, made mistakes in the citation
  • or sometimes people cited the proceedings instead of the conference itself – this isn’t a mistake in the context of the original data entry but makes reconciling the year and the city difficult.
  • or sometimes their citation was perfectly clear, but my attempt to extract the data into tidy columns introduced… misunderstandings (aka terrible, terrible mistakes).
  • or we’ve ended up searching for the same conference a whole pile of times because various people call it the Annual Conference of X, the Annual X Conference, the X Annual Conference, the International Conference of X, the Annual Meeting of X, etc etc.

On the other hand I’ve also noticed some things that make the search easier – either for me:

  • having done so many, I’m starting to recognise titles, so I can search the spreadsheet and often copy/paste a line
  • when all else fails I have access to the source data, so I can look up the title of the paper if I need to figure out whether I’m trying to find the 2008 or 2009 conference.

And things that could be generally helpful:

  • if a conference makes any mention of ACM, whether in the title or as a sponsor, then chances are the proceedings are listed in http://dl.acm.org/proceedings.cfm
  • if it mentions IEEE, try http://ieeexplore.ieee.org/browse/conferences/title/  If it’s there, then on the page for the appropriate year, scroll down and look on the right for the “Purchase print from partner” link – chances are you’ll get a page with an ISBN for the print option; plus confirming the location which is harder to find on IEEEXplore itself.
  • if it’s about computer science in any way, shape or form, then http://dblp.uni-trier.de/search/ can probably point you to the source(s). This is the best way to find anything published as a Lecture Notes in Computer Science (LNCS) because Springer’s site doesn’t search for conferences very well.
  • if you do a web search and see a search result for www.conferencealerts.com, this will confirm the year/title/location of a conference, and give you an event website (which may or may not still be around, but it’s a start). Unfortunately I haven’t found a way to search the site directly for past conferences.
  • a search result for WorldCat will usually confirm year/title/location and (if you scroll down past the holding libraries) often give you the ISBN for the print proceedings.

And two things that have delighted me:

  • Finding some online proceedings in the form of a page listing all the papers’ DOIs – which resolve to the papers on Dropbox.
  • Two of the conferences in the dataset have no identifiable city/country – because they were held entirely online.

I I am of course still eagerly soliciting help, if anyone has 10 minutes here or there over the next month (take a break from the silly season? 🙂  Check out my original post for more, or jump straight to the spreadsheet.

Help me research conference proceedings and open access

I’ve been interested for a while in the amount of scientific/academic knowledge that gets lost to the world due to conference proceedings not being open access / disappearing off the face of the internet. My main question at the moment is, just how much is lost and how much is still available?

Unfortunately googling 1,955 conferences will rapidly give me RSI, so I’m hoping I can convince you to do a few for me – in the interests of science!

Background: I’ve written elsewhere about Open Access to conference literature (short version: conferences are where a huge amount of research gets its first public airing, yet conference papers are notoriously hard to track down after the fact) and Open Access and the PBRF (short version: if conference papers were all OA, PBRF verification/auditing would become a lot easier). Here I’m wanting to quantify the situation.

The data: The original dataset was sourced from TEC, from the list of conference-related NROs (nominated research outputs) from the 2012 PBRF round. There are obvious and non-obvious limitations but basically I feel this makes it a fairly good listing of conferences between 2006-2011 that New Zealand academics presented at and felt that presentation was worthy of being included among their best work for the period. The original dataset is confidential, but I’ve received permission to post a derived, anonymised dataset publically for collaborative purposes, and in due course publish it on figshare.

How you can help:
(Note: by contributing to the spreadsheet you’re agreeing to licence your contribution under a Creative Commons Zero licence, meaning anyone can later reuse it in any way with or without attribution. (Though I’ll be attributing it in the first instance – see below.))

  1. Go to the spreadsheet containing the list of conferences
  2. Pick a conference that doesn’t have any URLs/notes/name-to-credit
  3. SearchGoogle/DuckDuckGo/your search engine of choice for the conference name, year, and city to find a conference website. Assuming you find one:
  4. Correct any details that are wrong or missing: eg expand the acronym; add in missing locations; if the website says it’s the 23rd annual conference put “23” in the “No.” column, etc.
  5. Browse on the website for proceedings, list of papers, table of contents, etc. If you find:
    • a list of papers including links to the full text of each paper freely accessible, paste the URL in “Proceedings URL: free online”
    • a list of papers including links to the full text but requiring a login (including in a database or special journal issue), paste the URL in “Proceedings URL: non-free online”
    • information about offline proceedings eg a CD or book, paste the URL in “Proceedings URL/info re print/CD/etc”
    • none of the above, paste the URL of the conference website for that year in “Other URL: conference website”
  6. If you can’t find any conference website at all, write that in “Any notes” so others don’t try endlessly repeating the futile search!
  7. Sign with a “Name to credit” for your work. If you’d prefer to remain anonymous, put in n/a.
  8. If you like, return to step 2. 🙂
  9. Share this link around!

What I’ll do with it:
First I’ll check it all! And obviously I’ll pull it back into my research and finish that up. I’ll also publish the final checked dataset on figshare under Creative Commons Zero licence so others can use it in their research. I’ll acknowledge everyone who helps and provides a name, in the creation of the dataset and in the paper I’m working on. And if someone wants to do a whole pile and/or be otherwise involved in the research then talk to me about coauthorship!

Why don’t I just use…

  • Mechanical Turk: I’m boycotting Amazon, for various reasons. Plus I consider a fair price for the work would be at least US$0.50 a conference (possibly double that) and as that’s a bit harder to afford I feel more ethical being upfront about asking folk to do it for free.
  • Library assistants: I am doing this a bit but there’s a limited period where they’re still working before summer hours and things have got quiet enough that they have time.
  • Something else: Ask me, I may want to!

Other questions
Please comment or email me.

Exploring OER repositories

I’ve been doing a very introductory exploration of what people are doing out there in terms of repository software/platforms for OER (Open Educational Resources). These are my preliminary notes-to-self, which I’m posting primarily in the hopes that someone more knowledgeable will come along and correct any fundamental misunderstandings / point me to useful resources.

So, after a very brief scan of a few sites, I get the impression that:

  • Where the aim is to just put up course outlines, lecture slides, handouts, and the like – maybe the occasional multimedia file but no wholesale recordings of lectures etc – something like a prettified dSpace would be quite suitable. For example this is used by Jorum (a JISC-funded service).
  • However another factor would be the intended audience for the resources. Jorum seems primarily aimed at educators sharing resources with each other. By contrast, sites aimed at prospective students tend to be more complex, often based on Drupal (an open-source content management system) eg Open University (Drupal/Moodle) or Michigan’s OERbit (open-source software based on Drupal). Of course these also tend to include a lot of multimedia content, especially lectures.
  • A third option would be to put material directly into an existing repository – www.oercommons.org, lemill.net, www.curriki.org, cnx.org and many others curate OER. This gets the material out there without having to maintain a platform yourself. But a lot of educational material might make best sense in a national rather than international context (cf OERAfrica)

I came across mention of Equella; this is digital repository software designed such that “faculty and instructional designers can search and find the best learning content for the desired outcome or activity at hand, whether that content is OER, licensed (paid-for) content, or user-generated content“. That is, it seems focused on internal users, not prospective students, and OER is only one part of the intended content, hence a prominent feature in the sites using it being that they require a login (or the workaround of “guest access”). From a glance over the highlighted example, I don’t see its offerings as an outwards-focused repository for OER being substantially superior to (open source) dSpace’s.

Really useful resources:

Getting further along, licensing and following standards that would allow harvesting are really important. But thinking for now just about platforms, what else should I be looking at and thinking about?

Links of interest 25/3/10

Resources
C-SPAN Video Library “indexes, and archives all C-SPAN programming for historical, educational, research, and archival uses.” (Content is primarily US politics but see here for overlap with other subject areas.) All programs since 1987 can be viewed online for free.

Twitter
Following in the popular footsteps of the Fake AP Stylebook Twitter account (“Use a hyphen to join words together, a dash to separate two words that really don’t like each other.”) come rival accounts Fake AACR2 (“2.17B1. Describe an illustrated item as instructed in 2.5C. Optionally, add woodcuts, metal cuts, paper cuts, etc., as appropriate.”) and Fake RDA (“2.3.3 When attempting to parallel title, line title up to proper title, put title in reverse, turn left, shift into drive, turn right.”)

Neat stuff