Auditing your digital repository(ies): the U-M Library migration experience by Kat Hagedorn
Have over 275 collections to handle: needed to audit prior to migrating. Also needed to know how the repositories interact and how the systems work.
Determined minimum and maximum factors for the audit, and divided into qualitative and quantitative. Quantitative easy to fill in, harder to figure out relevance.
Ran a pilot – including some problematic ones. Finding:
- Even “number of objects” needs conversation in some collections. May need carefully thought out words – do you count the object or the record?
- “Collection staleness” not ‘last updated’ but ‘last used’. And ‘update dates’ changed once to date of migration…
- Some information about collections is only in email. Data often unclear even when locatable.
- Technical issues – broken script meant collection usage appeared to be nil. Options for format types didn’t originally include XML.
- Sometimes a stakeholder has been non-responsive in providing better images.
Mind the gap! Reflections on the state of repository data harvesting by Simeon Warner
A long time when 10GB was a lot… OAI-PMH was formed. It works, scales, is easy, is widely deployed. Harvested into aggregators, discovery layers etc. But not RESTful, clunky, focused on metadata and pull-based.
So we hate it, but don’t know what to do instead!
New approach has to meet existing use cases; support content as well as metadata, scale better, follow standards, make developers happy. Need to be able to push for more frequent updates.
Wants to use ResourceSync – ANSI/NISO Z39.99-2017 – has WebSub (was PubSubHubbub) companion standard.
CORE is looking at replacing OAI-PMH with ResourceSync. Work with Hyku & DPLA. Samvera (was Hydra) building native ResourceSync support).
The community should agree on ResourceSync as a new shared approach. Have to support it as primary harvesting support, OAI-PMH as secondary for transition.
(In Q&A: Haven’t yet talked to discovery layers – need to get consensus in community first. Trove have switched some to SiteMaps (basic level of ResourceSync).
Audience suggestion to create user stories of migration.)
Does Curation Impact the Quality of Metadata in Data Repositories? by Amy Elizabeth Neeser, Linda Newman
Various research questions about metadata options and curation. Compared 4 institutions. Looked at most recent 20 datasets in each. Each institution’s metadata is very different, eg ‘author’ vs ‘creator’; some automatically generated (which were discounting). Finding:
- Sizable variation of metadata ‘options’ per institution
- Choice to curate doesn’t necessarily guarantee more metadata. More than minimum is available regardless
- Documentation is far less common in self-submission repositories, usually only a readme
- Institutions who curate can ensure that each dataset gets a DOI, others currently leave the choice to the user. (May just be related to policy though)
- Not sure whether placement of input form or curation is the bigger factor in number of keywords
(In Q&A: As a community we think curation is good but want some proof of this to justify all the hours! Also as a result of the study have made changes to own practices eg better input forms.)
Leading the Charge: Supporting Staff and Repository Development at the University of Glasgow by Susan Ashworth
What does repository success look like? Enlighten is a recognised brand that covers all the services. Multiplying repositories as users keep asking for them. Recognition of value of data – populates web pages, research evaluation, benchmarking, KPIs.
How did they get there? Early and ongoing engagement with deans of research and research admins. Surface data publically on researcher profiles. Say yes (and panic afterwards) eg improving reporting from the repository. Adapt quickly to external forces (funding and govt requirements).
What does this mean for libraries? Cross-library services, appointing new staff. Have 6 OA staff in various teams. Developing new skills in data management, licensing, metrics, etc. Lots to do leading to lots of opportunities for staff. Staff can easily see the contribution they make to institution. Clear when service has to deliver on high expectations.
UK-wide adoption of “UK Scholarly Communications License” also used/adapted from Harvard – where unis retain some control over outputs to make them available at point of publication instead of embargos. [Seems to be adapted from CC-BY??]
(In Q&A: 70 UK institutions discussing adoptions of this license – may be declarations in Open Access Week. Some pushback from publishers but Harvard have been able to work with it for 10 years!)
Most research into OA usefulness is focused on use for researchers. Worked in govt and wanted to use research evidence but couldn’t get access. Did PhD on how to help govt use research evidence in decision-making. Increasingly important in context of govt impact assessments eg REF, PBRF, ERA. Access (and knowing it exists!) is the biggest, structural barrier to using academic information.
“The Conversation” website aims to help researchers communicate research in a way that’s easy for lay people to understand. Free to read and share under Creative Commons. 4 sectors reading it: research and academia; teaching and education; govt and policy; health and medicine together represented 50% of survey participants. Value academic expertise, research finding, clarity of writing, no commercial agenda, editorial independence. Discuss it with friends or colleagues afterwards, many share on social media. Used in discussions and debate; may change behaviour; some use it to inform decision-making (or to support an existing decision….)
(In Q&A: To do more research with pop-up surveys on downloads from IRs to find out why people are using the content.)
Batch processes and outreach for faculty work by Colleen Elizabeth Lyon
UT at Austin – research-intensive. IR on DSpace. Big campus, competing priorities = lots of missing content. Wanted to increase access to content and improve outreach skills using existing repository staff (2 full time plus a few grad students to upload content).
Use CC licenses and publisher policies to identify which publishers allow it without having to ask faculty permissions, and create automated process:
Export content from WoS to Endnote -> csv -> Google Sheets > cf SherpaRomeo via API -> download articles -> use SAFCreator to get into right format for batch import to DSpace (followed by stuff for usage reports for faculty). Results:
- Filtering and deduping took more time than expected.
- Faculty didn’t respond to notifications – not sure if they just ignored, didn’t think it needed a response – but at least no complaints!
- Added almost 2500 items between other projects.
Under the DuraSpace Umbrella: A Framework for Open Repository Project Support by Carol Minton Morris, Valorie Hollister, Debra Hanken Kurtz, Andrew Woods, David Wilcox
DuraSpace is a not-for-profit org so mission to support open tech projects to provide long-term durable access and discovery. Fosters projects eg DSpace, Fedora, Vivo. Offers services eg tech/business expertise, membership/governance framework, infrastructure, marketing/comms. Affiliate project Samvera.
Ecosystem is larger so want to expand support to ensure community/financial/technical sustainability. Criteria include: philosophical alignment; strategic importance to community; financially viable; technical pieces in place. So if you know of a project needing support, contact them.
A Simple Method for Exposing Repository Content on Institutional Websites by Gonzalo Luján Villarreal, Paula Salamone Lacunza, María Marta Vila, Marisa Raquel De Giusti, Ezequiel Manzur
2 institutions with active IRs but questionable web dissemination practices – lack of interest/staff/time/money to maintain web presence for staff profiles.
- Improved existing sites with Choique CMS, WordPress, Joomla
- Designed and developed new sites with WordPress multisite – hosted if research centre didn’t have their own site
- Gave advice about web publishing
- Use IRs to boost websites by using OpenSearch to retrieve contents, then software library to fetch/filter/organise/deliver and share results, then created software addons for each CMS
- Easy configuration
- flexible usage eg “all research centre publications” or “theses from the last 5 years” or “all researcher A’s content”
Results: 7 new websites published, 14 in development. Researchers want to deposit in IRs to keep the website updated. Dev work to continue eg flushing cache; multi-repository retrieval?