Tag Archives: bibliographic analysis

Social media as an agent of socio-economic change #vala14 p2

Johan Bollen Social media as an agent of socio-economic change: analytics and applications

World we live in increasingly about online connections. First computer had 1KB RAM and programmable by BASIC. Now can wake up parents in Belgium by FaceTime. Data from 2012 2.4billion internet users worldwide (15.6% Africa to 78.6% North America, 67.6% Oceania/Australia). Amount of online content staggering.

Facebook, LiveJournal, Twitter… We’re not using these networks to broadcast – they’re to collaborate socially. Many-to-many. Generates content and establishes social relations — collaboratively.

Displays xkcd cartoon re ubiquity of phones and map of usage of Twitter and Flickr. Visualising languages spoken; what things are being downloaded. Using Twitter to map discussion of beer vs church. And using it to monitor outbreaks of flu.

Wikipedia using collaboration to create content. Estimize using it to predict markets.
“Prevailing pessimism about large groups collaborating in a productive manner, absent central authority, may not be justified.” From the “madness of crowds” (wacky ideas) to “the wisdom of crowds”. On “Who wants to be a millionaire”, asking an expert gets it right 65%, asking the audience 91% right. When you ask people questions they have to guestimate an answer to, “the average of two guesses from one individual was more accurate than either guess alone”.

Galton (1907), Nature, 1949(75):450-451 – aggregating judgements of people of weight of dressed ox got within 1% of accuracy.
Condorcet Jury Theorem (1785) – even if jurors individually are rarely right, going for a majority vote the chance of being right approaches unity.
Collective intelligence – birds flocking, ants finding food.

We have telescopes to look at huge things, microscopes to look at tiny things – we need a macroscope to look at really complex things: this is computational social science studying data generated by social media. Network analysis. Natural language processing.

Epictetus “Men are disturbed, not by things, but by the principles and notions which they form concerning things”.

Sentiment analysis. eg “Affective Norms for English Words” rated along valence, arousal and dominance, OpinionFinder, SentiWordnet. We understand individual emotions well, not so much collective emotions. Diagram charting fluctuations in collective mood based on Twitter feeds; correlating with market fluctuations – discovered that the Twitter ‘calm’ mood correlated with increase in DOW three days in advance 85%. Other results have largely confirmed this using Google trends, using dataset from LiveJournal posts.

Where does collective emotion come from? Is it more than the sum of individual emotions? Do sad people flock together or do they make each other sad? Homophily (bird of a feather) prevalent in social networks. People connected to lots of people tend to be connected to other people who are connected to lots of people. (Ie the popular kids hang out with each other.) Image of political homophily on Twitter. So does mood act in the same way? Looked at reciprocal following on Twitter. Found small cluster of negative-emotion users, and larger cluster of positive-emotion users. (Don’t know where causation is.) The closer the friendship, the more reliable this was.

Application to bibliometrics: got rejected from journals so published on arXiv and got massively read and within a month cited. So looked at arXiv papers and found a weak correlation between Twitter mentions and early citations. But the problem with altmetrics: the biggest nodes are the media, big blogs etc. The number mentions doesn’t matter as who is mentioning.

Radical proposal for funding science (developed over alcohol-fueled Christmas party grumps about writing funding proposals). (Motto: “What would the aliens say?”) Fund people not projects. Science as gift-economy. Encourage innovation. Change scholarly incentives for the better. Congress should give money to scientific community – every scientist gets an equal chunk, but you have to donate a certain percentage to anyone you want (who have to donate a percentage of what they’ve received). Would lead to an uneven “but fair” distribution. [My criticism: would be susceptible to issues of implicit bias against women, people of colour, etc. However don’t know if it’d be more or less susceptible to these problems than the current system is.] Ran a simulation using network data: when F=0.5 it matches the distribution by the NSF and NIH.

Q: Risk of feedback loops?
A: Yes – citing hacking of Twitter account to post about bombs in White House leading to massive market shorting – not just people getting freaked out, algorithms getting freaked out. Positive feedback loops bad news – hopefully can set up things so instead you’ll get negative feedback loops that lead to homeostasis. Can only mitigate problems by understanding how things work.

Bibliographic analysis for fun and collection development

You know how you get a brand new hammer and suddenly you notice all these nails sticking out?

So I’ve been working more with Ref2RIS. And in the meantime some of my colleagues and I were talking about analysing researchers’ bibliographies for nefarious purposes, and I suddenly realised that doing such a thing might also help me get the handle I desperately need on one of the subject areas I’m attempting to be a liaison librarian for without having had any handover or background in.

And then I realised that, instead of staring glumly at some PhD thesis bibliography and having my eyes glaze over, I could just run it through Ref2RIS, pull all the references into Endnote, and sort by journal title.

It did take me two hours to create the conversion file, but on the other hand I’m getting quicker at that. And then I sorted, and did a quick count, and came up with the following data:

The bibliography for this thesis contained 133 references, of which 1 was a website, 9 were books/reports/manuals, and the bulk of 123 were journal articles from 27 different journals.

16 journals were used for only 1 reference each;
2 journals for 2 references;
2 journals for 3 references;
1 journal for 4;
2 journals for 5;
1 journal for 12;
1 journal for 18;
1 journal for 19;
1 journal for 34 references (over a quarter of the entire bibliography)

I also discovered that this last journal is one that our library doesn’t hold…. (We do hold everything that was used 4 or more times; I got bored before checking the less-used journal titles.)

Obviously more research is required

  • to find out if this is a significant gap in our collection or a fluke of this particular thesis; and
  • to figure out if there are any other interesting patterns in usage;

but if the researchers have had the courtesy to all use the same citation style then it should be pretty quick research.