Stop Wasting Science

So you do a lot of amazing research, whatever. Your research will not matter to anyone else on Earth – at least, not until you make it accessible to them. If we’re not making it available, we’re just wasting science.

The number of research projects that are sitting in desk drawers waiting to be written and published, or those that get published but remain behind paywalls is saddening. But with the boom of open-access journals, that is rapidly changing. There are some growing pains – including the high rate of fake and falsified papers.

If you do a lot of amazing research, and publish it in an open access journal, there is still a chance that a lot of your work is being wasted. Looking through a few papers I recently read (this is called a biased sample), the average journal article has roughly about 5-10 tables and figures. I’ve seen enough of other researcher’s excel sheets to know that this summary is hardly the tip of the iceberg. This isn’t the print era anymore, publishing data is very possible. But, well, where is all the data?


In most cases, it is sitting on aging hard-drives under file names that quickly forget their ways into obscurity. Some lucky files manage to make their way onto websites like FigShare and Research Gate, while some Big Datasets (like genomics data) are too big to have a home anywhere on the internet.

There are a number of astonishing recent studies, meta studies, that use the results from hundreds or thousands of papers to come to fascinating conclusions. These papers are just a glimpse into what the future of meta-analysis has at hand. They are a glimpse at how essential making data accessible is going to be in just a few years.

Researchers are all about getting publications, and that is understandable, given the pressures that they are under. However, a lot of signs indicate that those pressures are changing. We are on the brink of a revolution in science. If you want to stay competitive you would just be silly not to start making your data available now.

You and I Just Love the Time!



‘You and I Just Love the Time’ could be a next big pop music hit, after all, the title is made up of seven of the top 25 most frequent words in this analysis of music lyrics. The following is some of that analysis – and the beginning of my strange new obsession with the numerical investigation of human language.


I selected six extremely popular* artists from different genres, and selected albums or parts of albums from each of them.

Artist: Album:
Garth Brooks 1989 **
Justin Bieber Misc
Lady Gaga Misc
Metallica Master of Puppets
Nirvana In Utero
Talking Heads Speaking in Tongues

Then, I used a free program (TextSTAT 2.9) to count the occurrence of each word for each artist*** (this is called a corpus). Sample size for each artist was between 1200-2200 words. I put that data in an excel sheet, and made graphs.


Vocabulary for multiple artists

pronouns as used by different musical artists


Though most of these results are not surprising, it is interesting that an analysis of thousands of words could tell you a lot about an artist or a genre. With factorial statistics, it may be possible to group like artists, or even distinguish genres, entirely based on the word usage. It might therefore be possible for a computer to tell a lot about the artist (such as genre, date, country of origin, etc) just by the lyrics.

Some interesting case-study finds were noted from investigating the different forms of use for certain words that were used only by a couple of the artists, such as “Wild” or “Game”.

Garth Brooks: “Wild horses keep draggin’ me away”
Talking Heads: “I get wild, whizzing up”

Lady Gaga: “Let’s play a love game”
Garth Brooks: “Just the other night at a hometown football game”
Metallica: “Infection is the game”

*Okay, so, the Talking Heads aren’t insanely popular, but they are popular with me.
**I also tried No Fences, and it looked the same.. The real/next version will have a larger dataset, and be stats-heavy
***Damn am I ever glad we have computers

How Frequently Do Students Dropout?

completion rates for different academic levels

‘Dropping out’ is something that is frequently discussed in academia, often in the context of “don’t [insert verb here], or you will end up as a dropout”. Since it is such a commonplace term/warning/fear,  it surprises me that people generally don’t know how likely it is to occur. What are the dropout rates?

It doesn’t help that ‘drop out’ numbers aren’t easy to pin down. Universities don’t always track or flaunt their drop out rates – possibly especially in cases that would look bad on the university. The most information available is on ‘completion rates’, that is: the rate of students that started and finished a program at a particular university. This rate does not distinguish between students who flunked out, switched universities, were promoted from one category to the next without graduating from the first, or met some more tragic end – it isn’t the antonym of ‘drop out’ rates, but it is related, and it’ll have to do for now.

In Canada, it seems that the higher level of education you are in, the less likely you are to complete your program:

completion rates for different academic levels
Sources: High School, Undergrad, Masters, Doctorate.

That trend does not suggest that the longer you stay in a program the more likely you are to dropout – you’re actually more likely to dropout near the start. Of course, it isn’t that simple of a rule – the numbers change dramatically with the demographic. Countries, schools, and programs all have different rates of program completion. This is just one way to view a portion of the data. The numbers are different in American universities – especially for undergrads, where the average freshman is just a coin flip away from either graduating or dropping out. There are also some exemplary universities, like the University of St. Andrews, where purportedly only 0.05% of students dropout. There are also some horror story settings, such as one university where the average time-till-drop-out for doctorates was 6 years, meaning that the average ‘quitter’ there leaves after 6 years of worth of work – with no degree.

Disciplines also have some impact on the completion rate. For doctorates and undergraduates, students of the physical sciences are more likely to complete than are humanities or social science majors. For instance, the completion rate for PhD doctorates in Canada ranges from 52-83% in the sciences, and from only 40-59% in the arts. There wasn’t the same trend happening for Masters students, but there are certainly still differences.

completion rates for various graduate programsBased on one study, the reasons doctoral students say they throw in the towel for are generally either: personal problems, ‘departmental issues’ (ie. bad advising), or the wrong fit. I would be curious to see if the different disciplines have different rates at least in part due to differences (or perceived differences) in funding and hire-ability (any info or thoughts on this in the comments would be appreciated).

There is some good news, at least for Canada, where some of the dropout rates are… dropping. Fewer high-schoolers dropout each year:

dropout rates for high school students are improving

Whether this trend is similar for other levels of academic study, I am not sure. One thing you probably noticed from the above graph is that males still dropout more often than females. Stay in school, kids.

There are also some interesting lists of college dropouts who turned out more or less alright. Instead of going into a big discussion about what I think these numbers all mean for education, science, the economy, and the future, I will introduce you to the comments section and leave that up to you!