Friday, May 2, 2014

Postmortem: Bioinformatics at Peking University

The Bioinformatics course at Beijing (or is that Peking? I never know) University is over. It's still being graded, but having had 10/10 for each homework and 97% on the final, it's not an overly wild guess to say it'll be my first Coursera certificate (or “Statement of Achievement” in Courserese).

So, what is Bioinformatics?

Bioinformatics is “the application of computer science to solve biological problems”. To say it's a growing field would be an understatement: the growth of biological data has been exponential, necessitating innovative data management and data analysis techniques; it's fair to say that nowadays, in a lot of fields of biological research, more work is being done with computers than with, say, Petri dishes or lab mice. Taken from the opposite angle, biological research labs are at the forefront of the “big data” revolution. Lists of innovative big data companies include institutions such as Mount Sinai's Icahn School of Medicine (they are also well-known as a big MongoDB customer, if I'm not mistaken).
Rigorously, “bioinformatics” comprises an understanding of the type of problems bioinformaticians face, and the algorithms they use to solve them. By extension, it also includes the major databases of publicly-available bioinformatics data on the Internet − and truth be told, I didn't think there were so many of them!
As is usual in fields associated with academic research, bioinformatics is a field in which open source is prevalent, both in terms of the software itself (indeed, algorithms and tools that are not properly described in a peer-reviewed paper have little chance of being widely used) and in terms of the actual data. To repeat myself: the amount of data in freely-available databases hosted by such organizations as the National Center for Biotechnology Information or the European Bioinformatics Institute is staggering. Theoretically, any private individual with an Internet connection, or indeed company, could do effective biological research in silico; I'm guessing we're just at the beginning of a wave of bioinformatics startups following the lead of such as 23andme. It'll be interesting to see.

What does the course cover?

The Peking course covers quite a lot, from a rather comprehensive (and welcome) review of the history of bioinformatics, the present situation of the field, and what we can expect in the near future, to alternating descriptions of the major algorithms and/or techniques in use in the field and exploration of the available resources. It ends with a couple of case studies highlighting how the various techniques and databases surveyed in the course were integrated by actual researchers to investigate real-world issues.

Who is the teaching team?

There are two main professors: Drs Liping Wei and Ge Gao. Dr Wei generally handles the high-level stuff (e.g. background information, overviews of databases, etc.) while Dr Gao dives into the details of algorithms. Both are evidently highly qualified; the course draws tightly on their own research and contributions, which is very welcome as it makes the course so much more concrete.

What about logistics?

The course lasts for 6 weeks; two topics are covered each week, with a series of video lectures and a quiz; except for the last week, when the lectures are about case studies and the quizzes are replaced by a 40-question final exam.
It is notable that the course is bilingual; that is to say, it is offered simultaneously in Chinese and English. The Chinese lectures have slides with classroom shots inserted (meaning you actually see the professor speaking); the English lectures are slides-only.
In addition to the lectures there are supplementary videos such as lab visits and student presentations. These tend to be Chinese-only, so I skipped over most. There are also a couple of interviews of famous bioinformaticians, but I didn't find these so interesting.

My impressions

It's undeniable that the team take pains to be welcoming. It's also undeniable that the content is actually of a pretty good level. But there are a couple of problematic aspects with this course.
The first is − and it's horrible to say this, not being a native English speaker myself − that the speakers' English is not so great. It's not so much that they make mistakes, but rather, their intonation and rhythm is… well, they're obviously reading from a transcript. Dr Gao's voice droning formulae (ecks-aye-jay-plus-ecks-jay-plus-one-aye-equals-ecks-aye-plus-one-jay) for long minutes was almost enough to take me out of the course altogether. Thankfully, the subject matter is interesting, so I could stick to it with a little effort. In the end, I viewed the lectures at 1.5x speed to compensate for the speaker's slow diction, and I referred back to the transcripts when I had doubts.
The second problem is perhaps due to Coursera's platform: the quizzes are, well, just quizzes. It feels strange, and generally wrong, to have an algorithmics course in which you do no coding at all. Generally speaking, you have three tries to answer questions which mostly have three or four options… At one point I was so immensely tired I almost dropped the course; instead I deprioritized it and spent a minimal amount of time on it. So, while I did learn a bunch of stuff about bioinformatics, I can hardly say that my final grade (which will be in the high nineties) reflects my mastery of the subject.
Strangely enough, the final exam was possibly the most interesting part of the course, as some questions required us to go search for information by ourselves (which means first identifying the right database to query, finding how to query it, etc.) Possibly, a vast improvement of the course would be to scrap about half the quizzes and replace them by practical case studies: give students a set of data (gene names, diseases, etc.) and send them off to search for information using the tools discussed in the lectures. As they stand, the quizzes help little in teaching.

Do not, however, let that discourage you. If you're interested in learning about bioinformatics, certainly there could be much worse options than this course. It may not make you a bioinformatics researcher, but it will at least give you a handle on where to get started.

No comments:

Post a Comment