Tuesday, November 11, 2014

MOOC status, November 2014

Hi.

Yeah, I know, I haven't kept this very well-updated. But let's see... The last post was just about two months ago. What's happened since then?

I've (successfully) finished the following courses:

  • Introduction to Systems Biology - Mount Sinai
  • Introductory Human Physiology - Duke
  • Statistical Inference - Johns Hopkins
  • Dinosaur paleobiology - U. Alberta (well, the course is still running, but I've done all the activities, so I'm done with it)
I've dropped the following courses:


  • Fundamental of Neuroscience part 2 - Harvard
  • Musculoskeletal Anatomy - Harvard
  • Data Analysis and Statistical Inference - Duke
I have started the following courses (some of which weren't quite planned for):
  • Immunology part 2 - Rice
  • Astrophysics part 3 - ANU
  • Exploring Neural Data - Brown
  • Experimental methods in systems biology - Mount Sinai
  • Functional Programming - Delft
By and large, I won't be starting any other "Big MOOCs" this year. I am considering signing up for The Neuroscience of Vision from MIT, as that's a short, 4-week course. If it's too heavy-going, I can always drop it - I find I am increasingly doing that kind of thing: sign up to a course to give it a go, then drop it if it doesn't quite fit what I want, or if it really doesn't fit in my schedule.

All in all, that's 29 courses I've finished. 11 are biology / life sciences, 9 are statistics / data science, 5 are regular computer science, the rest are a smattering of economics, physics, humanities, etc. By the end of the year, barring disasters I should have at least five more.

I'll do (maybe) detailed writeups of the courses I finished, so let's just do a quick recap of the ones I dropped:
  • Neuroscience from Harvard: this course is actually pretty good and has some stunning graphics. Unfortunately, the focus is very much on visuals, animations, etc. It certainly works for some. As for myself, I really can't find it in me to watch cartoons about house parties as a metaphor for the synapse.
  • Musculoskeletal Anatomy: I don't what to think of this one. Either the course runners are incompetent and/or have lost interest, or something terrible has happened (like a disease, an accident, something). The first couple of weeks were pretty sleek, with professional-looking videos. It's gone downhill since then, the syllabus has been truncated (each week was initially supposed to finish with a wrap-up about the "case" under examination), the content is released late and only consists of pages of text, the quizzes have glaring mistakes that are not corrected, all the professors and TAs have fled the forums (not a single post from a member of staff in over three weeks). It looks like they're scrambling to put up some content every week but are improvising with very limited resources. It's the first time I drop a course when it's almost over, but I find I can't find the motivation to keep going. It feels like standing on a sinking ship.
  • Data Analysis and Statistical Inference - I've dropped this one purely for scheduling reasons. It's a very good course, perhaps too introductory at times for me, but it broaches many subjects like ANOVA and such. The course is offered again next March, so I'll be taking it then.
As for the current ones:
  • Immunology is hard on memorization, but very interesting and the professor is great. We're onto T-cells now, pretty complicated stuff.
  • Astrophysics is a lot of fun. Dabbling in relativity and quantum mechanics without the hard-core maths. It's actually pretty relaxing.
  • Exploring Neural Data is a pretext for doing scientific computing in Python (instead of R, for instance). The lectures are engaging and the assignments are pretty thorough. Unfortunately, it's rather short: there's only a unit every other week, to accommodate students without a programming background, and so there are only 5 units altogether, each with an assignment that takes me, I don't know, a few hours to complete. So it stays pretty basic.
  • Experimental Methods in Systems Biology - a follow-up to the Introduction to Systems Biology class. It's the part I am the least interested in of all the Sys Bio courses, but it's an understandably requirement to take them all. Anyway, it's a description of the major technologies used in major biology labs today: Illumina sequencing, mass spectrometry, etc.
  • Functional Programming - I didn't plan on taking this one - I mean, functional programming is fun but I've already dabbled in it (and still do in a limited way, thanks to Java 8 streams). Simply I chanced on a video of the professor, who is kind of a heavyweight in the field (used to be a principal scientist or something at Microsoft Research, author of a ton of papers, etc. - still an open source fan, as far as I can tell, despite having worked at MS) and decided to take it just for kicks.

Tuesday, September 16, 2014

MOOC status, September edition

It's been a while since I've blogged; basically I've been busy with a lot of MOOCs on top of a lot of work. So, what's up?

First, let's do the numbers thing. I've racked up something like 25 course certificates in just about a year. Of these, 7 are verified. If we break them down by general topic, we have:

  • statistics / data science: 9
  • biology / medicine / life sciences: 7
  • computer science / programming: 5
  • astrophysics: 2
  • economics: 2

(Of course, the categories are somewhat arbitrary: much of the data science thing could be classified as computer science; Introduction to Bioinformatics counts as CS but Quantitative Biology − which was about using Python and Matlab and R to analyse biological data − counts as biology. So don't take the count as an absolute, more as an indication of where I'm going.)

I'm currently embarked (with paid certificates and all) in two Coursera specializations: the Johns Hopkins Data Science one (which I'm halfway through already) and the Mount Sinai Systems Biology one (which I'm only three weeks into, out of about a year, all told). I guess I'll blog some more about these; generally speaking I'm finding the Data Science one pretty good once one gets to the meat of it (the introductory courses about R are… introductory, but the projects are okay; the more mathematical "statistical inference" course is quite good) and while Systems Biology course has a very tough start, it gets easier once we've hit stride. It's quite advanced, which is what I'm looking for, and that's pretty satisfactory.

Anyway, I'm currently enrolled in the following courses:

  • Systems Biology
  • Introductory Human Physiology (a great backgrounder for sys.bio.)
  • Data analysis and statistical inference (to keep doing stats, but I'm really auditing only)
  • Dinosaur paleobiology
  • Fundamentals of Neuroscience part 2: Neurons and Networks
Also, I'm enrolled in JH's Statistical Inference, but it's a repeat from last month: then I didn't know if I could afford the time to complete this course so I did it "for play", without certification. This month I'm only redoing it with certification. So I did spend a few hours polishing up my course project, but that's really all.

And the future brings the following:

  • Immunology part 2
  • Astrophysics part 3: the violent universe
  • Anatomy
  • Neural data analysis
  • Another 4 modules of Data Science (out of 9)
  • Another 4 modules of Systems Biology (out of 5)
So… planning things out, we see a very hard last week of October (when some courses haven't quite finished and others have just started) with 10 courses altogether. But I guess I'll survive.

Thursday, August 28, 2014

Back to school (sort of)

So the most rainy month of the year (in Paris − not that that's usual, mind, and not that I complain: I prefer wet Augusts to sweltering ones, to be sure) is drawing to a close. The academic year is starting, my former local sub-mayor is now education minister, and MOOCs are starting left and right. Not that they've stopped, really… Anyway, time for a sum-up:

Courses that are ending

Astro2, Exoplanets (Australia National University)

Well, Astro1 was great fun, Astro2 almost as much. I say “almost” not because the course itself is less good, but because by necessity it's a lot about the technology behind exoplanet discovery − not something I'm very interested in. Still, the staff at ANU made it a lot of fun, so there.

(It'll be my 15th edX certificate, 20th overall!)

The Emergence of Life (University of Illinois, Urbana-Champaign)

A big disappointment. Broadly, the course is supposed to be a quick run through the history of life as we know it (and not so much about it's emergence, really, but at least that's up front). The problem is it's dumbed-down, inaccurate, and the lectures are quite confused. I'm sticking to it because there are chunks I don't know about (“spot-the-fossil” and the skeletal morphology criteria for classification, mostly), but more often than not I find I'm shaking my head. It's less like a class and more like a 

No certificates here, they're not free and definitely not worth paying for.

A bunch of Data Science courses (Johns Hopkins university)

I know I wrote I wouldn't tackle the Specialization… but I had second thoughts, so I registered for Signature track on the first four modules of the Specialization, plus auditing the Statistical Inference one (which I had heard many people complain about, saying it's hard and obtuse.) Actually… once one does the projects and goes beyond the first week or so of each course, they're getting pretty good. The Statistical Inference course tries to run through a lot of unintuitive material in really too little time, but − after having done UC Berkeley's Introduction to Statistics course − I find it very interesting, very stimulating. I'm glad I'm only auditing it − this way when I take it “seriously” it'll be a review and hopefully by then I'll understand it better.

In any case, I expect four verified certificates to land in my pocket in the coming few weeks.

So if I'm counting right, I'm virtually the proud owner of something like 24 certificates. 20 obtained in 2014. Not bad… 


Upcoming plans

I've rearranged a bit my planning, ditching a number of accessory courses that I couldn't seriously fit along the rest. Still, next week is a busy one, with no less than 5 courses starting at the same time.

Explore Neural Data, Brown

Data analytics + neurology. In Python. Cool.

Fundamentals of Neuroscience part 2, Harvard

More neuroscience! Actually I'm mostly taking this one because I took the first part. Not sure I'll keep both neuro courses (then again, at the time I thought little of it, but after a while I find I keep using the concepts of Neuro 1; so it's been a good use of my time.)

Dino 101, U. Alberta

I don't expect much from this one, a lightweight dino course to pass the time.

Introduction to systems biology, Mount Sinai

I tried this a while ago and dropped it after a week, thinking it too hard. Hopefully, I've learned a bit since then, and MIT's 7.QBWx rekindled my interest in systems biology.

Introductory Human Physiology, Duke

No, I don't want to be a doctor. But yes, physiology and anatomy are interesting. This promises to be a heavy-workload class; we'll see if I keep it through.

Astro3, The Violent Universe, ANU

I can't stop halfway through the series! Sadly, it starts in October and I may be too busy to give it my full attention. Hopefully things will pan out all right.

Fundamentals of Immunology part 2, Rice

Ditto. I did the first part, which was great (though hard work), the timing of this second part isn't so good, but we'll do as we can.

Next

That's September and October pretty much spoken for. With luck, I'll be able to sneak in a Data Science course in there... I still have five full courses to do in the Specialization (Statistical Inference, Regression Models, Reproducible Research, Building Data Products, Practical Machine Learning). I've pencilled in one in October and two each in November and December. This way I still have some leeway until the next capstone project (expected in February).

I have also registered for the Open University's Start Writing Fiction course. Not sure I'll stick with it, but a writing class is interesting to say the least (even though English isn't my first language).

Saturday, August 9, 2014

The Coursera-Johns Hopkins Data Science specialization

Quick recap: the department of Biostatistics at Johns Hopkins School of Public Health offers a full-on specialization in “Data Science” through Coursera, consisting of nine courses and a “capstone project”. The specialization certificate is supposed to testify that students are proficient in getting data, formatting it, graphing it, extracting useful knowledge from it, drawing and communicating conclusions from it, and so on. With an emphasis on using R, although the skills are supposed to be broadly applicable to other systems.

In detail, the sequence is made of nine courses:

  • The Data Scientist's Toolbox
  • R Programming
  • Getting and Cleaning Data
  • Exploratory Data Analysis
  • Reproducible Research
  • Statistical Inference
  • Regression Models
  • Practical Machine Learning
  • Developing Data Products
The courses are free, but if one shells out for them ($49 or 35€ depending on your currency zone of residence) one gains access to a capstone project and a specialization certificate.

I haven't yet made my mind about doing the whole specialization, or simply taking the free courses. I have a handful of days to decide.

What's in it for me?

Well, I sort of know a lot of this stuff from before. Using git and github for collaboration is something I do daily, programming (though not in R) is my main living; plus of course I've taken MIT's The Analytics Edge (a business, hands-on oriented very intense course on using R for analytics) and UC Berkeley's Introduction to Statistics, so I know my way around most of the material.

I am therefore not in a “first time learner” situation − rather, the cursus is more about consolidating the knowledge I do have, formalizing it, and getting an overall certification to somehow “prove” my mastery of it (the acceptability of this proof − by potential employers and / or academics − remains to be assessed).

The courses themselves

So far − in one week! − I've taken five courses out of nine, and completed two. This may sound impressive, but it's not − as I said, I'm hardly a first-time learner.

The Data Scientist's Toolbox is a very short introduction to the overall specialization. The main point is to install RStudio and create a Github account. Doable (including the quizzes and project) in two hours, and generally dispensable (and part of the reason why I balk at doing the specialization − 35€ is very expensive for three clicks on a website).

R Programming gets a bad rap on review sites. I sort of understand why; it's a rather heavy-handed introduction to R, I guess it's pretty incomprehensible to those as never wrote a line of code in their life and pretty abstract for most that have never toyed with R.

For me, as someone who has used R but never been formally introduced to it (like, I never figured that everything was a vector and I had difficulty wrapping my head around the difference between single and double square brackets, to say nothing of the scoping rules and the notion of “environments”), it was a nice crisp clarification of the essential concepts of the language. I guess having it as a prerequisite for the rest of the sequence isn't a great idea: really, one can use R without understanding it, and understanding is better approached after a degree of use. Laying it out like this is very bottom-up, very French I would say: first slog through abstract concepts then learn to apply them − I have come to prefer the other way around: build an intuition then consolidate the knowledge and learn how to go further. Anyway; I did the whole course in a day and gaining a good understanding of how R works in the process, so that's no bad thing.

Getting and Cleaning data is broadly a walkthrough of R's data gathering libraries. How to connect to a database, how to download a file, etc. It suffers from the lecture-then-exercise syndrome. I'm not sure how I would tackle this, really, except by drawing pictures of what the different input formats are, what the general target is (a clean data frame in R), pointers to the documentation and hand-holding exercises on real data rather than a slow demonstration of each function on made-up data. The Coursera platform is a hurdle there: similar endeavours were much easier on edX, where you can have long exercises with multiple questions, each with immediate feedback − I'm thinking of the very time-consuming exercise sets in The Analytics Edge, which had much, much better learning value than the simplistic, submit-all-at-once quizzes that Coursera provides.

That said, the course isn't very challenging but it's useful stuff to know. I'm more or less taking the course on schedule, with a bit of an advance (working up the energy to do the week 2 quiz).

Exploratory Data Analysis is, similarly, a walkthrough R's graphics libraries, and has the same pros and cons as the previous course. Similarly, having already been exposed to most of it, I find the course a crisp recap of everything. I doubt I would enjoy it very much if it was the first time I was exposed to it.

Statistical Inference is the most-decried course of the sequence, so in order to decide whether to take the overall specialization or not, I registered for it. I understand its detractors: it's very fast, very abstract. Basically in four weeks, Prof. Caffo runs through the same curriculum as Prof. Adhikari did (albeit annoyingly slowly at times) in UC Berkeley's fifteen-week Introduction to Statistics, with the same issues as the rest of the course: it's quite technical and abstract, and rather difficult to connect to (though it's hard to be practical when explaining mathematical constructs). I guess I'll be referring to Adhikari's slides more than this course's native ones, but I don't really expect the course to be very challenging.

These were the five courses I've sampled. The rest are:

Reproducible Research
 is about communicating research by using R markdown and knitr to create live R-embedding documents. Interesting stuff generally, and I guess useful skills to have if one intends to do statistics professionally, but not burning enough that I prioritized the course, so I'll take it later. Four weeks seems kind of long to do that kind of thing.

Regression Models is the other mathematically-grounded course in the sequence, and the other stumbling block for students. I think I'll be okay with it, but realistically I can't sample it this month. We'll see in September or October.

Practical Machine Learning will likely be − again − a fomalization of stuff I know from The Analytics Edge.

Developing Data Products sounds like another course about communicating results, half about “good practices” and half about using Shiny (R's own web framework − why is it that all languages must have their web development framework? Even Fortran has a CGI interface…)

The Capstone project seems to be about wrapping this up in a real-life situation.

So… why am I considering taking the whole shaboodle?

Based on the courses I took so far (about half), the sequence is pedagogically deficient − or rather, it's traditional in its approach of stuffing lots of science in the face of students then expect them to go through with it. I expect they have a high dropout rate (even higher than the baseline MOOC rate). Comprehensive as it is, the sequence is ill-suited to people approaching the subject for the first time. The course page says there are no prerequisites in terms of analytics or programming, but I don't find it so: it's more a recap / advanced course than an introduction to the field.

In that way, it's broadly what I'm after. I don't feel like I know a subject until I've studied the theory a bit†. Since I am vaguely considering reinventing myself as a biostatistician or bioinformatician‡, or at least keeping my options open, it may be worthwhile investing a bit of time (and some euros) into it.

Oh well. It's a subjective arithmetic. If the course were stellar, I wouldn't hesitate long before paying. As it is, I dither.

Postscript

I slept (well, napped) on it. Re-reading myself, it's kind of obvious I'm not really interested in pursuing the specialization certificate; there are better ways to spend 350€ than to rehash mostly-known subjects.

I may decide otherwise at another time − all I need to do is retake the courses, which means a few days doing quizzes and projects again. Doubt it'll be difficult.

If I need some certification or other there are probably better ones around, starting with Duke's Data Analysis and Inference.


† “A bit” as in, I don't need to know how to prove a theorem to use it, but I need to know I am using a theorem rather than use a pre-baked recipe I'm not comfortable with improvising with.

‡  On the premise that it's more useful to the human race than general web-based development, and anyway the kids who've done a week of Node.js and therefore know everything there is worth knowing about computer science are taking the fun out of general programming.

Wednesday, August 6, 2014

Mopping up on Exoplanets

It's a bit unfair to say I'm “mopping up” − there are two full weeks of the course, about direct imaging (at last!) and Earth-like planets − but it's clearly on the way out, and it's getting possible to start thinking back about the course.

This has been a surprisingly (or maybe not, I'm not at all a student of astronomy) technical, more than scientific, course. I mean, Paul Francis did whip out his tablet to perform some calculations, but they were fairly simple, by and large, much more than the (already not very advanced) physics of The Greatest Mysteries of the Universe. Here, instead of big questions about gamma ray bursts and Type 1a supernovae, what we have is a celebration of the ingenuity of the engineers making possible something as staggeringly complex as detecting planets orbiting distant stars.

The engineer in me is happy − and it's true these are fantastic achievements.

The course itself follows the same format as Greatest Mysteries: every week has a topic (“radial velocities”, “gravitational microlensing”), which Paul Francis and Brian Schmidt discuss in a Socratic manner, which is an impressive way of saying they convey all the knowledge through dialogue, bouncing questions off each other. Schmidt takes something of a backseat here, often playing the naive novice who asks questions of Francis; maybe he's less comfortable with the topic than with cosmology (or maybe it's a subliminal message: you may have a Nobel Prize, you're still − always − in a position to receive wisdom from your peers). Both lecturers' enthusiasm (especially Francis') is still communicative. Besides the video lectures, we have each week a link to the papers discussed, a text summary of the lesson, a worked example, a graded problem (generally very easy) and a new episode of the Mystery.

Last time around, the mystery had us figure out a weird bouncing parallel universe. This time, we're still in a strange cosmos, but the issues are more technical: a red star seems on course to collide with the world, and we have to find a likely destination for the world's population. But of course, there's a twist…

I have to admit I haven't been as interested in this course's mystery as the last. Maybe it's the lack of bubbles, or maybe I'm just not very entranced by the nitty-gritty detail of surveying the sky, taking radial-velocity measurements, etc. I'll be happy to have the solution for the Mystery through the final exam, but I'm not really motivated enough to go beyond and investigate on my own.

That's perfectly all right. I'm not destined to be an astrophysicist (if I were, I guess I'd be more involved in finding a new haven for the Moggians), I'm there to have fun learning about stuff; and as far as fun is concerned, this course delivers.

Quick notes about The Emergence of Life

So we're in Week 4 of this U. Illinois course over at Coursera, which aims at reconstructing the history of life throughout geological time. Midway between “taxonomy for dummies” and “introduction to evolutionary biology”.

So far, it's… unequal. It's notable that the teaching staff are all geologists rather than biologists, so they're in their home ground when discussing fossil formation, perhaps less so when they're talking about molecular biology. In any case, I like the fossil-discussing segments, they're informative and help driving the geological time-scales into my head; plus I like weird beasts.

Where I'm less enthusiastic is that the lectures are disjointed, often approximative (like mixing the terms eukaryote, metazoan, multiple-celled organisms − y'know, plants are multi-cellular organisms, but they're not metazoans, likewise, there are these things called saccharomyces, amoeba, giardia, etc. : all eukaryotes are not multi-cellular). Sometimes they'll use an inappropriate picture to illustrate what's being discussed (illustrating armored jawless fish with a toothy placoderm isn't a great idea!) There's little logic in how a segment connects to the ones before and after. It's a bit annoying that the clearest segments are the ones from the very young PhD student introducing taxonomy, while the segments from the official professor are somewhat confused (and confusing).

But I can live with that. Playing spot-the-fossil in the quizzes is fun.

Another thing I find upsetting is that the forums are basically drowned in two kinds of posts:

  • corrections for approximations made in the lectures
  • creationist crap (multiple threads discussing intelligent design, “global warming: fact or fiction”, etc.)
Huh. Okay. I'll just steer away from the forums, then.

That, plus the outright idolizing of Carl Woese (can't we grow up beyond the “great man single-handedly upsetting the establishment” type of narratives?) means the course isn't all it's meant to be… oh well. It's still something to do of an otherwise quiet summer.

(That said, I like the funky music and titles.)

Monday, August 4, 2014

Johns Hopkins' data science specialization, round two

Two days ago I noted I ran through the first course of JHSPH's Data Science specialization in a handful of hours.

In fact, yesterday I did the same for the second course in the series, R Programming. But this time I didn't feel “cheated” (although that's a strong word): I found the course easy as pie because I'm an experienced programmer and I've already used R quite a lot in MIT's The Analytics Edge, however I lacked any formal(ish) introduction to the language from a computer scientist's point of view. It's not enough to know that you should type lm(x ~ y + z, data=mydata); I find it necessary to know that it's a functional language where the basic data type is the vector and where every function carries with it its own environment, with such-and-such scoping semantics.

Such an introduction needn't be long. But having it, I'm a lot more confident that I understand how R works, and therefore that I can use it correctly.

All this to say − yeah, I ran through the 4-week course in a day, but it doesn't mean it deserves its poor reviews.