Commentary on Guo, Kim and Rubin’s Study on MOOC Videos and Student Engagement

Guo, Kim and Rubin’s How Video Production Affects Student Engagement: An Empirical Study of MOOC Videos is significant in that it is purportedly the first such research based on big data.

I offer commentary on this piece juxtaposed to each of the seven findings and recommendations given in Table 1 of the article. My reflections are based in part on incidental anecdotal evidence garnered in 35 years of teaching at all levels (K-12 through graduate school in computer science and educational technology). I’ve also read and researched on related aspects in gamification, human-computer interaction, graphical user-interface design and instructional design. In adition, I’ve run two iterations of a widely subscribed MOOC (12K+ participants and 2M+ click-ins).

Finding #1

“Short videos are much more engaging.”


“Invest heavily in pre-production lesson planning to segment videos into chunks shorter than six minutes.”


The authors’ metric for engagement is “length of time a student spends on a video.” There-in lies a pitfall in this research, which they acknowledge as a “limitation” of the study. I concur that length of time spent allowing a video to play to it’s conclusion is at best a crude measure of engagement and, more strongly, a non-measure of learning. I do agree with their premise, (originated by Chris Dede of HGSE, BTW) that there can be engagement without learning, but no learning without engagement.

Ruth Clark, a leading expert on instructional design, argues that learning is a function of retention and transfer. Retention is mentally encoding information (aka “knowledge” in its two forms—declarative and procedural) garnered from a lesson. Transfer is the ability of a student to demonstrate application of that knowledge in solving problems in domains that have both obvious similarity to the lesson paradigm (near transfer) and less obvious similarity (far transfer).

Here’s an example: If you can drive a car, there is near transfer to the problem of driving a taxicab and far transfer in having the skills to drive an 18 wheel truck. Most adult drivers could probably drive a taxi. Though the truck has familiar accoutrements, such as a steering wheel, accelerator and brake pedals, novices would be hard-pressed to drive one out of a lot.

Now, assume the sum total of a student’s driving experience was watching a YouTube video. Would viewing time as a measure of engagement correlate with retention—let alone transfer—thus enabling her to drive YOUR car? Maybe yours, but not mine! Moreover, and contrary to the author’s speculation, I’ve seen short videos that bored me to tears, and long videos that kept me engaged for hours. Go watch a 1 hour lecture by Christopher Hitchens on YouTube (for example,, come back, and let me know what you think.

I do agree that short videos can help facilitate chunking, that is, breaking units of instruction into cognitively palatable doses that facilitate student encoding. Students with good metacognitive skills can do this themselves, however, and someone ought to do a study as to whether there’s any evidence of their intentionally pausing long videos at key topical transition-points. I’ve watched Coursera and Udacity videos that actually embed periodic quizzes in long videos to facilitate reflection—a meta-cognitive skill known to improve retention.

Nonetheless, this research still begs the question as to whether the students learned anything.

The authors mention an additional criterion for measuring engagement: whether the students clicked into a post-test after the viewing experience. They did not look at correlations of viewing time versus post-test performance. Moreover, if the students had prior knowledge there would be no evidence that they learned from the video just because they got a high score. A controlled study incorporating pre- and post-tests would be needed to determine the efficacy of the viewing experience.

An outcome of cognitive research is that students with good metacognitive skills regulate their learning by taking a break on average every twenty minutes or so—which facilitates encoding. I believe this is called the settling effect. Breaking video coverage of a thematic unit of instruction into 6 minute modules is a form of chunking that offers viewers (casual or engaged) the affordance of down-time to reflect.

Since 2012 when the data sets for this study were garnered, the Coursera/Udacity practice of embedding interactive question/response opportunities in instructional videos has become more pervasive. Interventions of this sort seem to enhance attention, especially when there are queues in the timeline for the assessments (usually indicated by a brightly colored keyframe).

James Gee in his What Video Games Have To Teach Us About Learning And Literacy argues that continuous embedded assessments of this sort are the only true measures of learning. He also argues that they provide the occasion for learner feedback, reflection and encoding. Z-scores from pre-, embedded, and post-testing would make an even stronger case for learning effects.

Finding #2

“Videos that intersperse an instructor’s talking head with slides are more engaging than slides alone.”


“Invest in post-production editing to display the instructor’s head at opportune times in the video.”


Curiously, Clark in Building Expertise reports a contrary finding—that talking heads embedded picture-in-picture in a slide show videos act as distractors that degrade post-test performance. Ironically, earlier research by Mayer, whose book, Multimedia Learning is cited by the authors also reports this effect.

When I juxtapose an instructor to a slide presentation in an instructional video, I prefer whole body shots to disembodied heads. I think what’s missing in a lot of instructional videos is (whole) body language. Instructors tell as much with hand gestures and poses as they do with a glance. Why crop-out these features?

The earlier Clark studies also used lower resolution videos. Perhaps today’s students are preconditioned to expect high resolution, fully embodied presenters on screen to queue key points in their viewing experience. A well-timed walk-on cameo appearance by an instructor, PIP, should be investigated as an opportunity to enhance engagement and learning effects.

Finding #3

“Videos produced with a more personal feel could be more engaging than high-fidelity studio recordings.”


“Try filming in an informal setting; it might not be necessary to invest in big-budget studio productions.”


This resonates with me for several reasons. There are two key components to classroom interaction, or the semblance of it as conveyed via instructional videos. First, there is the sense of presence, vis-a-vis, that there is an instructor or moderator and students, both present and engaged. I think if a depiction of that interaction is missing, the sense of live classroom engagement is diminished.

Second, there are the gestalt aspects of witnessing teacher-student and student-student interactions. There’s another Clark study of students observing a lecture asynchronously that reported nearly the same satisfaction as students participating in a class synchronously, because 95% of the time questions the asynchronous students had (and might have asked) were posed by students present during the recording. The upshot of this research was that while live presence in a classroom experience may be good, the viewing of a recording of the experience is almost as good.

I’ve watched training videos done quick and dirty that included lots of extemporaneous um-ing and ah-ing that I felt should have been edited out. But then I watched a squeaky-clean version of the same series redone with a professional model and found them too sterile to hold my interest. So, a little um-ing and ah-ing is OK. It makes the presentation more human, and sure costs a lot less to produce. As my kindergarten art teacher mother once said, “It’s ok to color outside the lines.” I also think this is why Flash never caught on in education (even before Steve Jobs banished it). Making Hollywood-caliber lesson animations is beyond the time and expertise limitations of most teachers. Instead, most default to doing quick and dirty screen captures with applications such as Camtasia or ScreenFlow.

Finding #4

“Khan-style tablet drawing tutorials are more engaging than PowerPoint slides or code screencasts.”


“Introduce motion and continuous visual flow into tutorials, along with extemporaneous speaking.”


This makes sense, as a dynamic illustration shows the progression in the development of an idea that a static visual cannot. In effect, the static graphic depicts a snapshot of a thought after the thinking has taken place. Dynamic development of an illustration is in keeping to the chunking principle of cognition.This harkens back to the 1960s TV series, My World and Welcome to It, in which the lead character James Monroe explains life to his daughter Lidia using James Thurberesque sketches. See

The first round of tablet computers back in 2004 failed to gain traction, but the popularity of the iPad and it’s successors may bring about a revaluation of the hand and returning drawing to its rightful place in allowing dynamic illustration talking points. Low cost Wacom tablets can be used with conventional laptop or desktop computers to avail instructors of the same functionality. I also had a professor years ago who said that having to draw what he was talking about slowed him down, so students could keep up with his presentations. Most instructors have access to white boarding functionality on their desktops or in learning management systems, but don’t make use of it because they are provisioned with a mouse rather than a digital drawing tablet. Perhaps you’ve been rankled witnessing an instructor trying to draw with a mouse rather than a pen.

The newer “light boarding” technologies seem promising because they allow instructors to face the audience while drawing, maintaining the sense of presence and gestalt of a classroom interaction.

The Minerva Project incorporates this technology in their conferencing software, which also allows for direct manipulation. See

2U also offers light boarding in their toolkit. See

Finding #5

“Even high quality pre-recorded classroom lectures are not as engaging when chopped up for a MOOC.”


“If instructors insist on recording classroom lectures, they should still plan with the MOOC format in mind.”


We used to break length lectures up into segments for practical reasons. There was a time constraint set by video hosting services and worse, high resolution videos had to be compressed within that timeframe due to storage quotas, which diminished production values.

This is no longer the case as hosting services like YouTube automatically step-down uploaded videos to a spectrum of resolutions that can be throttled by the viewer.

I have found, quite to the contrary, that preference for continuous versus segmented formats is learner dependent. I determined this, incidentally, by doing the one thing our authors didn’t do; I asked.

Thus, I’ll put up un-cut, full-length captures of live lectures for the sense of presence and gestalt they portray *and* polished snippets (complete with um and ah reduction)–appealing to both learning styles.

Finding #6

“Videos where instructors speak fairly fast and with high enthusiasm are more engaging.”


“Coach instructors to bring-out their enthusiasm and reassure that they do not need to purposefully slow down.”


Studies cited by Ruth Clark and Edward Tufte concur with this finding. Speak fast and let students rewind and review in order to back-fill gaps in their understanding. The old adage, “speak to the middle of the class” does not seem to hold in today’s onsite or online classrooms.

Finding #7

“Students engage differently with lecture and tutorial videos.”


“For lectures, focus more on the first-watch experience; for tutorials, add support for rewatching and reviewing.”


The authors’ observations–that lectures tend to be more declarative in nature and tutorials more procedural–is false, in my opinion. In contrast, I’ve done short breakdown videos of conceptual (vis-a-vis, “declarative”) information and long lecture-style videos of tutorial (vis-a-vis, “procedural”) information.

Tutorials sometimes work better in contiguous extents with an audience both present and posing questions at key-points of presentation. The circa-2012 xMOOCs (as in EdX) were broadcast in orientation. More recently, there are cMOOCs, the “c” being for “connectionist”–that offer greater affordances for student interaction.

So called “think-aloud” protocols as advocated by Koedinger and Banning (among others) and are widely advocated as effective pedagogical techniques. That style of video capture was not explored by the authors of the MOOC study.

I personally double or triple-speed videos of instructors who are particularly onerous, or if reviewing material that covers stuff I already know.

Lastly, and the authors do acknowledge this, the study left out one critical component: talking to the students themselves.


Comments are closed.