Despite the great potential, Massive Open Online Courses (MOOCs) face major challenges such as low retention rate, limited feedback, and lack of personalization. In this paper, we report the results of a longitudinal study on AttentiveReview2, a multimodal intelligent tutoring system optimized for MOOC learning on unmodified mobile devices. AttentiveReview2 continuously monitors learners’ physiological signals, facial expressions, and touch interactions during learning and recommends personalized review materials by predicting each learner’s perceived difficulty on each learning topic. In a 3-week study involving 28 learners, we found that AttentiveReview2 on average improved learning gains by 21.8% in weekly tests. Follow-up analysis shows that multi-modal signals collected from the learning process can also benefit instructors by providing rich and fine-grained insights on the learning progress. Taking advantage of such signals also improves prediction accuracies in emotion and test scores when compared with clickstream analysis.