Bias in Student Evaluations: Myths and Facts

Myths

1. Instructors who require more work receive lower ratings:

Actually, research proves the reverse: students give higher ratings to difficult courses. However, if the work is deemed unnecessary / unrelated to the course (i.e., “busy work”), the ratings lower. Students give higher ratings to instructors who demand work that, to the students, directly relates to instructional objectives (Benton & Cashin, 2012). Nevertheless, high demands with low support result in lower ratings and little learning, while the combination of demand and support results in increased ratings and learning; the research suggests that requiring multiple assignments, as long as they are directly related to the objectives, makes the course more manageable for students and increases involvement (Bain, 2004; Keegan, 1998). Furthermore, instructors who give higher grades cause their students to work less, while instructors who give slightly lower grades elicit more effort from their students (Greenwald & Gillmore, 1997).

2. The style of the instructor is more important than the content:

Research shows that students know the difference between style or expressiveness and content (Theall & Franklin, 2001). While content is essential to learning, coupling content with expressiveness increases overall ratings (Perry, Abrami, & Leventhal, 1979; Perry, Magnusson, Parsonson, & Dickens, 1986). Specifically, expressiveness itself may increase “instructor’s enthusiasm” ratings, but content increases “instructor’s knowledge” and actual exam performance; when expressiveness and content are combined, overall ratings, learning, and achievement increase (Perry, Abrami, & Leventhal, 1982; Benton & Cashin, 2012).

3. Popularity affects ratings:

The research yields no correlation between popularity and ratings; also, the assumptions that “the popular instructors” lack knowledge or have low demands of students are without support (Theall & Franklin, 2001). If popularity is defined as reputation, then a number of factors--self-esteem, enthusiasm, expressiveness, knowledge of subject matter, cultural background, humility, and even personal appearance--combine to create a strong reputation, but it is not so groundless as the term “popularity” implies (Abrami, Rosenfield, & Dedic, 2007; Benton and Cashin, 2012; Bain, 2004; Aleamoni, 1999; Hayes, 1971; Melland, 1996). r

4. Gender has a predictable impact on SET results.

A great deal of research has been devoted to examining the relationship between gender and SET results.  Several comprehensive studies of SET results drawn from multiple disciplines find no statistically significant differences based on gender when controlling for other possible sources of bias (Centra & Gaubatz, 1998; Feldman, 1992; Theall & Franklin, 2001; Benton & Cashin, 2012; Wright & Jenkins-Guarnieri, 2012). Similar results have been found when studying individual institutions (Kogan, Schoenfeld-Tacher, & Hellyer, 2010).  

However, other researchers argue that SET results evidence a bias against women faculty (Amin, 1994; Sprague & Massoni, 2005; Abel, M. H., & Meltzer, 2007; MacNell et al., 2015; Boring, Ottoboni, & Stark, 2016).  Still others find that female faculty members receive higher SET evaluations than do male faculty members (Wigington, Tollefson, & Rodriguez, 1989; Rowden & Carlson, 1996; Whitworth, Price, & Randall, 2002).  Several studies suggest complicated correlations between the gender of the students completing SET and the gender of the instructor (Basow & Distenfield, 1985; Basow and Silberg, 1987; Atamian & Ganguli, 1993; Goldberg & Callahan, 1991; Bachen, McLoughlin, & Garcia, 1999).

5. Younger instructors receive lower ratings:

The correlation, though slight, is actually the reverse: older faculty receive lower ratings (Feldman, 1983).  Research remains vague about where the curve is. A study of ratings for the same instructors for 13 years revealed no change. Some research suggests that students reward youthfulness and the "seasoned" instructor, over 55, but that the ratings become lower between those two poles (McPherson & Jewell, 2007). First year / very early career instructors consistently receive lower ratings, but this most likely does not reflect a bias (Benton & Cashin, 2012).

6. Student ratings are unreliable:

Research shows that student ratings, over time, are remarkably consistent (Theall & Franklin, 2001; Aleamoni, 1987).

7. Students are not qualified to rate instructors:

Students have the benefit of long-term and frequent exposure to the instructor. Students can judge how much they have learned, report frequencies of teaching behaviors, the amount of work required, the difficulty of the material, clarity of lectures, value of assignments, etc. (Arreola, 1994; Theall & Franklin, 1990). They cannot accurately rate the instructor's knowledge of the subject (Theall & Franklin, 2001).

8. As they age, students will appreciate instructors more:

Students’ opinions about instructors change very little over time (Theall & Franklin, 2001). Several studies of students from 1 – 10 years after graduation reveal that while a student may change his or her opinion about a certain subject, the opinion about the instructor remains the same (Centra, 1979; Frey, 1976; Marsh & Dunkin, 1992).

9. Students with higher GPA's rate higher: The research yields no conclusive correlation between overall GPA and high ratings for individual instructors (Benton & Cashin, 2012).

10. The time of the semester affects ratings:  

Facts

1. Personality affects ratings:

Though the research shows that personality in isolation does not increase ratings, two factors of instructor personality positively influence ratings: positive self-esteem and energy or enthusiasm (Feldman, 1986; Benton & Cashin, 2012; Davidovitch & Soen, 2009). Students generally respond--higher ratings and increased learning--when the authentic personality of the instructor shows--sense of humor, consistency in behavior, etc.--and the students believe the instructor is being open with them (Abrami, Rosenfield, & Dedic, 2007; Bain, 2004).

2. Students who learn more rate higher:

There are consistently high correlations between students' ratings of the "amount learned" in the course and their overall ratings of the teacher and the course. The students who performed the best on final exams also gave the highest ratings (Theall & Franklin, 2001).

3. Students rate instructors of the same ethnicity higher:

Little and contradictory research exists on ethnicity as an isolated factor in evaluations. Speculation exists that students of the same ethnicity as the instructor may rate him or her slightly higher (Centra, 1993).  Some research also suggests that non-white professors are at a slight disadvantage, less than a point, compared to their white colleagues (McPherson & Jewell, 2007).

4. English speaking students rate non-native English speakers lower:

Some research notes that students with English as their first language give slightly lower ratings to non-native English speaking instructors. Even further, a correlation between gender and language is noted; male non-native English speaking instructors receive slightly lower ratings than female non-native English speakering instructors (Hamermesh & Parker, 2005; Huston, 2005).

5. Class size affects ratings:

Most research supports the common belief that instructors with smaller classes receive higher ratings, though the statistical increase is slight--less than .09 (Benton & Cashin, 2012; Feldman, 1984, Hoyt & Lee, 2002). Though slight, other research proves that smaller classes do have positive effects on both student ratings and student learning (Centra, 2009).

6. Course requirement / level of prior interest affects ratings:

Instructors receive higher ratings in courses that students had a prior interest in, such as courses directly related to their major, or that students were taking as an elective (Marsh & Dunkin, 1992; Aleamoni, 1981).

7. Purpose of the evaluations affect ratings:

Studies show that the ratings are affected, positively or negatively, if directions state they will be used for personnel decisions as opposed to instructor improvement (Benton & Cashin, 2012).

8. Instructor's presence affects ratings:

Ratings will be higher if the instructor stays in the classroom while students fill out the ratings (Braskamp & Ory, 1994; Centra, 1993; Feldman, 1979; Marsh & Dunkin, 1992).

9. Anonymity affects ratings:

Signed ratings tend to be higher (Braskamp & Ory, 1994; Centra, 1993; Feldman, 1979; Marsh & Dunkin, 1992). Online evaluations raise concerns for students about anonymity, since they worry that digital evaluations will be easier to track (Benton & Cashin, 2012).

10. Academic field affects ratings:

Research shows differences in ratings by field. Arts and humanities frequently receive higher ratings than social sciences and math (Feldman, 1978). Increasing evidence supports this disparity by field, but the reason why it occurs remains unclear. Some speculate that certain subjects are more difficult to teach, while others wonder if the disparity in ratings reflects a larger trend of shifting capacities among students, i.e. students more easily grasp / respond to arts and humanities as opposed to social sciences and math (Feldman, 1978; Centra, 1993, 2009).

11. Undergraduate vs. Graduate:

Graduate students tend to rate instructors more favorably than undergraduate students (Aleamoni & Hexner, 1980; Goldberg & Callahan, 1991). Some of the research indicates that "grade inflation" on the graduate level raises ratings (Greenwald & Gilmore, 1997; McPherson & Jewell, 2007).