Defensible Teacher Evaluation: Student Growth Through Classroom Assessment

Defensible Teacher Evaluation: Student Growth Through Classroom Assessment

Resources & Links


Rick Stiggins joins Justin Baeder to discuss his book Defensible Teacher Evaluation: Student Growth Through Classroom Assessment.
Interview Notes, Resources, & Links

About Rick Stiggins

Rick Stiggins, PhD is founder of the Assessment Training Institute and the author of Assessment for Learning and several other books. Dr. Stiggins is one of the profession's leading authorities on assessment.

Full Transcript

[00:01] SPEAKER_01:

Welcome to Principal Center Radio, bringing you the best in professional practice.

[00:06] Announcer:

Here's your host, Director of the Principal Center and Champion of High Performance Instructional Leadership, Dustin Bader.

[00:15]

Welcome everyone to Principal Center Radio.

[00:17] SPEAKER_00:

I'm your host, Justin Bader, and I'm honored to be joined once again by Dr. Rick Stiggins. Dr. Stiggins is the founder of the Assessment Training Institute and one of the profession's leading authorities on assessment. You might know his book, Assessment for Student Learning, among others. And we're here today to talk about his book, Defensible Teacher Evaluation, Student Growth Through Classroom Assessment.

[00:40] Announcer:

And now, our feature presentation.

[00:43] SPEAKER_00:

Dr. Stiggins, welcome back to Principal Center Radio. Justin, it's an honor to be invited. Well, let's talk about the origin of the book. We've had a lot of changes in our profession in terms of teacher evaluation over the past half decade or so. What prompted you to write this book on crafting defensible teacher evaluations?

[01:01] SPEAKER_02:

Recently, many states, including the U.S. Department of Education, too, have begun to advocate the consideration of student achievement in the evaluation of teacher performance. And that's fine. I'm not opposed to that. But the policies that they established permitted and encouraged reliance on change in annual standardized test scores.

[01:20]

As the evidence of teacher impact on student learning, and I have known for decades, as has the entire measurement community, this is unacceptable practice. And we really, really need to find ways to get people in highest levels of policy to understand this. That is to develop the assessment literacy they need to set sound policies in this regard. Student achievement can be considered, but not in this way.

[01:42] SPEAKER_00:

So essentially we're taking data that is produced and collected for one purpose, and you're saying inappropriately using that for another purpose, evaluating teachers.

[01:51] SPEAKER_02:

Right. And there's a universally accepted standard in the measurement community that for an assessment to be used in a particular context, it has to be validated for that purpose. And these tests have not been validated for the purpose. of evaluating teachers, and I can be very specific about what that means. In order for a test to be acceptable in this context, it has to be so tightly linked to the teacher's performance criteria that it's a powerful enough microscope to detect the impact of that teacher. It needs to be sufficiently sensitive.

[02:26]

instructionally sensitive to detect the impact of that teacher on his or her students. And that research hasn't even been done by the test publishers, let alone validated.

[02:36] SPEAKER_00:

Right. And how did this kind of get sold to the public? And I know people in the psychometric community have written letters, statisticians have written letters kind of describing these concerns, but it seems like the message that we've gotten about the validity of using student test scores to evaluate teachers is that while we can measure the changes you know in student test scores from year to year and we can attribute those to the teacher and we can control for other factors you know the rationale goes we can say that this teacher caused this level of student growth this year and then this other year we have you know a different set of data like the numbers are there But tell us what doesn't work about that, though.

[03:19] SPEAKER_02:

The fact is that we have people at very high levels of education policy setting policies about these kinds of things, and this is just the tip of a big iceberg, that simply have not been given the opportunity to understand the basic principles of sound assessment practice. And so it's easy in a context where we want accountability, where we want to evaluate teachers, to grasp from a very naive and damaging perspective at Strauss. And the numbers are there for absolute certain. And the thing that people at that level don't do is to delve into the background of that kind of thinking. Now, again, it's not inappropriate to consider student achievement. And I have proposed in the book a very specific set of strategies for doing that.

[04:06]

But with regard to the standardized test, first of all, what people need to understand is that they sample very broad domains of achievement, long lists of standards, sometimes spanning more than one grade level. So these are big domains. And remember now, the tests are timed. You get like maybe 35 or 40 minutes to ask enough test items to sample that domain. The problem then becomes apparent right there. The probability is great that a teacher's highest priority achievement standards won't even have been covered by the test.

[04:41]

The sample is very wide and very thin. And so it's unfair to hold a teacher accountable for scores on tests that don't reflect the things that are their teaching responsibility. And that's just the tip of the iceberg, Justin. These tests, because of the need for scoring efficiency, rely on multiple choice tests. Well, can you think of any teachers who have high priority achievement responsibilities that don't translate into multiple choice test items? Think the arts, think physical education, think performance kinds of things in science, etc.,

[05:14]

that simply can't be translated. These are closed-book tests in an information age. What teachers are doing is differentiating between knowledge that students need to know outright and that which they can retrieve if and when they need it in the infinitely large array of online knowledge. Well, these tests don't account for that. All they assess are the things that students respond based only on knowledge they bring with them to the test. That no longer reflects the learning priorities of our society.

[05:45] SPEAKER_00:

I had a previous guest who said, you know, if we're concerned that our students are going to just Google the answers to their homework and cheat in that way, maybe the best way to address that is not by checking up on them, but by giving better homework that can't be Googled.

[06:01] SPEAKER_02:

Right. The homework thing is another whole topic that we could talk about. Let me just add one other point here, and that is the time span of a year, it brings into play a wide variety of factors that are beyond the control of the teacher. In fact, the most compelling research tells us that during that time span, teacher differences may account for like 10% of the variance in test scores. That's the evidence of my point that there can be a profound mismatch hidden in here. And the point is, much more variance is attributable to factors in home, community, school, etc., that are beyond the teacher's control.

[06:44]

Once again, the issue of fairness comes into play here. It simply can't be defended. Too many factors beyond the control of teachers that contribute to these scores.

[06:53] SPEAKER_00:

Well, and I think we try to address that issue of fairness by saying, well, all teachers are evaluated on the same system. So if we have a set of numbers for one teacher and a set of numbers for another teacher, you know, sure, they're imperfect, but at least those teachers are both being treated in the same way. And what you're saying is it's not just that you can get a number and do the same thing to every teacher, but it's what that number fundamentally means. Help me with that a little bit.

[07:18] SPEAKER_02:

and how one attributes it. That is your point from earlier, how one attributes the cause of those scores. This entire standardized test system, if you think about this, is simply too imprecise. If the domain sampled spans more than one grade level, then it spans more than the responsibilities of more than one teacher. Well, what if one teacher does a great job, but the one before didn't? And, you know, it gets all tangled up in those things that are confounding factors.

[07:45]

If we want to get at student achievement and teacher evaluation, we got way better ways to do it than this. And we even talked about the whole value added thing, which comes into play here, too.

[07:54] SPEAKER_00:

I wonder if I might mine your expertise a little bit more on value-added because I think people need to understand that you do have a PhD in measurement, in the psychometric aspects of this, and there really are only maybe two or three people who are actively writing with that expertise. And there are a whole bunch of people talking about value-added and how great it is and how we can use it to make all these decisions that are justified based on their supposed impact on student learning and being good for kids. And it almost becomes this kind of whose side are you on, kids or teachers kind of argument. But I really want to get into the kind of technical reasons why you're saying what you're saying, that there is science behind this, there is math behind this. It's not just, you know, Rick and Justin like teachers and want to take their side.

[08:41] SPEAKER_02:

Yeah, no. And it isn't even about teachers versus kids. It's about sound practice. First of all, with regard to the value-added models that are being applied, they rely on test scores of the sort we've been talking about. So all of those problems get entered into the equation before we start talking about the statistical analysis problem. So we're starting with scores that only a 10% of which is accounted for by teacher differences.

[09:06]

So we got that starting. Then we start manipulating the scores. And so what these value-added models do is they try to address this issue of take into account the extraneous factors that could account for the scores, control those, and see what the teacher's contribution is. Two problems. One is that none of them can account for the full array of things that account for all the variance in student achievement. They simply are too narrow in their focus.

[09:32]

The second is they vary in the things they account for. So a teacher's evaluation would be based on the model used rather than reality of their impact on kids. And as a result of that, the fact that they're not accounting for enough of the factors and they're all accounting for different factors simply causes the measurement community and the statistical community to dismiss these models as inappropriate in this context. It's almost a universal acknowledgement that they're inappropriate, unless you happen to be selling them. In which case, they will simply ignore the positions of the measurement community. No.

[10:10]

Once again, if we want to get at a teacher's impact on kids, it's not a matter, in this case, of statistical analysis, it's a matter of precision. I want to talk common sense about this. And this is not common sense. This is not common sense.

[10:26] SPEAKER_00:

Right. And we hear some of the more egregious, non-commonsensical examples, like where we have PE teachers being evaluated on reading score, people being evaluated on subjects that they literally don't teach. But one of the more nuanced criticisms that I heard in what you've said so far is that standardized tests are not a great match for what teachers actually teach.

[10:49] SPEAKER_02:

Here's the problem. There's a pretty high probability of the danger of a mismatch between the things that happen to get tested in the sample of the domain and any individual teacher's instructional responsibilities. And if there is a mismatch, then that teacher has no control over the scores that are being used to evaluate them. And that's just patently unfair. You can't do that legitimately and defend it. We've got better ways to do this.

[11:14] SPEAKER_00:

Well, and I think on the other side of the coin, when teachers know what they're being evaluated on, we end up with kind of a Campbell's Law problem. And I wonder if you could kind of unpack that for us. What do you see happening when we make a certain test score very high stakes for teachers?

[11:31] SPEAKER_02:

It has the effect, obviously, of trying to narrow the curriculum, but it's a sort of a futile attempt because of the matter of the thinness of the sample. And we get into all the test prep stuff. that has nothing to do with learning the content or the reasoning proficiencies of the disciplines that we want kids to master. And a lot of time is wasted that could be productive instructional time. You know, what we should do is just step back for a minute and review the criteria that is what the proper teacher evaluation procedures would be. I detail those in the book.

[12:02]

One is that teachers deserve prior notice of when the evaluation is going to take place so they can prepare and be ready to demonstrate their capabilities. The second is the criteria by which they will be evaluated need to be made explicit in advance of the evaluation. So they have time to perform at the highest level of the continuum associated with each criterion. And then the people who are going to do the evaluation need to be properly trained to apply those performance criteria in a dependable way. Are you getting antsy about this? Because these are things that are really important that we're not paying attention to.

[12:43]

And then the evidence of teacher performance. needs to be gathered with a sufficient sample to lead to a confident inference about where they are in those continuums. These are all accepted standards. The teacher needs to be given the opportunity to address any factors beyond their control that may have influenced the level of their performance. And then the results need to be communicated to teachers in very clear and explicit ways so they can offer explanations as they need to. Well, if you're considering standardized test scores in this context, then the explicit achievement standards being assessed need to be made apparent in advance.

[13:20]

There needs to be an alignment established for each individual teacher of the things being tested with their instructional responsibilities in case there's a mismatch. There needs to be a systematic assessment plan with high quality assessments to gather the data. There needs to be a pre-test, post-test gathering of evidence to detect impact. We can do all these things, but not once a year with these inappropriate test scores. And then once again, teachers need to be given opportunity to explain factors beyond their control that may have influenced the scores. We know how to do this right.

[13:55]

It's not a mystery how to do it right. And that's why policies that demand that we do it badly are so, so problematic.

[14:02] SPEAKER_00:

Well, let's talk a little bit more about what we could be doing if we had the level of assessment literacy that we needed, if we designed systems that were taking into consideration what you've talked about. You have in the book some opportunities for us to really make smarter use of assessment in the teacher evaluation process and talk a little bit about student learning objectives. But what's your take on that situation?

[14:26] SPEAKER_02:

Right. Well, the foundation of a better system is the one that you just actually mentioned, and that is sufficient systems foundation of assessment literacy so that we can count on teachers and their principals to understand the difference between gathering good data and bad. And we want to concentrate, obviously, on gathering good evidence. So here's the simple system that I'm recommending, and there's much more detail about it in the book if people want to pursue it. First of all, the supervisor and teachers sit down and identify the teacher's highest priority achievement standards, and there could be several of them. that are going to unfold at some point during the coming grading period or coming year.

[15:02]

And having agreed upon those standards, then the teacher develops assessments high quality assessments associated with each of those achievement standards. And those assessments are reviewed by their supervisor in terms of their quality. Now you can see why the foundation of assessment literacy is critical. And then in addition to that, then as instruction unfolds during the time when instruction is focusing on each of those standards, the tests associated with those standards would be used in a pre-test, post-test administration. I mean, subtending the instruction focused on that specific standard so that the teacher's impact on that standard could be neatly detected. And then the teacher compiles those things over the year and puts together a portfolio of the evidence of their impact on student mastery of the agreed upon high priority standards, and they present that.

[15:52]

to, I'm suggesting, their supervisor and a committee of qualified teachers to review their portfolio and make judgments about their impact on student achievement according to the highest priority achievement standards. Now, if we have a foundation of assessment literacy in place in a school building, this is the kind of system that can help us meet all those criteria that I just spelled out and do it in a very powerful way. And once again, teachers in that presentation would be given the opportunity to describe any extraneous factors beyond their control that might have influenced their success. That simple local school building, school district system would be carried out at a fraction of the cost of anything else going on and produce much better data in a much more productive, professional, honor the professionalism of teachers and principals kind of environment.

[16:44]

Makes sense to me.

[16:45] SPEAKER_00:

So I'm thinking about something that I'm hearing more and more about in recent years, which is kind of end of course exams or common kind of teacher created, but standardized across the department or across the district exams that are curriculum based, that are developed by the teachers in accordance with the curriculum that they are intending to teach. So they're intentionally measuring what they think students should master. How do those typically stack up? You know, let's say people understand at a fairly deep level how to design good classroom-based assessments. They have clear criteria for what they want students to learn in their courses. How well do those typically stack up according to your criteria?

[17:24] SPEAKER_02:

I'm not going to accept your beginning assumption that the people who do this are good at assessment. If they are, If they are, we'll come back to that in a minute, then to the extent that the things being assessed align with the teacher whose performance is being evaluated in terms of those scores is fine. That's the kind of precise link that we're looking for. If that link can be established, like, yes, this covers my instructional responsibilities and the teacher and the supervisor agree on that, it's the kind of test, a kind of test that could be used in the system that I'm talking about. But what we have to be clear about here is that the vast majority of teachers come to that process without having been given the opportunity to develop the assessment literacy they need, or at least there's the danger of that, unless we believe they can turn to their principal for help in this regard. Let's be clear about the fact that relevant, helpful assessment training remains non-existent in leadership preparation programs.

[18:17]

So the big if is if the foundation of assessment literacy is in place, these tests can have a role to play, absolutely.

[18:24] SPEAKER_00:

Yeah, and I think a big factor there in terms of teacher buy-in is that teachers tend to mind less when they're being evaluated by tests that they themselves had a voice in creating. I think what's so frustrating to a lot of teachers is that the test that they're held accountable for is a black box to them. They never actually get to see it, much less be able to teach in alignment with it.

[18:46] SPEAKER_02:

And very often, it doesn't get to see them either. That is, the misalignment is profound. And that kind of gross measure just isn't what we need here. And then, of course, I want each teacher to have sufficient assessment literacy to be able to work with their supervisor to say, this is an achievement standard that's a high priority for me. Maybe not everybody else, but it is for me. And I'm going to devote instruction to it.

[19:07]

And I want this evidence of my impact on that standard to be considered in my teacher evaluation. And I think they ought to have the opportunity to do that. Actually, that's consistent with the system that I described in the book, Perfect Assessment System, the ASCD. where teachers really are players in the assessment world.

[19:27] SPEAKER_00:

What do you think are some of the key competencies in terms of assessment literacy that we're lacking as a profession? And I know you've been beating that drum for a good long time that we need to develop that assessment literacy and still we're not there. What do we specifically need to understand or to get better at within that process of classroom-based assessment?

[19:48] SPEAKER_02:

First of all, I don't want to begin... assumption that everybody's bad at it, because lots of teachers are really, really good at it. They were well-trained, as you were, and so are truly competent at this, as are some principals. But what I ask is that people do a pretty systematic evaluation of their own assessment literacy, and if they find it wanting, step up and take responsibility.

[20:10]

I provide criteria for doing that in the book, in the Defensible Teacher Evaluation book. So do the self-analysis with respect to Can you select a proper assessment method given the learning target? We have a variety of assessment methods at our disposal. They're not interchangeable. They align well with different kinds of achievement. Selecting the proper method is good.

[20:32]

Secondly, can you frame an appropriate sample of student performance with regard to the learning target or the targets and questions that will lead to a confident conclusion about their mastery? It turns out each assessment method carries with it certain rules of evidence for how to sample appropriately. If you know all of them, you get good data. If not, trouble. And then third, one must be able to create good assessment exercises and scoring schemes. If it's going to be a performance assessment, it's got to have a good scoring rubric, not a bad one, and you got to know the difference.

[21:05]

So there's the matter of the methodology itself. And then finally, every assessment context brings with it an array of potentially, an array of sources of bias that can distort results. If I know what those potential sources are, then I can control them and minimize the distortion. If I don't, I'm in trouble. So managing, minimizing bias within assessment is a big deal. So proper methods, proper sampling, good assessment development, and control of bias.

[21:34]

Those are the things that are the foundation. There's nothing technical there. I've not mentioned validity, reliability, or anything along those lines. I'm saying common sense here, common sense. And there are plenty of really good professional development programs available that teach these things, including ours. So that's what I mean by assessment literacy.

[21:54] SPEAKER_00:

Good deal. And you've got rubrics in the book for people to kind of self-assess and look at their own practices.

[21:59] SPEAKER_02:

Right, and suggestions on how to grow.

[22:02] SPEAKER_00:

Yes. So the book is Defensible Teacher Evaluation, and we referred briefly to our previous interview on your more recent book, The Perfect Assessment System. Dr. Stiggins, if people want to find out more about your work and locate you, what's the best way for them to do that?

[22:18] SPEAKER_02:

First of all, the Defensible Teacher Evaluation is a Corwin publication, so go to Corwin to find out. But I also have a website that links to that, rickstiggins.com. or if anybody wishes to talk directly, contact me at rickstiggins at gmail.com.

[22:34] SPEAKER_00:

Well, thanks so much for joining me again on Principal Center Radio. It's been a pleasure.

[22:37] SPEAKER_01:

Justin, thank you for inviting me. And now, Justin Bader on high-performance instructional leadership.

[22:44] SPEAKER_00:

So high-performance instructional leaders, what did you take away from the conversation with Dr. Stiggins about defensible teacher evaluation and developing assessments that can serve as a defensible basis for teacher evaluations? I hope you heard some of his comments about assessment literacy and about how we can build a foundation of assessment literacy that allows teachers to work together, to work with their colleagues, to develop assessments that we can trust. and we didn't really get into the idea of teaching to the test but we got we kind of alluded to the uh the idea of alignment and of teaching in such a way and measuring in such a way that we're getting the information we want about what matters to us in terms of student learning i think teaching to the test has taken on a negative connotation in terms of as we discussed test prep when we know the the narrowness of that and the irrelevance of that to students actual day-to-day learning

[23:41]

is resulting in that being a harmful practice. You know, we know we don't want to spend more time on test prep, getting away from our curriculum and really just focusing on getting scores up. But at the same time, we've got to understand why test prep is so popular and why test prep quote-unquote works. There's a phenomenon that I alluded to that we didn't really discuss or define called Campbell's Law, and you can look up Campbell's Law on Wikipedia. Donald Campbell said, "...the more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures, and the more apt it will be to distort and corrupt the social processes it is intended to monitor."

[24:24]

And teaching to the test, of course, is a classic example of Campbell's Law. And when we have even more ridiculous teacher evaluation systems where we're evaluating PE teachers based on reading scores and things like that, then we can only expect that people are going to change their behavior in order to protect themselves in order to optimize what the test says about them or what the scores say about them. And of course, that means in a lot of schools, we spend too much time on test prep. And in a lot of districts, we have things happening with enrollment and with expulsions and with suspensions. And we have all kinds of games that schools are being pressured to play to change who shows up on test day and who actually takes the test. And we know that those games that we play are bad for kids.

[25:13]

that operate because of Campbell's Law are bad for kids. So my message to you is, based on what I talked about with Dr. Stiggins, is to look at sound assessment practice. What does good assessment look like? And then build on that. Don't build on any number that we can get, any number that's convenient because we have a data source.

[25:35]

Build on a foundation of sound assessment. And as Dr. Stiggins said, we've got to build in that assessment literacy that professional development that it takes so that we're able to do good assessments, so that we're able to get valid information that tells us what we need to know about what we truly care about, which is student learning. So again, the book is Defensible Teacher Evaluation by Dr. Rick Stiggins. You can check that out on our website.

[26:01]

We'll have a link to Dr. Stiggins' website and where you can get the book in the show notes. And this episode of Principal Center Radio is brought to you by our other podcast, High Performance Habits. If you enjoy getting professional development via your headphones, you can do that as a pro member of the Principal Center, and you can get some of my best and most direct advice for improving your productivity, for increasing your impact on student learning, and for building capacity for instructional leadership in your school in our members-only podcast, High Performance Habits. That is part of our pro membership, and you can find out more about pro membership at principalcenter.com slash join.

[26:44] Announcer:

Thanks for listening to Principal Center Radio. For more great episodes, subscribe on our website at principalcenter.com slash radio.

Bring This Expertise to Your School

Interested in professional development, keynotes, or workshops? Send us a message below.

Inquire About Professional Development with Dr. Justin Baeder

We'll pass your message along to our team.