Assessing the Nation’s Report Card: Challenges and Choices for NAEP

Resources & Links

Get the book, Assessing the Nation's Report Card: Challenges and Choices for NAEP

About the Author

Chester E. Finn, Jr. is a distinguished senior fellow and president emeritus at the Thomas B. Fordham Institute and a senior fellow at Stanford University’s Hoover Institution. He has previously served as Professor of Education and Public Policy at Vanderbilt, and was the United States Assistant Secretary of Education for Research and Improvement. He is the author of numerous books on testing, accountability, and education policy.

Full Transcript

[00:01] Announcer:

Welcome to Principal Center Radio, helping you build capacity for instructional leadership. Here's your host, Director of the Principal Center, Dr. Justin Baeder. Welcome, everyone, to Principal Center Radio.

[00:13] Justin Baeder:

I'm your host, Justin Baeder, and I'm honored to welcome to the program today Chester E. Finn, Jr., who is a Distinguished Senior Fellow and President Emeritus at the Thomas B. Fordham Institute and a senior fellow at Stanford's Hoover Institution. Dr. Finn has previously served as Professor of Education and Public Policy at Vanderbilt and was the United States Assistant Secretary of Education for Research and Improvement.

[00:36]

And he's the author of numerous books on testing, accountability, and education policy, and certainly an honor to speak to such a giant in our field.

[00:47] Announcer:

And now, our feature presentation.

[00:49] Justin Baeder:

Dr. Finn, welcome to Principal Center Radio. Thank you very much. It's a pleasure to be with you. Well, I'm excited to talk about your new book, Assessing the Nation's Report Card, because for decades now, the National Assessment of Educational Progress has been an assessment that we've given in schools, but not all schools, and has shaped public education policy in numerous ways, but that are often invisible to educators. to frontline educators, as well as to parents and the public.

[01:20]

So I wonder if we could start just by talking a little bit about what NAEP is and how it works and why it's so important, just to set the context a bit.

[01:27] Chester E. Finn, Jr.:

Sure. Happy to do. I sometimes call this the most important test you've probably never heard of. This is a federal program. It's been around for 50 years. That is our best barometer of whether young Americans in K-12 education are learning anything and is administered to a statistical sample of kids in grades 4, 8, and 12.

[01:51]

And because it's a sample, it does not produce results at the school building level, at the child level, or even in most cases at the district level. The results it produces are for the country as a whole and for every state in many subjects, not all subjects. Ten subjects tested. Congress has required that reading and math be tested in grades four and eight every two years as part of what used to be No Child Left Behind and is now the Every Student Succeeds Act. And so every two years we get reports, again, for the country and for states, on fourth and eighth graders in reading and math. The other subjects and the 12th grade data come less frequently, and they're often only at the national level, and they're not always in all three grade levels.

[02:39]

This is because of a variety of constraints, including budget. Even though the program as a whole costs almost $200 million a year, it's administered by the US Department of Education, but it has an independent governing board called the National Assessment Governing Board. So it is not under the thumb of politicians. It is under the thumb of a 26-member board that decides what to test. and basically how to go about it. So because it doesn't deliver results at the level that most educators and certainly parents care most about, most people don't know about it.

[03:18]

But if you're a governor or a chief state school officer or a president of the United States, you care quite a lot whether things are getting better or worse. Additionally, it's our best gauge of learning gaps, achievement gaps between groups of students, whether by race or by disability status or gender, etc. And in addition to that, ever since No Child Left Behind, NAEP has functioned as a kind of a truth squad with respect to state standards and tests. Because if the state assessment is reporting that, let's say, 70% of fourth graders are proficient in math, but NAEP tests that same state's fourth graders in math and says that 30% of them are proficient, there is reason to wonder, why is the state saying 70%? Is that because the state's standards and rigorous expectations are lower?

[04:11]

than those of NAEP. So it functions as a kind of an auditor, really, for state assessments. And in that way, it does indirectly affect what happens in schools and in curriculum and in teacher preparation and so forth, and certainly in accountability systems under federal law for a great many public schools. And therefore, it has consequences that you don't know about because there's sort of two degrees of separation from the report that your school is getting about how it's doing in these core subjects.

[04:43] Justin Baeder:

It's interesting to see discussions of state-level assessments. And before we had the Smarter Balanced Consortium, the newer testing consortiums, there was always discussion of, are these state-level assessment results improving because we're just changing the cut score? That we're just lowering our standards and saying more students are meeting standard, good for us. And as you say, this is kind of a check on those state level accountability systems. When it comes to some of the decisions that are made, because education in the United States in particular is extremely local, extremely state driven, and only to a limited extent driven at the national level and by the federal government, What are some of the policy or accountability implications of performance on NAEP? So if a state is, say, cutting corners on their assessment, lowering standards, what are some of the actions that might be taken or some of the implications that might occur to address that?

[05:39] Chester E. Finn, Jr.:

The implications are mostly sunshine and embarrassment. for state leaders. The national assessment has no enforcement power. It doesn't cause a state, doesn't force a state, in no way requires a state to change its cut scores, to change its tests, to change its accountability system. I have quite a lot of flexibility with regard to where they set their standards and what kind of assessments they give and whether or not they have consequences for schools. What NAEP does is, in the background, it is telling the public and the country and the electorate and the elected officials and whoever might be running for office or monitoring the state's progress.

[06:25]

It might be a business group. It might be a parent group. It's a kind of an additional and very visible report on how the state is doing in national terms. So in effect, Ohio or Illinois or Missouri or Vermont or Oregon gets reported against a national standard. And that gives people in that state quite a lot of important information with which they can judge how the people leaving their own state are doing. Maybe they want to throw them out and replace them with somebody else who might expect more from the schools or conceivably less from the schools, though I hope that's not the case.

[07:02] Justin Baeder:

You say that the first half of the book is the history of NAEP and giving some of the background and context, but you also spend quite a bit of time talking about the future. So you've been following this assessment for many years now and commenting on its use. What do you see either changing right now or possibly changing in the future? And what do you see unfolding in the coming years with NAEP?

[07:25] Chester E. Finn, Jr.:

It's a surprisingly delicate mechanism, and when you change it, you have to be careful or you will lose what's known in the trade as the trend line. NAIP is mostly valuable telling you whether things are getting better or worse, whether gaps are getting wider or narrower. You only know that if the continuity of the assessment is such that you can see whether the change is real. And if you change the test totally, you won't have a trend. Therefore, it needs to evolve rather than be radically overhauled unless the governing board decides the time has come to start all over again in science. Science has changed so much that we no longer want the trend from the past.

[08:06]

We want to start a fresh start. That occasionally has happened historically. But mostly people want the trends. So changes need to be done very carefully. An example is that over the last 10 years, really, NAEP has moved from paper and pencil testing to digital testing using devices. It's only moved about 80% of the way there because the federal contractors still schlep the tablets from school to school.

[08:30]

They don't use what's in the school and they don't use the cloud. They sort of carry this hardware from participating school to participating school. It's actually an antiquated use of modern technology, if I can put it that way. So one of the things they're working on now, essentially technologically more sophisticated without necessarily changing the test itself, However, you also occasionally need to change the test itself because what is expected of kids changes over 50 years. And so what's known in this field as the framework, which basically sets forth a kind of a consensus view of what kids should be learning in reading or math or history or science, occasionally those need updating. There's a brand new framework just going into effect for reading, for example.

[09:15]

They say the trend line will be preserved. And the governing board of NAEP is just starting on a renovation of its science framework. So these changes occur, I hope gradually, not abruptly, in a kind of a cycle. Usually for these kinds of changes, it's about a 15-year cycle. This thing is very slow. This is a very large, complicated multi-year mechanism.

[09:41]

Another important point, however, is that the assessment could be doing some things it doesn't currently do, and whether it can afford to add them is important. For example, about 27 districts currently get district-level data through a kind of an add-on called the Trial Urban District Assessment. These are big cities, basically. But a whole lot more districts would like to get district data from NAEP. And today, NAEP can't afford to do that. So that would be one add-on.

[10:12]

Another one that, to me, is very important is at the 12th grade level, NAEP does not deliver state-level data. as it does in grades four and eight. I would think that end of high school is when state officials might most want to know how their kids are doing, how their schools are doing. And yet states don't get data from 12th grade NAEP. Again, it's budget and appears to also be lack of demand. I'm hoping my book builds some demand among state leaders to say, we want the 12th grade state level data, but they don't get it today.

[10:44]

So that would be another example of a change going forward.

[10:47] Justin Baeder:

I can certainly appreciate the need for stability and maintaining that trend line so that we can see if changes are real changes or just changes in how we measure things. And certainly if you change a test too much from one year to the next, that question gets introduced. I'm glad you mentioned the need for the test itself to change as our expectations change. And I'm thinking about science standards in particular. I became a science teacher 20 years ago this year, and we had completely different standards, mostly state standards. There were no national standards to speak of.

[11:19]

And looking at the next generation science standards now, it is a night and day difference. I'm just blown away looking at some of the tasks that students are asked to perform in Thinking in terms of what NAEP is able to measure, there's a lot of criticism within the profession of multiple choice standardized assessments. Teachers and principals, we don't tend to like multiple choice tests too much. To what extent is NAEP a multiple choice test? What other types of questions or items are included? And how's that changing?

[11:47] Chester E. Finn, Jr.:

There's been an ever-increasing amount of free response, open response, and short response answers. So there's some short essays. There's some fill-in-the-blanks. There's some sort of two sentences. There's a lot of applied, demonstrated knowledge and skill that isn't just multiple choice. There's a mixture.

[12:07]

Of course, it relates to so many different things, including, again, cost and also technology. one of the things they're trying to do is engage artificial intelligence more in the evaluation of things like open response answers because if human beings have to do it you run into all kinds of issues of how long does it take how much does it cost and are the various readers expectations the same so it's a tricky business when you move away from things that machines can score into things that people have to score. But there's a lot of things in NAEP today that are not just multiple choice and put it that way. And that's been increasing, increasing over the years. So one more thing about your science point, which is valid. However, not every state is using the next generation science standards.

[12:54]

Many are. Most are, I believe, but several aren't. So as you revise the national assessment, how do you deal with the tension between the large number of states that want to use something like the NGSS to be the basis for NAEP and the handful of states that say, well, that's not what we teach here? That's an example of the kind of thing that has to be worked through kind of carefully and carries with it the risk of a loss of consensus, which is a big problem because NAEP needs to be credible to be believed that it is a valid nationwide measure. And that's tricky when states are different from each other, which they certainly are in this country.

[13:36] Justin Baeder:

Yeah, and it strikes me as especially tricky when some states have state standards that are different from other states or from national standards on purpose, almost as a point of pride for their political leaders to say, hey, we do things differently here. What are some of the things that are happening in the national context with that? Because you mentioned the pressure that can effectively be applied through NAEP if a state is lowering its standards and acting like it's improving. What are some of the other kind of tricky policy things that are going on there?

[14:03] Chester E. Finn, Jr.:

The one that I guess I'm going to say worries me the most is the culture wars and their potential influence on these frameworks that decide what's going to be tested. Sticking with a science example, the last time the science framework was revised was about 20 years ago. at which point climate change was not a big issue for people. Therefore, the current framework doesn't much deal with climate change. So as you go into a new science assessment, how much of the new framework should deal with climate issues? Well, you're walking into a political challenge there, as well as an educational and substantive challenge there.

[14:47]

How much should deal with climate change? And from what point of view, if I can put it that way, not all state leaders, certainly not all national leaders agree on this topic. The reading framework almost caused the governing board to come unglued over how to deal with some of the equity challenges in the assessment of reading. That finally got patched together into a consensus again, which was, I think, very important for the credibility of this whole thing. But it was touch and go for a while. And after science, they've got to get around one of these days to revise updating the frameworks for history and civics.

[15:22]

Talk about culture wars. This could be a big fight rather than a new consensus. I hope they can find a way to keep the consensus flowing because, again, Much of the value of NAEP depends on its acceptance as a valid barometer of what kids know and can do. Well, for it to be a valid barometer, there's got to be some agreement on what they ought to know and do. And that underlies the credibility and acceptance, really, of this whole assessment.

[15:53] Justin Baeder:

And it's interesting, the governing board that you've mentioned having 35 different members. I mean, it certainly sounds like a large committee and anybody who's been on a committee of, you know, having occasionally served on committees that were that large. It certainly sounds like a difficult group to wrangle and to keep on track. I think one year our school leadership team rose to into the 20s before we reformulated and said, this is too many people. We can't make decisions this way. How does that group come together?

[16:17]

Who decides who's on it? Because it certainly plays an important role. Like the test itself, it's often invisible to most Americans.

[16:24] Chester E. Finn, Jr.:

Right. I was, incidentally, among my various involvements with NAEP over the years, I was the first chairman of this governing board during the first two years of its existence around 1990. And so I've got a kind of a special place in my heart for the 26 people that kind of got this off to, I believe, a very good start back then. By law, it's a kind of a carefully constructed Noah's Ark. It contains, for example, two governors or former governors of different parties, two state legislators of different parties, two state school superintendents, chief state school officers, two local superintendents. Several principals, I think three, if memory serves, elementary, middle, and high school principals, teachers, general public, testing experts, curriculum experts, business representative, a private school representative, and so on.

[17:14]

So the law specifies the categories. The Secretary of Education, U.S. Secretary of Ed, makes the appointments for terms, three-year terms, often renewed for a second term. But the vetting process that goes on before the secretary even gets a list of candidates is very elaborate. And the board does that itself.

[17:35]

So the board, in effect, functions as its own nominating committee, presenting the secretary of education with several choices for each opening. After going through a big solicitation and vetting process within the governing board itself, I call it semi-self-perpetuating. So the secretary of vet does not have a free hand to appoint her cousin to the governing board because there's a limited pool of people who can be picked among, and the board itself basically decides who's in that pool.

[18:05] Justin Baeder:

It occurs to me the three-year term is probably not accidental, given that other elected officials are, you know, presidents are elected on a four-year cycle. So you have a certain degree of political independence there, both in terms of nomination and term and so forth. And I wonder, in terms of independence, what you think some of the lessons that we could learn elsewhere in the profession are? Because NAEP has so many differences with so many of the other standardized assessments that we give. It's used in different ways, but it's also insulated from many of the pressures or flaws that we face with other assessments. I've wanted to ask somebody about this for years, and I think you're the perfect person.

[18:44]

Our listeners may not be familiar with Campbell's Law, but the tendency from sociologist Donald Campbell... for a quantitative measure that has a lot of accountability attached to it, that has very high stakes. Campbell's Law is this idea that the more high stakes a particular measure is, the more that measure will be distorted. If we're accountable for a certain test score, then we might have cheating, we might have excessive test prep, we might have lowering the standards within the test itself.

[19:11]

We have all these inappropriate pressures that come to bear on our state tests, on whatever frontline educators are held accountable for. How is NAEP different in that way? And maybe what can we learn about how we should be using assessment NAEP is kind of instructive toward?

[19:28] Chester E. Finn, Jr.:

Well, the key difference, and maybe sadly states aren't allowed to do this under federal law, NAEP is a sample test. And therefore, since nobody's getting a report for their classroom, their student, their child, or their school, There's no incentive to cheat. There's no incentive to even teach to the test, frankly, because nobody's going to get results back that have any effect on themselves. So sampling and reporting at a unit that is high enough that it doesn't really impact people's lives is good in terms of the integrity of the test. It's not good if you want to have an accountability system. where you actually want consequences to be associated with the test.

[20:09]

And so I don't think you can have it both ways. You either have an accountability system with consequences, in which case you've got to know the level at which you want the consequences to kick in. Is it the kid getting into college? Is it the school getting two stars instead of five? If you want the the accountability to kick in at the kid level or the school level, then you have to test in such a way that you get results at that level. NAEP doesn't do that.

[20:35]

And its impact, therefore, on things like curriculum, as I mentioned, is indirect. It sort of seeps in through the state's decisions about whether it wants to change its standards because they're too different from NAEP. That would be an indirect influence on curriculum, let's say, in the state or on the test and the cut scores in the state. I mean, states could also create independent arrangements for the creation of their own assessments instead of just leaving it to the state testing director, which is what they generally do today, and a contractor that gets hired. But I don't think you can achieve the kind of low stakes effect of NAEP and therefore how it's largely counterproductive. spared from Campbell's law at the same time as you're using a test as a high stakes accountability measure.

[21:23]

You got to decide which way you want to go.

[21:25] Justin Baeder:

I think that's well said because we're always going to have different assessments for different purposes. You know, if we're giving a test like NWEA's MAP, which many schools use now to measure growth, while teachers don't get to see that, they can't teach to the test. It doesn't match their curriculum because it is a standardized test. But yeah, it's a matter of matching to the purpose of the test.

[21:44] Chester E. Finn, Jr.:

And if you want to give teachers feedback through what's usually called a formative test, you do something very different. And it's usually low stakes, but it lets the teacher know which kids have learned which things. I think it would be great if your listeners looked up the national assessment. There's a ton of information on its website and acquainted themselves with it. They might even want to know how's their state doing on this metric and have a look at how that might compare with what the state is telling them about how their kids are doing. might be at least informative.

[22:14]

And in any case, that governing board needs principals and teachers on it. So when the time comes to apply or get nominated, think about that too. Indeed.

[22:25] Justin Baeder:

Indeed. So the book is Assessing the Nation's Report Card, Challenges and Choices for NAEP, published by Harvard Education Press. Dr. Finn, thank you so much for joining me on Principal Center Radio. It's been an honor. My pleasure.

[22:38] Announcer:

Thanks for listening to Principal Center Radio. For more great episodes, subscribe on our website at principalcenter.com slash radio.

Bring This Expertise to Your School

Interested in professional development, keynotes, or workshops? Send us a message below.

Inquire About Professional Development with Chester E. Finn, Jr.

We'll be happy to make an introduction.

← Back to all episodes