Philadelphia IMP Research Summary
1. Student Attitudinal Surveys, 1994: As a new program,
which was completely voluntary both for the IMP teachers and students, the
first question we wanted to know was whether the 9th grade students
who chose to take IMP would want to continue with the program. If no students
wanted to continue, we could not run the program. To gauge student interest, we developed a short questionnaire on
content and pedagogy and gave it to students at the end of their first year, in
May 1994. We wanted to know whether
there were any large differences between different types of schools. We had
started with three high achieving “special admission schools,” one vo-tech, and
two neighborhood comprehensive schools.
The result: students overwhelming preferred IMP to algebra classes
and indicated they would continue with the course. (See the Ninth
Grade Student Attitudes report in the
Appendix.)
2. Passing Rate and Attendance Data: Principals in Philadelphia’s twenty-two comprehensive
high schools face the problems of low student passing rates and a year-by-year
loss of students who drop out of school.
In most of these high schools, the graduating class is less than half of
what it was in ninth grade. As IMP
expanded into these comprehensive high schools, the first question was whether
IMP students had higher passing and attendance rates than did students using
pre-standards materials.
The results: By and large, IMP students had consistently higher
passing and attendance rates in all of the comprehensive schools, not only in
math, but English, social studies and science compared with their peers in the
same schools. (See Passing
Rate Comparisons 1995, 1996, 1997)
Methodological note:
To guard against the claim that IMP teachers were liberal graders, thereby
inflating the IMP passing grades, we also looked at other subjects and found
higher passing rates in those subjects as well (English, science and socials
studies). We attribute these results to the writing and problem solving
emphasis in IMP, which seemed to have a secondary effect on other subjects.
However, we did not have 8th grade test
scores to compare the two groups and control for cohort effects. Thus, another possible interpretation for
the higher IMP results was that we had “better” kids to start. This was a valid
concern, which we would systematically address in later studies. But for now,
we were encouraged by these early results because none of Philadelphia’s
honors’ students were allowed to take IMP. Also, IMP students within a school
were usual randomly assigned or they self-selected.
3. PSAT comparisons: IMP began in
three special admission schools: Central High School, the second oldest high
school in America and formerly an all boys school, Philadelphia High School for
Girls (still all girls), and Carver High School of Engineering & Science.
These schools draw students from the entire 8th grade student
population in Philadelphia and, as a consequence, attract the top scoring
students. For the principals in these schools, the main issues are SAT and
Advanced Placement scores rather than passing rates. As one principal remarked,
“I’ll consider IMP successful if it can raise SAT (Scholastic Aptitude Test)
scores 30 points by the junior year.”
The SATs were originally designed in the 1930s to be
objective criteria upon which to base admissions to college. As such, its
purpose was to measure the aptitude
of the student that was separate and apart from any previous instructional
influences. It was expressly designed not
to measure achievement per se. The Math
and verbal sections of the SAT, therefore, function more like an intelligence
test, rather than as indicators of achievement in mathematics or literature and
composition. Certainly the design of the Interactive Mathematics Program was
geared to prepare students for deeper mathematical understanding and higher
achievement, not for higher SAT scores per se. Nonetheless, the misuse of the
SATs to compare the effectiveness of schools and curricular programs is a
political reality. We, therefore, had to collect comparative data.
In October 1994, we began to collect PSAT data from our
first Philadelphia IMP cohort of students in Central, Girls and Carver. These
students were sophomores who had completely one year of IMP. They had not been
the highest achieving students in 8th grade. Those students were not
given the option of enrolling in IMP.
Results:
IMP sophomore students outperformed traditionally taught students in all three
special admission schools in both the
math and verbal sections of the PSAT.
(See
PSAT
Score Comparisons 1994, 1995, 1996 in the Appendix)
In
October 1995, the original cohort of IMP students was again analyzed for their
11th grade PSAT scores. Strawberry Mansion, an inner city
comprehensive high school, was also included.
Results:
IMP juniors continued to outperform traditionally taught students in the
special admission schools and one comprehensive school in both the math and verbal sections of the PSAT.
Methodological
Notes:
This analysis does not control for cohort effects, that is, 8th
grade achievement levels, nor for the effects on PSAT students who take more
math courses, particularly algebra in 8th grade. However, for
reasons previously outlined, these biases would tend to favor the traditional
students, many of whom, unlike the IMP students, had taken algebra in the 8th
grade and went on to take geometry in 9th grade and algebra II in 11th
grade. These students were essentially one year ahead of the IMP students.
4. Stanford Achievement Test – 9th
Edition (SAT-9)
In August 1994, David Hornbeck began his six-year tenure as
the Superintendent of the School District of Philadelphia. With backing from
the school board, he initiated a 10-point “Children Achieving” agenda. A key
part of his agenda was greater accountability for student performance. A
criteria referenced version of the Stanford Achievement Test- 9th
edition (SAT-9) was developed to be the centerpiece a new system of
accountability. Three subjects were tested: reading, math, and science. The first “baseline” SAT-9 test was
administered in April, 1996.
The results of the SAT-9 test for
each subject area were separately reported to the district on a 0-99 scale as
normal curve equivalents. The district then converted these scores into a seven
level rubric. A “0” was untested and a “Level 6” was considered “Advanced.”
Then the School District of Philadelphia assigned a numerical weight to each
level: Advanced = 1.2, Proficient = 1.0, Basic = .8, Below Basic III
= .6, Below Basic II = .4, Below Basic I = .2 and Non Tested = 0. These weights were then multiplied by the
percentage of a school’s students who scored in each of these seven levels for
a particular subject. The total point
values for all levels were then added together. This result was the performance
index score for that subject. (Equivalently, the students’ scores reported on
the 0 through 6 scale are multiplied by 20 and then averaged to obtain the
performance index.) For example, if
100% of the students took the test and 100% scored at the advanced level, Level
6, the performance index would be 6 x 20 or 120. On the other hand, if 100% of the students did not take the test
the performance index would be 100% x 0 or 0.
In 1996, the actual math performance index score for all Philadelphia
public high school was 27.2. See the Appendix for the Stanford
9 Test and Performance Index.
The same calculation was also
performed for reading and science. Two additional indices were also developed:
one for student and staff attendance, and another for “student persistence.”
These five index scores were added together and averaged. The result was a
school’s “Performance Index.” The goal was to achieve a Performance Index of 95 in twelve years. A team of central
office personnel reviewed principals whose schools that did not show progress
toward meeting this twelve-year goal. Within a few years, the SAT-9 became a
very high stakes test for building level administrators. However, initially,
many teachers and students did not invest much time in the administration of
the test. Overall, in 1996, half of the students did not take the test. The
exceptions were the special admission schools, such as Central, Girls and
Carver, whose students tend to take all tests more seriously. In the 1996 SAT-9
testing, these schools had a relatively low number of untested students.
Using the March 1996 SAT-9 test results for Central and
Girls High, we did a matched sample comparing IMP juniors to a similar group of
pre-standards, traditionally taught students from these same schools. We
matched by gender, whether they came from public or private elementary schools,
and their 8th grade national percentiles in the math and verbal
sections of the Comprehensive Test of Basic Skills (CTBS). Our sample sizes
were 83 IMP students at Central and 55 IMP students at Girls matched against 83
and 55 traditionally taught students respectively. We analyzed the raw scores
on each item of the math, reading and science sections for the SAT-9.
Results:
At Central High school, IMP students outperformed traditionally taught students
in 21 out of 28 reported SAT-9 multiple-choice categories. Of these Probability
and Functions gain scores were statistically significant. There were 3
categories that were ties. There were 4 categories where the traditionally
taught scores were higher than IMP. Of these, none was statistically
significant.
Results: At Girl High school, IMP students outperformed traditionally taught students in 12 out the 17 math-related sub-scores, tied on two and scored slightly lower on three. IMP students did better on all the cumulative scores and open-ended assessments. Moreover, only 29% of the IMP students’ scores “below basic” compared to 43.6% of the traditionally taught students.
(See Appendix for Central
and Girls' SAT 9 Matched Sample Analysis)
In the Spring 1997 administration of
the SAT-9 test, the percentage of untested students dropped from 49.9% in 1996
to 30.2% in 1997. Accordingly, the math
performance index score rose from 27.2
in 1996 to 36.8 in 1997. In short,
many principals could realize their targeted performance gains for the first
two-year reporting cycle merely by getting more students to be tested.
Using the Spring 1997 data, we
performed another study comparing IMP students to traditionally taught students
in three special admissions schools and from several comprehensive high
schools. We obtained students’ 8th grade CTBS scores to control for
cohort effects. In the appendix 1997
SAT 9 Matched Sample Analysis we present the
school-by-school results for all of the students who were tested. Then we did a
more carefully matched sample study across the magnet or special admission
schools comparing 96 IMP and 96 traditionally taught students, whose average
CTBS scores were nearly equal. We also compared 167 IMP and 167 traditionally
taught students in the four comprehensive high schools whose CTBS scores were
nearly equal. (In both cases, IMP students had slightly lower 8th
grade CTBS scores.) We reported the scores using the school district’s 0-6
level rubric.
Results: IMP students scored higher
than traditionally taught students whether they were in special admission high
schools or comprehensive high schools. This translates into a math performance
score gain of 5.4 points for the
special admission schools and a 5.2
performance point gain for the comprehensive IMP students as matched against a
comparable group of their peers.
In the Spring 1998 administration the SAT-9 test,
principals continued to make a concerted effort to get more of their students
tested and to encourage them to take the test seriously. The percentage of
untested students dropped to 26.2%, and the performance index score for math
rose to 40.0---still 55 points under
the 95-point target. With the third administration of the SAT-9, two realities
emerged: 1) the use of this test as part of a accountability system was
unlikely to go away, and 2) there were ceiling limits on how much gain could be
achieved through improved testing conditions and incentives for students. As a
result, principals had increased incentives to examine changing their
curriculum and instructional methods in the three major subjects tested in
their schools. This gave rise to
practical questions about the extent to which IMP was appropriate for different
sub-groups of students: ESOL, Special
Education, Advanced Coursework, and Mentally Gifted.
By the Spring 1998, there were 10 Philadelphia high schools
that had juniors who took the SAT-9 test. We analyzed SAT-9 results for each
school and then performed a series of aggregate analyses, which are presented
in the appendix as 1998
SAT 9 Charts 1A to 4C. In these 10 schools, we had a total IMP junior student test
population of 407 students and a non-IMP junior test population in those same
schools of 2200 students. We then compared these two groups across nine sub-scores
for the SAT-9 in reading, math and science.
We used the 0-6 rubric scale.
The results: IMP students outperformed non-IMP students in every sub-score, including reading and
science. The math Performance Index for IMP students was 11.8[1] points
higher than for non-IMP students. (See
Chart 1-A)
What if the IMP students were better students in 8th
grade? We then looked at only those students who had 8th grade CTBS
scores, which included 232 IMP and 1,018 non-IMP students. They were roughly
equal in their 8th grade CTBS scores: 778 versus 773. (See Chart
1-B.)
The results: IMP students outperformed non-IMP students in every
sub-score, including reading and science. The math Performance Index for IMP
students was 11.4 points higher than
for non-IMP students. (See Chart 1-C).
We then separated out all the students who had taken
advanced course work in math, all the special education students, mentally
gifted and students for whom English was not the primary language. We were left
with 355 IMP students and 1,672 non-IMP students.
The results: IMP students outperformed non-IMP students in every
sub-score, including reading and science. The math Performance Index for IMP
students was 16.8 points higher than
for non-IMP students. (See Chart 2-A.)
From this group, we further sorted
out all those students who had 8th grade CTBS scores. This reduced the number
to 194 IMP students versus 761 non-IMP students. The IMP students were somewhat
higher in 8th grade CTBS scores, 771 versus 758. (See Chart 2-B.) We then analyzed this sub-group’s SAT-9 scores.
The results: IMP students outperformed non-IMP students in every
sub-score, including reading and science. The math Performance Index for IMP
students was 15.6 points higher than
for non-IMP students. (See Chart 2-C.)
Next we took the original population
(Chart 1-A) and analyzed only those students from Central, Girls and Carver
high schools. We excluded the mentally gifted, ESOL, and those who had advanced
coursework. This left us with a total of 112 IMP students and 430 non-IMP
students across all three of these special admission schools.
From this special admission school
population we then took only those students who had 8th grade CTBS
scores. This reduced the sample to 56 IMP students and 178 non-IMP students who
scored 812 and 804 respectively on their CTBS tests. (See Chart 3-B.)
The results: IMP students outperformed non-IMP students in every
sub-score, including reading and science. The math Performance Index for IMP
students was 7.8 points higher than
for non-IMP students. (See Chart 3-A.)[3]
We did a similar analysis, as
displayed in Charts 3-A, 3-B, and 3-C, for the comprehensive school students.
This population, regardless of whether they were IMP or non-IMP students,
scored substantially below their special admission school peers. We aggregated
all the comprehensive students, but excluded the ESOL, special education and
advanced students. This left us with a sample of 243 IMP students and 1,212
non-IMP students.
The results: IMP students outperformed non-IMP students in every sub-score,
including reading and science. The math Performance Index for IMP students was 18.6 points higher than for non-IMP
students. (See Chart 4-A.)
We further narrowed the sample by
selecting only those students who had 8th grade CTBS scores. This gave
us a reduced sample of 138 IMP students vs. 568 non-IMP students. Their CTBS
scores were 755 and 742 respectively. (See Chart 4-B.)
The results: IMP students outperformed non-IMP students in every
sub-score, including reading and science. The math Performance Index for IMP
students was 17.4 points higher than
for non-IMP students. (See Chart 4-C.)
ANCOVA study of Girls High data: One methodological
problem of using matched samples or matched pairs to compare IMP to non-IMP
students is the potential criticism that
in choosing two equally good non-IMP “matches” to a particular IMP student, the
reader does not know if the researcher biased the results by selecting the
weaker non-IMP match.
The advantage of doing an ANCOVA is that we can use all of
the data even if the IMP and non-IMP groups are not perfectly matched in either
their 8th grade test performance or the respective sample
sizes. In an ANCOVA, we used
essentially two regression lines. The first plotted each SAT-9 score (using
normal curve equivalents, which ranged from 0-99) as a function of each
student's 8th grade CTBS score.
This gave us a “line of best fit” for the non-IMP students. If there
were no predicative power in knowing a student’s CTBS score, the line would be
flat, i.e. horizontal. How closely the points cluster about the line determines
the strength of the linear relationship.
We also did a line of best fit for the IMP students at Girls High.
If the IMP students’ SAT-9 scores
were really better than the non-IMP students (after controlling for 8th
grade scores) then the IMP line of best fit would lie above the non-IMP
line. Moreover, by comparing the
“slopes” of each line one could tell if there were any trends between higher
and lower initially achieving students. For example, if the IMP line were above
the non-IMP line and if the distance between the two lines progressively
increased as the 8th grade scores increased, it would indicate that
the higher scoring students benefited the most from taking IMP.
Results: Girl’s IMP 11th grade
students significantly outperformed their non-IMP counterparts on the SAT-9.
Furthermore, there was a trend such that the higher achieving IMP students
tended to perform did even better. (See 1998
Girls' Ancova Study)
Methodological
Note: Because the test items of the CTBS and the SAT-9 are not aligned,
our
preference would be to use students’ 8th grade Stanford-9 scores as
baseline data and then compare their 11th grade Stanford-9 scores.
The earliest we could have done this analysis was with the 1995-96 cohort of 8th
graders who would have taken the 8th grade version of the SAT-9 in
April 1996 (its first year in Philadelphia) and the 11th grade
version three years later in April 1999.
As of the date of this report, we are currently in the process of
conducting such an analysis, and we hope to present the results in PART II of
this research report.
6. Ivy League University Math Exit Test (an undergraduate degree requirement). For college bound
students, the questions we have heard, beyond college admissions issues, center
around how well students will be prepared for higher-education coursework. Will high school students who take programs
like IMP suffer once they get into college because of IMP? Will they take
disportionately more remedial math courses? Will their choices for technical
majors be restricted? How will they fare in a traditional college math course?
What if they are not permitted to use graphics calculators? Will their GPAs suffer as a result?
While these questions are legitimate
concerns of parents and educators, they are asked against an existing backdrop
of statistics. We know there has already existed to varying degrees the need
for math remediation in four-year and two-year colleges. In part this is due to
more students aspiring to college and to higher requirements for math
achievement. Prior to programs such an IMP, there has been a chronic shortage
of American college students pursuing technical majors, as well as high college
drop out rates. Thus, a key research
problem is correct attribution. For
example, one Philadelphia math department head criticized IMP for supposedly
producing students who needed remedial math at a nationally known Philadelphia
university where he taught as an adjunct calculus instructor. But when he was
asked to distribute a survey to his 200 incoming freshman, it turned out than
none of these students had been IMP students, nor had any participated in any
other NSF curriculum math program.
Undoubtedly there will be IMP high school students who
later need to take remedial college math. Some students are weak regardless of
their math program in high school or have chronically poor study habits. Some
IMP students will shy away from technical majors, will not fare well in
traditional college courses, will have low GPA scores and may even drop
out. In short, there will be students
who fare no better than the status quo.
The real question is whether IMP students taken as a group will
have a greater tendency to do better or worse in college than their
non-IMP peers, all things being equal.
To get a quantitative handle on the
issue of college preparedness, we invited Norman Webb, a nationally renowned
educational researcher from the University of Wisconsin, to conduct a
study. Dr. Webb selected ten questions
from a quantitative reasoning test given to undergraduate college students at
an ivy-league university. Among the topics covered by the questions were
statistical reasoning and interpretations of graphs. We then administrated
these to 150 students from Central High School in June 1996 who had just
completed their junior year. Of these 150 students, 91 had been enrolled in IMP
for three years. (This was IMP’s first student cohort.) 59 students were enrolled in a traditional
Algebra II class. Each group’s 8th
grade CTBS scores were compared and found to be nearly equal.
Results: The IMP students
got a little more than 50% of the questions correct whereas the non-IMP
students got less than 25% of the questions correct. This difference in performance was statistically significant to
the p < .0001 level.
Dr. Webb and his colleague Maritza Dowling performed a more
in-depth precise matched-group analysis with an accompanying report, which is
included in the appendix. A copy of the
test used is also included in the appendix. College
Quantitive Reasoning Study
Conclusions
The Interactive Mathematics Program
curriculum materials, however well crafted, will not by themselves yield
increased student achievement. It is not a teacher proof curriculum, and so
this program is not about buying new textbooks. It is rather a profoundly
intense, multi-year professional development program exploring rich, varied and
meaningful mathematical content and subtle yet profound pedagogical techniques
to promote deeper student understanding. When implemented systemically, IMP can
significantly enhance student understanding of and competence in mathematics.
IMP also has been demonstrated to have collateral benefit in reading and
science as well.
[1] This is calculated by taking the difference in the “Math Composite scores” (3.09 vs. 2.5) and multiplying it by 20.
[2] The aggregate IMP Reading Opened subscore is lower than the non-IMP does. However, in each of the three schools taken separately, IMP students had, in fact, higher opened ended reading subscores.
[3]Ibid.