Philadelphia IMP Research Summary

Early Philadelphia Data: 1994, 1995, and 1996.


1. Student Attitudinal Surveys, 1994:  As a new program, which was completely voluntary both for the IMP teachers and students, the first question we wanted to know was whether the 9th grade students who chose to take IMP would want to continue with the program. If no students wanted to continue, we could not run the program.  To gauge student interest, we developed a short questionnaire on content and pedagogy and gave it to students at the end of their first year, in May 1994.  We wanted to know whether there were any large differences between different types of schools. We had started with three high achieving “special admission schools,” one vo-tech, and two neighborhood comprehensive schools.

The result: students overwhelming preferred IMP to algebra classes and indicated they would continue with the course.  (See the Ninth Grade Student Attitudes report in the Appendix.) 


2. Passing Rate and Attendance Data: Principals in Philadelphia’s twenty-two comprehensive high schools face the problems of low student passing rates and a year-by-year loss of students who drop out of school.  In most of these high schools, the graduating class is less than half of what it was in ninth grade.  As IMP expanded into these comprehensive high schools, the first question was whether IMP students had higher passing and attendance rates than did students using pre-standards materials.

The results: By and large, IMP students had consistently higher passing and attendance rates in all of the comprehensive schools, not only in math, but English, social studies and science compared with their peers in the same schools.  (See Passing Rate Comparisons 1995, 1996, 1997)

Methodological note: To guard against the claim that IMP teachers were liberal graders, thereby inflating the IMP passing grades, we also looked at other subjects and found higher passing rates in those subjects as well (English, science and socials studies). We attribute these results to the writing and problem solving emphasis in IMP, which seemed to have a secondary effect on other subjects.

However, we did not have 8th grade test scores to compare the two groups and control for cohort effects.  Thus, another possible interpretation for the higher IMP results was that we had “better” kids to start. This was a valid concern, which we would systematically address in later studies. But for now, we were encouraged by these early results because none of Philadelphia’s honors’ students were allowed to take IMP. Also, IMP students within a school were usual randomly assigned or they self-selected.


3. PSAT comparisons:  IMP began in three special admission schools: Central High School, the second oldest high school in America and formerly an all boys school, Philadelphia High School for Girls (still all girls), and Carver High School of Engineering & Science. These schools draw students from the entire 8th grade student population in Philadelphia and, as a consequence, attract the top scoring students. For the principals in these schools, the main issues are SAT and Advanced Placement scores rather than passing rates. As one principal remarked, “I’ll consider IMP successful if it can raise SAT (Scholastic Aptitude Test) scores 30 points by the junior year.” 

The SATs were originally designed in the 1930s to be objective criteria upon which to base admissions to college. As such, its purpose was to measure the aptitude of the student that was separate and apart from any previous instructional influences. It was expressly designed not to measure achievement per se.  The Math and verbal sections of the SAT, therefore, function more like an intelligence test, rather than as indicators of achievement in mathematics or literature and composition. Certainly the design of the Interactive Mathematics Program was geared to prepare students for deeper mathematical understanding and higher achievement, not for higher SAT scores per se. Nonetheless, the misuse of the SATs to compare the effectiveness of schools and curricular programs is a political reality. We, therefore, had to collect comparative data.

In October 1994, we began to collect PSAT data from our first Philadelphia IMP cohort of students in Central, Girls and Carver. These students were sophomores who had completely one year of IMP. They had not been the highest achieving students in 8th grade. Those students were not given the option of enrolling in IMP.

Results: IMP sophomore students outperformed traditionally taught students in all three special admission schools in both the math and verbal sections of the PSAT.

(See PSAT Score Comparisons 1994, 1995, 1996 in the Appendix)


In October 1995, the original cohort of IMP students was again analyzed for their 11th grade PSAT scores. Strawberry Mansion, an inner city comprehensive high school, was also included.  

 Results: IMP juniors continued to outperform traditionally taught students in the special admission schools and one comprehensive school in both the math and verbal sections of the PSAT.


Methodological Notes: This analysis does not control for cohort effects, that is, 8th grade achievement levels, nor for the effects on PSAT students who take more math courses, particularly algebra in 8th grade. However, for reasons previously outlined, these biases would tend to favor the traditional students, many of whom, unlike the IMP students, had taken algebra in the 8th grade and went on to take geometry in 9th grade and algebra II in 11th grade. These students were essentially one year ahead of the IMP students.


Later Philadelphia Data: 1996, 1997, 1998.

4. Stanford Achievement Test – 9th Edition  (SAT-9)

In August 1994, David Hornbeck began his six-year tenure as the Superintendent of the School District of Philadelphia. With backing from the school board, he initiated a 10-point “Children Achieving” agenda. A key part of his agenda was greater accountability for student performance. A criteria referenced version of the Stanford Achievement Test- 9th edition (SAT-9) was developed to be the centerpiece a new system of accountability. Three subjects were tested: reading, math, and science.  The first “baseline” SAT-9 test was administered in April, 1996.

            The results of the SAT-9 test for each subject area were separately reported to the district on a 0-99 scale as normal curve equivalents. The district then converted these scores into a seven level rubric. A “0” was untested and a “Level 6” was considered “Advanced.” Then the School District of Philadelphia assigned a numerical weight to each level: Advanced = 1.2, Proficient = 1.0, Basic = .8, Below Basic III = .6, Below Basic II = .4, Below Basic I = .2 and Non Tested = 0.  These weights were then multiplied by the percentage of a school’s students who scored in each of these seven levels for a particular subject.  The total point values for all levels were then added together. This result was the performance index score for that subject. (Equivalently, the students’ scores reported on the 0 through 6 scale are multiplied by 20 and then averaged to obtain the performance index.)  For example, if 100% of the students took the test and 100% scored at the advanced level, Level 6, the performance index would be 6 x 20 or 120.  On the other hand, if 100% of the students did not take the test the performance index would be 100% x 0 or 0.  In 1996, the actual math performance index score for all Philadelphia public high school was 27.2.  See the Appendix for the Stanford 9 Test and Performance Index.

            The same calculation was also performed for reading and science. Two additional indices were also developed: one for student and staff attendance, and another for “student persistence.” These five index scores were added together and averaged. The result was a school’s “Performance Index.” The goal was to achieve a Performance Index of 95 in twelve years. A team of central office personnel reviewed principals whose schools that did not show progress toward meeting this twelve-year goal. Within a few years, the SAT-9 became a very high stakes test for building level administrators. However, initially, many teachers and students did not invest much time in the administration of the test. Overall, in 1996, half of the students did not take the test. The exceptions were the special admission schools, such as Central, Girls and Carver, whose students tend to take all tests more seriously. In the 1996 SAT-9 testing, these schools had a relatively low number of untested students.

Using the March 1996 SAT-9 test results for Central and Girls High, we did a matched sample comparing IMP juniors to a similar group of pre-standards, traditionally taught students from these same schools. We matched by gender, whether they came from public or private elementary schools, and their 8th grade national percentiles in the math and verbal sections of the Comprehensive Test of Basic Skills (CTBS). Our sample sizes were 83 IMP students at Central and 55 IMP students at Girls matched against 83 and 55 traditionally taught students respectively. We analyzed the raw scores on each item of the math, reading and science sections for the SAT-9. 

Results: At Central High school, IMP students outperformed traditionally taught students in 21 out of 28 reported SAT-9 multiple-choice categories. Of these Probability and Functions gain scores were statistically significant. There were 3 categories that were ties. There were 4 categories where the traditionally taught scores were higher than IMP. Of these, none was statistically significant.


Results: At Girl High school, IMP students outperformed traditionally taught students in 12 out the 17 math-related sub-scores, tied on two and scored slightly lower on three. IMP students did better on all the cumulative scores and open-ended assessments.  Moreover, only 29% of the IMP students’ scores “below basic” compared to 43.6% of the traditionally taught students.


(See Appendix for Central and Girls' SAT 9 Matched Sample Analysis)


            In the Spring 1997 administration of the SAT-9 test, the percentage of untested students dropped from 49.9% in 1996 to 30.2% in 1997.  Accordingly, the math performance index score rose from 27.2 in 1996 to 36.8 in 1997. In short, many principals could realize their targeted performance gains for the first two-year reporting cycle merely by getting more students to be tested.

           Using the Spring 1997 data, we performed another study comparing IMP students to traditionally taught students in three special admissions schools and from several comprehensive high schools. We obtained students’ 8th grade CTBS scores to control for cohort effects. In the appendix 1997 SAT 9 Matched Sample Analysis we present the school-by-school results for all of the students who were tested. Then we did a more carefully matched sample study across the magnet or special admission schools comparing 96 IMP and 96 traditionally taught students, whose average CTBS scores were nearly equal. We also compared 167 IMP and 167 traditionally taught students in the four comprehensive high schools whose CTBS scores were nearly equal. (In both cases, IMP students had slightly lower 8th grade CTBS scores.) We reported the scores using the school district’s 0-6 level rubric.

Results: IMP students scored higher than traditionally taught students whether they were in special admission high schools or comprehensive high schools. This translates into a math performance score gain of 5.4 points for the special admission schools and a 5.2 performance point gain for the comprehensive IMP students as matched against a comparable group of their peers.


In the Spring 1998 administration the SAT-9 test, principals continued to make a concerted effort to get more of their students tested and to encourage them to take the test seriously. The percentage of untested students dropped to 26.2%, and the performance index score for math rose to 40.0---still 55 points under the 95-point target. With the third administration of the SAT-9, two realities emerged: 1) the use of this test as part of a accountability system was unlikely to go away, and 2) there were ceiling limits on how much gain could be achieved through improved testing conditions and incentives for students. As a result, principals had increased incentives to examine changing their curriculum and instructional methods in the three major subjects tested in their schools.  This gave rise to practical questions about the extent to which IMP was appropriate for different sub-groups of students:  ESOL, Special Education, Advanced Coursework, and Mentally Gifted.

By the Spring 1998, there were 10 Philadelphia high schools that had juniors who took the SAT-9 test. We analyzed SAT-9 results for each school and then performed a series of aggregate analyses, which are presented in the appendix as 1998 SAT 9 Charts 1A to 4C.  In these 10 schools, we had a total IMP junior student test population of 407 students and a non-IMP junior test population in those same schools of 2200 students. We then compared these two groups across nine sub-scores for the SAT-9 in reading, math and science.  We used the 0-6 rubric scale.

The results: IMP students outperformed non-IMP students in every sub-score, including reading and science. The math Performance Index for IMP students was 11.8[1] points higher than for non-IMP students. (See Chart 1-A)


What if the IMP students were better students in 8th grade? We then looked at only those students who had 8th grade CTBS scores, which included 232 IMP and 1,018 non-IMP students. They were roughly equal in their 8th grade CTBS scores: 778 versus 773. (See Chart 1-B.)


The results: IMP students outperformed non-IMP students in every sub-score, including reading and science. The math Performance Index for IMP students was 11.4 points higher than for non-IMP students. (See Chart 1-C).


We then separated out all the students who had taken advanced course work in math, all the special education students, mentally gifted and students for whom English was not the primary language. We were left with 355 IMP students and 1,672 non-IMP students. 

The results: IMP students outperformed non-IMP students in every sub-score, including reading and science. The math Performance Index for IMP students was 16.8 points higher than for non-IMP students. (See Chart 2-A.)


            From this group, we further sorted out all those students who had 8th grade CTBS scores. This reduced the number to 194 IMP students versus 761 non-IMP students. The IMP students were somewhat higher in 8th grade CTBS scores, 771 versus 758.  (See Chart 2-B.)  We then analyzed this sub-group’s SAT-9 scores.

The results: IMP students outperformed non-IMP students in every sub-score, including reading and science. The math Performance Index for IMP students was 15.6 points higher than for non-IMP students. (See Chart 2-C.)


            Next we took the original population (Chart 1-A) and analyzed only those students from Central, Girls and Carver high schools. We excluded the mentally gifted, ESOL, and those who had advanced coursework. This left us with a total of 112 IMP students and 430 non-IMP students across all three of these special admission schools.

The results: IMP students outperformed non-IMP students in every sub-score, including reading and science. The math Performance Index for IMP students was 7.8 points higher than for non-IMP students. (See Chart 3-A.)[2]


            From this special admission school population we then took only those students who had 8th grade CTBS scores. This reduced the sample to 56 IMP students and 178 non-IMP students who scored 812 and 804 respectively on their CTBS tests. (See Chart 3-B.)

The results: IMP students outperformed non-IMP students in every sub-score, including reading and science. The math Performance Index for IMP students was 7.8 points higher than for non-IMP students. (See Chart 3-A.)[3]


            We did a similar analysis, as displayed in Charts 3-A, 3-B, and 3-C, for the comprehensive school students. This population, regardless of whether they were IMP or non-IMP students, scored substantially below their special admission school peers. We aggregated all the comprehensive students, but excluded the ESOL, special education and advanced students. This left us with a sample of 243 IMP students and 1,212 non-IMP students. 

The results: IMP students outperformed non-IMP students in every sub-score, including reading and science. The math Performance Index for IMP students was 18.6 points higher than for non-IMP students. (See Chart 4-A.)


            We further narrowed the sample by selecting only those students who had 8th grade CTBS scores. This gave us a reduced sample of 138 IMP students vs. 568 non-IMP students. Their CTBS scores were 755 and 742 respectively. (See Chart 4-B.)

The results: IMP students outperformed non-IMP students in every sub-score, including reading and science. The math Performance Index for IMP students was 17.4 points higher than for non-IMP students. (See Chart 4-C.)



ANCOVA study of Girls High data: One methodological problem of using matched samples or matched pairs to compare IMP to non-IMP students is the potential criticism that in choosing two equally good non-IMP “matches” to a particular IMP student, the reader does not know if the researcher biased the results by selecting the weaker non-IMP match.

            The advantage of doing an ANCOVA is that we can use all of the data even if the IMP and non-IMP groups are not perfectly matched in either their 8th grade test performance or the respective sample sizes.  In an ANCOVA, we used essentially two regression lines. The first plotted each SAT-9 score (using normal curve equivalents, which ranged from 0-99) as a function of each student's 8th grade CTBS score.  This gave us a “line of best fit” for the non-IMP students. If there were no predicative power in knowing a student’s CTBS score, the line would be flat, i.e. horizontal. How closely the points cluster about the line determines the strength of the linear relationship.  We also did a line of best fit for the IMP students at Girls High.  

            If the IMP students’ SAT-9 scores were really better than the non-IMP students (after controlling for 8th grade scores) then the IMP line of best fit would lie above the non-IMP line.  Moreover, by comparing the “slopes” of each line one could tell if there were any trends between higher and lower initially achieving students. For example, if the IMP line were above the non-IMP line and if the distance between the two lines progressively increased as the 8th grade scores increased, it would indicate that the higher scoring students benefited the most from taking IMP.

Results: Girl’s IMP 11th grade students significantly outperformed their non-IMP counterparts on the SAT-9. Furthermore, there was a trend such that the higher achieving IMP students tended to perform did even better. (See 1998 Girls' Ancova Study)  


Methodological Note: Because the test items of the CTBS and the SAT-9 are not aligned,

our preference would be to use students’ 8th grade Stanford-9 scores as baseline data and then compare their 11th grade Stanford-9 scores. The earliest we could have done this analysis was with the 1995-96 cohort of 8th graders who would have taken the 8th grade version of the SAT-9 in April 1996 (its first year in Philadelphia) and the 11th grade version three years later in April 1999.  As of the date of this report, we are currently in the process of conducting such an analysis, and we hope to present the results in PART II of this research report.   


6. Ivy League University Math Exit Test (an undergraduate degree requirement). For college bound students, the questions we have heard, beyond college admissions issues, center around how well students will be prepared for higher-education coursework.  Will high school students who take programs like IMP suffer once they get into college because of IMP? Will they take disportionately more remedial math courses? Will their choices for technical majors be restricted? How will they fare in a traditional college math course? What if they are not permitted to use graphics calculators?  Will their GPAs suffer as a result?  

            While these questions are legitimate concerns of parents and educators, they are asked against an existing backdrop of statistics. We know there has already existed to varying degrees the need for math remediation in four-year and two-year colleges. In part this is due to more students aspiring to college and to higher requirements for math achievement. Prior to programs such an IMP, there has been a chronic shortage of American college students pursuing technical majors, as well as high college drop out rates.  Thus, a key research problem is correct attribution.   For example, one Philadelphia math department head criticized IMP for supposedly producing students who needed remedial math at a nationally known Philadelphia university where he taught as an adjunct calculus instructor. But when he was asked to distribute a survey to his 200 incoming freshman, it turned out than none of these students had been IMP students, nor had any participated in any other NSF curriculum math program.

Undoubtedly there will be IMP high school students who later need to take remedial college math. Some students are weak regardless of their math program in high school or have chronically poor study habits. Some IMP students will shy away from technical majors, will not fare well in traditional college courses, will have low GPA scores and may even drop out.  In short, there will be students who fare no better than the status quo.  The real question is whether IMP students taken as a group will have a greater tendency to do better or worse in college than their non-IMP peers, all things being equal.  

            To get a quantitative handle on the issue of college preparedness, we invited Norman Webb, a nationally renowned educational researcher from the University of Wisconsin, to conduct a study.   Dr. Webb selected ten questions from a quantitative reasoning test given to undergraduate college students at an ivy-league university. Among the topics covered by the questions were statistical reasoning and interpretations of graphs. We then administrated these to 150 students from Central High School in June 1996 who had just completed their junior year. Of these 150 students, 91 had been enrolled in IMP for three years. (This was IMP’s first student cohort.)  59 students were enrolled in a traditional Algebra II class.  Each group’s 8th grade CTBS scores were compared and found to be nearly equal.

Results: The IMP students got a little more than 50% of the questions correct whereas the non-IMP students got less than 25% of the questions correct.  This difference in performance was statistically significant to the p < .0001 level.


Dr. Webb and his colleague Maritza Dowling performed a more in-depth precise matched-group analysis with an accompanying report, which is included in the appendix.  A copy of the test used is also included in the appendix. College Quantitive Reasoning Study



            The Interactive Mathematics Program curriculum materials, however well crafted, will not by themselves yield increased student achievement. It is not a teacher proof curriculum, and so this program is not about buying new textbooks. It is rather a profoundly intense, multi-year professional development program exploring rich, varied and meaningful mathematical content and subtle yet profound pedagogical techniques to promote deeper student understanding. When implemented systemically, IMP can significantly enhance student understanding of and competence in mathematics. IMP also has been demonstrated to have collateral benefit in reading and science as well.


[1] This is calculated by taking the difference in the “Math Composite scores” (3.09 vs. 2.5) and multiplying it by 20.

[2]  The aggregate IMP Reading Opened subscore is lower than the non-IMP does. However, in each of the three schools taken separately, IMP students had, in fact, higher opened ended reading subscores.


Next Page