Why educational measurement is difficult




















These possibly significant factors are one reason why the model presented in Chapter 4 includes adjunct usage as an explicit factor. A survey of the chief financial officers CFOs of colleges by The Chronicle of Higher Education revealed their view that the most effective cost-cutting or revenue-raising strategies are to raise teaching loads and increase tuition.

These initiatives resulted in the release of performance data identifying faculty teaching loads versus the cost to keep faculty members employed. The basic idea was to examine the number of students taught by an individual faculty member relative to the cost borne by the university in faculty salaries, benefits, and overhead. In the current atmosphere of accountability for public funds, this kind of measure of faculty performance will be used synonymously with faculty productivity, even though William Powers, president of the.

Faculty quality is often measured by grades and student evaluations. However, these outcomes can be heavily influenced by external factors which make it difficult for institutions to ascertain the contribution of faculty quality toward student success. In a controlled study by Carrell and West , U. Air Force Academy students were randomly assigned to a permanent or a part-time instructor. Student grades over a course sequence were analyzed to evaluate teacher quality.

The study found that the students taught by part-time instructors more specifically, less experienced instructors who did not possess terminal degrees received better grades in the lower level course taught Calculus I. The National Research Council b report on measuring quality of research doctoral programs also outlined assessment methods for the quality of faculty involved in Ph. Even though the NRC report addressed a complicated issue, it emphasized measuring faculty quality as it pertains to research-doctoral programs in four-year research universities.

The absence of guidelines on measuring quality of instructional faculty in four-year universities and community colleges was attributed to the trend of relying on wages earned as a proxy of faculty quality.

We have made the point that higher education produces multiple outputs. Even for those concerned primarily with the instructional component, looking narrowly at the production of four-year degrees may be inadequate because degrees are far from homogeneous.

Even with full information, weights that would be applied to these characteristics would still entail subjective assessments. We emphasize, explicitly and unapologetically, that adjusting degrees or otherwise defined units of higher education output by a quantitative quality index is not feasible at the present time. However, it is possible to begin dealing with the problem through classification of institutions by type and mission and then, as described in Chapter 4 , by seeking to assure that quality within each segment is being regularly assessed and at least roughly maintained.

When considering a productivity metric focusing on instruction, objectively measurable outputs such as credit hours earned and number of degrees granted represent the logical starting point; however, the quality problem arises almost immediately since these can be expected to differ across courses, programs, and institutions.

While universities often use credit hours as a measure of the importance and difficulty of each course, these quality adjustments are incomplete because they do not reflect the full course value to students and to society. The University of Toronto has a differential tuition price policy that makes the expected relative public to private benefit of the degree one of the criteria for determining the level of public subsidy vs.

From an economic perspective, it makes sense to base tuition on the expected value of the major and the costs. One practical problem is that lower income students may be excluded from majors with high costs but high return. In addition, the needs of states may call for training students in areas that are high cost but provide limited economic return to students.

Another problem is that the policy could lead to the production of the wrong kind of degrees over different time frames for example, swings in demand for nurses. On the other hand, if infinite cross-subsidization is not endorsed or feasible, and if cross subsidies have gone beyond what is seen as a reasonable level, then students may be required to bear a larger portion of costs. Market-oriented assessments of educational output, with attention to how salary effects vary by area of study and by institutional quality, have been explored in the economics literature.

Despite work in this area, many tough issues remain even if the goal is to estimate only the economic returns to education. How can wage data best be used for such calculations, and what is most important: first job, salary five years out, or discounted lifetime earnings?

Perhaps most importantly, student characteristics, demographic heterogeneity, accessibility and opportunities, and other factors affecting earnings must be controlled for in these kinds of economic studies. The reason full quality adjustment of the output measure is still a futuristic idea is that much research is still needed to make headway on these issues.

The literature certainly offers evidence of the effects of these variables, but using precise coefficients in a productivity measure requires a higher level of confidence than can be gleaned from this research. Beyond measures of credits and degrees produced, and their associated wage effects, is the goal of measuring the value added of student learning.

That is, earning a baccalaureate degree without acquiring the knowledge, skills, and competencies required to function effectively in the labor market and in society is a hollow accomplishment. Indicators are thus needed of the quality of the degree represented by, for example, the amount of learning that has taken. Ignoring measures of learning outcomes or student engagement while, perhaps, emphasizing graduation rates may result in misleading conclusions about institutional performance and ill-informed policy prescriptions.

Is it acceptable for a school to have a high graduation rate but low engagement and outcomes scores? Or are individual and public interests both better served by institutions where students are academically challenged and demonstrate skills and competencies at a high level, even if fewer graduate?

Strong performance in the areas of engagement, achievement, and graduation are certainly not mutually exclusive, but each says something different about institutional performance and student development. That is, students bear a major responsibility for any gains derived from their postsecondary experience.

Motivation is also a nontrivial factor in accounting for post-college differences in income once institutional variables such as selectivity are controlled Pascarella and Terenzini, The CLA is most specifically designed to measure valued added at the institutional level between the freshman and senior years.

Nonetheless, it is important to work through the logic of which kinds of measures are relevant to which kinds of questions. The above kinds of assessments show that even identical degrees may represent different quantities of education produced if, for example, one engineering graduate started having already completed Advanced Placement calculus and physics while another entered with a remedial math placement.

Modeling approaches have been developed to estimate time to degree and other potentially. Useful discussions of the merits of assessment tests are provided in Carpenter and Bach and Ewell a. These take into account entering student ability as represented by pre-college achievement scores ACT, SAT and prior academic performance, other student characteristics such as enrollment status full- or part-time , transfer status, and financial need Wellman, Popular proxies for institutional quality such as rankings are flawed for the purpose of estimating educational productivity.

The major limitation of most rankings and especially that of U. As an illustration of the limitations of most ranking systems, only one number is needed to accurately predict where an institution ranks in U. The correlation between U.

This is not to say that selectivity is unrelated to college quality. Being in the company of highly able people has salutary direct effects on how students spend their time and what they talk about.

Hoxby , , has quantified the returns to education and shown that the setting of highly selective schools contributes to the undergraduate education of at least some subsets of students. More recently, Bowen, Chingos, and McPherson present evidence that institutional selectivity is strongly correlated with completion rates, controlling for differences in the quality and demographics of enrolled students as well as factors such as per student educational expenditures.

The authors argue that students do best, in terms of completion rates, when they attend the most selective schools that will accept them, due in part to peer effects. At the same time, research shows that other factors are important to desired outcomes of college. These include working collaboratively with peers. Longitudinal data from the National Study of Student Learning and cross-sectional results from the NSSE show that institutional selectivity is a weak indicator of student exposure to good practices in undergraduate education—practices such as whether faculty members clearly articulate course objectives, use relevant examples, identify key points, and provide class outlines Kuh and Pascarella, These kinds of practices and experiences are arguably much more important to college quality than enrolled student ability alone.

This is consistent with the substantial body of evidence showing that the selectivity of the institution contributes minimally to learning and cognitive growth during college Pascarella and Terenzini, As Pascarella concluded,. Other measures of educational quality are worth considering, given the increasing diversity of college students and their multiple, winding pathways to a baccalaureate degree.

These could include goal attainment, course retention, transfer rates and success, success in subsequent course work, year-to-year persistence, degree or certificate completion, student and alumni satisfaction with the college experience, student personal and professional development, student involvement and citizenship, and postcollegiate outcomes, such as graduate school participation, employment, and a capacity for lifelong learning.

Measures of success in subsequent coursework are especially important for students who have been historically underrepresented in specific majors and for institutions that provide remedial education. Participation in high-impact activities—such as first-year seminars, learning communities, writing-intensive courses, common intellectual experiences, service learning, diversity experiences, student-faculty research, study abroad, internships and other field placements, and senior capstone experiences—might also be useful indicators of quality, as they tend to be associated with high levels of student effort and deep learning Kuh, ; Swaner and Brownell, The two most relevant points for thinking about how to introduce explicit quality adjustment into higher education output measures may be summarized as follows:.

Adding to the complexity of productivity measurement is the fact that various policy and administrative actions require information aggregated at a number of different levels. Institution and state level measures are frequently needed for policy and are relevant to the development of administrative strategies. A major motivation for analyzing performance at these levels is that policy makers and the public want to know which institutions and which systems are performing better and how their processes can be replicated.

Prospective students and their parents also want to know which institutions are good values. As we have repeatedly pointed out, for many purposes, it is best to compare institutions of the same type. A course can be envisioned as the atomistic element of learning production, and the basic building block of productivity measurement at the micro level. For example, this may be expressed as the number of semester credits produced from a given number of faculty hours teaching.

However, increasingly, courses themselves can be broken down further to examine quantitative and qualitative aspects within the course or classroom unit Twigg, Classroom technology is changing rapidly.

The introduction of scalable technologies is important, as are the effects of class size and technology. The technology of how education is delivered across and within categories disciplines, institutions, etc. Flagship state universities often have big classes while private colleges often have smaller ones. The latter is almost certainly more expensive on a per unit of output basis; less is known about the quality of the outcome.

Those students that can make college choices based on tradeoffs in price and perceived quality offered by the range of options. Adding to the complexity is the faculty mix, including the use of graduate student instructors or adjunct faculty.

This may also affect cost and quality of delivering credit hours. A growing body of research and an increasing number of programs assess efficiencies at the course level seeking cost, quality tradeoffs that can be exploited. For example, the National Center for Academic Transformation NCAT develops programs for institutions to improve efficiency in production of higher education through course redesign.

Course redesign is not just about putting courses online, but rather rethinking the way instruction is delivered in light of the possibilities that technology offers.

NCAT reports that, on average, costs were reduced by 37 percent in redesigned courses with a range of 9 to 77 percent. Meanwhile, learning outcomes improved in 72 percent of the redesigned courses, with the remaining 28 percent producing learning equivalent to traditional formats.

Appendix B to this volume provides a description of how NCAT measures comparative quality and cost of competing course design models. For some purposes, an academic department or program is a more appropriate unit of analysis. Despite these advantages, department-based analysis is inappropriate for determining sector-based productivity statistics.

One difficulty is that it is not easy to compare institutions based on their departmental structures. Already a member? Publications Pages Publications Pages. Subscriber sign in You could not be signed in, please check and try again. Username Please enter your Username. Password Please enter your Password. Forgot password? Don't have an account? Sign in via your Institution. You could not be signed in, please check and try again. Sign in with your library card Please enter your library card number.

Related Articles Expand or collapse the "related articles" section about About Related Articles close popup. Competencies in creativity, social-emotional learning, citizenship, and health should be assessed for the same reasons that reading, writing, and math are assessed — to provide relevant, specific information about student learning in these vital areas.

Assessment is a process of gathering information that reflects how well a student, classroom, school, or school system is doing against a set of purposes, learning criteria, or curricula Ontario, Assessment of these competencies is complex, and we cannot rely on the tools and strategies typically used to assess other skills or knowledge.

It is possible to assess these competencies at a jurisdictional level, however standardized assessments or surveys can only give information of limited quality about complex competencies.

The complexity of assessment Across the world, educators, policy-makers, and experts agree that student success in both school and life includes more than literacy and numeracy skills and academic content knowledge e. Many education systems are endeavouring to embed broader competencies — referred to as everything from 21st century skills to global competencies — into curricula, outcome expectations, and assessment strategies.

Some of these competencies are related to a process rather than a product, and some are more likely to be observed through social interactions. Since , educators from across Ontario have been field testing the use of a set of concrete, observable competencies in health, creativity, social-emotional learning, and citizenship.

These educators are exploring how they might teach these competencies in their classrooms, monitor student progress, and provide feedback to move students forward. Their assessment methods include checklists, observations, student journals, and collaborative work developing scales to track how frequently students apply specific competencies.

A number of themes are emerging from our work with educators on the ground and experts in the field:. They find that using the sliding scale leads naturally into a discussion of how the student might move up the scale. Because the competencies are defined in specific, observable terms, they can be used as learning goals.

This allows the educators to use evidence gathered in a variety of ways to assess student progress toward those goals. The specificity of the competencies also means that teachers are able to give students feedback about how they demonstrate the competency within a task, experience, or process. Feedback that is directed at the task, rather than the person, has been shown to improve achievement, and is an important part of effective social-emotional learning interventions e.

The purposes of classroom-level assessment are different from the purpose of assessment at the jurisdictional board, province and international level. Jurisdictionally, measurement and assessment are more often used to provide information to policy-makers and the public about how systems are doing. Large-scale assessments can include performance tasks including tests or essays , third party e.

They can be census-based everyone is assessed , or sample-based a portion of the population is assessed. While many systems are exploring ways to assess competencies in areas such as creativity, health, social-emotional learning, and citizenship, as well as ways to gather data about learning environments across the system, there are both risks and rewards to this type of reporting.

Because it is important to have information about these areas of learning, it is worth exploring ways to navigate the challenges of jurisdictional assessment.

For example, standardized assessments given to a sampled population of students, on a sampled selection of competencies, could provide information about system performance, while avoiding some of the negative consequences of large-scale measurement e.

What are the effects of measurement error on the accuracy of the estimates of teacher, school, or program effects? What is the contribution of measurement error to the volatility in estimates over time e. Since there are questions about the assumption that test score scales are equal-interval, to what extent are inferences from value-added modeling sensitive to monotonic transformations meaning transformations that preserve the original order of test scores?

Given the problems described above, how might value-added analyses be given a thorough evaluation prior to operational implementation? One way of evaluating a model is to generate simulated data that have the same characteristics as operational data and determine whether the model can accurately capture the relationships that were built into the simulated data.

If the model does not estimate parameters with sufficient accuracy from data that are generated to fit the model and match the characteristics of the test data, then there is little likelihood that the model will work well with actual test data. Note that doing well by this measure. Different workshop participants tended to identify many of the same measurement issues associated with value-added models. However, the vertical scale issues and the equal interval assumption are more specific to VAM applications.

According to Kolen, there are several critical questions: Are estimated teacher and school effects largely due to idiosyncrasies of statistical methods, measurement error, the particular test examined, and the scales used?

Or are the estimated teacher and school effects due at least in part to educationally relevant factors? He argued that these questions need to be answered clearly before a value-added model is used as the sole indicator to make important educational decisions.

Value-added methods refer to efforts to estimate the relative contributions of specific teachers, schools, or programs to student test performance. In recent years, these methods have attracted considerable attention because of their potential applicability for educational accountability, teacher pay-for-performance systems, school and teacher improvement, program evaluation, and research. Value-added methods involve complex statistical models applied to test data of varying quality.

Accordingly, there are many technical challenges to ascertaining the degree to which the output of these models provides the desired estimates. Despite a substantial amount of research over the last decade and a half, overcoming these challenges has proven to be very difficult, and many questions remain unanswered--at a time when there is strong interest in implementing value-added models in a variety of settings.

The National Research Council and the National Academy of Education held a workshop, summarized in this volume, to help identify areas of emerging consensus and areas of disagreement regarding appropriate uses of value-added methods, in an effort to provide research-based guidance to policy makers who are facing decisions about whether to proceed in this direction.

Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website. Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book. Switch between the Original Pages , where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

To search the entire text of this book, type in your search term here and press Enter. Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

Do you enjoy reading reports from the Academies online for free? Sign up for email notifications and we'll let you know about new publications in your areas of interest when they're released. Get This Book. Visit NAP. Looking for other ways to read this? No thanks. Suggested Citation: "3 Measurement Issues.

Page 28 Share Cite. Page 29 Share Cite. Tests Are Incomplete Measures of Achievement. Page 30 Share Cite. Page 31 Share Cite. Measurement Error. Page 32 Share Cite. Page 33 Share Cite. Measurement Error and the Stability of Teacher Effects. Page 34 Share Cite. Page 35 Share Cite. Page 36 Share Cite. Page 37 Share Cite. Model of Learning. Page 38 Share Cite. Key Research Areas. Page 39 Share Cite. Page 40 Share Cite. This page intentionally left blank. Page 27 Share Cite.



0コメント

  • 1000 / 1000