video gaming

video gaming

 

 

video gaming

Refer to the Schute-Ventura reading.
What is the difference between measurement and assessment? How are both of these concepts applied to learning?Briefly describe the ECD framework and its purpose.The reading discusses some of the challenges in using games for the purposes of assessment. Choose one of the challenges mentioned and explain how it manifests in the
game you have chosen to study this semester.Refer to the D’Anastasio reading.
What is the premise of the article?We have played a number of games in class which asked us to identify with marginalized characters. Based on your own experience, do you agree with the premise of the
article or not? Are there ethical implications to attempting to convey empathy through games? What are some potential consequences in doing so (both positive and
negative)?Reflect on your experience in the course. (1 paragraph)
What has been your most valuable take-away?What worked well for you in the course? What would you change to improve your learning experience?Â
Respond to at least 2 peers.
Â
Â
Â
Â
Â
Â
Please use easy grammar and vocab.
https://motherboard.vice.com/en_us/article/mgbwpv/empathy-games-dont-exist
Stealth AssessmentMeasuring and Supporting Learningin Video GamesValerie Shute and Matthew VenturaThe John D. and Catherine T. MacArthur Foundation Reports onDigital Media and Learningeducation/technologyStealth AssessmentMeasuring and Supporting Learning in Video GamesValerie Shute and Matthew VenturaTo succeed in today’s interconnected and complex world, workers need to be ableto think systemically, creatively, and critically. Equipping K–16 students with thesetwenty-first-century competencies requires new thinking not only about what shouldbe taught in school but also about how to develop valid assessments to measure andsupport these competencies. In Stealth Assessment, Valerie Shute and Matthew Venturainvestigate an approach that embeds performance-based assessments in digitalgames. They argue that using well-designed games as vehicles to assess and supportlearning will help combat students’ growing disengagement from school; providedynamic and ongoing measures of learning processes and outcomes; and offer studentsopportunities to apply such complex competencies as creativity, problem solving,persistence, and collaboration. Embedding assessments within games providesa way to monitor players’ progress toward targeted competencies and to use thatinformation to support learning. Shute and Ventura discuss problems with such traditional assessment methodsas multiple-choice questions, review evidence relating to digital games and learning,and illustrate the stealth-assessment approach with a set of assessments they aredeveloping and embedding in the digital game Newton’s Playground. These stealthassessments are intended to measure levels of creativity, persistence, and conceptualunderstanding of Newtonian physics during game play. Finally, they consider futureresearch directions related to stealth assessment in education.Valerie Shute is Professor of Educational Psychology and Learning Systems at FloridaState University. Matthew Ventura is a Research Scientist at Florida State University.Front cover image (detail) by Les Todd/Duke University Photography.The MIT PressMassachusetts Institute of TechnologyCambridge, Massachusetts 02142http://mitpress.mit.edu978-0-262-51881-9Stealth Assessment Shute and Venturawww.macfound.orgThis report was made possible by grants from the John D. and CatherineT. MacArthur Foundation in connection with its grant making initiativeon Digital Media and Learning. For more information on the initiativevisit http://www.macfound.org.Stealth AssessmentThe John D. and Catherine T. MacArthur Foundation Reports onDigital Media and LearningPeer Participation and Software: What Mozilla Has to Teach Government, byDavid R. BoothKids and Credibility: An Empirical Examination of Youth, Digital Media Use,and Information Credibility, by Andrew J. Flanagin and Miriam Metzgerwith Ethan Hartsell, Alex Markov, Ryan Medders, Rebekah Pure, andElisia ChoiNew Digital Media and Learning as an Emerging Area and “Worked Examples”as One Way Forward, by James Paul GeeDigital Media and Technology in Afterschool Programs, Libraries, and Museums,by Becky Herr-Stephenson, Diana Rhoten, Dan Perkel, and ChristoSims with contributions from Anne Balsamo, Maura Klosterman, andSusana Smith BautistaQuest to Learn: Developing the School for Digital Kids, by Katie Salen, RobertTorres, Loretta Wolozin, Rebecca Rufo-Tepper, and Arana ShapiroMeasuring What Matters Most: Choice-Based Assessments for the Digital Age,by Daniel L. Schwartz and Dylan ArenaLearning at Not-School? A Review of Study, Theory, and Advocacy for Educationin Non-Formal Settings, by Julian Sefton-GreenStealth Assessment: Measuring and Supporting Learning in Video Games, byValerie Shute and Matthew VenturaThe Future of the Curriculum: School Knowledge in the Digital Age, by BenWilliamsonFor a complete list of titles in this series, see http://mitpress.mit.edu/books/series/john-d-and-catherine-t-macarthur-foundation-reports-digital-media-and-learning.Stealth AssessmentMeasuring and Supporting Learning in Video GamesValerie Shute and Matthew VenturaThe MIT PressCambridge, MassachusettsLondon, England© 2013 Massachusetts Institute of TechnologyAll rights reserved. No part of this book may be reproduced in any formby any electronic or mechanical means (including photocopying, recording,or information storage and retrieval) without permission inwriting from the publisher.MIT Press books may be purchased at special quantity discounts forbusiness or sales promotional use. For information, please email special_sales@mitpress.mit.eduor write to Special Sales Department, TheMIT Press, 55 Hayward Street, Cambridge, MA 02142.This book was set in Stone Serif and Stone Sans by the MIT Press. Printedand bound in the United States of America.Library of Congress Cataloging-in-Publication DataShute, Valerie J. (Valerie Jean), 1953– , author.Stealth assessment : measuring and supporting learning in video games/ Valerie Shute and Matthew Ventura.pages cm. — (The John D. and Catherine T. MacArthur Foundationreports on digital media and learning)Includes bibliographical references.ISBN 978-0-262-51881-9 (pbk. : alk. paper)1. Educational tests and measurements. 2. Video games. I. Ventura,Matthew, author. II. Title.LB3051.S518 2013371.26—dc23201203821710 9 8 7 6 5 4 3 2 1ContentsSeries Foreword viiAcknowledgments ixEducation in the Twenty-First Century 1Problems with Current Assessments 7Assessment Writ Large 8Traditional Classroom Assessments Are Detached Events 10Traditional Classroom Assessments Rarely Influence Learning 11Traditional Assessment and Validity Issues 12Digital Games, Assessment, and Learning 17Evidence of Learning from Games 18Assessment in Games 23Stealth Assessment 31Stealth Assessment in Newton’s Playground 32Conscientiousness Review and Competency Model 38Creativity Review and Competency Model 46Conceptual Physics Review and Competency Model 53vi ContentsRelation of Physics Indicators to Conscientiousness and CreativityIndicators 57Newton’s Playground Study Procedure 63Discussion and Future Research 67AppendixesAppendix 1: Full Physics Competency Model 71Appendix 2: External Measures to Validate Stealth Assessments 72References 79Series ForewordThe John D. and Catherine T. MacArthur Foundation Reportson Digital Media and Learning, published by the MIT Press incollaboration with the Monterey Institute for Technology andEducation (MITE), present findings from current research onhow young people learn, play, socialize, and participate in civiclife. The reports result from research projects funded by the MacArthurFoundation as part of its fifty million dollar initiativein digital media and learning. They are published openly online(as well as in print) in order to support broad dissemination andstimulate further research in the field.
AcknowledgmentsWe would like to sincerely thank the Bill and Melinda GatesFoundation for its funding for this project, particularly EmilyDalton-Smith, Robert Torres, and Ed Dieterle. We would alsolike to express our appreciation to the other members of theresearch grant team—Yoon Jeon Kim, Don Franceschetti, RussellAlmond, Matt Small, and Lubin Wang—for their awesome andabundant support on the project, and Lance King, who came upwith the “agents of force and motion” idea. Finally, we acknowledgeDiego Zapata-Rivera for ongoing substantive conversationswith us on the topic of stealth assessment.
Education in the Twenty-First CenturyYou can discover more about a person in an hour of play than in a yearof conversation.—PlatoIn the first half of the twentieth century, a person who acquiredbasic reading, writing, and math skills was considered to be sufficientlyliterate to enter the work force (Kliebard 1987). Thegoal back then was to prepare young people as service workers,because 90 percent of the students were not expected to seek orhold professional careers (see Shute 2007). With the emergenceof the Internet, however, the world has become more interconnected,effectively smaller, and more complex than before (Friedman2005). Developed countries now rely on their knowledgeworkers to deal with an array of complex problems, many withglobal ramifications (e.g., climate change or renewable energysources). When confronted by such problems, tomorrow’s workersneed to be able to think systemically, creatively, and critically(see, e.g., Shute and Torres 2012; Walberg and Stariha 1992). 2 Education in the Twenty-First CenturyThese skills are a few of what many educators are calling twentyfirst-century(or complex) competencies (see Partnership for the21st Century 2012; Trilling and Fadel 2009).Preparing K–16 students to succeed in the twenty-first centuryrequires fresh thinking about what knowledge and skills (i.e.,what we call competencies) should be taught in our nation’sschools. In addition, there’s a need to design and develop validassessments to measure and support these competencies. Exceptin rare instances, our current education system neither teachesnor assesses these new competencies despite a growing body ofresearch showing that competencies, such as persistence, creativity,self-efficacy, openness, and teamwork (to name a few),can substantially impact student academic achievement (Noftleand Robins 2007; O’Connor and Paunonen 2007; Poropat2009; Sternberg 2006; Trapmann et al. 2007). Furthermore, themethods of assessment are often too simplified, abstract, anddecontextualized to suit current education needs. Our currentassessments in many cases fail to assess what students actuallycan do with the knowledge and skills learned in school (Shute2009). What we need are new performance-based assessmentsthat assess how students use knowledge and skills that aredirectly relevant for use in the real world.One challenge with developing a performance-based measureis crafting appropriate situations or problems to elicit acompetency of interest. A way to approach this problem is touse digital learning environments to simulate problems for performance-basedassessment (Dede 2005; DiCerbo and Behrens2012; Quellmalz et al. 2012). Digital learning environments canprovide meaningful assessment environments by supplying studentswith scenarios that require the application of various competencies.This report introduces a variant of this assessment Education in the Twenty-First Century 3approach by investigating how performance-based assessmentscan be used in digital games. Specifically, we are interested inhow assessment in games can be used to enhance learning (i.e.,formative assessment).For example, consider role-playing games (e.g., World of Warcraft).In these games, players must read lengthy and complexquest logs that tell them the goals. Without comprehendingthese quest instructions, the players would not be able to knowhow to proceed and succeed in the game. This seemingly simpletask in role-playing games is, in fact, an authentic, situatedassessment of reading comprehension. Without these situatedand meaningful assessments, we cannot determine what studentscan actually do with the skills and knowledge obtained.Thus new, embedded, authentic types of assessment methodsare needed to properly assess valued competencies.Why use well-designed games as vehicles to assess and supportlearning? There are several reasons. First, as our schoolshave remained virtually unchanged for many decades whileour world is changing rapidly, we are seeing a growing numberof disengaged students. This disengagement increases thechances of students dropping out of school. For instance, highdropout rates, especially among Hispanic, black, and NativeAmerican students, were described as “the silent epidemic” ina recent research report for the Bill and Melinda Gates Foundation(Bridgeland, DiIulio, and Morison 2006). According tothis report, nearly one-third of all public high school studentsdrop out, and the rate is higher for minority students. In thereport, when 467 high school dropouts were asked why they leftschool, 47 percent of them simply responded, “The classes werenot interesting.” We need to find ways (e.g., well-designed digitalgames and other immersive environments) to get our kids 4 Education in the Twenty-First Centuryengaged, support their learning, and allow them to contributefruitfully to society.A second reason for using games as assessments is a pressingneed for dynamic, ongoing measures of learning processesand outcomes. An interest in alternative forms of assessment isdriven by dissatisfaction with and the limitations of multiplechoiceitems. In the 1990s, an interest in alternative forms ofassessment increased with the popularization of what becameknown as authentic assessment. A number of researchers foundthat multiple-choice and other fixed-response formats substantiallynarrowed school curricula by emphasizing basic contentknowledge and skills within subjects, and not assessing higherorderthinking skills (see, e.g., Kellaghan and Madaus 1991;Shepard 1991). As George Madaus and Laura O’Dwyer (1999)argued, though, incorporating performance assessments intotesting programs is hard because they are less efficient, more difficultand disruptive to administer, and more time consumingthan multiple-choice testing programs. Consequently, multiplechoice has remained the dominant format in most K–12 assessmentsin our country. New performance assessments are neededthat are valid, reliable, and automated in terms of scoring.A third reason for using games as assessment vehicles is thatmany of them typically require a player to apply various competencies(e.g., creativity, problem solving, persistence, and collaboration)to succeed in the game. The competencies requiredto succeed in many games also happen to be the same ones thatcompanies are looking for in today’s highly competitive economy(Gee, Hull, and Lankshear 1996). Moreover, games are asignificant and ubiquitous part of young people’s lives. The PewInternet and American Life Project, for instance, surveyed 1,102youths between the ages of twelve and seventeen. They reported Education in the Twenty-First Century 5that 97 percent of youths—both boys (99 percent) and girls (94percent)—play some type of digital game (Lenhart et al. 2008).Additionally, Mizuko Ito and her colleagues (2010) found thatplaying digital games with friends and family is a large as well asnormal part of the daily lives of youths. They further observedthat playing digital games is not solely for entertainment purposes.In fact, many youths participate in online discussionforums to share their knowledge and skills about a game withother players, or seek help on challenges when needed.In addition to the arguments for using games as assessmentdevices, there is growing evidence of games supporting learning(see, e.g., Tobias and Fletcher 2011; Wilson et al. 2009). Yet weneed to understand more precisely how as well as what kindsof knowledge and skills are being acquired. Understanding therelationships between games and learning is complicated by thefact that we don’t want to disrupt players’ engagement levelsduring gameplay. As a result, learning in games has historicallybeen assessed indirectly and/or in a post hoc manner (Shute andKe 2012; Tobias et al. 2011). What’s needed instead is real-timeassessment and support of learning based on the dynamic needsof players. We need to be able to experimentally ascertain thedegree to which games can support learning, and how and whythey achieve this objective.This book presents the theoretical foundations of and researchmethodologies for designing, developing, and evaluating stealthassessments in digital games. Generally, stealth assessments areembedded deeply within games to unobtrusively, accurately,and dynamically measure how players are progressing relative totargeted competencies (Shute 2011; Shute, Ventura, et al. 2009).Embedding assessments within games provides a way to monitora player’s current level on valued competencies, and then use 6 Education in the Twenty-First Centurythat information as the basis for support, such as adjusting thedifficulty level of challenges or providing timely feedback. Theterm and technologies of stealth assessment are not intended toconvey any type of deception but rather to reflect the invisiblecapture of gameplay data, and the subsequent formative use ofthe information to help learners (and ideally, help learners tohelp themselves).There are four main sections in this report. First, we discussproblems with existing traditional assessments. We then reviewevidence relating to digital games and learning. Third, we defineand then illustrate our stealth assessment approach with a setof assessments that we are currently developing and embeddingin a digital game (Newton’s Playground). The stealth assessmentsare intended to measure the levels of creativity, persistence, andconceptual understanding of Newtonian physics during gameplay.Finally, we discuss future research and issues related tostealth assessment in education.Problems with Current AssessmentsOur country’s current approach to assessing students (K–16) hasa lot of room for improvement at the classroom and high-stakeslevels. This is especially true in terms of the lack of support thatstandardized, summative assessments provide for students learningnew knowledge, skills, and dispositions that are important tosucceed in today’s complex world. The current means of assessingstudents infrequently (e.g., at the end of a unit or schoolyear for grading and promotion purposes) can cause variousunintended consequences, such as increasing the dropout rategiven the out-of-context and often irrelevant test-preparationteaching contexts that the current assessment system frequentlypromotes.The goal of an ideal assessment policy/process should be toprovide valid, reliable, and actionable information about students’learning and growth that allows stakeholders (e.g., students,teachers, administrators, and parents) to utilize theinformation in meaningful ways. Before describing particularproblems associated with current assessment practices, we firstoffer a brief overview of assessment.8 Problems with Current AssessmentsAssessment Writ LargePeople often confound the concepts of measurement and assessment.Whenever you need to measure something accurately, youuse an appropriate tool to determine how tall, short, hot, cold,fast, or slow something is. We measure to obtain information(data), which may or may not be useful, depending on the accuracyof the tools we use as well as our skill at using them. Measuringthings like a person’s height, a room’s temperature, or a car’sspeed is technically not an assessment but rather the collectionof information relative to an established standard (Shute 2009).Educational MeasurementEducational measurement refers to the application of a measuringtool (or standard scale) to determine the degree to whichimportant knowledge, skills, and other attributes have been orare being acquired. It involves the collection and analysis oflearner data. According to the National Council on Measurementin Education’s Web site, this includes the theory, techniques,and instrumentation available for the measurement ofeducationally relevant human, institutional, and social characteristics.A test is education’s equivalent of a ruler, thermometer,or radar gun. But a test does not typically improve learning anymore than a thermometer cures a fever; both are simply tools.Moreover, as Catherine Snow and Jacqueline Jones (2001) pointout, tests alone cannot enhance educational outcomes. Rather,tests can guide improvement (given that they are valid and reliable)if they motivate adjustments to the educational system(i.e., provide the basis for bolstering curricula, ensure supportfor struggling learners, guide professional development opportunities,and distribute limited resources fairly).Problems with Current Assessments 9Again, we measure things in order to get information, whichmay be quantitative or qualitative. How we choose to use the datais a different matter. For instance, back in the early 1900s, students’abilities and intelligence were extensively measured. Yetthis wasn’t done to help them learn better or otherwise progress.Instead, the main purpose of testing was to track students intoappropriate paths, with the understanding that their aptitudeswere inherently fixed. A dominant belief during that period wasthat intelligence was part of a person’s genetic makeup, and thustesting was aimed specifically at efficiently assigning studentsinto high, middle, or low educational tracks according to theirsupposedly innate mental abilities (Terman 1916). In general,there was a fundamental shift to practical education going on inthe country during the early 1900s, countering “wasted time” inschools while abandoning the classics as useless and inefficientfor the masses (Shute 2007). Early educational researchers andadministrators inserted the metaphor of the school as a “factory”into the national educational discourse (Kliebard 1987).The metaphor has persisted to this day.AssessmentAssessment involves more than just measurement. In additionto systematically collecting and analyzing information (i.e.,measurement), it also involves interpreting and acting on informationabout learners’ understanding and/or performance relativeto educational goals. Measurement can be viewed as a subsetof assessment.As mentioned earlier, assessment information can be usedby a variety of stakeholders and for an array of purposes (e.g.,to help improve learning outcomes, programs, and services aswell as to establish accountability). There is also an assortment 10 Problems with Current Assessmentsof procedures associated with the different purposes. For example,if your goal was to enhance an individual’s learning, andyou wanted to determine that individual’s progress toward aneducational goal, you could administer a quiz, view a portfolioof the student’s work, ask the student (or peers) to evaluate progress,watch the person solve a complex task, review lab reports orjournal entries, and so on.In addition to having different purposes and procedures forobtaining information, assessments may be differentially referencedor interpreted–for instance, in relation to normative dataor a criterion. Norm-referenced interpretation compares learnerdata to that of other individuals or a larger group, but can alsoinvolve comparisons to oneself (e.g., asking people how they arefeeling and getting a “better than usual” response is a normreferenceinterpretation). The purpose of norm-referenced interpretationis to establish what is typical or reasonable. On theother hand, criterion-referenced interpretation involves establishingwhat a person can or cannot do, or typically does or doesnot do—specifically in relation to a criterion. If the purpose ofthe assessment is to support personal learning, then criterionreferencedinterpretation is required (for more, see Nitko 1980).This overview of assessment is intended to provide a foundationfor the next section, where we examine specific problemssurrounding current assessment practices.Traditional Classroom Assessments Are Detached EventsCurrent approaches to assessment are usually divorced fromlearning. That is, the typical educational cycle is: teach; stop;administer test; go loop (with new content). But consider the followingmetaphor representing an important shift that occurred Problems with Current Assessments 11in the world of retail outlets (from small businesses to supermarketsto department stores), suggested by James Pellegrino, NaomiChudhowsky, and Robert Glaser (2001, 284). No longer do thesebusinesses have to close down once or twice a year to take inventoryof their stock. Rather, with the advent of automated checkoutand bar codes for all items, these businesses have access toa continuous stream of information that can be used to monitorinventory and the flow of items. Not only can a businesscontinue without interruption; the information obtained is alsofar richer than before, enabling stores to monitor trends andaggregate the data into various kinds of summaries as well asto support real-time, just-in-time inventory management. Similarly,with new assessment technologies, schools should no longerhave to interrupt the normal instructional process at varioustimes during the year to administer external tests to students.Assessment instead should be continual and invisible to students,supporting real-time, just-in-time instruction (for more,see Shute, Levy, et al. 2009).Traditional Classroom Assessments Rarely Influence LearningMany of today’s classroom assessments don’t support deeplearning or the acquisition of complex competencies. Currentclassroom assessments (referred to as “assessments of learning”)are typically designed to judge a student (or group of students)at a single point in time, without providing diagnosticsupport to students or diagnostic information to teachers. Alternatively,assessments (particularly “assessments for learning”)can be used to: support the learning process for students andteachers; interpret information about understanding and/or performanceregarding educational goals (local to the curriculum, 12 Problems with Current Assessmentsand broader to the state or common core standards); provideformative compared to summative information (e.g., give usefulfeedback during the learning process rather than a single judgmentat the end); and be responsive to what’s known about howpeople learn—generally and developmentally.To illustrate how a classroom assessment may be used to supportlearning, Valerie Shute, Eric Hansen, and Russell Almond(2008) conducted a study to evaluate the efficacy of an assessmentfor learning system named ACED (for “adaptive contentwith evidence-based diagnosis”). They used an evidence-centereddesign approach (Mislevy, Steinberg, and Almond 2003) to createan adaptive, diagnostic assessment system that also includedinstructional support in the form of elaborated feedback. The keyissue examined was whether the inclusion of the feedback into thesystem impairs the quality of the assessment (relative to validity,reliability, and efficiency), and does in fact enhance student learning.Results from a controlled evaluation testing 268 high-schoolstudents showed that the quality of the assessment was unimpairedby the provision of feedback. Moreover, students using theACED system showed significantly greater learning of the content(geometric sequences) compared with a control group (i.e., studentsusing the system but without elaborated feedback—just correct/incorrectfeedback). These findings suggest that assessmentsin other settings (e.g., state-mandated tests) can be augmentedto support student learning with instructional feedback withoutjeopardizing the primary purpose of the assessment.Traditional Assessment and Validity IssuesAssessments are typically evaluated under two broad categories:reliability and validity. Reliability is the most basic requirement Problems with Current Assessments 13for an assessment and is concerned with the degree to which atest can consistently measure some attribute over similar conditions.In assessment, reliability is seen, for example, when aperson scores really high on an algebra test at one point in timeand then scores similarly on a comparable test the next day. Inorder to achieve high reliability, assessment tasks are simplifiedto independent pieces of evidence that can be modeled by existingmeasurement models.An interesting issue is how far this simplification process cango without negatively influencing the validity of the test. Thatis, in order to remove any possible source of construct-irrelevantvariance and dependencies, tasks can end up looking likedecontextualized, discrete pieces of evidence. In the process ofachieving high reliability, which is important for supportinghigh-stakes decision making, other aspects of the test may besacrificed (e.g., engagement and some types of validity).Another aspect that traditional, standardized assessmentsemphasize is dealing with operational constraints (e.g., the needfor gathering and scoring sufficient pieces of evidence within alimited administration time and budget). In fact, many of thesimplifications described above could be explained by this issuealong with the current state of certain measurement models thatdo not easily handle complex interactions among tasks, the presenceof feedback, and student learning during the test.Validity, broadly, refers to the extent to which an assessmentactually measures what it is intended to measure. Here are thespecific validity issues related to traditional assessment.Face ValidityFace validity states that an assessment should intuitively“appear” to measure what it is intended to measure. For example, 14 Problems with Current Assessmentsreading some excerpted paragraphs on an uninteresting topicand answering multiple-choice questions about it may not bethe best measure for reading comprehension (i.e., it lacks goodface validity). As suggested earlier, students need to be assessedin meaningful environments rather than filling in bubbles on aprepared form in response to decontextualized questions. Digitalgames can provide such meaningful environments by supplyingstudents with scenarios that require the application of variouscompetencies, such as reading comprehension and problemsolvingskill.Predictive ValidityPredictive validity refers to an assessment predicting futurebehavior. Today’s large-scale, standardized assessments are generallylacking in this area. For example, a recent report from theCollege Board found that the SAT only marginally predicted collegesuccess beyond high school GPA at around r = 0.10 (Korbinet al. 2008). This means that the SAT scores contribute around 1percent of the unique prediction of college success after controllingfor GPA information. Other research studies have showngreater incremental validity of noncognitive variables (e.g.,pyschosocial) over SAT and traditional academic indicators likeGPA in predicting college success (see, e.g., Robbins et al. 2004).Consequential ValidityConsequential validity refers to the effects of a particular assessmenton societal and policy decisions. One negative side effectof the No Child Left Behind (NCLB 2002) initiative, with itsheavy focus on accountability, has been teachers “teaching tothe test.” That is, when teachers instruct content that is relevantto answering items on a test but not particularly relevant for Problems with Current Assessments 15solving real-world problems, this reduces student engagementin school, and in turn, that can lead to increased dropout rates(Bridgeland, DiIulio, and Morison 2006). Moreover, the low predictivevalidity of current assessments can lead to students notgetting into college due to low scores. But the SAT and similartest scores are still being used as the main basis for college admissiondecisions, which can potentially lead to some studentsmissing opportunities at fulfilling careers and lives, particularlydisadvantaged youths.To illustrate the contrast between traditional and new performance-basedassessments, consider the attribute of conscientiousness.Conscientiousness can be broadly defined as themotivation to work hard despite challenging conditions—a dispositionthat has consistently been found to predict academicachievement from preschool to high school to the postsecondarylevel and adulthood (see, e.g., Noftle and Robins 2007;O’Connor and Paunonen 2007; Roberts et al. 2004). Conscientiousnessmeasures, like most dispositional measures, are primarilyself-report (e.g., “I work hard no matter how difficult thetask”; “I accomplish my work on time”)—a method of assessmentthat is riddled with problems. First, self-report measuresare subject to “social desirability effects” that can lead to falsereports about behavior, attitudes, and beliefs (see Paulhaus1991). Second, test takers may interpret specific self-report itemsdifferently (e.g., what it means “to work hard”), leading to unreliabilityand lower validity (Lanyon and Goodstein 1997). Third,self-report items often require that individuals have explicitknowledge of their dispositions (see, e.g., Schmitt 1994), whichis not always the case.Good games, coupled with evidence-based assessment, showpromise as a vehicle to dynamically measure conscientiousness 16 Problems with Current Assessmentsand other important competencies more accurately than traditionalapproaches (see, e.g., Shute, Masduki, and Donmez 2010).These evidence-based assessments can record and score multiplebehaviors as well as measurable artifacts in the game that pertainto particular competencies. For example, various actions that aplayer takes within a well-designed game can inform conscientiousness—howlong a person spends on a difficult problem(where longer equals more persistent), the number of failures andretries before success, returning to a hard problem after skippingit, and so on. Each instance of these “conscientiousness indicators”would update the student model of this variable—and thuswould be up to date and available to view at any time. Additionally,we posit that good games can provide a gameplay environmentthat can potentially improve conscientiousness, becausemany problems require players to persevere despite failure andfrustration. That is, many good games can be quite difficult, andpushing one’s limits is an excellent way to improve persistence,especially when accompanied by the great sense of satisfactionone gets on successful completion of a thorny problem (see, e.g.,Eisenberg 1992; Eisenberg and Leonard 1980). Some students,however, may not feel engaged or comfortable with games, orcannot access them. Alternative approaches should be availablefor these students.As can be seen, traditional tests may not fully satisfy variousvalidity and learning requirements. In the next section wedescribe how digital games can be effectively used in education—asassessment vehicles and to support learning.Digital Games, Assessment, and LearningDigital games are popular. For instance, revenues for the digitalgame industry reached US $7.2 billion in 2007 (Fullerton2008), and overall, 72 percent of the population in the UnitedStates plays digital games (Entertainment Software Association2011). The amount of time spent playing games also continuesto increase (Escobar-Chaves and Anderson 2008). Besides beinga popular activity, playing digital games has been shown to bepositively related to a variety of cognitive skills (on visual-spatialabilities, e.g., see Green and Bavelier 2007; on attention, e.g., seeShaw, Grayson, and Lewis 2005), openness to experience (Choryand Goodboy 2011; Ventura, Shute, and Kim 2012; Witt, Massman,and Jackson 2011), persistence (i.e., a facet of conscientiousness;Ventura, Shute, and Zhao, forthcoming), academicperformance (e.g., Skoric, Teo, and Neo 2009; Ventura, Shute,and Kim 2012), and civic engagement (Ferguson and Garza2011). Digital games can also motivate students to learn valuableacademic content and skills, within and outside the game(e.g., Barab, Dodge, et al. 2010; Coller and Scott 2009; DeRouinJessen2008). Finally, studies have shown that playing digital 18 Digital Games, Assessment, and Learninggames can promote prosocial and civic behavior (e.g., Fergusonand Garza 2011).As mentioned earlier, learning in games has historically beenassessed indirectly and/or in a post hoc manner (see Shute and Ke2012). What is required instead is real-time assessment and supportof learning based on the dynamic needs of players. Researchexamining digital games and learning is usually conducted usingpretest-game-posttest designs, where the pre- and posttests typicallymeasure content knowledge. Such traditional assessmentsdon’t capture and analyze the dynamic, complex performancesthat inform twenty-first-century competencies. How can weboth measure and enhance learning in real time? Performancebasedassessments with automated scoring are needed. The mainassumptions underlying this new approach are that: learning bydoing (required in gameplay) improves learning processes andoutcomes; different types of learning and learner attributes maybe verified as well as measured during gameplay; strengths andweaknesses of the learner may be capitalized on and bolstered,respectively, to improve learning; and ongoing feedback can beused to further support student learning.Evidence of Learning from GamesBelow are three examples of learning from educational games.Preliminary evidence suggests that students can learn deeplyfrom such games and acquire important twenty-first-centurycompetencies.Programming Skills in NIU-TorcsThe game NIU-Torcs (Coller and Scott 2009) requires players tocreate control algorithms to make virtual cars execute nimble Digital Games, Assessment, and Learning 19maneuvers and stay balanced. At the beginning of the game,players receive their own cars, which sit motionless on a track.Each student must write a C++ program that controls the steeringwheel, gearshift, accelerator, and brake pedals to get the carto move (and stop). The program also needs to include specificmaneuverability parameters (e.g., gas pedal, transmission, andsteering wheel). Running their C++ programs permits studentsto simulate the car’s performance (e.g., distance from the centerline of the track and wheel rotation rates), and thus students areable to see the results of their programming efforts by drivingthe car in a 3-D environment.NIU-Torcs was evaluated using mechanical engineering studentsin several undergraduate classrooms. Findings showedthat students in the classroom using NIU-Torcs as the instructionalapproach (n = 38) scored significantly higher than studentsin four control group classrooms (n = 48) on a conceptmap assessment. The concept map assessment included questionsspanning four progressively higher levels of understanding:the number of concepts recalled (i.e., low-level knowledge),Figure 1Screen capture of NIU-Torcs20 Digital Games, Assessment, and Learningthe number of techniques per topic recalled, the depth of thehierarchy per major topic (i.e., defining features and their connections),and finally, connections among branches in the hierarchy(i.e., showing a deep level of understanding). Studentsin the NIU-Torcs classroom significantly improved in terms ofthe depth of hierarchy and connections among branches (i.e.,deeper levels of knowledge) relative to the control group. Figure1 shows a couple of screen shots from the NUI-Torcs game.Understanding Cancer Cells with Re-MissionRe-Mission (Kato et al. 2008) is the name of a video game inwhich players control a nanobot (named Roxxi) in a 3-D environmentrepresenting the inside of the bodies of young patientswith cancer. The gameplay was designed to address behavioralissues that were identified in the literature and were seen as criticalfor optimal patient participation in cancer treatment. Thevideo gameplay includes destroying cancer cells and managingcommon treatment-related adverse effects, such as bacterialinfections, nausea, and constipation. Neither Roxxi nor any ofthe virtual patients die in the game. That is, if players fail at anypoint in the game, then the nanobot powers down and playersare given the opportunity to retry the mission. Players needto complete missions successfully before moving on to the nextlevel.A study was conducted to evaluate Re-Mission at thirty-fourmedical centers in the United States, Canada, and Australia. Atotal of 375 cancer patients, thirteen to twenty-nine years old,were randomly assigned to the intervention (n = 197) or controlgroup (n = 178). The intervention group played Re-Mission whilethe control group played Indiana Jones and the Emperor’s Tomb(i.e., both the gameplay and interface were similar to Re-Mission).After taking a pretest, all participants received a computer either Digital Games, Assessment, and Learning 21with Indiana Jones and the Emperor’s Tomb (control group) or thesame control group game plus the Re-Mission game (interventiongroup). The participants were asked to play the game(s) forat least one hour per week during the three-month study, andoutcome assessments were collected at one and three monthsafter the pretest. Game use was recorded electronically. Outcomemeasures included adherence to taking prescribed medications,self-efficacy, cancer-related knowledge, control, stress, and qualityof life. Adherence, self-efficacy, and cancer-related knowledgewere all significantly greater in the intervention groupFigure 2Screen capture of Re-Mission game22 Digital Games, Assessment, and Learningcompared to the control group. The intervention did not affectself-reported measures of stress, control, or quality of life. Figure2 shows an opening screen of Re-Mission.Taiga Park and Science Content LearningOur last example illustrates how kids learn science content andinquiry skills within an online game called Quest Atlantis: TaigaPark. Taiga Park is an immersive digital game developed by SashaBarab and his colleagues at Indiana University (Barab et al. 2007;Barab, Gresalfi, and Ingram-Goble 2010). Taiga Park is a beautifulnational park where many groups coexist, such as the flyfishingcompany, the Mulu farmers, the lumber company, andpark visitors. In this game, Ranger Bartle calls on the player toinvestigate why the fish are dying in the Taiga River. To solvethis problem, players are engaged in scientific inquiry activities.They interview virtual characters to gather information, and collectwater samples at several locations along the river to measurewater quality. Based on the collected information, players makea hypothesis and suggest a solution to the park ranger.To move successfully through the game, players need tounderstand how certain science concepts are related to eachother (e.g., sediment in the water from the loggers’ activitiescauses an increase to the water temperature, which decreases theamount of dissolved oxygen in the water, which causes the fishto die). Also, players need to think systemically about how differentsocial, ecological, and economic interests are intertwinedin this park. In a controlled experiment, Barab and his colleagues(2010) found that middle-school students learning with TaigaPark scored significantly higher on the posttest (i.e., assessingknowledge of core concepts such as erosion and eutrophication)compared to the classroom condition (p < 0.01). The Taiga ParkDigital Games, Assessment, and Learning 23group also scored significantly higher than the control conditionon a delayed posttest, thus demonstrating retention of thecontent relating to water quality (p < 0.001) in a novel task (thusbetter retention and transfer). The same teacher taught bothtreatment and control conditions. For a screen capture fromTaiga Park, see figure 3.As these examples show, digital games appear to supportlearning. But how can we more accurately measure learning,especially as it happens (rather than after the fact), and beyondcontent knowledge?Assessment in GamesIn a typical digital game, as players interact with the environment,the values of different game-specific variables change. ForFigure 3Screen capture of Taiga Park24 Digital Games, Assessment, and Learninginstance, getting injured in a battle reduces a player’s health, andfinding a treasure or another object increases a player’s inventoryof goods. In addition, solving major problems in games permitsplayers to gain rank or “level up.” One could argue that these areall “assessments” in games—of health, personal goods, and rank.But now consider monitoring educationally relevant variables atdifferent levels of granularity in games. In addition to checkinghealth status, players could check their current levels of systemsthinkingskill, creativity, and teamwork, where each of thesecompetencies is further broken down into constituent knowledgeand skill elements (e.g., teamwork may be broken downinto cooperating, negotiating, and influencing/leadership skills).If the estimated values of those competencies got too low, theplayer would likely feel compelled to take action to boost them.One main challenge for educators who want to employ ordesign games to support learning is making valid inferences—about what the student knows, believes, and can do—at anypoint in time, at various levels, and without disrupting the flowof the game (and hence engagement and learning). One way toincrease the quality and utility of an assessment is to use evidence-centereddesign (ECD), which informs the design of validassessments and yields real-time estimates of students’ competencylevels across a range of knowledge and skills (Mislevy,Steinberg, and Almond 2003).ECD is a conceptual framework that can be used to developassessment models, which in turn support the design of validassessments. The goal is to help assessment designers coherentlyalign the claims that they want to make about learners as wellas the things that learners say or do in relation to the contextsand tasks of interest (e.g., Mislevy and Haertel 2006; Mislevy,Steinberg, and Almond 2003; for a simple overview, see ECD for Digital Games, Assessment, and Learning 25Dummies by Shute, Kim, and Razzouk 2010). There are threemain theoretical models in the ECD framework: competency,evidence, and task models.Competency ModelWhat collection of knowledge, skills, and other attributes should beassessed? Although ECD can work with simple one-dimensionalcompetency models, its strength comes from treating competencyas multidimensional. Variables in the competency modeldescribe the set of knowledge and skills on which inferences arebased (see Almond and Mislevy 1999). The term student modelis used to denote an instantiated version of the competencymodel—like a profile or report card, only at a more refined grainsize. Values in the student model express the assessor’s currentbelief about the level on each variable within the competencymodel, for a particular student.Evidence ModelWhat behaviors or performances should reveal those competencies?An evidence model expresses how the student’s interactions withand responses to a given problem constitute evidence about competencymodel variables. The evidence model attempts to answertwo questions: (a) What behaviors or performances reveal targetedcompetencies; and (b) What’s the statistical connection betweenthose behaviors and the variable(s) in the competency model?Task ModelWhat tasks or problems should elicit those behaviors that comprisethe evidence? The variables in a task model describe features of situationsthat will be used to elicit performance. A task model providesa framework for characterizing or constructing situations 26 Digital Games, Assessment, and Learningwith which a student will interact to supply evidence abouttargeted aspects of competencies. The main purpose of tasks orproblems is to elicit evidence (observable) about competencies(unobservable). The evidence model serves as the glue betweenthe two.There are two main reasons why we believe that the ECDframework fits well with the assessment of learning in digitalgames. First, in digital games, people learn in action (Gee 2003;Salen and Zimmerman 2005). That is, learning involves continuousinteractions between the learner and game, so learning isinherently situated in context. The interpretation of knowledgeand skills as the products of learning therefore cannot be isolatedfrom the context, and neither should assessment. The ECDframework helps us to link what we want to assess and whatlearners do in complex contexts. Consequently, an assessmentcan be clearly tied to learners’ actions within digital games, andcan operate without interrupting what learners are doing orthinking (Shute 2011).The second reason that ECD is believed to work well with digitalgames is because the ECD framework is based on the assumptionthat assessment is, at its core, an evidentiary argument.Its strength resides in the development of performance-basedassessments where what is being assessed is latent or not apparent(Rupp et al. 2010). In many cases, it is not clear what peoplelearn in digital games. In ECD, however, assessment begins byfiguring out just what we want to assess (i.e., the claims we wantto make about learners), and clarifying the intended goals, processes,and outcomes of learning.Accurate information about the student can be used to supportlearning. That is, it can serve as the basis for deliveringtimely and targeted feedback as well as presenting a new task Digital Games, Assessment, and Learning 27or quest that is right at the cusp of the student’s skill level, inline with flow theory (e.g., Csikszentmihalyi 1990) and LevVygotsky’s (1978) zone of proximal development.As discussed so far, there are good reasons for using games asassessment vehicles to support learning. Yet Diego Zapata-Riveraand Malcolm Bauer (2011) discuss some of the challenges relatingto the implementation of assessment in games, such as thefollowing:•  Introduction of construct irrelevant content and skills Whendesigning interactive gaming activities, it is easy to introducecontent and interactions that impose requirements on knowledge,skill, or other attributes (KSA) that are not part of the construct(i.e., the KSAs that we are not trying to measure). That is,authenticity added by the context of a game may also imposedemands on irrelevant KSAs (Messick 1994). Designers need toexplore the implications for the type of information that will begathered and used as evidence of students’ performance on theKSAs that are part of the construct.•  Interaction issues The nature of interaction in games may be atodds with how people are expected to perform on an assessmenttask. Making sense of issues such as exploring behavior, pacing,and trying to game the system is challenging, and has a directlink to the quality of evidence that is collected about studentbehavior. The environment can lend itself to interactions thatmay not be logical or expected. Capturing the types of behaviorsthat will be used as evidence and limiting other types of behaviors(e.g., repeatedly exploring visual or sound effects) withoutmaking the game dull or repetitive is a challenging activity.•  Demands on working memory Related to both the issues ofconstruct-irrelevant variance (i.e., when the test contains excess 28 Digital Games, Assessment, and Learningvariance that is irrelevant to the interpreted construct; Messick1989) and interaction with the game is the issue of demandsthat gamelike assessments place on students’ working memory.By designing assessments with higher levels of interactivity andengagement, it’s easy to increase cognitive processing demandsin a way that can reduce the quality of the measurement of theassessment.•  Accessibility issues Games that make use of rich, immersivegraphic environments can impose great visual, motor, auditory,and other demands on the player to just be able to interact inthe environment (e.g., sophisticated navigation controls). Moreover,creating environments that do not make use of some ofthese technological advances (e.g., a 3-D immersive environment)may negatively affect student engagement, especially forstudents who are used to interacting with these types of games.Parallel environments that do not impose the same visual,motor, and auditory demands without changing the constructneed to be developed for particular groups of students (e.g., studentswith visual disabilities).•  Tutorials and familiarization Although the majority of studentshave played some sort of video game in their lives, studentswill need support to understand how to navigate andinteract with the graphic environment. Lack of familiarity withnavigation controls may negatively influence student performanceand student motivation (e.g., Lim, Nonis, and Hedberg2006). The use of tutorials and demos can support this familiarizationprocess. The tutorial can also be used as an engagementelement (see, e.g., Armstrong and Georgas 2006).•  Type and amount of feedback Feedback is a key componentof instruction and learning. Research shows that interactive Digital Games, Assessment, and Learning 29computer applications that provide immediate, task-level feedbackto students can positively contribute to student learning(e.g., Hattie and Timperley 2007; Shute 2008; Shute, Hansen,and Almond 2008). Shute (2008) reviews research on formativefeedback and identifies the characteristics of effective formativefeedback (e.g., feedback should be nonevaluative, supportive,timely, specific, multidimensional, and credible). Immediatefeedback that results from a direct manipulation of objects inthe game can provide useful information to guide explorationor refine interaction strategies. The availability of ongoing feedbackmay influence motivation and the quality of the evidenceproduced by the system. Measurement models need to take intoaccount the type of feedback that has been provided to studentswhen interpreting the data gathered during their interactionwith the assessment system.•  Handling dependencies among actions Dependencies amongactions/events can be complex to model and interpret. Assumptionsof conditional independence required by some measurementmodels may not hold in complex interactive scenarios.Designing scenarios carefully can help reduce the complexity ofmeasurement models. Using data-mining techniques to supportevidence identification can also help with this issue.In addition to these challenges, in order to make scalableassessments in games, we need to take into account operationalconstraints and support the need for assessment information bydifferent educational stakeholders, including students, teachers,parents, and administrators. Stealth assessment addresses manyof these challenges. The next section describes stealth assessmentand offers a sample application in the area of Newtonianphysics.