AI and Assessment: Where We Are Now
- Because technology can capture how students create assignments, teachers can assess the learner’s process as well as the learning outcome.
- While autograding remains controversial, it also frees up time that professors can redirect to other activities that enhance student learning.
- Because technology can create limitless numbers of tests that cover the same material, professors can give each student a personalized exam, which mitigates the chances of cheating.
How can business schools use artificial intelligence (AI) to improve the assessment process in a way that boosts student learning? That question was at the heart of a March webinar conducted by AACSB’s Online Learning Affinity Group. “Ensuring Online Assessment Rigor and Integrity in the Age of AI Advancements” was moderated by Bora Ozkan, associate professor of instruction at Temple University’s Fox School of Business and Management in Philadelphia.
Panelists included Jeff Rients, associate director of teaching and learning innovation at Temple University’s Center for the Advancement of Teaching; Tawnya Means, assistant dean for educational innovation and chief learning officer at Gies College of Business at the University of Illinois Urbana-Champaign; and Gonzalo Tudela of Examind AI, a student assessment software company based in Vancouver, British Columbia, Canada.
While AI offers impressive possibilities for student learning and assessment, the speakers noted that such applications are still in the nascent stage. “We’re in a Wild West feeling-out period,” said Rients. “Any conclusions we reach today are tentative.”
All groundbreaking technology follows a similar pattern, outlined in the Gartner Hype Cycle, where it skyrockets to inflated expectations, then falls to the “trough of disillusionment,” and—finally—achieves productive outcomes, said Tudela. “We’re currently at the peak of inflated expectations where everyone is expecting AI to completely change the world,” he said. “It might. It just might not happen tomorrow.”
Even so, the panelists all agreed that schools must begin exploring available AI edtech—not only because these technologies can enhance student learning, but because similar applications are in use in today’s workplaces. If schools don’t use AI in the classroom, warned Means, they open the door for students “to use the tools in ways that are detrimental to their learning. It’s important that they learn how to use these tools … in a way that maps to what they will need to do in the future.”
The Case for Collaboration
The current state of the market might best be described as experimental, as both schools and edtech companies try to determine what’s working. “The only successful path is to quadruple or 10X our number of experiments,” said Tudela—and to share each result regardless of the outcome.
“We should be talking about our failures,” Tudela said. “There’s just as much benefit for people who are working on discovery and innovation to know what doesn’t work as what does work.” If educators and instructional designers don’t share their failures, he says, “ten institutions might be running the same experiment and painfully learning the same lessons.”
Schools must begin exploring AI edtech—not only because these technologies can enhance student learning, but because similar applications are in use in today’s workplaces.
It’s also critical for companies and universities to collaborate and communicate so that, together, they can “iterate through to a better solution,” said Means. While she doesn’t expect vendors to produce the perfect products right out of the box, she looks for tools that are “broadly generalizable” and that can be adapted to the school’s specific needs.
Rients agreed, saying, “One of the silver linings of this development is going to be a more dialogic relationship between educators and vendors—having the discussion of ‘This doesn’t quite work for our particular environment. What changes can you make?’”
Experiments and Accessibility
Schools are already experimenting with ways to integrate AI into the classroom. For instance, in one course at Temple, students use free ChatGPT accounts when they brainstorm and write rough drafts, said Rients. As part of their assignments, students must submit transcripts of their conversations with ChatGPT as well as reflections on how the process worked and whether the tool was helpful.
In another class at Temple, Ozkan requires his MBA students to use ChatGPT to summarize a case. Then they add line-by-line comments about where they agree and disagree with the chatbot, before making a final analysis. As an additional step, student peers review each other’s work.
Gies is experimenting with a chatbot tool trained on videos from the school’s MOOC courses. Students can ask the chatbot for more information about a topic, and it will present them with a relevant video segment as well as related clips. The chatbot enables them to prepare for an exam or “just better understand the concepts in the course,” said Means.
But for an AI application to be useful—and fair—in the classroom, it must be equally accessible to all students, so no one is disadvantaged by having to use it, said Means. It’s also critical that the tool doesn’t create so much anxiety among students that it has a negative impact on the learning experience.
If schools don’t provide access to AI tools, Tudela warned, they’ve effectively ensured inequality in the classroom. Students who can afford it will pay for GPT-4; students who can’t will either use the free version or will make do without any AI assistance at all. He adds, “Equal access is one of the fundamentals for AI in education.”
Assessments and Improvements
While students are using chatbots to complete assignments, instructors are using AI tools to assess student work. What’s most exciting is that such tools make it possible for instructors to assess not just the final product, such as an article or a case analysis, but the process a student used to create it. This enables professors to make both ongoing formative assessments and end-of-process summative assessments, said Means.
For instance, a tool such as GPT-4 records the chat histories between students and the application, including when and how students imported or rewrote text. It tracks what prompts students used and how they uncovered and incorporated information—in other words, it provides proof of process. This allows teachers to see and explain to students where they might have gone wrong.
AI tools make it possible for instructors to assess not just the final product, such as an article or a case analysis, but the process a student used to create it.
“Instead of teaching to the end result, you can teach to the process of critical thinking,” said Tudela. In the future, he predicted, instructors will be able to use a range of AI products that reveal a student’s entire creation process. “Assessment will be a combination of autograding, some proof of effort, and some keystroke dynamics to determine transcription versus original thought,” he said.
Yet autograding remains a controversial aspect of AI, with some critics complaining that it isn’t fair. For instance, Tudela conceded that most educators would balk at an autograding tool that promises an accuracy rate of 90 percent. But he would encourage them to compare that to the consistency rate of five different TAs doing the grading for a large class. “If I can provide something that’s 10 percent more consistent than five TAs, that shifts the perspective. That’s actually an improvement to what you’re currently doing,” he said.
Another concern is that, when professors employ autograding tools, schools might not be reaching the U.S. Department of Education’s requirements for students to have “regular and substantive interaction” with professors and peers or equivalent standards in other nations. However, Means argued that if AI frees up more of a professor’s time, the teacher ultimately can provide faster, better, and more personalized feedback to students.
For instance, if she trains an AI application to grade papers according to her expectations, it can provide feedback to students repeatedly as they submit, revise, and resubmit assignments—a process she doesn’t have enough time to manage on her own. “They can get immediate feedback at 12:01 a.m., when they need it and I’m not available,” Means said. If students are still struggling, she can provide personalized tutoring.
Rients concurred, saying that if he saved time by having AI grade papers, he could spend those hours recording a video that provided global feedback about the results he saw and what students could be doing better. He also pointed out that teachers can choose which assignments to autograde and which ones to grade by hand. Instructors can provide in-depth personal feedback on critical assignments and use autograding on exercises that don’t need the same level of attention.
There might be evidence that AI assessments actually can encourage students to work harder to master the material. For instance, research shows that, with undifferentiated practice exams, students make 1.2 to 1.6 attempts at the practice test before taking the actual test.
However, Tudela described a University of Iowa experiment in which students had access to practice exams with unlimited variations—which meant the tests were different every time students took them. In this case, they averaged 4.3 attempts. Because questions were different each time, students had to approach them differently and were more likely to master the material.
But What About Cheating?
That exam experiment highlights one of the great, if counterintuitive, benefits of AI and online assessment: It can mitigate cheating.
When schools began using digital tests, the risk was that questions would be leaked online where students could memorize them before taking exams. But when AI can provide different, personalized tests to every learner, such security breaches become irrelevant.
AI can mitigate cheating because it can provide different, personalized tests to every learner.
“We’re well within the realm of possibility where an online test could be posted to every website everywhere and not matter, because these new tools allow us to create new tests based on the content of the course,” said Rients. “If the AI makes a new test for each learner, the fact that 14,000 versions exist in the wild [means] that they all just exist as study tools for students.”
Of course, AI allows students to cheat outside of test-taking, such as when they use chatbots to help them write papers and prepare assignments. Means responds by creating different types of assignments. For instance, instead of asking students simply to summarize information—which any chatbot can do—she has them analyze the summaries. “That moves them farther along on Bloom’s taxonomy,” she said. “It moves the needle in what we’re asking students to do.”
Too often, said Tudela, schools that have introduced digital tools to the classroom have focused too closely on “how do we stop, limit, prevent, or detect when students are doing bad things?” He suggested a better question: How can schools use AI in assessments—and allow students to use AI in coursework—while removing “the value that is gained from cheating activities?”
Predictions and Hopes
Given how quickly AI tech is developing, panelists found it difficult to forecast where it might go next. “I’ve seen some things behind the hood that are quite incredible,” said Tudela. Today, he said, different AI applications are designed for different tasks, but he speculates that one day companies could create artificial general intelligence—“one model to rule them all.”
Rients is similarly uncertain about what’s ahead. “Some days I feel we’ve passed the event horizon of the technological singularity, and any attempt to predict the future is woefully inadequate.” Even so, educators must continually pay attention to how oranizations are using these tools so that schools can prepare students to take jobs at those companies. “Even if our students are only experimenting with these tools, we want to give them the edge,” he said.
Means hopes that improved AI learning tools allow schools to solve “Bloom’s 2 sigma problem,” which demonstrates that students who receive personalized tutoring outperform peers educated in the classroom by two standard deviations.
“If we can do scalable personalization in assessment, we can provide learners with opportunities to have individualized tutoring experiences,” she said. Through such an approach, students become more confident and more excited about learning, and “we’re able to address what we’ve struggled with in all of education.”
Continue the discussion by posting your ideas in the Online Learning Affinity Group community on the AACSB Exchange and tagging Tawnya Means and Jeff Rients in your comments.