Xiaoming Zhai, University of Georgia – Can AI Think Like a Teacher?

Want to grade your students faster? You can use ChatGPT – but there may be downsides.

Xiaoming Zhai, associate professor of science education and artificial intelligence at the University of Georgia, discusses the potential risks.

Xiaoming Zhai, Associate Professor in Science Education, Artificial Intelligence, Computer Science (courtesy), and Statistics (courtesy), serves the Directors of the AI4STEM Education Center and the National Center on Generative AI for Uplifting STEM+C Education (National GENIUS Center) at the University of Georgia. He was a Humboldt Fellow and Visiting Professor at Leibniz Institute of Science and Mathematics Education. He serves as Founding Co-Chair of the National Association of Research in Science Teaching (NARST)’s Research Interest Group RAISE (Research in AI-involved Science Education) and the Co-Chair of the committee of Advancing AI in Science Education (AASE, NSF-Funded). He recently co-edited the Oxford University Press book, Uses of AI in STEM Education.

Can AI Think Like a Teacher?

 

AI is starting to transform how we grade students’ work, especially in science education. In this study, my team and I looked at how large language models—like ChatGPT—can help teachers score students’ written responses. What we found is both promising and cautionary.

On the one hand, AI can make grading much faster. It can assign scores quickly, which can really help teachers give feedback to students more efficiently. But here’s the catch: AI often takes shortcuts. Instead of truly reasoning through a student’s answer, it tends to latch onto keywords. That means it might give a correct score, but for the wrong reasons. This is a big concern—because if students figure that out, they might start writing answers that just “sound right” instead of showing real understanding.

We also explored ways to help AI grade more like a human. What really made a difference was giving AI detailed rubrics—those scoring guides teachers use when grading. When we did that, the AI’s scores got more accurate and more aligned with human teachers. But surprisingly, when we gave AI examples of student answers with scores, it actually did worse. It started focusing on keywords instead of using real reasoning.

What this tells us is that while AI can definitely speed up grading, we have to be careful. Teachers’ expertise is still essential to guide AI in the right direction. AI can help, but it’s not a replacement for human judgment. As I often say, AI is a tool, not a teacher. We need to design these systems thoughtfully, so they support learning instead of getting in the way.

Read More:

Unveiling Scoring Processes: Dissecting the Differences Between LLMs and Human Graders in Automatic Scoring

 

AI and Formative Assessment: The Train Has Left the Station

 

Uses of Artificial Intelligence in STEM Education. Oxford University Press

 

[Wiley] – A Multimodal Interactive Framework for Science Assessment in the Era of Generative Artificial Intelligence

Share