AI for Education: A Living Catalog of Emerging Practice

Author

Kim Uittenhove

Published

August 29, 2025

What is the Living Catalog ?

The Living Catalog of Generative AI for Education is an evolving resource designed to support both researchers and educators in exploring how AI is being used in teaching and learning.

Developed by the Center for Learning Sciences at EPFL, the catalog maps a range of applications of generative AI in education. The SAMR framework (Substitution, Augmentation, Modification, Redefinition; Puentedura, 2009; Belkina et al., 2025) provides a useful lens for interpreting this spectrum. In Substitution, AI performs traditional tasks without changing their fundamental nature. Augmentation improves the efficiency or quality of processes, whereas Modification introduces substantial changes to how the process is carried out. Finally, Redefinition enables entirely new educational experiences that were previously unattainable.

Our goal is to point users to relevant scientific literature and examples of real-world practice. By doing so, we aim to provide both a starting point and practical guidance for those seeking to implement generative AI in pedagogically sound ways. We do not evaluate the quality of the research itself.

🧭 Expert-defined practices We identify key trends in the educational use of generative AI, grounded in research, theory and practice.

📚 Evidence from the Field We use open-access scientific literature to bring you evidence-based insights from real-world AI applications in educational settings.

🤖 AI-Powered Distillation We use large language models to synthesize vast amounts of information, making complex research data more accessible and usable.

⚡ Automatic Updating Pipeline The entire process from querying academic databases to extracting structured data is designed for a high degree of automation, systematicity and traceability. This enables rapid updates in a fast-changing field.

🧠 Critical Use Encouraged We do not verify the quality of the referenced studies. Users are encouraged to read the original articles and apply their own critical judgment when interpreting findings.

🔎 Explore each emerging practice in its dedicated space, featuring:

Synthesis: Discover key applications and outcomes with links to relevant papers.
Table: Browse a comprehensive table of real-world applications to see how these practices are being implemented today.

Emerging AI for Education practices

Here’s a quick overview of the practices currently included in the catalog, along with a short description of each one.

Practice	Description
Assessment Design	Generative AI is used to support the design of educational assessments, for example to generate questions or personalize assessments.
Feedback	Generative AI is used for providing formative feedback on student productions, such as essays, code, or answers to (exam) questions.

Assessment Design

Generative AI is used to support the design of educational assessments, for example to generate questions or personalize assessments.

Applications

Generative AI is being widely adopted across disciplines to substitute, augment, modify and redefine various facets of assessment design in education. At the substitution level, it is primarily used for generating assessment items such as multiple-choice questions (MCQs), exercises, and practice materials. Augmentation involves enhancing assessments through alignment with learning objectives and integration of pedagogical frameworks. Modification changes how assessments are administered, through automatic question generation and adaptive or personalized assessments. Redefinition changes the nature of assessments more deeply and encompasses innovative approaches like interactive assessments and simulations.

Substitution: Creating Assessment Items & Exercises

Creating Multiple-Choice Questions (MCQs) represents the most prevalent application. Studies span multiple domains, but the medical domain is particularly well represented. Studies have documented the use of generative AI to produce medical questions 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 and clinical cases 20, 21, 22; computer science questions 23, 24, 25, 26, 17, 27, 28 and exercises 29 30, 31, 32; language questions 33, 34 and exercises 35; mathematics and physics questions 36, 37, 38, 39; business and law questions 40, 41 and exercises 42; and K-12 curriculum-based questions 43, 44, 45.

Augmentation: Enhanced Assessment Design

Generative AI can be used to augment traditional assessment design processes, supporting question refinement 46, 16, 12, and alignment with learning objectives 26, 46, 47, 48, 49, or integration of pedagogical design frameworks such as Bloom’s taxonomy and Mayer’s Multimedia Theory.

Bloom’s Taxonomy serves as a common design framework to align AI-generated questions with appropriate cognitive levels. It is used both to guide question generation 58, 59, 60, 25, 41, 26, 46, 37, 30, 43, 61, 62, 39 and to evaluate cognitive demand of AI-generated questions 2, 40, 3, 8, 34, 63, 6. A critical finding emerges that AI tends to default to lower-order cognitive levels unless carefully engineered through specific prompting strategies 3.

Mayer’s Multimedia Theory offers another lens for augmentation by integrating multimodal elements into assessments 64. Even when not explicitly referenced, many studies implicitly apply Mayer’s principles by incorporating images, diagrams, and other visual representations to support assessments, which aligns with Mayer’s principles of dual coding 65, 66, 38.

Modification: Automatic & Adaptive Assessments

Generative AI also enables more profound modifications in assessment by automating question generation, allowing for on-demand delivery in contexts such as online learning and interactive quiz platforms 50, 51, 28. It also supports personalized assessment content 35, 30 and adaptive question selection 52.

Redefinition: Interactive Assessments & Simulations

Finally, GenAI is redefining assessment design through innovative approaches that go beyond traditional formats, by creating interactive assessments in the medical field, such as oral examination dialogues 53, 54 or patient simulations 55. Beyond the medical field, GenAI is also being used to create interactive assessments in other domains, such as intercultural communication 56, where it simulates communication tasks, and in cybersecurity, where it has been used to simulate crisis scenarios 57.

Measures & Outcomes

Several studies evaluated AI-generated items using measures such as difficulty indices, reliability coefficients, content validity, discrimination parameters, and alignment with learning objectives. 1, 67, 2, 25, 68, 3, 4, 8, 17, 19, 41, 6, 7, 9, 69, 70, 34, 10, 39, 20, 61, 26, 46, 59.

Across studies, there were mixed findings for the psychometric quality of AI-generated MCQs. There were reports that these items tended to be easier than those written by human experts, often reflecting lower cognitive demand and, in some cases, factual inaccuracies and validity concerns 40, 3, 30, 13, 70, 34, 1, 14. Despite this, some studies argued that discrimination indices show promise, with several studies demonstrating that AI-generated items can effectively differentiate between high and low-performing students 1, 3, 6, 10, 67.

Limitations & Recommendations

The primary endeavor is the production of high-quality AI-generated educational content. Output quality is highly dependent on phrasing and model version, necessitating clear, detailed, structured prompts 71, 41, 14. Expert validation procedures are critical to ensure content quality 4, 17, 1, 58, 3, 73. Advanced implementations employ retrieval-augmented generation to improve quality and domain-specific pertinence 72, 17, 58, 46, or multi-agent frameworks that simulate collaborative expert processes, incorporating generation of items, checking of items (e.g., plausibility of distractors), and refining of items 43, 74, 59.

Conclusion and Future Directions

Generative AI can transform assessment design by enabling rapid and automatic item generation, aligning assessments with learning objectives and pedagogical principles, introducing adaptive delivery and personalization, and supporting entirely new formats such as simulations and interactive tasks. Future work should focus on advancing the quality of AI-generated assessments, for example through refined prompting strategies, retrieval-augmented generation, multi-agent systems, and the consistent use of expert validation as an essential step in assessment development. Research should also expand beyond the current emphasis on multiple-choice questions to explore AI’s potential for creating innovative assessments that incorporate multimodality and interactivity, thereby fully realizing its educational value in assessment design.

Feedback

Generative AI is used for providing formative feedback on student productions, such as essays, code, or answers to (exam) questions.

Applications

Generative AI is being widely adopted across disciplines to substitute, augment, modify, and redefine various facets of feedback provision in education. At the substitution level, it primarily reduces teacher workload by supporting qualitative feedback delivery for assignments and exercises, while augmentation can enhance feedback quality through alignment with established pedagogical frameworks. Modification transforms feedback administration through automation, enabling immediate delivery and possible integration with learning management systems. Redefinition more profoundly reimagines feedback provision, encompassing innovative approaches such as feedback within roleplay scenarios and simulation-based learning environments.

Substitution: Generating Feedback

Given generative AI’s sophisticated linguistic capabilities, the most extensive application has been in providing feedback on written assignments, including for example essays, reports, and translation tasks, where AI systems offer linguistic corrections, structural improvements, and guidance on content quality and argumentation [2], [6], [8], [19], [32], [40], [60], [81], [84], [95], [97], [98], [111], [124], [128], [129], [132], [167], [170], [192], [194], [196], [206], [213], [218], [220], [224], [226], [133], [142], [23], [7], [33], [22], [160], [93], [5], [11], [45], [88].

Feedback on programming assignments represents a natural extension of these linguistic abilities, as programming languages themselves consist of syntax, semantics, and structural rules. Generative AI can identify syntactic or logic errors, and provide feedback on code efficiency, style, readability and documentation [12], [49], [50], [53], [54], [70], [61], [62], [63], [64], [76], [77], [83], [157], [160], [166], [180], [191].

Beyond purely linguistic feedback on natural or programming languages, AI can be employed to provide feedback that draws upon knowledge domains. This involves evaluating students’ understanding of subject matter, assessing the quality of their reasoning and problem-solving approaches, and their disciplinary performance [36], [91], [99], [117], [122], [28], [71], [46], [48], [3], [110], [195], [161], [56], [15], [101], [18], [82]

Augmentation: Enhanced Feedback Design

Generative AI can enhance educational feedback in multiple ways: alignment with specific learning objectives, adherence to established pedagogical frameworks, application of evidence-based feedback principles, incorporation of domain-specific knowledge of errors, implementation of thoughtful design principles, and feedback delivery through diverse modalities and styles [19], [168], [85], [9], [109], [230], [3], [8], [27], [172], [197], [65], [63], [82], [171], [73].

Modification: Automatic Integrated Feedback

The affordance of generative AI to generate adapted content on the fly enables modification of feedback delivery in several ways, including automation of feedback provision and it’s integration with learning platforms, and realtime or immediate delivery of feedback that is personalized or adapted to individual learner needs by representing a dynamic student state. These applications are prevalent in the computer science domain [157], [166], [53], [69], [63], [228], [85], [74], [190], [66], [67], [79], and in the educational field more generally [173], [225], [15], [176], [3], [28], [156], [170], [82], [171], [222]. Realtime delivery of feedback is particularly relevant for language learning, where students can get feedback on speaking performance during realtime language practice [149], [108], [9], [137], [24], [230].

Redefinition: Interactive Feedback and Simulations

Finally, generative AI can redefine feedback integration by creating roleplay and simulation-based learning environments where feedback is either embedded within the interactive experience or delivered immediately afterward. These applications are most prevalent in the medical field [158], [25], [26], [43], [89], [94], [115], [118], [238], [29], [99], with some exceptions in physics [59], [202] and teacher education [13]

Measures and Outcomes

Feedback Quality

A first important outcome is the actual quality of AI-generated feedback itself, and how it compares to feedback created by human experts. Several studies have argued that AI-generated feedback can be effective as a support tool for teachers [22], [160], [5], [73], [220] and/or be comparable to or exceed human feedback [155], [11], [23], [53], [28], [3], [61], [84], [122]. However, other studies have reported mixed results, noting that AI-generated feedback can be ineffective or less effective in some areas [12], [98], [191], [192], [13] and/or low comparability with human feedback [41], [6], [86], [18], or inconsistent and unreliable [156], [62], [76], [50].

Feedback Perception

Beyond the objective quality of AI-generated feedback, students’ subjective experience of this feedback represents a crucial factor in determining its ultimate educational impact. Research reveals that students’ pre-existing attitudes toward AI technology influence their perception of feedback quality beyond the actual content of the feedback itself [169], [168]. Interestingly, students often cannot reliably distinguish between AI-generated and human-generated feedback, frequently rating AI feedback quite favorably when they are unaware of its source [83], [46]. However, student preferences vary considerably when the source is disclosed. Some studies report favorable attitudes toward AI feedback or relatively even preferences between AI and human-generated responses [97] [93], [151]. Conversely, other research indicates that students prefer human feedback and express skepticism toward AI-generated responses [52], [4]. This perception deficit can be partially addressed through the design of natural, flexible interaction experiences[57].

Engagement

Several studies have investigated the impact of AI-generated feedback on student engagement, with many reporting positive effects across cognitive, affective, and behavioral dimensions [141], [194], [44], [222], [176], [15], [59]. This enhanced engagement with feedback has been associated with improved subsequent revision behaviors [66], [72], [167], [32]. However, research also reveals potential drawbacks. AI-generated feedback can overwhelm students cognitively and prove time-consuming to process effectively [7], [15], [40]. Additionally, excessive usage patterns have been linked to decreased performance outcomes [157]. More broadly, some studies suggest that student engagement with feedback remains generally low regardless of source, showing little improvement when AI-generated feedback is provided compared to human feedback [109], [81].

Educational Outcomes

Research examining the actual educational impact of AI-generated feedback has mostly painted an encouraging picture. In writing and language learning contexts, AI feedback has been associated with improved skills and performance [95], [196], [45], [114], [128], [193], [226], [154], [140], [142], [184], [108], [171]. Similar positive outcomes extend beyond language domains, with studies documenting enhanced learning performance across various educational disciplines [89], [201], [85], [15] as well as improved academic achievement [222], [172], [79], [97], sometimes even suggesting that AI feedback may help reduce disparities in learning outcomes [176]. These benefits are often mediated by increased student engagement [235], [123], [59]. However, these positive findings are not universal. Some studies have explicitly reported a lack of significant learning gains from AI-generated feedback [56], [157].

Beyond domain-specific learning outcomes, AI-generated feedback has been linked to various psychological and metacognitive benefits. These include increased motivation [154], [85], [94], reduced anxiety [42], [144], enhanced autonomy and perceived competence or self-efficacy [44], [165], [219], [137], and improvements in self-regulated learning [9], [172].

Limitations and Recommendations

Enhancing AI Feedback

A critical challenge is countering generic AI-generated feedback that lacks both domain-specificity and pedagogical effectiveness. To address this limitation, researchers have explored several promising approaches. Some studies implement retrieval-arugmented generation (RAG) to ensure that feedback is contextually appropriate by drawing from relevant educational resources and domain-specific knowledge bases[48], [49], [230]. Other research has focused on fine-tuning language models using specialized datasets tailored to particular academic disciplines, or employing reinforcement learning techniques to guide models toward producing feedback that aligns with educator objectives [207], [230], [3]. These approaches aim to move beyond one-size-fits-all feedback toward more specialized, contextually appropriate responses.

Multi-step AI systems can further enhance feedback quality by decomposing the complex feedback generation process into distinct, specialized stages. Rather than attempting to evaluate student work and generate pedagogical guidance simultaneously, these systems can separate submission assessment from instructional feedback delivery. For instance, one component might focus exclusively on evaluating content quality and another on writing quality, while a subsequent stage incorporates pedagogical principles to craft appropriate feedback. A third validation layer can then assess the generated feedback against established quality criteria to ensure accuracy and educational effectiveness. This modular approach can be implemented through various architectural strategies. Multi-component systems may combine LLM-based modules with traditional rule-based components, capitalizing on the strengths of each approach. Alternatively, multi-agent frameworks can deploy several instances of a language model, each guided by distinct system prompts that define specific roles in the feedback pipeline. Some implementations strategically match different LLMs to particular tasks based on their relative strengths [110], [60], [68], [163], [53], [99], [230].

Designing Effective Human-AI Interaction

Studies consistently concur that hybrid human-AI workflows, where teachers curate and moderate LLM output, are preferable to fully autonomous tutors. These approaches combine scalable automation with pedagogical expertise, offering complementary strengths where teachers can for example address higher-order concerns while AI handles surface-level feedback [173], [230], [167], [161].

Furthermore, effective design requires balancing educational goals with practical usability while navigating inherent tensions between learning effectiveness and user satisfaction. Key principles include promoting transparency about AI limitations, maintaining instructor control, and creating natural, flexible interaction experiences [63], [57]. Systems must also leverage generative AI’s capabilities to promote inclusion and ensure equitable access for diverse learner populations [225], [208].

However, preventing over-reliance on AI tools remains crucial to avoid diminishing critical thinking and creativity. Both educators and students require comprehensive training in AI literacy and feedback literacy to ensure responsible and effective use of these technologies. These efforts should operate within regulatory frameworks that prioritize responsible use and support ethically grounded pedagogical implementations rather than purely technical solutions [198], [203], [159], [223], [232], [214], [174], [153].

System Implementation Recommendations

Successful deployment requires attention to system scalability and performance optimization to handle educational workloads effectively [54], [20], [228], [171]. Furthermore, privacy compliance represents a critical concern, as educational data requires special protection measures [230]. To address privacy concerns while maintaining functionality, institutions should consider implementing open-source LLMs that allow for local deployment and greater data control [70], [77], [160].

Conclusion and Future Directions

Generative AI can transform educational feedback by enabling immediate feedback that is aligned with pedagogical principles, and supporting entirely new formats such as feedback within simulations and interactive feedback. Future work should focus on advancing feedback quality through refined approaches such as retrieval-augmented generation and multi-step systems, with consistent expert validation as an essential component. Research should also emphasize AI’s potential for creating innovative feedback delivery methods that incorporate simulation and real-time interaction.

About the Catalog Creation

This catalog is built through a systematic, multi-stage process including data retrieval and processing, combining AI-powered analysis and essential human oversight. Our goal is to create a dynamic resource that is regularly updated.

Open-Access Scientific Literature

Our process begins by casting a wide net to capture all potentially relevant research.

Exhaustive Search: For each emerging practice, we develop tailored search queries to retrieve references from three major academic databases: Web of Science, Scopus, and Semantic Scholar. This initial step typically yields thousands of references.
Full-Text Filtering: We then narrow this pool to focus exclusively on open-access articles. This ensures we can analyze the complete full-text content, which is essential for in-depth analysis. This filtering step typically reduces the list to about one-third of the original references, for which we are able to collect an open-access full text.

AI-Powered Distillation

Once we have a collection of full-text articles, we leverage Large Language Models (LLMs) to perform a detailed, multi-step analysis.

Relevance Screening: An LLM reviews each article abstract to determine its relevance. Only studies where the emerging practice is a primary focus of the research article are selected for further analysis.
Variable Extraction & Coding: These selected studies are then further analyzed by an LLM, by extracting key variables of interest, such as the description of the application, the pedagogical frameworks used, and the reported outcomes. This yields a comprehensive, filterable application table that can be explored in the catalog.
Synthesis creation: Finally, we create a narrative synthesis based on the structured data in the application table, by combining AI categorization with human insights. The synthesis identifies and categorizes the main types of applications and outcomes and links to DOIs of relevant papers.

A note on our tools: We utilize the OpenAI API for these tasks. As per the data processing agreement, none of the data submitted is used for training their models, ensuring the confidentiality of the research analyzed.

Human-in-the-loop

Our workflow is a collaborative system where human expertise is crucial for ensuring the quality and accuracy of the final output. The human-in-the-loop has several critical tasks:

Schema Definition: Defining and refining the coding schemas that instruct the AI on what information to extract and how to categorize it.
Verification: Comparing AI-coded data against human-coded samples to validate accuracy and consistency.
Synthesis: Co-creation of research insights with AI.

⚠️ Important Note: We do not evaluate the scientific quality, methodological rigor, or validity of the studies referenced in this catalog. Our role is to point to relevant literature and synthesize available information—not to certify its credibility. Users must consult the original publications and apply their own critical judgment when interpreting the findings.

Automatic Updating Pipeline

The research pipeline, from querying academic databases to extracting structured data, is designed for a high degree of automation. This enables rapid updates in a continuously evolving field while ensuring systematicity through consistent procedures for collecting, filtering, and analyzing research papers. Each step is traceable via documented parameters (e.g., research queries, analysis prompts) and preserved historical data, and the workflow is reusable across different contexts.

Phase	Description
📥 Phase 1 – Retrieval	Automatically retrieves an up-to-date collection of references matching a research query from Web of Science, Scopus, and Semantic Scholar, then either creates a new dataset or updates an existing one.
🗂 Phase 2 – Filtering & Full Text	Automatically filters for relevant papers and retrieves their full texts from publishers or open-access repositories.
🤖 Phase 3 – LLM Analysis	Applies LLM-powered extraction to transform full texts into structured information.

🛑 Decision checkpoints between these phases allow for crucial human oversight and quality control.

Continuous Development

This catalog is a living project, not a static report. Our methodology is in a state of continuous development. We view the entire workflow as an iterative process to discover the most effective ways to derive optimal results from LLMs.

We are constantly experimenting with and refining key aspects of the process, including:

The formulation of our prompts to guide the AI.
The design and granularity of the coding schemas.
The methods for pre-processing and presenting full texts to the AI.
The specific requirements and structure for the final synthesis.

This commitment to iterative improvement means that as we refine our techniques and as AI models evolve, the quality, depth, and reliability of this catalog will continuously increase over time.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

How to Cite This Catalog

If you use this catalog in your research or work, please use the citation information below. Click on a style to expand it and copy the citation to your clipboard.


Uittenhove, K. (2025). AI for Education: A Living Catalog of Emerging Practice (Version v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.15633017


Uittenhove, Kim. AI for Education: A Living Catalog of Emerging Practice. Version v1.0.0, Zenodo, 10 June 2025, doi:10.5281/zenodo.15633017.


Uittenhove, Kim. 2025. “AI for Education: A Living Catalog of Emerging Practice.” Zenodo. https://doi.org/10.5281/zenodo.15633017.


Uittenhove, K. (2025) 'AI for Education: A Living Catalog of Emerging Practice', Zenodo. doi: 10.5281/zenodo.15633017.


1. Uittenhove K. AI for Education: A Living Catalog of Emerging Practice [Internet]. Zenodo; 2025 Jun 10. Available from: https://doi.org/10.5281/zenodo.15633017


[1] K. Uittenhove, “AI for Education: A Living Catalog of Emerging Practice,” Zenodo, Jun. 10, 2025. doi: 10.5281/zenodo.15633017.

@misc{uittenhove_2025,
  author       = {Uittenhove, Kim},
  title        = {{AI for Education: A Living Catalog of Emerging Practice}},
  month        = {June},
  year         = {2025},
  publisher    = {Zenodo},
  version      = {v1.0.0},
  doi          = {10.5281/zenodo.15633017},
  url          = {[https://doi.org/10.5281/zenodo.15633017](https://doi.org/10.5281/zenodo.15633017)}
}