Our Global Data Science Challenge is shaping the future of learning. In an era when AI is reshaping industries, Capgemini’s 7th Global Data Science Challenge (GDSC) tackled education.
By harnessing cutting-edge AI and advanced data analysis techniques, participants, from seasoned professionals to aspiring data scientists, are building tools to empower educators and policy makers worldwide to improve teaching and learning.
The rapidly evolving landscape of artificial intelligence presents a crucial question: how can we leverage its power to solve real life challenges? Capgemini’s Global Data Science Challenge (GDSC) has been answering this question for years and, in 2024, it took on its most significant mission yet – revolutionizing education through smarter decision making.
The need for innovation in education is undeniable. Understanding which learners are making progress, which are not, and why is critically important for education leaders and policy makers to prioritize the interventions and education policies effectively. According to UNESCO, a staggering 251 million children worldwide remain out of school. Among those who do attend, the average annual improvement in reading proficiency at the end of primary education is alarmingly slow—just 0.4 percentage points per year. This presents a sheer challenge in global foundational learning hampering efforts made to achieve the learning goal as set forth in the Sustainable Development Agenda.
The Grade-AI Generation: A collaborative effort
The GDSC 2024, aptly named “The Grade-AI Generation,” brought together a powerful consortium. Capgemini offered its data science expertise, UNESCO contributed its deep understanding of global educational challenges, and Amazon Web Services (AWS) provided access to cutting-edge AI technologies. This collaboration unlocks the hidden potential within vast learning assessment datasets, transforming raw data into actionable insights for decision making that could change the future of millions of children worldwide.
At the heart of this year’s challenge lies the PIRLS 2021 dataset – a comprehensive global survey encompassing over 30 million data points on 4th grade children’s reading achievement. This dataset is particularly valuable because it provides a rich and standardized data that allows participants to identify patterns and trends across different regions and education systems. By analyzing factors like student performance, demographics, instructional approaches, curriculum, home environment, etc. the AI-powered education policy expert can offer insights that would take much longer time and resources to gain from traditional methods. Participants were tasked with creating an AI-powered education policy expert capable of analyzing this rich data and providing data-driven advice to policymakers, education leaders, teachers, but also parents, and students themselves.
Building the future: Agentic AI systems
The challenge leveraged state-of-the-art AI technologies, particularly focusing on agentic systems built with advanced Large Language Models (LLMs) such as Claude, Llama, and Mistral. These systems represent a significant leap forward in AI capabilities, enabling more nuanced understanding and analysis of complex educational data.
“Generative AI is the most revolutionary technology of our time,” says Mike Miller, Senior Principal Product Lead at AWS, “enabling us to leverage these massive amounts of complicated data to capture for analysis, and present knowledge in more advanced ways. It’s a game-changer and it will help make education more effective around the world and enable our global community to commit to more sustainable development.“
The transformative potential of AI in education
The potential impact of this challenge extends far beyond the competition itself. As Gwang-Chol Chang, Chief, Section of Education Policy at UNESCO, explains, “Such innovative technology is exactly what this hackathon has accomplished. Not just only do we see the hope for lifting the reading level of young children around the world, we also see a great potential for a breakthrough in education policy and practice.”
The GDSC has a proven track record of producing innovations with real-world impact. In the 2023 edition, “The Biodiversity Buzz,” participants developed a new state-of-the-art model for insect classification. Even more impressively, the winning model from the 2020 challenge, “Saving Sperm Whale Lives,” is now being used in the world’s largest public whale-watching site, happywhale.com, demonstrating the tangible outcomes these challenges can produce.
Aligning with a global goal
This year’s challenge aligns perfectly with Capgemini’s belief that data and AI can be a force for good. It embodies the company’s mission to help clients “get the future you want” by applying cutting-edge technology to solve pressing global issues.
Beyond the competition: A catalyst for change
The GDSC 2024 is more than just a competition; it’s a global collaboration that brings together diverse talents to tackle one of the world’s most critical challenges. By bridging the gap between complex, costly collected learning assessment data and actionable insights, participants have the opportunity to make a lasting impact on global education.
A glimpse into the Future
The winning team ‘insAIghtED’ consists of Michal Milkowski, Serhii Zelenyi, Jakub Malenczuk, and Jan Siemieniec, based in Warsaw Poland. They developed an innovative solution aimed at enhancing actionable insights using advanced AI agents. Their model leverages the PIRLS 2021 dataset, which provides structured, sample-based data on reading abilities among 4th graders globally. However, recognizing the limitations of relying solely on this dataset, the team expanded their model to incorporate additional data sources such as GDP, life expectancy, population statistics, and even YouTube content. This multi-agent AI system is designed to provide nuanced insights for educators and policymakers, offering short answers, data visualizations, yet elaborated explanations, and even a fun section to engage users.
The architecture of their solution involves a lead data analyst, data engineer, chart preparer, and data scientist, each contributing to different aspects of the model’s functionality. The system is capable of querying databases, aggregating data, performing internet searches, and preparing elaborated answers. By integrating various data sources and employing state-of-the-art AI technologies like Langchain and crewAI, the insAIghtED model delivers impactful, real-world, actionable insights that go beyond the numbers, helping to address complex educational challenges and trends.
Example:
Figure 1: Show an example of the winning model. The image has the model answering the following prompt “Visualize the number of students who participated in the PIRLS 2021 study per country”
As we stand on the brink of an AI-powered educational revolution, the Grade-AI Generation challenge serves as a beacon of innovation and hope. It showcases how the combination of data science, AI, and human creativity and passion can pave the way for a future where quality education is accessible to all, regardless of geographical or socioeconomic barriers.
Start innovating now –
Dive into AI for good
Explore how AI can be applied to solve societal challenges in your local community or industry.
Embrace agentic AI systems
Start experimenting with multi-agent AI systems to tackle complex, multi-faceted problems in your field.
Collaborate globally
Seek out international partnerships and datasets to bring diverse perspectives to your AI projects.
Interesting read?Capgemini’s Innovation publication,Data-powered Innovation Review – Wave 9 features 15 captivating innovation articles with contributions from leading experts from Capgemini, with a special mention of our external contributors fromThe Open Group, AWS andUNESCO. Explore the transformative potential of generative AI, data platforms, and sustainability-driven tech. Find all previous Waves here.