From words to watts: How prompting patterns shape AI’s environmental impact

Compatibilidad
Ahorrar(0)
Compartir

Large language models (LLMs) have become widely used across various domains spanning search engines, code generation, and text creation.

However, a major concern associated with their adoption is the high environmental impact of inference, affecting both sustainability goals and operational efficiency. This study empirically investigates the relationship between prompt characteristics and energy consumption in LLMs through Capgemini’s internal chatbot – a digital assistant powered by GPT-4 transformer technology supporting over 350,000 users worldwide. We conducted experiments across question answering, code generation, and fact verification tasks, measuring energy consumption, carbon footprint, primary energy, and ADPe values using the EcoAI environmental impact measurement framework. Our results demonstrate that even when presented with identical tasks, Capgemini’s internal chatbot generates responses with varying characteristics and subsequently exhibits distinct energy consumption patterns. We found that prompt length is less significant than the task type itself in determining energy usage. We identified specific keywords associated with higher or lower energy usage that vary between associated tasks, highlighting the importance of prompt design in optimizing inference efficiency. These findings offer practical insights for enhancing the energy efficiency of large language models. By applying green prompting techniques, organizations can make more environmentally conscious choices in AI usage – reducing energy consumption and carbon emissions – while maintaining performance standards. This contributes to more sustainable AI operations as the demand for LLM-based solutions continues to grow.

Introduction

This study explores the environmental implications of prompt design in large language models (LLMs), focusing on energy consumption patterns. Using Capgemini’s internal chatbot – an enterprise-scale digital assistant powered by GPT-4 – we analyze how different prompt characteristics influence sustainability metrics such as energy usage, carbon footprint, and resource depletion. The study was inspired by Green Prompting¹, research by Lancaster University examining how prompt design affects the energy consumption of large language models during inference. This research aims to inform future developments in environmentally responsible AI usage.

Capgemini’s internal chatbot² is a smart digital assistant created to support over 350,000 users across HR, IT, finance, and more. It uses advanced AI (via Azure OpenAI) to give accurate, personalized answers based on the user’s role, location, and needs. It supports multiple languages and ensures privacy by not storing personal conversation links.

By leveraging the EcoAI³ framework across varied task types such as Q&A, we uncover prompt-level variations in energy efficiency, offering insights into how green prompting strategies may contribute to more sustainable AI interactions. EcoAI³ is a Capgemini’s internal AI carbon calculation tool & API designed to help organizations manage sustainability and environmental goals. It collects and analyzes data to track carbon footprints, energy use, and other eco-metrics. By offering real-time insights, EcoAI helps businesses make greener decisions. It’s user-friendly, customizable, and supports better compliance with environmental regulations.

To further align with environmental goals, green prompting is introduced – a technique of using AI prompts in a way that reduces energy consumption and carbon footprint. It encourages efficient use of AI models by minimizing unnecessary processing while still achieving accurate results.

This white paper examines how Capgemini’s internal chatbot, EcoAI, and green prompting together inform a more sustainable AI approach. It explores energy and carbon impact measurement through EcoAI and investigates prompt-level efficiency gains. The study aims to highlight pathways for making AI interactions more efficient, ethical, and environmentally conscious – aligning with Capgemini’s digital responsibility goals.

The study titled Green Prompting¹ by researchers at Lancaster University investigates how the design of prompts influences the energy consumption of large language models (LLMs) during inference. By analyzing three open-source transformer-based models across tasks like question answering, sentiment analysis, and text generation, the researchers discovered that the semantic content and specific keywords in prompts have a greater impact on energy use than prompt length. For example, prompts containing words like “analyze” or “explain” tend to trigger more detailed responses, which in turn consume more energy. The study emphasizes that both the nature of the task and the length of the model’s response are key factors in determining energy usage. These insights suggest that by carefully crafting prompts, it’s possible to reduce the environmental footprint of AI systems. The research ultimately advocates for the development of energy-adaptive prompting strategies, which could lead to more sustainable and efficient use of LLMs in real-world applications.

Analysis design

This study aims to analyze inference energy usage patterns in Capgemini’s internal GPT‑4 chatbot, focusing on how different prompt types affect environmental impact, covering both operational energy consumption and the embodied carbon footprint linked to infrastructure usage time. The experiment evaluates Capgemini’s internal chatbot’s responses to various prompt categories, measuring performance in terms of energy consumption, carbon footprint, and additional environmental indicators such as abiotic depletion potential and primary energy use.

While evaluating, input prompts are drawn from a dedicated dataset and categorized into three main task types: question answering (Q&A), fact verification, and code generation. Each of these tasks has constant hyperparameters, including temperature, max tokens, inference time, and memory settings and used OpenAI tokenizer⁴ for counting the number of tokens of the input and output prompts.

The Q&A category involved prompts related to internal topics such as Capgemini’s sustainability initiatives (e.g., inquiring on green technology investments, environmental impact of global operations ), the car lease program (e.g., explaining enrollment processes, parameters for cost-effectiveness), HR policies (e.g., understanding ethical guidelines, training modules for employees), and leave or attendance guidelines (e.g., listing leave types, inquiring on our policy alignment with labor laws).

Fact verification included prompts aimed at validating company-specific information, such as organizational values (e.g., confirming Capgemini’s presence on EV100 initiative), internal data points (e.g., verifying employee counts or sustainability project delivered), and reporting statistics (e.g., checking renewable electricity usage or net-zero commitments).

The third category, Code assistance, involved querying Capgemini’s internal chatbot to generate or explain code snippets, particularly in languages like Python (e.g., writing programs to add numbers, create palindrome checkers, or build simple chatbots) – reflecting common developer support needs in enterprise environments.

To understand how different task types and linguistic structures influence inference efficiency, we also examined a curated set of commonly used task verbs or keywords in prompts. These included terms like justify, analyze, measure, create, explain, list, summarize, translate, write, and classify. These verbs were intentionally selected to cover a wide range of cognitive operations, from evaluative and analytical to generative and descriptive.

EcoAI is a modular, API-first tool that calculates the environmental footprint of digital workloads. It can ingest raw data about model behavior – such as token counts, memory usage, hardware type – and return sustainability indicators.

For each prompt and its corresponding response, we computed several statistical measures to analyze energy and performance variations. These metrics included sample size, minimum, maximum, mean, and median values for energy consumption, carbon footprint, abiotic depletion potential and primary energy. These metrics were captured separately for input and output phases of each prompt.
This expanded the scope beyond the original Green Prompting white paper – which focused solely on response energy – by incorporating more comprehensive environmental metrics. The study now includes measurements of carbon emissions, abiotic depletion potential (ADPe), and primary energy (PE) for Capgemini’s internal chatbot.

Analysis results

This study evaluated the environmental impact of prompt design in Capgemini’s internal chatbot assistant, powered by GPT-4, using the EcoAI API. We analyzed three task categories – Q&A, fact verification and code generation– across both input and output stages, measuring energy (Wh), carbon footprint (gCO₂e), primary energy (kJ), and abiotic depletion potential (ADPe, kg Sb eq).

1. Task type is the dominant factor in energy and carbon emissions

  • Q&A prompts had the highest environmental impact, with output carbon emissions averaging 8.17 gCO₂e, over three times higher than fact verification.
  • Code generation was moderately intensive, while fact verification was the most efficient across all metrics.
  • Input emissions were relatively low and consistent across all categories, confirming that output generation is the primary contributor to environmental cost.

2. Output token count predicts energy use; input token count does not

  • A strong linear correlation was observed between output token count and output energy across different task types. The slopes were closely aligned at approximately 31.8 mWh/token for Q&A, 29 mWh/token for fact verification, and 27 mWh/token for code generation, demonstrating consistent energy consumption rates across use cases.
  • The analysis revealed no consistent correlation between input token count and energy usage, with scattered data points in the graphs indicating high variability. This suggests that prompt length alone is not a reliable indicator of environmental impact. Average energy usage per prompt type varied significantly: Q&A at approximately 1.18 Wh/token, fact verification at around 3.69 Wh/token, and code generation at roughly 1.60 Wh/token, demonstrating that energy consumption varies independently of input token length.

3. Keyword choice in Q&A prompts influences energy consumption

A detailed breakdown of Q&A prompts using specific action verbs revealed a clear energy hierarchy:

  • Prompts like “justify” and “analyze” are highly energy-intensive, likely due to the depth of reasoning required.
  • Prompts like “summarize,
Detalles de contacto
subhranilsengupta