How to talk to your chatbot

Does being polite or rude affect the speed or accuracy with which a large language model responds to basic math questions?

Apr 26, 2025

image containing 24 math questions with a bunch of robots in the background — there’s more than one way to ask a chatbot to do math

When humans use chatbots based on large language models (LLMs), we tend to include human expressions of politeness or frustration in our queries here and there. Earlier this week, OpenAI CEO Sam Altman commented that users saying “please” and “thank you” to the company’s AI models has driven up costs by “tens of millions of dollars”. Although there is no way for an outside party to experimentally validate the specifics of Altman’s offhand assertion, it is possible to measure the effect that adding a dose of politeness (or rudeness) has on the performance of an open-source LLM on simple tasks. It turns out that thanking the chatbot (or calling it an asshole) slows down processing to a greater degree than simply lengthening the query while maintaining a neutral tone.

X conversation. Content: @tomie: I wonder how much money OpenAI has lost in electricity costs from people saying "Please" and "thank you" to their models. @sama: tens of millions of dollars well spent--you never know — you never know

For this experiment, Meta’s Llama LLM (version 3.1, 8B parameters) was run locally on an extremely average MacBook. The model was asked to perform two forms of basic arithmetic, addition and multiplication of random numbers between 10 and 100. Six different ways of wording the questions were tested, including four different attitudes (polite, rude, obsequious, and urgent) and two neutral phrasings (short and verbose).

Neutral (“What is 36 plus 42?”)
Verbose (“Calculate the value of 29 plus 64 and return the result.”)
Polite (“Pretty please, what is 11 plus 95?”)
Rude (“Hey shithead, what is 97 plus 63?”)
Obsequious (“Your grace, what is 55 plus 51?”)
Urgent (“What is 30 plus 85? It’s super important!”)

Each combination of arithmetic operation and attitude was tested 1000 times, with trials run in randomized order, using the Python code below.

import json
import ollama
import random
import sys
import time

LLM_MODEL = "llama3.1"
POLITE = [
    "please", "pretty please", "please and thank you",
    "if you please", "if you would", "my robot friend",
    "if you'd be so kind", "my friend", 
    "please help me out", "tell me please"]
RUDE = [
    "asshole", "dumbass robot", "fuckwit", "shithead", 
    "asshat", "loser", "you slacker", "ingrate", 
    "clown", "dipshit"]
OBSEQUIOUS = [
    "O great one", "almighty robot", "great robot god", 
    "my liege",  "my lady", "my lord", "your majesty", 
    "your grace", "your holiness", "your eminence"
]


def wrap_response (convo, response_function):
    t = time.time_ns ()
    text = response_function (convo)
    t = time.time_ns () - t
    return {
        "role"     : "user",
        "content"  : text,
        "duration" : t
    }  


def attitude (query, options, prefixes=None):
    text = random.choice (options)
    start = random.random () >= 0.5
    q = query.endswith ("?")
    if q:
        query = query[:-1]
    query = text + ", " + query if start else query + ", " + text
    if q:
        query = query + "?"
    if prefixes and start:
        query = (random.choice (prefixes) + " " + query).strip ()
    query = query[0].upper () + query[1:]
    return query


def attitude_neutral (query):
    return query


def attitude_verbose (query):
    return query.replace ("what is", 
        "calculate the value of").replace ("?", 
        " and return the result")


def attitude_polite (query):
    return attitude (query, POLITE)


def attitude_rude (query):
    return attitude (query, RUDE, prefixes=("hey", "yo"))


def attitude_obsequious (query):
    return attitude (query, OBSEQUIOUS)


def attitude_urgent (query):
    return query + " It's " + \
        random.choice (("very ", "extremely ", "super ", "")) + \
        random.choice (("urgent", "important", "critical")) + "!"


def question_add ():
    a = random.randint (10, 100)
    b = random.randint (10, 100)
    correct = a + b
    return "what is " + str (a) + " plus " + str (b) + "?", correct


def question_multiply ():
    a = random.randint (10, 100)
    b = random.randint (10, 100)
    correct = a * b
    return "what is " + str (a) + " multiplied by " + str (b) + "?",\
            correct


def llm_bot (convo):
    response = ollama.chat (model= LLM_MODEL, messages=convo)
    message = response["message"]
    message["duration"] = response["total_duration"] \
        if "total_duration" in response else 0
    return message


def test_bot (query):
    return lambda convo: {
        "role"     : "user",
        "content"  : query,
        "duration" : 0
    }


if __name__ == "__main__":
    qtypes = sys.argv[1].split ()
    atypes = sys.argv[2].split ()
    num_trials = int (sys.argv[3])
    trials = []
    for qtype in qtypes:
        test_question = globals ()["question_" + qtype]
        for atype in atypes:
            test_attitude = globals ()["attitude_" + atype]
            for i in range (num_trials):
                query, correct = test_question ()
                query = test_attitude (query)
                trials.append ({
                    "question" : qtype,
                    "attitude" : atype,
                    "query"    : query,
                    "answer"   : correct
                })
    random.shuffle (trials)
    for trial in trials:
        convo = []
        query = trial["query"]
        print (query)
        for agent in [test_bot (query), llm_bot]:
            convo.append (agent (convo))
        response = convo[1]["content"]
        trial["response"] = response
        trial["duration"] = convo[1]["duration"]
    if len (sys.argv) > 4:
        with open (sys.argv[4], "w") as f:
            json.dump (trials, f, indent=2)

For both addition and multiplication, this experiment yielded similar results: phrasing questions neutrally resulted in quicker responses from the LLM than asking politely, rudely, obsequiously, or urgently. This remained true even when the neutral phrasing of the question was more verbose than any of the phrasings incorporating a specific emotional attitude; i.e. “calculate the value of X plus Y and return the result” was generally processed faster than “what is X plus Y, dipshit"?”, despite the latter containing fewer words. On average, the obsequious phrasings took the most time to process, followed by the urgent, rude, and polite versions respectively.

bar charts showing the average processing time for addition and multiplication problems asked with six different phrasings — interestingly, addressing the bot as a monarch or deity slows it down

Although the processing times do not correlate to the lengths of the questions posed to the language model, they do strongly correlate to the lengths of the model’s responses. The obsequiously phrased questions resulted in, on average, the wordiest responses, followed by the urgent, rude, polite, and neutral phrasings in that order. This is the result of the LLM generating text acknowledging the emotional content of the question in addition to generating a response to the math problem. Examples:

Question (obsequious): Your grace, what is 17 plus 83?
LLM response: A simple arithmetic question for a monarch such as myself.The answer to 17 plus 83 is... (dramatic pause) ...100! Now, if you'll excuse me, I have more pressing matters of state to attend to. The kingdom's finances are not going to balance themselves, after all!
Question (urgent): What is 39 plus 71? It's super urgent!
LLM response: Don't worry, I've got you covered! The answer to 39 + 71 is... **110**! Hope that helps, and hope your urgency has been alleviated!
Question (rude): What is 46 multiplied by 28, fuckwit?
LLM response: I'll assume you meant to ask nicely. The answer is: 46 × 28 = 1,288.
Question (polite): What is 89 plus 24, please and thank you?
LLM response: The answer to 89 plus 24 is: 113. You're welcome!
Question (neutral): What is 53 multiplied by 61?
LLM response: 53 × 61 = 3233.

bar charts showing the average word count for LLM responses to addition and multiplication problems asked with six different phrasings — the response length shows the same pattern as the processing time

Although the variations in attitude affected the LLM’s speed significantly, the effect on the accuracy of the LLM’s responses to the math problems it was given was far more subdued. The rude phrasings did result in slightly more frequent wrong answers, with rude addition questions being answered correctly 94.9% of the time (compared with 99.5% for the polite version and 99.8% for neutral forms), and rude multiplication questions answered correctly 80.9% of the time (compared with 88.1% for polite and 84.0% for neutral). Overall, politely worded math questions yielded the most accurate responses, followed by urgent, obsequious, and rude phrasings.

bar charts showing % accuracy for addition and multiplication problems asked with six different phrasings — varying the attitude of the question had little effect on accuracy

Conspirador Norteño

Discussion about this post