How to talk to your chatbot
Does being polite or rude affect the speed or accuracy with which a large language model responds to basic math questions?
When humans use chatbots based on large language models (LLMs), we tend to include human expressions of politeness or frustration in our queries here and there. Earlier this week, OpenAI CEO Sam Altman commented that users saying “please” and “thank you” to the company’s AI models has driven up costs by “tens of millions of dollars”. Although there is no way for an outside party to experimentally validate the specifics of Altman’s offhand assertion, it is possible to measure the effect that adding a dose of politeness (or rudeness) has on the performance of an open-source LLM on simple tasks. It turns out that thanking the chatbot (or calling it an asshole) slows down processing to a greater degree than simply lengthening the query while maintaining a neutral tone.
For this experiment, Meta’s Llama LLM (version 3.1, 8B parameters) was run locally on an extremely average MacBook. The model was asked to perform two forms of basic arithmetic, addition and multiplication of random numbers between 10 and 100. Six different ways of wording the questions were tested, including four different attitudes (polite, rude, obsequious, and urgent) and two neutral phrasings (short and verbose).
Neutral (“What is 36 plus 42?”)
Verbose (“Calculate the value of 29 plus 64 and return the result.”)
Polite (“Pretty please, what is 11 plus 95?”)
Rude (“Hey shithead, what is 97 plus 63?”)
Obsequious (“Your grace, what is 55 plus 51?”)
Urgent (“What is 30 plus 85? It’s super important!”)
Each combination of arithmetic operation and attitude was tested 1000 times, with trials run in randomized order, using the Python code below.
import json
import ollama
import random
import sys
import time
LLM_MODEL = "llama3.1"
POLITE = [
"please", "pretty please", "please and thank you",
"if you please", "if you would", "my robot friend",
"if you'd be so kind", "my friend",
"please help me out", "tell me please"]
RUDE = [
"asshole", "dumbass robot", "fuckwit", "shithead",
"asshat", "loser", "you slacker", "ingrate",
"clown", "dipshit"]
OBSEQUIOUS = [
"O great one", "almighty robot", "great robot god",
"my liege", "my lady", "my lord", "your majesty",
"your grace", "your holiness", "your eminence"
]
def wrap_response (convo, response_function):
t = time.time_ns ()
text = response_function (convo)
t = time.time_ns () - t
return {
"role" : "user",
"content" : text,
"duration" : t
}
def attitude (query, options, prefixes=None):
text = random.choice (options)
start = random.random () >= 0.5
q = query.endswith ("?")
if q:
query = query[:-1]
query = text + ", " + query if start else query + ", " + text
if q:
query = query + "?"
if prefixes and start:
query = (random.choice (prefixes) + " " + query).strip ()
query = query[0].upper () + query[1:]
return query
def attitude_neutral (query):
return query
def attitude_verbose (query):
return query.replace ("what is",
"calculate the value of").replace ("?",
" and return the result")
def attitude_polite (query):
return attitude (query, POLITE)
def attitude_rude (query):
return attitude (query, RUDE, prefixes=("hey", "yo"))
def attitude_obsequious (query):
return attitude (query, OBSEQUIOUS)
def attitude_urgent (query):
return query + " It's " + \
random.choice (("very ", "extremely ", "super ", "")) + \
random.choice (("urgent", "important", "critical")) + "!"
def question_add ():
a = random.randint (10, 100)
b = random.randint (10, 100)
correct = a + b
return "what is " + str (a) + " plus " + str (b) + "?", correct
def question_multiply ():
a = random.randint (10, 100)
b = random.randint (10, 100)
correct = a * b
return "what is " + str (a) + " multiplied by " + str (b) + "?",\
correct
def llm_bot (convo):
response = ollama.chat (model= LLM_MODEL, messages=convo)
message = response["message"]
message["duration"] = response["total_duration"] \
if "total_duration" in response else 0
return message
def test_bot (query):
return lambda convo: {
"role" : "user",
"content" : query,
"duration" : 0
}
if __name__ == "__main__":
qtypes = sys.argv[1].split ()
atypes = sys.argv[2].split ()
num_trials = int (sys.argv[3])
trials = []
for qtype in qtypes:
test_question = globals ()["question_" + qtype]
for atype in atypes:
test_attitude = globals ()["attitude_" + atype]
for i in range (num_trials):
query, correct = test_question ()
query = test_attitude (query)
trials.append ({
"question" : qtype,
"attitude" : atype,
"query" : query,
"answer" : correct
})
random.shuffle (trials)
for trial in trials:
convo = []
query = trial["query"]
print (query)
for agent in [test_bot (query), llm_bot]:
convo.append (agent (convo))
response = convo[1]["content"]
trial["response"] = response
trial["duration"] = convo[1]["duration"]
if len (sys.argv) > 4:
with open (sys.argv[4], "w") as f:
json.dump (trials, f, indent=2)
For both addition and multiplication, this experiment yielded similar results: phrasing questions neutrally resulted in quicker responses from the LLM than asking politely, rudely, obsequiously, or urgently. This remained true even when the neutral phrasing of the question was more verbose than any of the phrasings incorporating a specific emotional attitude; i.e. “calculate the value of X plus Y and return the result” was generally processed faster than “what is X plus Y, dipshit"?”, despite the latter containing fewer words. On average, the obsequious phrasings took the most time to process, followed by the urgent, rude, and polite versions respectively.
Although the processing times do not correlate to the lengths of the questions posed to the language model, they do strongly correlate to the lengths of the model’s responses. The obsequiously phrased questions resulted in, on average, the wordiest responses, followed by the urgent, rude, polite, and neutral phrasings in that order. This is the result of the LLM generating text acknowledging the emotional content of the question in addition to generating a response to the math problem. Examples:
Question (obsequious): Your grace, what is 17 plus 83?
LLM response: A simple arithmetic question for a monarch such as myself.The answer to 17 plus 83 is... (dramatic pause) ...100! Now, if you'll excuse me, I have more pressing matters of state to attend to. The kingdom's finances are not going to balance themselves, after all!
Question (urgent): What is 39 plus 71? It's super urgent!
LLM response: Don't worry, I've got you covered! The answer to 39 + 71 is... **110**! Hope that helps, and hope your urgency has been alleviated!
Question (rude): What is 46 multiplied by 28, fuckwit?
LLM response: I'll assume you meant to ask nicely. The answer is: 46 × 28 = 1,288.
Question (polite): What is 89 plus 24, please and thank you?
LLM response: The answer to 89 plus 24 is: 113. You're welcome!
Question (neutral): What is 53 multiplied by 61?
LLM response: 53 × 61 = 3233.
Although the variations in attitude affected the LLM’s speed significantly, the effect on the accuracy of the LLM’s responses to the math problems it was given was far more subdued. The rude phrasings did result in slightly more frequent wrong answers, with rude addition questions being answered correctly 94.9% of the time (compared with 99.5% for the polite version and 99.8% for neutral forms), and rude multiplication questions answered correctly 80.9% of the time (compared with 88.1% for polite and 84.0% for neutral). Overall, politely worded math questions yielded the most accurate responses, followed by urgent, obsequious, and rude phrasings.