Beyond Multiple Choices

Capturing Nuanced Public Opinion with Large Language Models

Laurence-Olivier M. Foisy
Hubert Cadieux
Étienne Proulx
Yannick Dufresne

Université Laval

Can open-source language models be trusted to reliably clean and analyze open-ended survey questions?

Pros and cons of open-ended questions

Pros

More depth and nuance
Doesn’t limit respondent’s answers
Does not cue respondents about possible causes
Allow for unexpected insights
Allow the detection of new trends

Cons

Respondants often skip open-ended questions
Troublesome for mobile users
Difficult to analyze
Time-consuming to code manually
Grammar and spelling errors

Classic Methods and their Limitations

Manual Coding

Time-consuming
Subjective
Costly

Dictionary-based methods

Limited to pre-defined categories
Time-consuming to create
Limited to the dictionary’s scope

Machine Learning

Requires a lot of data
Difficult to train
Requires a lot of time

Method

CES 2021 Survey

Question: “What is the most important issue to you personally in this federal election?”

CAPP’s 12 categories of issues

Human Coder as ground truth

Accuracy
F1 Score
Above 0.8 is good for multi class categorization

Models

Open-Source
- Llama3
- Phi3
- Mistral
Commercial
- GPT-4

Manual Coding

Ollama

Open-source
Large Language Models
Easy to use API

CLELLM

# Use devtools to install the clellm package from github
devtools::install_github("clessn/clellm")

CLELLM

# Use devtools to install the clellm package from github
devtools::install_github("clessn/clellm")

#Use the install_ollama() function to install ollama, only on linux
clellm::install_ollama()

CLELLM

# Use devtools to install the clellm package from github
devtools::install_github("clessn/clellm")

#Use the install_ollama() function to install ollama, only on linux
clellm::install_ollama()

# Use the ollama_install_model() function to install models
clellm::ollama_install_model("model_name")

CLELLM

# Use devtools to install the clellm package from github
devtools::install_github("clessn/clellm")

#Use the install_ollama() function to install ollama, only on linux
clellm::install_ollama()

# Use the ollama_install_model() function to install models
clellm::ollama_install_model("model_name")

# Use the ollama_prompt() function to prompt any model you want
prompt <- clellm::ollama_prompt("prompt", model = "model_name")

Prompt

[1] "In this survey question, respondents had to name their most important issue. Please read the answer and determine to which of the following 12 categories it belongs: 'Law and Crime', 'Culture and Nationalism', 'Public Lands and Agriculture', 'Governments and Governance', 'Immigration', 'Rights, Liberties, Minorities, and Discrimination', 'Health and Social Services', 'Economy and Employment', 'Education', 'Environment and Energy', 'International Affairs and Defense', 'Technology'. Use your judgement and only output a single issue category. The answer your need to categorize is: pension reform."

Accuracy

F1 Score results

Issue Category	Llama3	Phi3	Mistral	GPT-4	Dict
Culture and Nationalism	NA	NA	1	NA	NA
Economy and Employment	0.9	0.87	NA	0.94	0.21
Education	0.67	0.67	1	0.67	NA
Environment and Energy	0.88	0.8	0.8	0.84	0.08
Governments and Governance	0.41	0.47	0.56	0.65	0.03
Health and Social Services	0.94	0.83	0.91	0.96	0.34
Immigration	1	1	1	1	NA
Law and Crime	1	1	1	1	NA
Rights, Liberties, Minorities, and Discrimination	0.86	0.86	0.71	0.57	0.29
Weighted Mean	0.81	0.77	0.5	0.86	0.19

Conclusion

Lack context and nuance
Very promising

Limitations

Prompt limitation

Future work

More validation work needs to be done
Deploy a survey only with open-ended questions
- Test scale building with factor analysis
- Compare with other surveys