Beyond Multiple Choices

Capturing Nuanced Public Opinion with Large Language Models

Laurence-Olivier M. Foisy
Hubert Cadieux
Étienne Proulx
Yannick Dufresne

Université Laval

Can open-source language models be trusted to reliably clean and analyze open-ended survey questions?

Pros and cons of open-ended questions

Pros

  • More depth and nuance

  • Doesn’t limit respondent’s answers

  • Does not cue respondents about possible causes

  • Allow for unexpected insights

  • Allow the detection of new trends

Cons

  • Respondants often skip open-ended questions

  • Troublesome for mobile users

  • Difficult to analyze

  • Time-consuming to code manually

  • Grammar and spelling errors

Classic Methods and their Limitations

Manual Coding

  • Time-consuming
  • Subjective
  • Costly

Dictionary-based methods

  • Limited to pre-defined categories
  • Time-consuming to create
  • Limited to the dictionary’s scope

Machine Learning

  • Requires a lot of data
  • Difficult to train
  • Requires a lot of time

Method

CES 2021 Survey

  • Question: “What is the most important issue to you personally in this federal election?”

CAPP’s 12 categories of issues

Human Coder as ground truth

  • Accuracy
  • F1 Score
  • Above 0.8 is good for multi class categorization

Models

  • Open-Source
    • Llama3
    • Phi3
    • Mistral
  • Commercial
    • GPT-4

Manual Coding

Ollama

  • Open-source
  • Large Language Models
  • Easy to use API

CLELLM

# Use devtools to install the clellm package from github
devtools::install_github("clessn/clellm")

CLELLM

# Use devtools to install the clellm package from github
devtools::install_github("clessn/clellm")

#Use the install_ollama() function to install ollama, only on linux
clellm::install_ollama()

CLELLM

# Use devtools to install the clellm package from github
devtools::install_github("clessn/clellm")

#Use the install_ollama() function to install ollama, only on linux
clellm::install_ollama()

# Use the ollama_install_model() function to install models
clellm::ollama_install_model("model_name")

CLELLM

# Use devtools to install the clellm package from github
devtools::install_github("clessn/clellm")

#Use the install_ollama() function to install ollama, only on linux
clellm::install_ollama()

# Use the ollama_install_model() function to install models
clellm::ollama_install_model("model_name")

# Use the ollama_prompt() function to prompt any model you want
prompt <- clellm::ollama_prompt("prompt", model = "model_name")

Prompt

[1] "In this survey question, respondents had to name their most important issue. Please read the answer and determine to which of the following 12 categories it belongs: 'Law and Crime', 'Culture and Nationalism', 'Public Lands and Agriculture', 'Governments and Governance', 'Immigration', 'Rights, Liberties, Minorities, and Discrimination', 'Health and Social Services', 'Economy and Employment', 'Education', 'Environment and Energy', 'International Affairs and Defense', 'Technology'. Use your judgement and only output a single issue category. The answer your need to categorize is: pension reform."

Accuracy

F1 Score results

Issue Category Llama3 Phi3 Mistral GPT-4 Dict
Culture and Nationalism NA NA 1 NA NA
Economy and Employment 0.9 0.87 NA 0.94 0.21
Education 0.67 0.67 1 0.67 NA
Environment and Energy 0.88 0.8 0.8 0.84 0.08
Governments and Governance 0.41 0.47 0.56 0.65 0.03
Health and Social Services 0.94 0.83 0.91 0.96 0.34
Immigration 1 1 1 1 NA
Law and Crime 1 1 1 1 NA
Rights, Liberties, Minorities, and Discrimination 0.86 0.86 0.71 0.57 0.29
Weighted Mean 0.81 0.77 0.5 0.86 0.19

Conclusion

  • Lack context and nuance
  • Very promising

Limitations

  • Prompt limitation

Future work

  • More validation work needs to be done
  • Deploy a survey only with open-ended questions
    • Test scale building with factor analysis
    • Compare with other surveys