Reasoning vs. Recitation: The Limits of LLMs in New Worlds

LLMs struggle with tasks like base-9 arithmetic & unconventional chess. Read about counterfactual evaluation.
LLMs struggle with tasks like base-9 arithmetic & unconventional chess. Read about counterfactual evaluation.