BIG-Bench Hard // Lexicon

or: Challenging BIG-Bench Tasks And Whether Chain-Of-Thought Can Solve Them

A selection of 23 BIG-Bench tasks, called BIG-Bench Hard, is presented. They choose tasks where humans outperform the current SOTA, and discard few-shot tasks and multi-subtask challenges.

They use prompting (“Let’s think step by step.”) to request a chain-of-thought solution, which substantially improves performance on many of the tasks by several LLMs.