A team of Apple researchers has questioned the formal reasoning capabilities of large language models (LLMs), particularly in mathematics. They found that LLMs exhibit noticeable variance when responding to different instantiations of the same question.