Leap Labs reposted this
We gave Claude a frontier science task: analyse experimental catalyst data (Meta OCx24) and give us three insights about what makes a good CO₂ reduction catalyst. On the surface, Claude looked pretty smart – as requested, we got three insights about catalyst composition, complete with plots to back them up and relevant references. But there was one big problem: none of the ‘insights’ were actually supported by the data. This isn’t Claude’s fault! All LLMs do this – and humans do it too. All too often we look for the patterns that we expect to see, overgeneralise, or fixate on outliers – and miss real insight. So we tried again – same dataset, same prompt – but this time gave Claude access to Leap's Discovery Engine. Our tool for systematically finding novel patterns in data. The result was quite different. We found many non-obvious relationships (e.g. negative synergies between certain metals), and Claude did a spectacular job of synthesising the results from Discovery Engine and providing useful guidelines for catalyst composition. This is a toy example, with one model, single-shot prompted, on a single dataset. But I think it points to an important lesson: if we want LLMs to help with science, we need to give them the right scaffolding. Language models need data-driven discovery just as much as humans do.