OpenAI Releases LifeSciBench Benchmark for Life Science AI
LifeSciBench shifts AI evaluation from narrow question-and-answer tests to real-world scientific workflows, setting a new standard for measuring practical utility in research.
Reporting from 1 sources: GIGAZINE.
OpenAI announced LifeSciBench, a benchmark test measuring how useful AI is for life science researchers. Developed with 173 scientists, it includes 750 tasks and 1,062 attachments, evaluating AI on criteria like reasoning and detail level. OpenAI also released a report on GPT-5.4 assisting drug discovery.
OpenAI released LifeSciBench on June 17, a benchmark designed to measure how useful AI is for life science researchers. Unlike traditional science tests that focus on narrow domain knowledge or clear-answer Q&A formats, LifeSciBench classifies daily scientist tasks into seven categories including handling scientific evidence, analysis, design and optimization, and scientific communication. OpenAI worked with 173 scientists in biotechnology and drug discovery to create the tasks, each structured as a scientist requesting help from a knowledgeable collaborator.
The benchmark gives AI models 750 tasks with 1,062 attachments such as figures, tables, and chemical structure files, with 53% of tasks requiring at least one attachment. Responses are scored on criteria including appropriate detail, correct reasoning, and correct format. The highest-scoring model was GPT-Rosalind, OpenAI's science-specialized AI based on GPT-5.5, which outperformed GPT-5.5 across all seven categories. On the same day, OpenAI also published a report on GPT-5.4 assisting drug discovery research.
Synthesized by Yomimono from the 1 cited source below, including Japanese-language reporting where cited, then editorially reviewed before publishing.