{"id":51581,"date":"2025-03-24T16:31:01","date_gmt":"2025-03-24T11:01:01","guid":{"rendered":"http:\/\/officechai.com\/?p=51581"},"modified":"2025-03-24T17:19:20","modified_gmt":"2025-03-24T11:49:20","slug":"reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown","status":"publish","type":"post","link":"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/","title":{"rendered":"Reasoning Models Are Making AI Benchmarks Irrelevant: OpenAI&#8217;s Noam Brown"},"content":{"rendered":"\n<p>The top AI labs furiously compete among themselves to have the <a href=\"https:\/\/officechai.com\/ai\/chinese-company-deepseek-releases-r1-model-comparable-to-openais-o-1-with-90-cheaper-costs\/\">best possible results<\/a> on standard benchmarks, but they are leaving out an important factor in their calculations &#8212; the costs required to achieve their results. <\/p>\n\n\n\n<p>This crucial point was recently raised by Noam Brown, a research scientist at OpenAI, known for his groundbreaking work on AI systems for complex games like poker and Diplomacy. In a recent discussion, Brown argued that the traditional way of evaluating AI models\u2014simply looking at their performance on a benchmark\u2014is becoming obsolete, especially with the rise of increasingly powerful reasoning models. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" fetchpriority=\"high\" decoding=\"async\" width=\"640\" height=\"360\" src=\"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/03\/NoamBrown_2024S-embed-1024x576.jpg?resize=640%2C360\" alt=\"\" class=\"wp-image-51582\" srcset=\"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/03\/NoamBrown_2024S-embed-scaled.jpg?resize=1024%2C576&amp;ssl=1 1024w, https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/03\/NoamBrown_2024S-embed-scaled.jpg?resize=300%2C169&amp;ssl=1 300w, https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/03\/NoamBrown_2024S-embed-scaled.jpg?resize=768%2C432&amp;ssl=1 768w, https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/03\/NoamBrown_2024S-embed-scaled.jpg?resize=1536%2C864&amp;ssl=1 1536w, https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/03\/NoamBrown_2024S-embed-scaled.jpg?resize=2048%2C1152&amp;ssl=1 2048w, https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/03\/NoamBrown_2024S-embed-scaled.jpg?w=1280 1280w, https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/03\/NoamBrown_2024S-embed-scaled.jpg?w=1920 1920w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/figure>\n\n\n\n<p>&#8220;The notion of model intelligence &#8212; performance on a benchmark as a single number doesn&#8217;t really even make sense anymore,&#8221; Brown <a href=\"https:\/\/x.com\/vitrupo\/status\/1903215162896814360\">explains<\/a>. He continues, arguing that a more nuanced approach is required: &#8220;You have to think of it as intelligence per dollar or per token or something like that.&#8221; This cost-centric perspective is crucial, he argues, because compute time directly impacts performance. &#8220;If a model can think for a very long time,&#8221; Brown observes, &#8220;it&#8217;s going to do better on all these benchmarks.&#8221; This leads to a fundamental shift in how we should visualize AI progress, moving away from single data points and toward a more comprehensive understanding: &#8220;So you really have to think of it as a curve of intelligence versus cost curve.&#8221; And this curve, he emphasizes, &#8220;can be very steep&#8230; can be very high if you want to spend a lot.&#8221; This, according to Brown, is the trajectory of the future of AI development.<\/p>\n\n\n\n<p>But Brown says that even with their higher costs, reasoning models are much cheaper than equivalent tasks performed by humans.  &#8220;And that&#8217;s kind of the future that we&#8217;re headed towards, I think, when people look at these reasoning models and they think, &#8216;Oh, this thing is so expensive! Well, compared to what? You know, if you&#8217;re comparing it to GPT-4, then sure, it&#8217;s very expensive. But if you compare it to a human trying to do the same test, then it&#8217;s dirt cheap,&#8221; he says. This comparison, Brown asserts, is the one that truly matters, especially as AI capabilities continue to grow. &#8220;And that comparison to a human matters,&#8221; he continues, &#8220;is the intelligence grows. Once you have these models surpassing top humans in certain domains, you know, you think about how much the top human in the world would be paid to do a task. They command a big premium for that expertise.&#8221;<\/p>\n\n\n\n<p>&#8220;And when you have the models now having that expertise and they&#8217;re the fraction of the cost of a human, there&#8217;s a lot of value in that,&#8221; Brown says.<\/p>\n\n\n\n<p>In the recent past, there have been some impressive improvement in many AI benchmarks. In the ARC-AGI benchmark for instance, OpenAI&#8217;s o3 model had <a href=\"https:\/\/officechai.com\/startups\/openai-o3-agi\/\">performed <\/a>much better than previous approaches. But the model had also spent a lot of tokens &#8212; as much as <a href=\"https:\/\/www.reddit.com\/r\/singularity\/comments\/1hisp7o\/o3_high_compute_costs_is_insane_3000_for_a_single\/#:~:text=Balance%2D-,o3%20high%20compute%20costs%20is%20insane%3A%20%243000%2B%20for%20a%20single,USD%20to%20run%20the%20benchmark.\">$3,000 per task<\/a> &#8212; reasoning through its answers, making it much more expensive than previous AI approaches. Brown suggests that instead of focusing simply on benchmark results, it could be better to also include the costs that were required to generate those results. And with reasoning models now becoming commonplace, this holistic approach could give a better estimation of future AI progress.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The top AI labs furiously compete among themselves to have the best possible results on standard benchmarks, but they are leaving out an&#8230;<\/p>\n","protected":false},"author":1,"featured_media":51582,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1029],"tags":[],"class_list":["post-51581","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Reasoning Models Are Making AI Benchmarks Irrelevant: OpenAI&#039;s Noam Brown<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Reasoning Models Are Making AI Benchmarks Irrelevant: OpenAI&#039;s Noam Brown\" \/>\n<meta property=\"og:description\" content=\"The top AI labs furiously compete among themselves to have the best possible results on standard benchmarks, but they are leaving out an...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/\" \/>\n<meta property=\"og:site_name\" content=\"OfficeChai\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/OfficeChai\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-03-24T11:01:01+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-03-24T11:49:20+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/03\/NoamBrown_2024S-embed-scaled.jpg?fit=2560%2C1440&ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1440\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"OfficeChai Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@OfficeChai\" \/>\n<meta name=\"twitter:site\" content=\"@OfficeChai\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"OfficeChai Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/\",\"url\":\"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/\",\"name\":\"Reasoning Models Are Making AI Benchmarks Irrelevant: OpenAI's Noam Brown\",\"isPartOf\":{\"@id\":\"https:\/\/officechai.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/03\/NoamBrown_2024S-embed-scaled.jpg?fit=2560%2C1440&ssl=1\",\"datePublished\":\"2025-03-24T11:01:01+00:00\",\"dateModified\":\"2025-03-24T11:49:20+00:00\",\"author\":{\"@id\":\"https:\/\/officechai.com\/#\/schema\/person\/5861f1134993293cc28905de7624d6b2\"},\"breadcrumb\":{\"@id\":\"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/#primaryimage\",\"url\":\"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/03\/NoamBrown_2024S-embed-scaled.jpg?fit=2560%2C1440&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/03\/NoamBrown_2024S-embed-scaled.jpg?fit=2560%2C1440&ssl=1\",\"width\":2560,\"height\":1440},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/officechai.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Reasoning Models Are Making AI Benchmarks Irrelevant: OpenAI&#8217;s Noam Brown\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/officechai.com\/#website\",\"url\":\"https:\/\/officechai.com\/\",\"name\":\"OfficeChai\",\"description\":\"Startups, Businesses And Careers\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/officechai.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/officechai.com\/#\/schema\/person\/5861f1134993293cc28905de7624d6b2\",\"name\":\"OfficeChai Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/officechai.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/61d744733248dc647d505d0676bb425323413132ee5447e86aa8eecbbb7b27d5?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/61d744733248dc647d505d0676bb425323413132ee5447e86aa8eecbbb7b27d5?s=96&d=mm&r=g\",\"caption\":\"OfficeChai Team\"},\"description\":\"Dotting the i's, crossing the t's.\",\"url\":\"https:\/\/officechai.com\/author\/admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Reasoning Models Are Making AI Benchmarks Irrelevant: OpenAI's Noam Brown","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/","og_locale":"en_US","og_type":"article","og_title":"Reasoning Models Are Making AI Benchmarks Irrelevant: OpenAI's Noam Brown","og_description":"The top AI labs furiously compete among themselves to have the best possible results on standard benchmarks, but they are leaving out an...","og_url":"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/","og_site_name":"OfficeChai","article_publisher":"https:\/\/www.facebook.com\/OfficeChai\/","article_published_time":"2025-03-24T11:01:01+00:00","article_modified_time":"2025-03-24T11:49:20+00:00","og_image":[{"width":2560,"height":1440,"url":"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/03\/NoamBrown_2024S-embed-scaled.jpg?fit=2560%2C1440&ssl=1","type":"image\/jpeg"}],"author":"OfficeChai Team","twitter_card":"summary_large_image","twitter_creator":"@OfficeChai","twitter_site":"@OfficeChai","twitter_misc":{"Written by":"OfficeChai Team","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/","url":"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/","name":"Reasoning Models Are Making AI Benchmarks Irrelevant: OpenAI's Noam Brown","isPartOf":{"@id":"https:\/\/officechai.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/#primaryimage"},"image":{"@id":"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/03\/NoamBrown_2024S-embed-scaled.jpg?fit=2560%2C1440&ssl=1","datePublished":"2025-03-24T11:01:01+00:00","dateModified":"2025-03-24T11:49:20+00:00","author":{"@id":"https:\/\/officechai.com\/#\/schema\/person\/5861f1134993293cc28905de7624d6b2"},"breadcrumb":{"@id":"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/#primaryimage","url":"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/03\/NoamBrown_2024S-embed-scaled.jpg?fit=2560%2C1440&ssl=1","contentUrl":"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/03\/NoamBrown_2024S-embed-scaled.jpg?fit=2560%2C1440&ssl=1","width":2560,"height":1440},{"@type":"BreadcrumbList","@id":"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/officechai.com\/"},{"@type":"ListItem","position":2,"name":"Reasoning Models Are Making AI Benchmarks Irrelevant: OpenAI&#8217;s Noam Brown"}]},{"@type":"WebSite","@id":"https:\/\/officechai.com\/#website","url":"https:\/\/officechai.com\/","name":"OfficeChai","description":"Startups, Businesses And Careers","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/officechai.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/officechai.com\/#\/schema\/person\/5861f1134993293cc28905de7624d6b2","name":"OfficeChai Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/officechai.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/61d744733248dc647d505d0676bb425323413132ee5447e86aa8eecbbb7b27d5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/61d744733248dc647d505d0676bb425323413132ee5447e86aa8eecbbb7b27d5?s=96&d=mm&r=g","caption":"OfficeChai Team"},"description":"Dotting the i's, crossing the t's.","url":"https:\/\/officechai.com\/author\/admin\/"}]}},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/03\/NoamBrown_2024S-embed-scaled.jpg?fit=2560%2C1440&ssl=1","jetpack_shortlink":"https:\/\/wp.me\/p685C6-dpX","jetpack_likes_enabled":true,"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/posts\/51581","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/comments?post=51581"}],"version-history":[{"count":2,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/posts\/51581\/revisions"}],"predecessor-version":[{"id":51587,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/posts\/51581\/revisions\/51587"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/media\/51582"}],"wp:attachment":[{"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/media?parent=51581"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/categories?post=51581"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/tags?post=51581"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}