OpenAI dropped the price of o3 by 80%

497 points by mfiguiere 2 days ago

34679 2 days ago

I'd like to offer a cautionary tale that involves my experience after seeing this post.

First, I tried enabling o3 via OpenRouter since I have credits with them already. I was met with the following:

"OpenAI requires bringing your own API key to use o3 over the API. Set up here: https://openrouter.ai/settings/integrations"

So I decided I would buy some API credits with my OpenAI account. I ponied up $20 and started Aider with my new API key set and o3 as the model. I get the following after sending a request:

"litellm.NotFoundError: OpenAIException - Your organization must be verified to use the model `o3`. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization. If you just verified, it can take up to 15 minutes for access to propagate."

At that point, the frustration was beginning to creep in. I returned to OpenAI and clicked on "Verify Organization". It turns out, "Verify Organization" actually means "Verify Personal Identity With Third Party" because I was given the following:

"To verify this organization, you’ll need to complete an identity check using our partner Persona."

Sigh I click "Start ID Check" and it opens a new tab for their "partner" Persona. The initial fine print says:

"By filling the checkbox below, you consent to Persona, OpenAI’s vendor, collecting, using, and utilizing its service providers to process your biometric information to verify your identity, identify fraud, and conduct quality assurance for Persona’s platform in accordance with its Privacy Policy and OpenAI’s privacy policy. Your biometric information will be stored for no more than 1 year."

OK, so now, we've gone from "I guess I'll give OpenAI a few bucks for API access" to "I need to verify my organization" to "There's no way in hell I'm agreeing to provide biometric data to a 3rd party I've never heard of that's a 'partner' of the largest AI company and Worldcoin founder. How do I get my $20 back?"

leetrout 2 days ago

I actually contacted the California AG to get a refund from another AI company after they failed to refund me.
The AG office followed up and I got my refund. Worth my time to file because we should stop letting companies get away with this stuff where they show up with more requirements after paying.
Separately they also do not need my phone number after having my name, address and credit card.
Has anyone got info on why they are taking everyone’s phone number?
- jazzyjackson 2 days ago
  
  (having no insider info:) Because it can be used as a primary key ID across aggregated marketing databases including your voting history / party affiliation, income levels, personality and risk profiles etc etc etc. If a company wants to, and your data hygiene hasn't been tip top, your phone number is a pointer to a ton of intimate if not confidential data. Twitter was fined $150 million for asking for phone numbers under pretense of "protecting your account" or whatever but they actually used it for ad targeting.
  >> Wednesday's 9th Circuit decision grew out of revelations that between 2013 and 2019, X mistakenly incorporated users' email addresses and phone numbers into an ad platform that allows companies to use their own marketing lists to target ads on the social platform.
  >> In 2022, the Federal Trade Commission fined X $150 million over the privacy gaffe.
  >> That same year, Washington resident Glen Morgan brought a class-action complaint against the company. He alleged that the ad-targeting glitch violated a Washington law prohibiting anyone from using “fraudulent, deceptive, or false means” to obtain telephone records of state residents.
  >> X urged Dimke to dismiss Morgan's complaint for several reasons. Among other arguments, the company argued merely obtaining a user's phone number from him or her doesn't violate the state pretexting law, which refers to telephone “records.”
  >> “If the legislature meant for 'telephone record' to include something as basic as the user’s own number, it surely would have said as much,” X argued in a written motion.
  https://www.mediapost.com/publications/article/405501/None
  - sgarland a day ago
    
    Tangential: please do not use a phone number as a PK. Aside from the nightmare of normalizing them, there is zero guarantee that someone will keep the same number.
    
    godelski a day ago
    
    Also fun fact, people mistype and mistranscribe data. Some people even... put down fake phone numbers because they don't want to be tracked!
    I would think in a world where we constantly get spam calls and texts that people would understand that a phone number is not a good PKI. I mean we literally don't answer calls from unknown numbers because of this. How is it that we can only look at these things in one direction but not the other?
    
    ponector a day ago
    
    Even better: phone numbers are redistributed after some time of inactivity.
  - azinman2 2 days ago
    
    OpenAI doesn’t (currently) sell ads. I really cannot see a world where they’re wanting to sell ads to their API users only? It’s not like you need a phone number to use ChatGPT.
    To me the obvious example is fraud/abuse protection.
    
    jazzyjackson a day ago
    
    You're thinking ads are to advertise products. Ads are to modify behavior to make you more likely to buy products.
    ChatGPT has the capacity to modify behavior more subtly than any advertising ever devised. Aggregating knowledge on the person on the other end of the line is key in knowing how to nudge them toward the target behavior. (Note this target behavior may be how to vote in an election, or how to feel about various hot topics.)
    
    ethbr1 a day ago
    
    > Aggregating knowledge on the person on the other end of the line is key in knowing how to nudge them toward the target behavior.
    It also, as Google learned, enables you to increase your revenue per placement. Advertisers will pay more for placement with their desired audience.
    
    lcnPylGDnU4H9OF a day ago
    
    > It’s not like you need a phone number to use ChatGPT.
    I’m pretty sure you do. Claude too. The only chatbot company I’ve made an account with is Mistral specifically because a phone number was not a registration requirement.
    
    ax0ar a day ago
    
    They also require it now.
    
    godelski a day ago
    
    > To me the obvious example is fraud/abuse protection.
    Phones are notorious for spam...
    Seriously. How can the most prolific means of spam be used to prevent fraud and abuse? (Okay, maybe email is a little more prolific?) Like have you never received a spam call or text? Obviously fraudsters and abusers know how to exploit those systems... it can't be more obvious...
    
    azinman2 19 hours ago
    
    It costs money to get a phone number. It’s about friction, not elimination.
    What would you do instead?
    
    godelski 13 hours ago
    
    I'm saying it clearly isn't enough friction. It's not worth the privacy cost. Which let's be real, those numbers are then commonly sold to those same spammers, even if indirectly.
    You are also forgetting it is easy to mask, obscure, and hijack numbers. So it doesn't cost money per number, many times they can use their own number.
    There isn't a universal solution, which is the main problem here. Sometimes numbers make sense, most of the time not.
    
    azinman2 8 hours ago
    
    Ok so you don’t like phone numbers. I get it.
    But you’re OpenAI. You need to do _something_. What do you do?
    
    jazzyjackson 5 hours ago
    
    What are they trying to prevent again? Requiring a phone number is one kind of friction for free services like twitter, but this is a service where a user registers a credit card and authorizes charges, they have the legal name of the person paying for a service, what's the phone number for? It's not like OpenAI gives me their phone number so I can call when I'm having an issue.
    
    KomoD a day ago
    
    > It’s not like you need a phone number to use ChatGPT.
    When I signed up I had to do exactly that.
    
    prmoustache a day ago
    
    The fact they don't sell ads doesn't mean they are not in the business of selling users data to third parties.
    Also Netflix wasn't initially selling ads and there you have after increasing the price of their plans drastically in the last few years the ad supported subscription is probably the #1 plans because most people aren't willing to shed 15 to 25usd/€ every month to watch content that is already littered with ads.
    
    fsmv a day ago
    
    If you sell ads you're actually incentivised not to sell data because then your competitors would be able to reach your users without paying you
    
    ethbr1 a day ago
    
    You're incentivized not to sell targeting data, but you're very incentivized to collect and maintain as much of it as you can, and then offer access using it as a service.
    So, at the end of your day, company X has an overdetailed profile of you, rather than each advertiser. (And also, at least in the US, can repackage and sell that data into various products if it chooses)
    
    hnaccount_rng 2 days ago
    
    They don’t need to. It’s totally sufficient that they can correlate your chat history with your identity. That makes other identifiers more valuable, if they can extract your interests
    
    cmenge a day ago
    
    The typical use case of an API is not that you personally use it. I have hundreds of clients all go through my API key, and in most cases they themselves are companies who have n clients.
    
    brookst a day ago
    
    It’s a good conspiracy theory, but of course it’s scoped to only ChatGPT users who are also developers and using specifically the o3 model via API. So if it is a conspiracy, it’s a fairly non-ambitious one.
    
    hshdhdhj4444 a day ago
    
    They may not sell ads.
    They may still buy data from ad companies and store credit cards, etc.
    Many of them link users based on phone number.
    
    azinman2 a day ago
    
    But to do what with api users? Most api users won’t be individuals…
    
    KomoD a day ago
    
    I bet there's way more individuals than companies that use the API
    
    codedokode a day ago
    
    Obvious goal is to know the identity of users.
- giancarlostoro a day ago
  
  Thank you for this comment… a relative of mine spent a ton of money on an AI product that never came a license he cannot use. I told him to contact his states AG just in case.
- pembrook a day ago
  
  Source: have dealt with fraud at scale before.
  Phone number is the only way to reliably stop MOST abuse on a freemium product that doesn't require payment/identity verification upfront. You can easily block VOIP numbers and ensure the person connected to this number is paying for an actual phone plan, which cuts down dramatically on bogus accounts.
  Hence why even Facebook requires a unique, non-VOIP phone number to create an account these days.
  I'm sure this comment will get downvoted in favor of some other conspiratorial "because they're going to secretly sell my data!" tinfoil post (this is HN of course). But my explanation is the actual reason.
  I would love if I could just use email to signup for free accounts everywhere still, but it's just too easily gamed at scale.
  - LexGray a day ago
    
    On the flip side it makes a company seem sparklingly inept when they use VOIP as a method to filter valid users. I haven’t done business with companies like Netflix or Uber because I don’t feel like paying AT&T a cut for identity verification. There are plenty of other methods like digital licenses which are both more secure and with better privacy protections.
    
    pembrook a day ago
    
    I wish we could all agree on a better way of auth -- but unfortunately this is all we have. Asking normal people to do anything outside of phone number or email (or 'login with [other account based on phone number or email]' for auth is basically impossible.
  - exceptione a day ago
    
    Maybe they should look into a non-freemium business model. But that won't happen because they want to have as much personal data as possible.
    - Parent talks about a paid product. If they wants to burn tokens, they are going to pay for it.
    - Those phone requirements do not stop professional abusers, organized crime nor state sponsored groups. Case in point: twitter is overrun by bots, scammers and foreign info-ops swarms.
    - Phone requirements might hinder non-professional abusers at best, but we are sidestepping the issue if those corporations deserve that much trust to compel regular users to sell themselves. Maybe the business model just sucks.
    
    pembrook a day ago
    
    I don't like requiring phone numbers either, but saying OpenAI shouldn't do freemium model for hottest tech product of this century (AI) is a fundamental misunderstanding of how humans and the world works.
    Also, if they don't do freemium they're getting way more valuable information about you than just a phone number.
    
    jimmydorry a day ago
    
    What part of this thread relates to freemium? Use of the API requires tokens that are paid for. General use of the AI via the web interface does not require a phone number.
    Only requiring the phone number for API users feels needlessly invasive and is not explained by a vague "countering fraud and abuse" for a paid product...
  - SheinH a day ago
    
    The discussion wasn't about freemium products though. Someone mentioned that they paid 20 bucks for OpenAI's API already and then they were asked for more verification.
  - juros 15 hours ago
    
    Personally I found that rejecting disposable/temporary emails and flagging requests behind VPNs filtered out 99% of abuse on my sites.
    No need to ask for a phone or card -- or worse, biometric data! -- which also removes friction.
  - AnthonyMouse a day ago
    
    > I'm sure this comment will get downvoted in favor of some other conspiratorial "because they're going to secretly sell my data!" tinfoil post (this is HN of course). But my explanation is the actual reason.
    Your explanation is inconsistent with the link in these comments showing Twitter getting fined for doing the opposite.
    > Hence why even Facebook requires a unique, non-VOIP phone number to create an account these days.
    Facebook is the company most known for disingenuous tracking schemes. They just got caught with their app running a service on localhost to provide tracking IDs to random shady third party websites.
    > You can easily block VOIP numbers and ensure the person connected to this number is paying for an actual phone plan, which cuts down dramatically on bogus accounts.
    There isn't any such thing as a "VOIP number", all phone numbers are phone numbers. There are only some profiteers claiming they can tell you that in exchange for money. Between MVNOs, small carriers, forwarding services, number portability, data inaccuracy and foreign users, those databases are practically random number generators with massive false positive rates.
    Meanwhile major carriers are more than happy to give phone numbers in their ranges to spammers in bulk, to the point that this is now acting as a profit center for the spammers and allowing them to expand their spamming operations because they can get a large number of phone numbers those services claim aren't "VOIP numbers", use them for spamming the services they want to spam, and then sell cheap or ad-supported SMS service at a profit to other spammers or privacy-conscious people who want to sign up for a service they haven't used that number at yet.
  - pixxel 9 hours ago
    
    [dead]
charliebwrites 2 days ago

Doesn’t Sam Altman own a crypto currency company [1] that specifically collects biometric data to identify people?
Seems familiar…
[1] https://www.forbes.com/advisor/investing/cryptocurrency/what...
- jjani 2 days ago
  
  GP did mention this :)
  > I've never heard of that's a 'partner' of the largest AI company and Worldcoin founder
- 93po a day ago
  
  the core tech and premise doesnt collect biometric data, but biometric data is collected for training purposes with consent and compensation. There is endless misinformation (willfully and ignorantly) around worldcoin but it is not, at its core, a biometric collection company
  - malfist a day ago
    
    Collecting biometrics for training purposes is still collecting biometrics.
ddtaylor 2 days ago

I also am using OpenRouter because OpenAI isn't a great fit for me. I also stopped using OpenAI because they expire your API credits even if you don't use them. Yeah, it's only $10, but I'm not spending another dime with them.
- numlocked a day ago
  
  Hi - I'm the COO of OpenRouter. In practice we don't expire the credits, but have to reserve the right to, or else we have a uncapped liability literally forever. Can't operate that way :) Everyone who issues credits on a platform has to have some way of expiring them. It's not a profit center for us, or part of our P&L; just a protection we have to have.
  - mitthrowaway2 a day ago
    
    If you're worried about the unlimited liability, how about you refund the credits instead of expiring them?
    
    kfrane a day ago
    
    From my experience with billing systems it is usually not possible to refund a transaction after 6 or 12 months.
    
    gotimo a day ago
    
    even possible with a some of them, but even in that case they're usually not "refunding" as much as they're just "making a new transaction for the same anount the other way" which does the same at the surface until reversals, voids or rejections happen and it all becomes a mess.
    
    diggan a day ago
    
    Seems like a weird question to ask OpenRouter Inc, a for-profit company.
    Really a shame OpenAI left their non-profit (and open) roots, could have been something different but nope, the machine ate them whole.
    
    azemetre a day ago
    
    Why is it a bad thing to ask for a company to do right by their paid customers? This type of policy absolutely causes the company to lose more business in the future because it shows customers that they don't care about customers.
    I never heard of OpenRouter prior to this thread, but will now never use them and advocate they never be used either.
    
    mitthrowaway2 a day ago
    
    A fair refund policy is not in conflict with a company being for-profit. I (and it seems many others) would be much less inclined to buy credits from a company that will expire them if I don't use it, and more inclined to buy credits from a service that will refund them if I end up not using it. Once I've bought them I'm more likely to use them. And in addition to reducing that purchasing friction and gaining market share, they can get the time-value of the money between when I bought the credits and when they eventually refund them.
    Enlightened self-interest is when you realize that you win by being good to your customers, instead of treating customer service like a zero-sum game.
  - otterley a day ago
    
    Out of curiosity, what makes you different from a retailer or restaurant that has the same problem?
  - carstenhag a day ago
    
    Why only 365 days? Would be way fairer and still ok for you (if it's such a big issue) to expire them after 5 years.
- bonki 2 days ago
  
  I wonder if they do this everywhere, in certain jurisdictions this is illegal.
- johnnyyyy a day ago
  
  then you shouldn’t use OpenRouter. ToS: 4.2 Credit Expiration; Auto Recharge OpenRouter reserves the right to expire unused credits three hundred sixty-five (365) days after purchase
- cactusplant7374 2 days ago
  
  That is so sleezy.
- cedws 2 days ago
  
  After how long do they expire?
  - zeograd 2 days ago
    
    IIRC, 1 year
    
    0xdeafbeef 2 days ago
    
    Same for anthropic
Marsymars 2 days ago

Oh I also recently got locked out of my linkedin account until I supply data to Persona.
(So I’m remaining locked out of my linkedin account.)
csomar 2 days ago

> How do I get my $20 back?
Contact support and ask for a refund. Then a charge back.
baq 2 days ago

Meanwhile the FSB and Mossad happily generate fake identities on demand.
- romanovcode 2 days ago
  
  The whole point of identity verification is for the same Mossad to gather your complete profile and everything else they can from OpenAI.
  Since Mossad and CIA is essentially one organization they already do it, 100%.
AstroBen 2 days ago

KYC requirement + OpenAI preserving all logs in the same week?
- jjani 2 days ago
  
  OpenAI introduced this with the public availability of o3, so no.
  It's also the only LLM provider which has this.
  What OpenAI has that the others don't is SamA's insatiable thirst for everyone's biometric data.
- mycall 2 days ago
  
  I think KYC has been beaten by AI agents according to RepliBench [0] as obtaining compute requires KYC which has a high success rate in the graphic.
  [0] https://www.aisi.gov.uk/work/replibench-measuring-autonomous...
- infecto 2 days ago
  
  KYC has been around for a few months I believe. Whenever they released some of the additional thought logs you had to be verified.
shmoogy a day ago

I was excited about trying o3 for my apps but I'm not doing this validation.. thanks for the heads up.
bratao 2 days ago

You are even luck to be able to verify. Mine give me an error about "Session expired" for months!! Support do not reply.
conradev 2 days ago

I was more excited by the process, like, there exists a model out there so powerful it requires KYC
which, after using it, fair! It found a zero day
- __float 2 days ago
  
  I think they're probably more concerned about fake accounts and people finding ways to get free stuff.
  - gscott 2 days ago
    
    China is training their AI models using ChatGPT. They want to stop or slow that down.
    
    olalonde a day ago
    
    Why? It seems counterproductive given OpenAI's mission statement: "We are building safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome."
    
    stavros a day ago
    
    Because OpenAI's actual mission statement is "money, and lots of it".
    
    conradev a day ago
    
    Yeah. I think they need money to deliver AGI
  - conradev 2 days ago
    
    I actually think they’re worried about foreign actors using it for…
    - generating synthetic data to train their own models
    - hacking and exploitation research
    etc
  - abeindoria 2 days ago
    
    What free stuff? It requires a paid API.
    
    DrammBA 2 days ago
    
    With no intention to tarnish your pure world view, paid services with low registration requirements are ideal for account laundering and subscription fraud with stolen credit cards
- hiatus 2 days ago
  
  > which, after using it, fair! It found a zero day
  Source?
  - sothatsit 2 days ago
    
    Recently, Sean Heelan wrote a post "How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation". It might be what they are referring to.
    Link: https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-...
    
    conradev 2 days ago
    
    Yep, that’s the one!
- artrockalter 2 days ago
  
  [flagged]
  - jjani 2 days ago
    
    Bit strange to see such new accounts already priming everyone to think this is acceptable and normal, which it isn't, hence OA having started doing this months ago and still not a single other provider has it, despite offering just as powerful models. It's not like Google and Anthropic have launched their own Worldcoin either.
    
    SunlitCat 2 days ago
    
    Especially what does it have to do with that "these models are getting more powerful"?
    
    therein 2 days ago
    
    Basically the earmark of a narrative OpenAI loves to push covertly, and often overtly. I wouldn't be shocked if these are bots.
  - conradev 2 days ago
    
    There is an ISO standard for digital ID, the same one used by the TSA, and it’s coming to the web soon! Human verification is going to continue to grow in importance.
    
    hhh 2 days ago
    
    which standard is that?
    
    conradev 2 days ago
    
    https://www.aamva.org/publications-news/aamva-news/aamva-spe...
5Qn8mNbc2FNCiVV 2 days ago

This feels eerily similar to a post I've read a within the last month. Either I'm having a deja vu, it's a coincidence that the same exact story is mentioned or theres something else going on
- bgwalter 2 days ago
  
  What should be going on? A regular Google search for "openai persona verify organization" shows withpersona.com in the second search result.
  - 5Qn8mNbc2FNCiVV 2 days ago
    
    Yeah ok guess I misremembered it a bit but I was curious too and found the previous one I've thought of: https://news.ycombinator.com/item?id=43795406
    
    Barbing 2 days ago
    
    Good eye! In this case, complaints are numerous. See a web search for:
    openai persona verification site:community[.]openai[.]com
    e.g. a thread with 36 posts beginning Apr 13:
    "OpenAI Non-Announcement: Requiring identity card verification for access to new API models and capabilities"
    But always good to be on look out for shenanigans :)
- Retric 2 days ago
  
  This is OpenAI’s fairly dystopian process, so the exact same thing happens to lots of people.
- xboxnolifes a day ago
  
  This is just the process for OpenAI. It's the same process I went through as well.
- verisimi a day ago
  
  It's a concerted attempt to de-anonymise the internet. Corporate entities are jostling for position as id authorities.
fakedang 2 days ago

As someone not in the US, I do a straight nope out whenever I see a Persona request. I advise everyone else to do the same. Afaik, it's used by LinkedIn and Doordash too.
cess11 2 days ago

I suspect their data collection might not be legal in the EU.
https://withpersona.com/legal/privacy-policy
To me it looks like an extremely aggressive data pump.
- wqaatwt 2 days ago
  
  There are stories about e.g. Hetzner requiring all sorts of data from people who want to open/verify accounts so perhaps not. Might just be an anti “money laundering” thing. Especially if the credit card company ends up refunding everything..
  - Hetzner_OL 10 hours ago
    
    Hi there, During our KYC process, we do sometimes ask customers to submit additional information, including IDs, so that we can verify their identity. However, we follow the General Data Protection Regulation in the EU, amongst other regulations. So we only keep that data for the account verification process. We also have a data protection officer and team who can answer questions potential customers have about data protection measures that we take. Their contact information is here: https://www.hetzner.com/legal/privacy-policy/ --Katie, Hetzner Online
  - 7bit 2 days ago
    
    What stories? Can you back up that claim with some sources please?
    
    zuppy a day ago
    
    Infomaniak did request my personal id or passport for creating a company account. I'm not going to provide any extra source for you, this is what happened to me, you can either believe it or not.
    
    TiredOfLife 2 days ago
    
    https://www.reddit.com/r/hetzner/search?q=id
    
    7bit a day ago
    
    That's not a source and it is not my responsibility to backup the claims you made. That is yours. If you don't have any sources, and admit to just saying things that are not probable, I can also live with that.
    
    wut42 a day ago
    
    Hetzner is famously notorious for this, but not enough for publications to pick up this. So by your definitions, YEARS of people talking about their experiences with this is nothing?
    
    wqaatwt a day ago
    
    What’s the source on OpenAI doing the same? How is it anymore legitimate in anyway?
    Which kind of would make the entire “discussion” moot and pointless
predkambrij a day ago

Interesting, it works for me through openrouter, without configured openai integration. Although, I have openai account and did verification with openai before. Conspiricy theory would say that they are exchanging PII so openrouter knows who am I :)
coderatlarge 2 days ago

this reminds me of how broadcom maintains the “free” tier of vmware.
- teruakohatu 2 days ago
  
  Can you explain? Is it not actually free?
  - coderatlarge a day ago
    
    there are so many non-functional websites and signups required to get to the end of the rainbow that any sane person quits well before getting to any freely distributed software, if, in fact, there still is some.
nateburke 2 days ago

I don't recall Persona being in the mix last year when I signed up. Interesting development.
- gwhr 2 days ago
  
  I think it was a recent thing [1], but I thought they were only considering it
  [1] https://techcrunch.com/2025/04/13/access-to-future-ai-models...
righthand a day ago

Thank you for reminding me why I’ll never touch these LLM services.
sneak 2 days ago

Yeah, same. I am a paying API customer but I am not doing biometric KYC to talk to a bot.
gloosx 2 days ago

>ID Check
Just send them a random passport photo from the Internet, what's the deal? Probably they are just vibe-verifying the photo with "Is it legit passport?" prompt anyways.
- _joel 2 days ago
  
  I think modern face verification has moved on, it's been video in all my encounters.
  - gloosx 2 days ago
    
    still no real human is involved, as they mention their verification is automated and prohabilistic — which is especially funny to hear in context of verification. Im pretty sure even a kid can go around it, e.g. on the video showing a photo of a person holding his passport which you can find online.
    
    tethys 2 days ago
    
    No. You have to turn your head, and stuff. Also, even if this would work, they allow only one verification per person per 90 days.
    
    gloosx a day ago
    
    >one verification per person per 90 days.
    this is absurd, how do they define "person"? On the internet I can be another person from another country in a minute, another minute I will be a different person from a different country.
    
    _joel a day ago
    
    No, that's not how it works.
    
    gloosx a day ago
    
    So your saying it's not just feeding your video to the AI model and blindly trusting it's outcome? Any evidence how it works then?
    
    _joel 14 hours ago
    
    You can't just hold up a 2d object (passport) and wave it about to try and trick it. There are heuristics at work.
    
    gloosx 11 hours ago
    
    For a regular camera - all objects are 2d, it is not equipped with tools to capture depth. What heuristics are you talking about? There is ML model at work which tries to tell whether the object is legit, but it cannot have any real sense of what is on the image: it just relies on a statistically plausible outcome when being fed pixels from your camera screen, which means you definitely can trick it.
- sneak 2 days ago
  
  It requires video and an app. They are collecting facial biometrics.
  - gloosx 2 days ago
    
    App? So you cannot verify without a mobile phone?
    
    sneak a day ago
    
    You can’t sign up in the first place without a mobile phone number. They require a phone number and block VoIP services.
    
    gloosx a day ago
    
    Real mobile phone number to receive a code is 10¢. Maybe a bit more, but certainly not more than a dollar.
askl 2 days ago

Crazy, I already gave up registering for chatgpt because they asked for my phone number. I'm not giving that to any random startup.
belter a day ago

With all this plus the saving of all chats they can't operate on the EU. But they do ....
exceptione a day ago

Welcome to tech dystopia.
Hello Human Resource, we have all your data, please upload your bio-metric identity, as well as your personal thoughts.
Building the next phase of a corporate totalitarian state, thank you for your cooperation.
verisimi a day ago

> OK, so now, we've gone from "I guess I'll give OpenAI a few bucks for API access" to "I need to verify my organization" to "There's no way in hell I'm agreeing to provide biometric data to a 3rd party I've never heard of that's a 'partner' of the largest AI company and Worldcoin founder. How do I get my $20 back?"
This should be illegal. How many are going to do the same as you, but then think that the effort/time/hassle they would waste to try to get their money back would not be worth it? At which point you've effectively donated money to a corp that implements anti-consumer anti-patterns.
jiggawatts 2 days ago

This is in part "abuse prevention"[1] and in part marketing. Making customers feel like they're signing up to access state secrets makes the models seem more "special". Sama is well known to use these SV marketing tricks, like invite-only access, waiting lists, etc to psychologically manipulate users into thinking they're begging for entry to an exclusive club instead of just swiping a credit card to access an API.
Google tried this with Google Plus and Google Wave, failed spectacularly, and have ironically stopped with this idiotic "marketing by blocking potential users". I can access Gemini Pro 2.5 without providing a blood sample or signing parchment in triplicate.
[1] Not really though, because a significant percentage of OpenAI's revenue is from spammers and bulk-generation of SOE-optimised garbage. Those are valued customers!
- miki123211 2 days ago
  
  Gemini doesn't give you reasoning via API though, at least as far as I'm aware.
  - jjani 2 days ago
    
    If by reasoning you mean showing CoT, Gemini and OA are the same in this regard - neither provides it, not through the UI nor through the API. The "summaries" both provide have zero value and should be treated as non-existent.
    Anthropic exposes reasoning, which has become a big reason to use them for reasoning tasks over the other two despite their pricing. Rather ironic when the other two have been pushing reasoning much harder.
    
    finebalance 2 days ago
    
    Google exposes their reasoning. You can use their new gemini python sdk to get thought traces.
    
    jjani a day ago
    
    Google does not expose their reasoning any more. They give "thought summaries" which provide effectively zero value. [1][2]
    [1] https://ai.google.dev/gemini-api/docs/thinking#summaries [2] https://discuss.ai.google.dev/t/massive-regression-detailed-...
  - jiggawatts 2 days ago
    
    Works for me?
    Maybe you’re thinking of deep research mode which is web UI only for now.
- paulcole 2 days ago
  
  HN Don’t Hate Marketing Challenge
  Difficulty: Impossible

34679 2 days ago

Follow-up. I pasted the above into ChatGPT because I genuinely wanted to know how to get my 20 bucks back. That request is in and they say I'll get a response in 2-3 days. I then asked it "What are the potential risks of agreeing to their terms?" Here's the response:

Agreeing to Persona’s terms, especially for biometric identity verification, involves both privacy and long-term data security risks. Here’s a clear breakdown of the main risks you should be aware of: 1. Biometric Data Collection

Risk: Biometric identifiers (like facial recognition, voiceprints, etc.) are extremely sensitive and irreplaceable if compromised.

    What they collect: Persona may collect a selfie, video, and metadata, and extract biometric templates from those for facial comparison and liveness detection.

    If leaked or abused: Unlike passwords, you can't change your face. A future data breach or misuse could lead to permanent identity compromise.

 2. Data Storage & Retention

Risk: Persona says biometric data is kept for up to one year, but:

    You’re relying on their internal policies, not a legal guarantee.

    There’s no technical detail on how securely it’s stored or whether it’s encrypted at rest.

 Worst-case scenario: Poorly secured biometric templates could be stolen, reused, or matched against other data sets by bad actors or governments.
 3. Third-Party Sharing and Surveillance Risks

Risk: Your biometric and ID data may be shared with subprocessors (partners/vendors) that you haven’t explicitly vetted.

    Persona may transfer your data to cloud providers (like AWS, GCP), verification specialists, or fraud prevention services.

    Depending on jurisdiction, data could be subject to subpoenas, surveillance laws, or government backdoors (especially in the U.S.).

 4. Consent Ambiguity & Future Use

Risk: The fine print often includes vague consent for "quality assurance", "model improvement", or "fraud detection".

    This opens the door to retraining algorithms on your biometric data—even if anonymized, that's still a use of your body as data.

    Their privacy policy may evolve, and new uses of your data could be added later unless you opt out (which may not always be possible).

Should You Agree?

Only if:

    You absolutely need the service that requires this verification.

    You’re aware of the privacy tradeoff and are okay with it.

    You trust that Persona and its partners won’t misuse your biometric data—even a year down the line.

If you’re uneasy about this, you’re not alone. Many developers and privacy advocates refuse to verify with biometrics for non-critical services, and companies like OpenAI are increasingly facing criticism for requiring this.

stevage 2 days ago

Huh, I never considered using ChatGPT to read terms of serivce for me. Nice idea.

lakomen 2 days ago

[dead]
_345 a day ago

o3 is really powerful. I understand it tbh. They don't want scammers and abusers easily accessing it

sschueller 2 days ago

Has anyone noticed that OpenAI has become "lazy"? When I ask questions now it will not give me a complete file or fix. Instead it tells me what I should do and I need to ask a second or third time to just do the thing I asked.

I don't see this happening with for example deepseek.

Is it possible they are saving on resources by having it answer that way?

tedsanders 2 days ago

Yeah, our models are sometimes too lazy. It’s not intentional, and future models will be less lazy.
When I worked at Netflix I sometimes heard the same speculation about intentionally bad recommendations, which people theorized would lower streaming and increase profit margins. It made even less sense there as streaming costs are usually less than a penny. In reality, it’s just hard to make perfect products!
(I work at OpenAI.)
- ukblewis 2 days ago
  
  Please be careful about the alternative. I’ve seen o3 doing excessive tool calls and research for relatively simple problems.
  - jillesvangurp 2 days ago
    
    Yep, it defaults to doing a web search even when that doesn't make sense.
    Example, I asked it to write something. And then I asked it to give me that blob of text in markdown format. So everything it needed was already in the conversation. That took a whole minute of doing web searches and what not.
    I actually dislike using o3 for this reason. I keep the default to 4o. But sometimes I forget to switch back and it goes off boiling the oceans to answer a simple question. It's a bit too trigger happy with that. In general all this version and model soup is impossible to figure out for non technical users. And I noticed 4o is now sometimes starting to do the same. I guess, too many users never use the model drop down.
  - Hard_Space 2 days ago
    
    After the last few weeks, where o3 seems desperate to do tool searches or re-crunch a bad gen even though I only asked a question about it, I assumed that the policy is to burn through credits at the fastest possible rate. With this price change, I don't know what's happening now...
    
    Nextgrid a day ago
    
    Are they actually profitable? A policy to burn through credits only makes sense if they're making a profit on each token - otherwise it would be counterproductive.
- anshumankmr 2 days ago
  
  That was a problem in GPT 4 Turbo as well...
  - jazzyjackson 2 days ago
    
    IMO its just that the models are very nondeterministic, and people get very different kinds of responses from it. I met a number of people who tried it when it first came out and it was just useless so they stopped trying it, other people (including me) got gobsmacking great responses and it felt like AGI was around the corner, but after enough coin flips your luck runs out and you get some lazy responses. Some people have more luck than others and wonder why everyone around them says it's trash.
    
    anshumankmr 2 days ago
    
    GPT4-Turbo had some major "laziness" problems, like really major ones. I posted about this a year back.https://news.ycombinator.com/item?id=39985596#39987726
    I am not saying they haven't improved the laziness problem, but it does happen anecdotally. I even got similar sort of "lazy" responses for something I am building with gemini-2.5-flash.
- 0x1ceb00da 2 days ago
  
  I think it's good. The model will probably make some mistake at first. Not doing the whole thing and just telling the user the direction it's going in gives us a chance to correct its mistakes.
- thimabi 2 days ago
  
  Can you share what are the main challenges OpenAI has been facing in terms of increasing access to top-tier and non-lazy models?
- TZubiri 2 days ago
  
  but maybe you are saying that because you are a CIA plant that's trying to make the product bad because of complex reasons.
  takes tinfoil hat off
  Oh, nvm, that makes sense.
TillE 2 days ago

Had a fun experience the other day asking "make a graph of [X] vs [Y]" (some chemistry calculations), and the response was blah blah blah explain explain "let me know if you want a graph of this!" Yeah ok thanks for offering.
csomar 2 days ago

I don't think that's laziness but maybe agent tuning.

mythz 2 days ago

I've been turned off with OpenAI and have been actively avoiding using any of their models for a while, luckily this is easy to do given the quality of Sonnet 4 / Gemini Pro 2.5.

Although I've always wondered how OpenAI could get away with o3's astronomical pricing, what does o3 do better than any other model to justify their premium cost?

jstummbillig 2 days ago

It's just a highly unoptimized space. There is very little market consolidation at this point, everyone is trying things out that lead to wildly different outcomes and processes and costs, even though in the end it's always just a bunch of utf-8 characters. o3 was probably just super expensive to run, and now, apparently, it's not anymore and can beat sonnet/opus 4 on pricing. It's fairly wild.
jsnider3 a day ago

Very few customers pick the model based on cost, for many ChatGPT is the only one they know of.
- hu3 a day ago
  
  > Very few customers pick the model based on cost.
  What? 3 ou of 4 companies I consulted for that started using AI for coding marked cost as an important criteria. The 4th one has virtually infinite funding so they just don't care.
  - jsnider3 a day ago
    
    > 3 out of 4 companies I consulted for that started using AI for coding marked cost as an important criteria.
    And those aren't average customers.

lvl155 2 days ago

Google has been catching up. Funny how fast this space is evolving. Just a few months ago, it was all about DeepSeek.

bitpush 2 days ago

Many would say Google's Gemini models are SOTA, although Claude seems to be doing well with coding tasks.
- snarf21 2 days ago
  
  Gemini has been better than Claude for me on a coding project. Claude kept telling me it update some code but the update wasn't in the output. Like, I had to re-prompt just for updated output 5 times in a row.
  - jacob019 2 days ago
    
    I break out Gemini 2.5 pro when Claude gets stuck, it's just so slow and verbose. Claude follows instructions better and seems to better understand it's role in agentic workflows. Gemini does something different with the context, it has a deeper understanding of the control flow and can uncover edge case bugs that Claude misses. o3 seems better at high level thinking and planning, questioning if it should it be done and whether the challenge actually matches the need. They're kind of like colleagues with unique strengths. o3 does well with a lot of things, I just haven't used it as much because of the cost. Will probably use it more now.
ookdatnog 2 days ago

If the competition boils down to who has access to the largest amount of high quality data, it's hard to see how anyone but Google could win in the end: through Google Books they have scans of tens of millions of books, and published books are the highest quality texts there are.
- itake a day ago
  
  I've been learning vietnamese. Unfortunately, a lot of social media (reddit, fb, etc) has a new generation of language. The younger generation uses so much abbreviations and acronyms, ChatGPT and Google Translate can't keep up.
  I think if you're goal is to have properly written langauge using older writing styles, then you're correct.
  - ookdatnog a day ago
    
    I don't think it's simply a stylistic matter: it seems reasonable to assume that text in books tends to have higher information density, and contains longer and more complicated arguments (when compared to text obtained from social media posts, blogs, shorter articles, etc). If you want models that appear more intelligent, I think you need them to train on this kind of high-quality content.
    The fact that these tend to be written in an older writing style is to me incidental. You could rewrite all your college text books in contemporary social media slang and I would still consider them high-quality texts.
johan914 2 days ago

I have been using Google’s models the past couple months, and was surprised to see how sycophantic chatGPT is now. It’s not just at the start or end of responses, it’s interspaced within the markdown, with little substance. Asking it to change its style makes it overuse technical terms.
malshe a day ago

I have observed that DeepSeek hallucinates a lot more than others for the same task. Anyone else experienced it?
resource_waste 2 days ago

Deepseek was exciting because you could download their model. They are seemingly 3rd place and have been since Gemini 2.5.
- Squarex 2 days ago
  
  I would put them on the fourth after Google, OpenAI and Anthropic. Still the best open weight llm.

behnamoh 2 days ago

how do we know it's not a quantized version of o3? what's stopping these firms from announcing the full model to perform well on the benchmarks and then gradually quantizing it (first at Q8 so no one notices, then Q6, then Q4, ...).

I have a suspicion that's how they were able to get gpt-4-turbo so fast. In practice, I found it inferior to the original GPT-4 but the company probably benchmaxxed the hell out of the turbo and 4o versions so even though they were worse models, users found them more pleasing.

CSMastermind 2 days ago

This is almost certainly what they're doing and rebranding the original o3 model as "o3-pro"
- tedsanders 2 days ago
  
  Nope, not what we’re doing.
  o3 is still o3 (no nerfing) and o3-pro is new and better than o3.
  If we were lying about this, it would be really easy to catch us - just run evals.
  (I work at OpenAI.)
  - fastball 2 days ago
    
    Anecdotal, but about a week ago I noticed a sharp drop in o3 performance. For many tasks I will compare Gemini 2.5 Pro with o3, running the same prompt in both. Generally for my personal use o3 and G2.5P have been neck-and neck over the last months, with responses I have been very happy with.
    However starting from a week ago, the o3 responses became noticeably worse, with G2.5P staying about the same (in terms of what I've come to expect from the two models).
    This alongside the news that you guys have decreased the price of o3 by 80% does really make it feel like you've quantized the model or knee-capped thinking or something. If you say it is wholly unchanged I'll believe you, but not sure how else to explain the (admittedly subjective) performance drop I've experienced.
    
    IanCal a day ago
    
    Are you sure you're using the same models? G2.5P updated almost exactly a week ago.
    
    fastball 19 hours ago
    
    G2.5P might've updated, but that's not the model I noticed a difference. o3 seemed noticeably dumber in isolation, not just compared to G2.5P.
    But yes, perhaps the answer is that about a week ago I started asking subconsciously harder questions, and G2.5P handled them better because it had just been improved, while o3 had not so it seemed worse. Or perhaps G2.5P has always had more capacity than o3, and I wasn't asking hard enough questions to notice a difference before.
  - fny a day ago
    
    Unrelated: Can you all come up with a better naming scheme for your models? I feel like this is a huge UX miss.
    o4-mini-high o4-mini o3 o3-pro gpt-4o
    Oy.
  - energy123 2 days ago
    
    Is it o3 (low), o3 (medium) or o3 (high)? Different model names have crept into the various benchmarks over the last few months.
    
    tedsanders 2 days ago
    
    o3 is a model, and reasoning effort (high/medium/low) is a parameter that goes into the model.
    o3 pro is a different thing - it’s not just o3 with maximum remaining effort.
    
    tauntz 2 days ago
    
    Why's it called o3 then if it's a different thing? There's already a rather extreme amount of confusion with the model names and it's not clear _at all_ which model would be "the best" in terms of response quality.
    Here's the current state with version numbers as far as I can piece it together (using my best guess at naming of each component of the version identifier. Might be totally wrong tho):
    1) prefix (optional): "gpt-", "chatgpt-"
    2) family (required): o1, o3, o4, 4o, 3.5, 4, 4.1, 4.5,
    3) quality? (optional): "nano", "mini", "pro", "turbo"
    4) type (optional): "audio", "search"
    5) lifecycle (optional): "preview", "latest"
    6) date (optional): 2025-04-14, 2024-05-13, 1106, 0613, 0125, etc (I assume the last ones are a date without a year for 2024?)
    7) size (optional): "16k"
    Some final combinations of these version number components are as small as 1 ("o3") or as large as 6 ("gpt-4o-mini-search-preview-2024-12-17").
    Given this mess, I can't blame people assuming that the "best" model is the one with the "biggest" number, which would rank the model families as: 4.5 (best) > 4.1 > 4 > 4o > o4 > 3.5 > o3 > o1 (worst).
    
    tedsanders a day ago
    
    o3 pro is based on o3 and its style and outputs will be quite similar to o3.
    As an analogy, think of it like this:
    o3-low ~ Ford Mustang with the accelerator gently pressed
    o3-medium ~ Ford Mustang with the accelerator pressed
    o3-high ~ Ford Mustang with the accelerator heavily pressed
    o3 pro ~ Ford Mustang GT
    Even though a Mustang GT is a different car than a Mustang, you don’t give it a totally different name (eg Palomino). The similarity in name signals it has a lot of the same characteristics but a souped up engine. Same for o3 pro.
    Fun fact: before GPT-4, we had a unified naming scheme for models that went {modality}-{size}-{version}, which resulted in names like text-davinci-002. We considered launching GPT-4 as something like text-earhart-001, but since everyone was calling it GPT-4 anyway, we abandoned that system to use the name GPT-4 that everyone had already latched onto. Kind of funny how our original unified naming scheme made room for 999 versions, but we didn't make it past 3.
    Edit: When I say the Mustang GT is a different car than a Mustang - I mean it literally. If you bought a Mustang GT and someone delivered a Mustang with a different trim, you wouldn't say "great, this is just what I ordered, with the same features/behavior/value." That we call it a different trim is a linguistic choice to signal to consumers that it's very similar, and built on the same production line, but comes with a different engine or different features. Similar to o3 pro.
    
    dwohnitmok a day ago
    
    Can you elaborate on what you mean that o3 pro is a GT? In particular I don't understand how to reconcile what you're saying that o3 pro is in some way fundamentally different from o3 (albeit based on o3) with this tweet:
    > As o3-pro uses the same underlying model as o3, full safety details can be found in the o3 system card.
    https://x.com/OpenAI/status/1932530423911096508
    
    tedsanders a day ago
    
    Yeah, I totally get the confusion here. Unfortunately I can't give the recipe behind our models, so there's going to be some irreducible blurriness here, but the following statements are all true:
    - o3 pro is based on o3
    - o3 pro uses the same underlying model as o3
    - o3 pro is similar to o3, but is a distinct thing that's smarter and slower
    - o3 pro is not o3 with longer reasoning
    In my analogy, o3 pro vs o3 is more than just an input parameter (e.g., not just the accelerator input) but less than a full difference in model (e.g., Ford Mustang vs F150). It's in between, kind of like car trim with the same body but a stronger engine. Imperfect analogy, and I apologize if this doesn't feel like it adds any clarity. At the end of the day, it doesn't really matter how it works - what matters is if people find it worth using.
    
    stonogo a day ago
    
    This analogy might work better if the Mustang GT weren't, in fact, the same car as the Mustang. It's just a trim level, not a different car.
    
    energy123 a day ago
    
    My guess is this comes from an org structure where you have multiple "pods" working on different research. Who comes up with the next shippable model and when that happens is kind of random and the chaotic naming system comes from that. It's just my speculation and could be wildly wrong.
    
    rat9988 a day ago
    
    o3 and o3-pro aren't the same thing still makes sense though.
    
    fragmede 2 days ago
    
    Could someone there maybe possibly use, oh I dunno, ChatGPT and come up with some better product names?
  - meta_ai_x a day ago
    
    Just because you work at openAI doesn't mean you know everything about openAI especially as strategic as nerfing models to save costs
  - MattDaEskimo 2 days ago
    
    What's with the dropped benchmark performance compared to the original o3 release? It was disappointing to not see o4-mini on it as well
    
    refulgentis 2 days ago
    
    What dropped benchmark performance?
    
    MattDaEskimo a day ago
    
    o3 scores noticeably worse on benchmarks compared to its original announcement benchmarks
    
    refulgentis a day ago
    
    Any link / source / anything? You got quite an opportunity here, OpenAI employee claiming there's no difference and you got something that shows there is.
    
    MattDaEskimo a day ago
    
    Yes, the original announcement for o3 and o4-mini:
    https://openai.com/index/introducing-o3-and-o4-mini/
    o3 scored 91.6 on AIME 2024. 83.3 on GPQA
    o4-mini scored 93.4, 81.4 GPQA
    Then, the new announcement
    https://help.openai.com/en/articles/6825453-chatgpt-release-...
    o3 scored 90 on AIME 2024, 81 GPQA
    o4-mini wasn't measured
    ---
    Codeforces is the same, but they have a footnote that they're using a different dataset due to saturation, but still have no grounding model to compare with
  - bn-l 2 days ago
    
    Not quantized?
    
    tedsanders 2 days ago
    
    Not quantized. Weights are the same.
    If we did change the model, we'd release it as a new model with a new name in the API (e.g., o3-turbo-2025-06-10). It would be very annoying to API customers if we ever silently changed models, so we never do this [1].
    [1] `chatgpt-4o-latest` being an explicit exception
    
    linsomniac 2 days ago
    
    >we'd release it as a new model with a new name
    Speaking of a new name. I'll donate the API credits to run a "choose a naming scheme for AI models that isn't confusing AF" for OpenAI.
    
    thegeomaster 2 days ago
    
    Google could at least learn something from this attitude, given their recent 03-25 -> 05-06 model alias switcharoo with 0 notice :)
    
    johnb231 2 days ago
    
    That is a preview / beta model with no expectation of stability. Google did nothing wrong there. No one should be using a preview model in production.
    
    thegeomaster 2 days ago
    
    Hard disagree. Of course technically they didn't do anything explicitly against the public guidance (the checks and balances would never let them), but naming a model with a date very strongly implies immutability.
    It's the same logic of why UB in C/C++ isn't a license to do whatever the compiler wants. We're humans and we operate on implications, common-sense assumptions and trust.
    
    johnb231 2 days ago
    
    The model is labelled as Preview. There are no guarantees of stability or availability for Preview models. Not intended for production workloads.
    https://cloud.google.com/products?hl=en#product-launch-stage...
    "At Preview, products or features are ready for testing by customers. Preview offerings are often publicly announced, but are not necessarily feature-complete, and no SLAs or technical support commitments are provided for these. Unless stated otherwise by Google, Preview offerings are intended for use in test environments only. The average Preview stage lasts about six months."
    
    0xbadcafebee 2 days ago
    
    There's a very large gulf between "what makes sense to Google" and "what makes sense to Human Beings". I have so many rants about Google's poor treatment of "customers" that they feel like Oracle to me now. Like every time I use them, I'm really just falling prey to my own misguided idea that this time I won't get screwed over.
    
    johnb231 2 days ago
    
    The users aren't random "human beings" in this case. They are professional software developers who are expected to understand the basics. Deploying that model into production shows a lack of basic competence. It is clearly marked "preview" and is for test only.
    
    0xbadcafebee a day ago
    
    That may be true, but it doesn't make the customer's claims not true. What Google did was counter-intuitive. That's a fact. Pointing at some fine print and saying "uhh actually, technically it's your stupid human brain is the problem, not us! we technically are allowed to do anything we want, just look at the fine print!!" does not make things better. We are human beings; we are flawed. That much should be obvious to any human organization. If you don't know how to make things that don't piss off human beings, the problem isn't with the humans.
    If the "preview release" you were using was v0.3, and suddenly it started being v0.6 without warning, that would be insane. The only point of providing a version number is to give people an indicator of consistency. The datestamp is a version number. If they didn't want us to expect consistency, they should not have given it a version number. That's the whole point of rolling release branches, they have no version. You don't have "v2.0" of a rolling release, you just have "latest". They fucked up by giving it a datestamp.
    This is an extremely old and well-known problem with software interfaces. Either you version it or you don't. If you do version it, and change it, you change the version, and give people dependent on the old version some time to upgrade. Otherwise it breaks things, and that pisses people off. The alternative is not versioning it, which is a signal that there is no consistency to be expected. Any decent software developer should have known all this.
    And while I'm at it: what's with the name flip-flopping? In 2014, GCP issued a PR release explaining It was no longer using "Preview", but "Alpha" and "Beta" (https://cloudplatform.googleblog.com/2014/10/new-release-pha...). But the link you showed earlier says "Alpha" and "Beta" are now deprecated. But no PR release? I guess that's our bad for not constantly reading the fine print and expecting it to revert back to something from 11 years ago.
    
    refulgentis 2 days ago
    
    There hasn't been a non-preview Gemini since...November? The previews are the same as everyone else's release cadance, "preview" is just a magic wand that meant the Launchcal (google's internal signoff tool, i.e. "wave will never happen again) needs less signoffs. Then it got to the point date-pinned models were getting swapped in, in the name of doing us a favor, and it's a...novel idea, we can both agree at the least.
    I bet someone at Google would be a bit surprised to see someone jumping to legalese to act like this...novelty...is inherently due to the preview status, and based on anything more than a sense that there's no net harm done to us if it costs the same and is better.
    I'm not sure they're wrong.
    But it also leads to a sort of "nobody knows how anything works because we have 2^N configs and 5 bits" - for instance, 05-06 was also upgraded to 06-05. Except it wasn't, if you sent variable thinking to 05-06 after upgrade it'd fail. (and don't get me started on the 5 different thinking configurations for Gemini 2.5 flash thinking vs. gemini 05-06 vs. 06-05 and 0 thinking)
    
    johnb231 2 days ago
    
    I honestly have no idea what you are trying to say.
    It's a preview model - for testing only, not for production. Really not that complicated.
    
    refulgentis 2 days ago
    
    So you don't have anything to contribute beyond, and aren't interested in anything beyond, citing of terms?
    Why are you in the comments section of a engineering news site?
    (note: beyond your, excuse me while I'm direct now, boorish know-nothing reply, the terms you are citing have nothing to do with the thing people are actually discussing around you, despite your best efforts. It doesn't say "we might swap in a new service, congrats!", nor does it have anything to say about that. Your legalese at most describes why they'd pull 05-06, not forward 05-06 to 06-05. This is a novel idea.)
    
    johnb231 2 days ago
    
    This case was simply a matter of people not understanding the terms of service. There is nothing more to be said. It's that simple. The "engineers" should know that before deploying to prod. Basic competence.
    And I mean I genuinely do not understand what you are trying to say. Couldn't parse it.
    
    lcnPylGDnU4H9OF a day ago
    
    > And I mean I genuinely do not understand what you are trying to say. Couldn't parse it.
    It’s always worth considering that this may be your problem. If you still don’t get it, the only valuable reply is one which asks a question. Also, including “it’s not that complicated” only serves to inflame.
    
    refulgentis 2 days ago
    
    John, do you understand that the thing you're quoting says "We reserve the right to pull things", not "We reserve the right to swap in a new service"?
    Do you understand that even if it did say that, that wasn't true either? It was some weird undocumentable half-beast?
    I have exactly your attitude about their cavalier use of preview for all things Gemini, and even people's use of the preview models.
    But I've also been on this site for 15 years and am a bit wow'd by your interlocution style here -- it's quite rare to see someone flip "the 3P provider swapped the service on us!" into "well they said they could turn it off, of course you should expect it to be swapped for the first time ever!" insert dull sneer about the quality of other engineers
    
    johnb231 2 days ago
    
    How is this so hard to understand? It's a preview service for testing only, not intended for production.
    I am done with this thread. We are going around in circles.
    
    refulgentis 2 days ago
    
    Well, no. Well, sure. You're done, but we're not going in circles. It'd just do too much damage to you to have to answer the simple question "Where does the legalese say they can swap in a new service?", so you have to pretend this is circular and just all-so-confusing, de facto, we have to pretend it is confusing and/or obviously wrong to use any Gemini 2+ at all.
    It's a cute argument, as I noted, I'm emotionally sympathetic to it even, it's my favorite "get off my lawn." However, I've also been on the Internet long enough to know you write back, at length, when people try anti-intellectualism and why-are-we-even-talking-about-this as interaction.
    
    johnb231 2 days ago
    
    https://cloud.google.com/terms/service-terms
    "b. Disclaimer. PRE-GA OFFERINGS ARE PROVIDED “AS IS” WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES OR REPRESENTATIONS OF ANY KIND. Pre-GA Offerings (i) may be changed, suspended or discontinued at any time without prior notice to Customer and (ii) are not covered by any SLA or Google indemnity. Except as otherwise expressly indicated in a written notice or Google documentation, (A) Pre-GA Offerings are not covered by TSS, and (B) the Data Location Section above will not apply to Pre-GA Offerings."
    
    anoncareer0212 2 days ago
    
    [flagged]
    
    johnb231 2 days ago
    
    [flagged]
    
    anoncareer0212 2 days ago
    
    Been here for 15 years, and there's standards for interaction, especially for 19 day old accounts. I recommend other sites if you're expecting to be dismissive and rude without strong intellectual pushback.
    
    Gigablah 2 days ago
    
    This is just “f—- off” dressed in formal language, so not much better.
    
    johnb231 2 days ago
    
    [flagged]
    
    ant6n 2 days ago
    
    It was definitely annoying when o1 disappeared over night, my impression is that was better at some tasks than o3.
  - csomar 2 days ago
    
    I think the parent-parent poster has explained why we can't trust you (and work on OpenAI doesn't help they way you think it does).
    I didn't read the ToS, like everyone else, but my guess is that degrading model performance at peak times will be one of the things that can slip through. We are not suggesting you are running a different model but that you are quantizing it so that you can support more people.
    This can't happen with Open weight models where you put the model, allocate the memory and run the thing. With OpenAI/Claude, we don't know the model running, how large it is, what it is running on, etc... None of that is provided and there is only one reason that I can think of: to be able to reduce resources unnoticed.
    
    rfoo 2 days ago
    
    An (arbitrarily) quantized model is a totally different model, compared to the original.
    
    Reubachi a day ago
    
    I'm not totally sure how you at this point in your online presence associate someone stating their job as a "brag" and not what it really is, providing transparency/disclosure before stating their thoughts.
    This is HN and not reddit.
    "I didn't read the ToS, like everyone else, but my guess..."
    Ah, there it is.
- mliker 2 days ago
  
  Where are you getting this information? What basis do you have for making this claim? OpenAI, despite its public drama, is still a massive brand and if this were exposed, would tank the company's reputation. I think making baseless claims like this is dangerous for HN
  - beering 2 days ago
    
    I think Gell-Mann amnesia happens here too, where you can see how wrong HN comments are on a topic you know deeply, but then forget about that when reading the comments on another topic.
- behnamoh 2 days ago
  
  > rebranding the original o3 model as "o3-pro"
  interesting take, I wouldn't be surprised if they did that.
- anticensor 2 days ago
  
  -pro models appear to be a best-of-10 sampling of the original full size model
  - Szpadel 2 days ago
    
    how do you sample it behind the scenes? usually best of X means you generate X outputs and you choose best result.
    if you could do this automatically, it would be game changer as you could run top 5 best models in parallel and select best answer every time
    but it's not practical because you are the bottleneck as you have to read all 5 solutions and compare them
    
    anticensor 2 days ago
    
    > if you could do this automatically, it would be game changer as you could run top 5 best models in parallel and select best answer every time
    remember they have access to the RLHF reward model, against which they can evaluate all N outputs and have the most "rewarded" answer picked and sent
    
    joshstrange 2 days ago
    
    I think the idea is they use another/same model to judge all the results and only return the best one to the user.
    
    anticensor 2 days ago
    
    I think the idea is they just feed each to the RLHF reward model used to train the model and return the most rewarded answer.
    
    spott 2 days ago
    
    I believe it is a majority vote kinda thing, rather than a best single result.
lispisok 2 days ago

I swear every time a new model is released it's great at first but then performance gets worse over time. I figured they were fine-tuning it to get rid of bad output which also nerfed the really good output. Now I'm wondering if they were quantizing it.
- Tiberium 2 days ago
  
  I've heard lots of people say that, but no objective reproducible benchmarks confirm such a thing happening often. Could this simply be a case of novelty/excitement for a new model fading away as you learn more about its shortcomings?
  - Kranar 2 days ago
    
    I used to think the models got worse over time as well but then I checked my chat history and what I noticed isn't that ChatGPT gets worse, it's that my standards and expectations increase over time.
    When a new model comes out I test the waters a bit with some more ambitious queries and get impressed when it can handle them reasonably well. Over time I take it for granted and then just expect it to be able to handle ever more complex queries and get dissappointed when I hit a new limit.
    
    echelon 2 days ago
    
    Re-run your historical queries, or queries that are similarly shaped.
    
    sakesun 2 days ago
    
    They could cache that :)
    
    echelon a day ago
    
    That would make for a very interesting timing attack.
    
    throwaway314155 2 days ago
    
    Sounds like a _whole_ thing.
  - herval 2 days ago
    
    there's definitely measurements (eg https://hdsr.mitpress.mit.edu/pub/y95zitmz/release/2 ) but I imagine they're rare because those benchmarks are expensive, so nobody keeps running them all the time?
    Anecdotally, it's quite clear that some models are throttled during the day (eg Claude sometimes falls back to "concise mode" - with and without a warning on the app).
    You can tell if you're using Windsurf/Cursor too - there are times of the day where the models constantly fail to do tool calling, and other times they "just work" (for the same query).
    Finally, there's cases where it was confirmed by the company, like Gpt-4o's sycopanth tirade that very clearly impacted its output (https://openai.com/index/sycophancy-in-gpt-4o/)
    
    Deathmax 2 days ago
    
    Your linked article is specifically comparing two different versioned snapshots of a model and not comparing the same model across time.
    You've also made the mistake of conflating what's served via API platforms which are meant to be stable, and frontends which have no stability guarantees, and are very much iterated on in terms of the underlying model and system prompts. The GPT-4o sycophancy debacle was only on the specific model that's served via the ChatGPT frontend and never impacted the stable snapshots on the API.
    I have never seen any sort of compelling evidence that any of the large labs tinkers with their stable, versioned model releases that are served via their API platforms.
    
    herval 2 days ago
    
    Please read it again. The article is clearly comparing gpt4 to gpt4, and gpt3.5 to gpt3.5, in march vs june 2023
    
    Deathmax 2 days ago
    
    I did read it, and I even went to their eval repo.
    > At the time of writing, there are two major versions available for GPT-4 and GPT-3.5 through OpenAI’s API, one snapshotted in March 2023 and another in June 2023.
    openaichat/gpt-3.5-turbo-0301 vs openaichat/gpt-3.5-turbo-0613, openaichat/gpt-4-0314 vs openaichat/gpt-4-0613. Two _distinct_ versions of the model, and not the _same_ model over time like how people like to complain that a model gets "nerfed" over time.
    
    drewnick 2 days ago
    
    I feel this too. I swear some of the coding Claude Code does on weekends is superior to the weekdays. It just has these eureka moments every now and then.
    
    herval 2 days ago
    
    Claude has been particularly bad since they released 4.0. The push to remove 3.7 from Windsurf hasn’t helped either. Pretty evident they’re trying to force people to pay for Claude Code…
    Trusting these LLM providers today is as risky as trusting Facebook as a platform, when they were pushing their “opensocial” stuff
    
    glitch253 2 days ago
    
    Cursor / Windsurf's degraded functionality is exactly why I created my own system:
    https://github.com/mpfaffenberger/code_puppy
  - cainxinth 2 days ago
    
    I assumed it was because the first week revealed a ton of safety issues that they then "patched" by adjusting the system prompt, and thus using up more inference tokens on things other than the user's request.
  - bobxmax 2 days ago
    
    My suspicion is it's the personalization. Most people have things like 'memory' on, and as the models increasingly personalize towards you, that personalization is hurting quality rather than helping it.
    Which is why the base model wouldn't necessarily show differences when you benchmarked them.
  - 85392_school 2 days ago
    
    I think it's an illusion. People have been claiming it since the GPT-4 days, but nobody's ever posted any good evidence to the "model-changes" channel in Anthropic's Discord. It's probably just nostalgia.
  - tshaddox 2 days ago
    
    Yeah, it’s almost certainly hallucination (by the human user).
  - colordrops 2 days ago
    
    It's probably less often quantizing and more often adding more and more to their hidden system prompt to address various issues and "issues", and as we all know, adding more context sometimes has a negative effect.
- JoshuaDavid 2 days ago
  
  I suspect what's happening is that lots of people have a collection of questions / private evals that they've been testing on every new model, and when a new model comes out it sometimes can answer a question that previous models couldn't. So that selects for questions where the new model is at the edge of its capabilities and probably got lucky. But when you come up with a new question, it's generally going to be on the level of the questions the new model is newly able to solve.
  Like I suspect if there was a "new" model which was best-of-256 sampling of gpt-3.5-turbo that too would seem like a really exciting model for the first little bit after it came out, because it could probably solve a lot of problems current top models struggle with (which people would notice immediately) while failing to do lots of things that are a breeze for top models (which would take people a little bit to notice).
- nabla9 2 days ago
  
  It seems that least Google is overselling their compute capacity.
  You pay monthly fee, but Gemini is completely jammed 5-6 hours when North America is working.
  - baq 2 days ago
    
    Gemini is simply that good. I’m trying out Claude 4 every now and then and go back to Gemini to fix its mess…
    
    energy123 2 days ago
    
    Gemini is the best model in the world. Gemini is the worst web app in the world. Somehow those two things are coexisting. The web devs in their UI team have really betrayed the hard work of their ML and hardware colleagues. I don't say this lightly - I say this after having paid attention to critical bugs, more than I can count on one hand, that persisted for over a year. They either don't care or are grossly incompetent.
    
    thorum 2 days ago
    
    Try AI Studio if you haven’t already: https://aistudio.google.com/
    
    koakuma-chan 2 days ago
    
    https://ai.dev
    
    nabla9 2 days ago
    
    Well said.
    Google is best in pure AI research, both quality and volume. They have sucked at productization for years. Not not just AI but other products as well. Real mystery.
    
    energy123 2 days ago
    
    I don't understand why they can't just make it fast and go through the bug reports from a year ago and fix them. Is it that hard to build a box for users to type text into without it lagging for 5 seconds or throwing a bunch of errors?
    
    baq 16 hours ago
    
    If it doesn’t make sense, it makes sense. Nobody will get their promo by ‘fixing bugs’.
    
    fasterthanlime 2 days ago
    
    Funny, I have the exact opposite experience! I use Claude to fix Gemini’s mess.
    
    symfoniq 2 days ago
    
    Maybe LLMs just make messes.
    
    hgomersall 2 days ago
    
    I heard that, but I'm getting consistent garbage from Gemini.
    
    dayjah 2 days ago
    
    For code? Use the context7 mcp.
  - edzitron 2 days ago
    
    When you say "jammed," how do you mean?
- JamesBarney 2 days ago
  
  I'm pretty sure this is just a psychological phenomenon. When a new model is released all the capabilities the new model has that the old model lacks are very salient. This makes it seem amazing. Then you get used to the model, push it to the frontier, and suddenly the most salient memories of the new model are it's failures.
  There are tons of benchmarks that don't show any regressions. Even small and unpublished ones rarely show regressions.
- mhitza 2 days ago
  
  That was my suspicion when I first deleted my account, when it felt the output got worse in ChatGPT and I found highly suspicious when I saw an errand davinci model keyword in the chatgpt url.
  Now I'm feeling similarly with their image generation (which is the only reason I created a paid account two months ago, and the output looks more generic by default).
  - beering 2 days ago
    
    Are you able to quantify how quickly your perception gets skewed by how long you use the models?
    
    mhitza a day ago
    
    I can't quantity it for my past experience, that was more than a year ago, and I wasn't using ChatGPT daily at the time either.
    This time around it felt pretty stark. I used ChatGPT to create at most 20 different image compositions. And after a couple of good ones at first, it felt worse after. One thing I've noticed recently is that when working on vector art compositions, the results start more simplistic, and often enough look like clipart thrown together. This wasn't my experience first time around. Might be temperature tweaks, or changes in their prompt that lead to this effect. Might be some random seed data they use, who knows.
- beering 2 days ago
  
  It’s easy to measure the models getting worse, so you should be suspicious that nobody who claims this has scientific evidence to back it up.
- solfox 2 days ago
  
  I have seen this behavior as well.
- codr7 2 days ago
  
  [flagged]
  - daseiner1 2 days ago
    
    It's still a very competitive marketplace
  - mathgradthrow 2 days ago
    
    honestly refreshing take.
  - bboygravity 2 days ago
    
    But OpenAI breathes honesty. They're open source! They would never do such a thing. /s
tedsanders 2 days ago

It's the same model, no quantization, no gimmicks.
In the API, we never make silent changes to models, as that would be super annoying to API developers [1]. In ChatGPT, it's a little less clear when we update models because we don't want to bombard regular users with version numbers in the UI, but it's still not totally silent/opaque - we document all model updates in the ChatGPT release notes [2].
[1] chatgpt-4o-latest is an exception; we explicitly update this model pointer without warning.
[2] ChatGPT Release Notes document our updates to gpt-4o and other models: https://help.openai.com/en/articles/6825453-chatgpt-release-...
(I work at OpenAI.)
ctoth 2 days ago

From the announcement email:
> Today, we dropped the price of OpenAI o3 by 80%, bringing the cost down to $2 / 1M input tokens and $8 / 1M output tokens.
> We optimized our inference stack that serves o3—this is the same exact model, just cheaper.
hyperknot 2 days ago

I got 700+ tokens/sec on o3 after the announcement, I suspect it's very much a quantized version.
https://x.com/hyperknot/status/1932476190608036243
- dist-epoch 2 days ago
  
  Or maybe they just brought online much faster much cheaper hardware.
  - az226 2 days ago
    
    Or they are using a speedy add-on decoder.
- beering 2 days ago
  
  Do you also have numbers on intelligence before and after?
- zackangelo 2 days ago
  
  Is that input tokens or output tokens/s?
carter-0 2 days ago

An OpenAI researcher claims it's the exact same model on X: https://x.com/aidan_mclau/status/1932507602216497608
ants_everywhere 2 days ago

Is this what happened to Gemini 2.5 Pro? It used to be very good, but it's started struggling on basic tasks.
The thing that gets me is it seems to be lying about fetching a web page. It will say things are there that were never on any version of the page and it sometimes takes multiple screenshots of the page to convince it that it's wrong.
- SparkyMcUnicorn 2 days ago
  
  The Aider discord community has proposed and disproven the theory that 2.5 Pro became worse, several times, through many benchmark runs.
  It had a few bugs here or there when they pushed updates, but it didn't get worse.
  - ants_everywhere 2 days ago
    
    Gemini is objectively exhibiting new behavior with the same prompts and that behavior is unwelcome. It includes hallucinating information and refusing to believe it's wrong.
    My question is not whether this is true (it is) but why it's happening.
    I am willing to believe the aider community has found that Gemini has maintained approximately equivalent performance on fixed benchmarks. That's reasonable considering they probably use a/b testing on benchmarks to tell them whether training or architectural changes need to be reverted.
    But all versions of aider I've tested, including the most recent one, don't handle Gemini correctly so I'm skeptical that they're the state of the art with respect to bench-marking Gemini.
    
    SparkyMcUnicorn 2 days ago
    
    Gemini 2.5 Pro is the highest ranking model on the aider benchmarks leaderboard.
    For benchmarks, either Gemini writes code that adheres to the required edit format, builds successfully, and passes unit tests, or it doesn't.
    I primarily use aider + 2.5 pro for planning/spec files, and occasionally have it do file edits directly. Works great, other than stopping it mid-execution once in a while.
- code_biologist 2 days ago
  
  My use case is mostly creative writing.
  IMO 2.5 Pro 03-25 was insanely good. I suspect it was also very expensive to run. The 05-06 release was a huge regression in quality, most people saying it was a better coder and a worse writer. They tested a few different variants and some were less bad then others, but overall it was painful to lose access to such a good model. The just released 06-05 version seems to be uniformly better than 05-06, with far fewer "wow this thing is dumb as a rock" failure modes, but it still is not as strong as the 03-25 release.
  Entirely anecdotally, 06-05 seems to exactly ride the line of "good enough to be the best, but no better than that" presumably to save costs versus the OG 03-25.
  In addition, Google is doing something notably different between what you get on AI Studio versus the Gemini site/app. Maybe a different system prompt. There have been a lot of anecdotal comparisons on /r/bard and I do think the AI Studio version is better.
esafak 2 days ago

Are there any benchmarks that track historical performance?
- behnamoh 2 days ago
  
  good question, and I don't know of any, although it's a no brainer that someone should make it.
  a proxy to that may be the anecdotal evidence of users who report back in a month that model X has gotten dumber (started with gpt-4 and keeps happening, esp. with Anthro and OpenAI models). I haven't heard such anecdotal stories about Gemini, R1, etc.
- SparkyMcUnicorn 2 days ago
  
  Aider has one, but it hasn't been updated in months. People kept claiming models were getting worse, but the results proved that they weren't.
  - esafak 2 days ago
    
    https://aider.chat/docs/leaderboards/by-release-date.html
  - __mharrison__ 2 days ago
    
    Updated yesterday... https://aider.chat/docs/leaderboards/
    
    vitaflo 2 days ago
    
    That Deepseek price is always hilarious to see in these charts.
    
    SparkyMcUnicorn 2 days ago
    
    That's not the one I'm referring to. See my other comments or your sibling comment.
benterix 2 days ago

> users found them more pleasing.
Some users. For me the drop was so huge it became almost unusable for the things I had used it for.
- behnamoh 2 days ago
  
  Same here. One of my apps straight out stopped working because the gpt-4o outputs were noticeably worse than the gpt-4 that I built the app based on.
risho 2 days ago

Quantization is a massive efficiency gain for near negligible drop in quality. If the tradeoff is quantization for an 80 percent price drop I would take that any day of the week.
- behnamoh 2 days ago
  
  > for near negligible drop in quality
  Hmm, that's evidently and anecdotally wrong:
  https://github.com/ggml-org/llama.cpp/discussions/4110
- spiderice 2 days ago
  
  You may be right that the tradeoff is worth it, but it should be advertised as such. You shouldn't think you're paying for full o3, even if they're heavily discounting it.
- code_biologist 2 days ago
  
  I would like the option to pay for the unquantized version. For creative or story writing (D&D campaign materials and such) quantization seems to end up in much weaker word selection and phrasing. There are small semantic missteps that break the illusion the LLM understands what it's writing. I find it jarring and deeply immersion breaking. I'd prefer prototype prompts on a cheaper quantized version, but I want to be able to spend 50 cents an API call to get golden output.
EnPissant 2 days ago

The API lists o3 and o3-2025-04-16 as the same thing with the same price. The date based models are set in stone.
rfoo 2 days ago

I don't work for OAI so obviously I can't say for them. But we don't do this.
We don't make hobbyist mistakes of randomly YOLO trying various "quantization" methods that only happen after all training and claim it a day, at all. Quantization was done before it went live.
Bjorkbat 2 days ago

Related, when o3 finally came out ARC-AGI updated their graph because it didn’t perform nearly as well as the version of o3 that “beat” the benchmark.
https://arcprize.org/blog/analyzing-o3-with-arc-agi
- beering 2 days ago
  
  The o3-preview test was with very expensive amounts of compute, right? I remember it was north of $10k so makes sense it did better
  - Bjorkbat a day ago
    
    Point remains though, they crushed the benchmark using a specialized model that you’ll probably never have access to, whether personally or through a company.
    They inflated expectations and then released to the public a model that underperforms
    
    throwaway314155 a day ago
    
    They revealed the price points for running those evaluations. IIRC the "high" level of reasoning cost tens of thousands of dollars if not more. I don't think they really inflated expectations. In fact a lot of what we learned is that ARC-AGI probably isn't a very good AGI evaluation (it claims to not be one, but the name suggests otherwise).
az226 2 days ago

Even classic GPT-4 from March 2023 was quantized to 4.5 bits.
smusamashah 2 days ago

Hw about testing same input vs output with same seed on different dates. If its a different model it will return different output.
- zomnoys 2 days ago
  
  Isn’t this not true since these models run with a non-zero temperature?
  - smusamashah 2 days ago
    
    You can set the temperature too.
resters 2 days ago

It's probably optimized in some way, but if the optimizations degrade performance, let's hope it is reflected in various benchmarks. One alternative hypothesis is that it's the same model, but in the early days they make it think "harder" and run a meta-process to collect training data for reinforcement learning for use on future models.
- SparkyMcUnicorn 2 days ago
  
  It's a bit dated now, but it would be cool if people submitted PRs for this one: https://aider.chat/docs/leaderboards/by-release-date.html
  - __mharrison__ 2 days ago
    
    Dated? This was updated yesterday https://aider.chat/docs/leaderboards/
    
    SparkyMcUnicorn 2 days ago
    
    My link is to the benchmark results _over time_.
    The main leaderboard page that you linked to is updated quite frequently, but it doesn't contain multiple benchmarks for the same exact model.
luke-stanley 2 days ago

I think the API has some special IDs to check for reproducibility of the environment.
jstummbillig 2 days ago

You can just give it a go for very little money (in Windsurf it's 1x right now), and see what it does. There is no room for conspiracy here, because you can simple look at what it does. If you don't like it, so won't others, and then people will not use it. People are obviously very capable of (collectively) forming opinions on models, and then vote with their wallet.
segmondy 2 days ago

you don't, so run your own model.

BeetleB 2 days ago

Why does OpenAI require me to verify my "organization" (which requires my state issued ID) to use o3?

valleyer 2 days ago

Don't bother anyway. There are lots of cases of people trying and failing to go through the process, and there is no way to try a second time.
https://community.openai.com/t/session-expired-verify-organi...
https://community.openai.com/t/callback-from-persona-id-chec...
https://community.openai.com/t/verification-issue-on-second-...
https://community.openai.com/t/verification-not-working-and-...
https://community.openai.com/t/organization-verfication-fail...
https://community.openai.com/t/help-organization-could-not-b...
https://community.openai.com/t/to-verify-an-organization-acc...
- BeetleB 2 days ago
  
  Yikes! Indeed, I won't bother.
bearjaws 2 days ago

Prevent Deepseek R2 being trained on it
- piskov 2 days ago
  
  If only there were people with multiple passports or, I don’t know, Kyrgyzstan.
  How exactly will passport check prevent any training?
  At most this will block API access to your average Ivan, not a state actor
  - ivanmontillam 2 days ago
    
    I'm an average Ivan, and I got access.
  - BeetleB 2 days ago
    
    Yeah, I just don't see myself using o3 when I have Gemini-2.5 Pro. I don't recall if Google Cloud verified my ID in the past, though. Still, no need to let yet another organization have my data if I'm not getting something better in return.
    
    jjani 2 days ago
    
    > I don't recall if Google Cloud verified my ID in the past, though
    It generally does not. No idea if there are edge cases where it does, but that's definitely not the norm for the average user.
  - yyhhsj0521 a day ago
    
    It's most likely for regulation compliance, instead of a sincere attempt to block anyone from training on them.

lxgr 2 days ago

Is there also a corresponding increase in weekly messages for ChatGPT Plus users with o3?

In my experience, o4-mini and o4-mini-high are far behind o3 in utility, but since I’m rate-limited for the latter, I end up primarily using the former, which has kind of reinforced the perception that OpenAI’s thinking models are behind the competition altogether.

sunaookami 2 days ago

200 per week now: https://x.com/kevinweil/status/1932565467736027597
el_benhameen 2 days ago

My usage has also reflected the pretty heavy rate limits on o3. I find o4-mini-high to be quite good, but I agree that I would much rather use o3. Hoping this means an increase in the limits.
sagarpatil 2 days ago

Before: 50 messages per week Now: 100 messages per week
- lxgr 2 days ago
  
  That’s already been the case for a few weeks though, right? and it’s up from 50, whereas a price reduction by 80% would correspond to 5x the quota extrapolating linearly.
  - johnnyApplePRNG 2 days ago
    
    Agreed 100% o3 is great but the rate limit window and the quota itself both render it almost useless for more than one off fixes.
    It's great with those, however!

mrcwinn 2 days ago

Only at HN can the reaction to an 80% price drop be a wall of criticism.

alternatex 2 days ago

"80% price drop" is just a title. The wall of criticism is for the fine-print.
- beering 21 hours ago
  
  What in the fine print are we criticising? Most of the negative comments make no reference to any fine print on their website.
- xboxnolifes a day ago
  
  The wall of criticism is all wild speculation, not fine print.

coffeecoders 2 days ago

Despite the popular take that LLMs have no moat and are burning cash, I find OpenAI's situation really promising.

Just yesterday, they reported an annualized revenue run rate of 10B. Their last funding round in March valued them at 300B. Despite losing 5B last year, they are growing really fast - 30x revenue with over 500M active users.

It reminds me a lot of Uber in its earlier years—fast growth, heavy investment, but edging closer to profitability.

bitpush 2 days ago

The problem is your costs also scale with revenue. Ideally you want to have control costs as you scale (the first you build is expensive, but as you make more your costs come down).
For OpenAI, the more people use the product, the same you spend on compute unless they can supplement it with another ways of generating revenue.
I dont unfortunately think OpenAI will be able to hit sustained profitability (see Netflix for another example)
- simonw 2 days ago
  
  "... as you make more your costs come down"
  I'd say dropping the price of o3 by 80% due to "engineers optimizing inferencing" is a strong sign that they're doing exactly that.
  - asadotzler 2 days ago
    
    You trust their PR statements?
    
    simonw 2 days ago
    
    Seems more likely to me then them deciding to take a sizable loss on inference by dropping prices by 80% for no reason.
    Optimizing serving isn't unlikely: all of the big AI vendors keep finding new efficiencies, it's been an ongoing trend over the past two years.
    
    bitpush 2 days ago
    
    This is my sense as well. You dont drop 80% on a random Tuesday based on scale, you do it with an explicit goal to get market share at the expense of $$.
    
    TZubiri 2 days ago
    
    It's not a PR statement, it's a change in price. Literally putting money where the mouth is.
    
    theappsecguy 2 days ago
    
    Or they are trying to gobble up market share because Anthropic has been much better than OpenAI
    
    petesergeant 2 days ago
    
    Providers are exceptionally easy to switch. There's no moat for enterprise-level usage. There's no "market share" to gobble up because I can change a line in my config, run the eval suite, and switch immediately to another provider.
    This is marginally less true for embedding models and things you've fine-tuned, but only marginally.
    
    Davidzheng 2 days ago
    
    o3 probably used to have a HUGE profit margin on inference, so I'd say it's unclear how much optimo was done;
    
    programjames 2 days ago
    
    I find it pretty plausible they got an 80% speedup just by making optimized kernels for everything. Even when GPUs say they're being 100% utilized, there are so many improvements to be made, like:
    - Carefully interleaving shared memory loading with computation, and the whole kernel with global memory loading.
    - Warp shuffling for softmax.
    - Avoiding memory access conflicts in matrix multiplication.
    I'm sure the guys at ClosedAI have many more optimizations they've implemented ;). They're probably eventually going to design their own chips or use photonic chips for lower energy costs, but there's still a lot of gains to be made in the software.
    
    Davidzheng 2 days ago
    
    yes I agree that it is very plausible. But it's just unclear whether it is more of a business decision or a real downstream effect of engineering optimizations (which I assume are happening everyday at OA)
  - lossolo 2 days ago
    
    > "engineers optimizing inferencing"
    They finally implemented DeepSeek open source methods for fast inference?
- Legend2440 2 days ago
  
  >(see Netflix for another example)
  Netflix has been profitable for over a decade though? They reported $8.7 billion in profit in 2024.
  - amazingamazing 2 days ago
    
    They increased prices and are not selling a pure commodity tho
- ACCount36 2 days ago
  
  The bulk of AI costs are NOT in inference. They're in R&D and frontier training runs.
  The more inference customers OpenAI has, the easier it is for them to reach profitability.
- tptacek 2 days ago
  
  All costs are not equal. There is a classic pattern of dogfights for winner-take-most product categories where the long term winner does the best job of acquiring customers at the expense of things like "engineering to reduce costs". I have no idea how the AI space is going to shake out, but if I had to pick between OpenAI's mindshare in the broadest possible cohort of users vs. best/most efficient model, I'd pick the customers.
  Obviously, lots of nerds on HN have preferences for Gemini and Claude, and having used all three I completely get why that is. But we should remember we're not representative of the whole addressable market. There were probably nerds on like ancient dial-up bulletin boards explaining why Betamax was going to win, too.
  - awongh 2 days ago
    
    We don't even know yet if the model is the product though, and if OpenAI is the company that will make the AI product/model, (chat that keeps expanding into other functionalities and capabilities) or will it be 10,000 companies using the OpenAI models. (well, it's probably both, but in what proportion of revenue)
    
    tptacek 2 days ago
    
    Right, but it might not even matter if all the competitors are in the ballpark of the final product/market fit and OpenAI holds a commanding lead in customer acquisition.
    Again: I don't know. I've got no predictions. I'm just saying that the logic where OpenAI is outcompeted on models themselves and thus automatically lose does not hold automatically.
  - TZubiri 2 days ago
    
    Unlike Uber or whatsapp, there's no network effect. Don't think this is a winner takes all market, there was an article where we had this discussion earlier. Players who get a small market share are immediately profitable proportional to the market share (given a minimum size is exceeded.)
  - stavros 2 days ago
    
    [flagged]
- Magmalgebra 2 days ago
  
  Anyone concerned about cost should remember that those costs are dropping exponenentially.
  Similarly, nearly all AI products but especially OpenAI are heavily _under_ monetized. OpenAI is an excellent personal shopper - the ad revenue that could be generated from that rivals Facebook or Google.
  - smelendez 2 days ago
    
    It wouldn't surprise me if they try, but ironically if GPT is a good personal shopper, it might make it harder to monetize with ads because people will trust the bot's organic responses more than the ads.
    You could override its suggestions with paid ones, or nerf the bot's shopping abilities so it doesn't overshadow the sponsors, but that will destroy trust in the product in a very competitive industry.
    You could put user-targeted ads on the site not necessarily related to the current query, like ads you would see on Facebook, but if the bot is really such a good personal shopper, people are literally at a ChatGPT prompt when they see the ads and will use it to comparison shop.
    
    whiplash451 2 days ago
    
    Alternative: let users reduce their monthly bill by accepting a sponsored answer with a dedicated button in the UI
    (with many potential variants)
- marsten 2 days ago
  
  You raise a good point that this isn't a low marginal cost business like software, telecom, or (most of) the web. Efficiency will be a big advantage for companies that can achieve it, in part because it will let them scale to new AI use cases.
  With the race to get new models out the door, I doubt any of these companies have done much to optimize cost so far. Google is a partial exception – they began developing the TPU ten years ago and the rest of their infrastructure has been optimized over the years to serve computationally expensive products (search, gmail, youtube, etc.).
- aizk 2 days ago
  
  > sustained profitability (see Netflix for another example)
  What? Netflix is incredibly profitable.
  - bitpush 2 days ago
    
    Probably a bad example from my part, but also because of increasing the costs and offering a tier with ads. I was mostly talking about the Netflix as it was originally concieved. "Give access to unlimited content at a flat fee", which didnt scale pretty well.
    
    whiplash451 2 days ago
    
    Isn't this exactly what they offer today?
therealdrag0 2 days ago

As an anecdote they have first mover advantage on me. I pay monthly but mostly because it’s good enough and I can’t be bothered to try a bunch out and switch. But if the dust settles and prices drop i would be motivated to switch. How much that matters maybe depends if their revenue comes from app users or API plans. And first mover only works once. Now they maybe coasting on name recognition, but otherwise new users maybe load balanced among all the options.
ToucanLoucan 2 days ago

I mean sure, it's very promising if OpenAI's future is your only metric. It gets notably darker if you look at the broader picture of ChatGPT (and company)'s impact on our society.
* We have people uploading tons of zero-effort slop pieces to all manner of online storefronts, and making people less likely to buy overall because they assume everything is AI now
* We have an uncomfortable community of, to be blunt, actual cultists emerging around ChatGPT, doing all kinds of shit from annoying their friends and family all the way up to divorcing their spouses
* Education is struggling in all kinds of ways due to students using (and abusing) the tech, with already strained administrations struggling to figure out how to navigate it
Like yeah if your only metric is OpenAI's particular line going up, it's looking alright. And much like Uber, it's success seems to be corrosive to the society in which it operates. Is this supposed to be good news?
- BugheadTorpeda6 a day ago
  
  I absolutely agree. I find it abhorrent.
- arealaccount 2 days ago
  
  Dying for a reference on the cult stuff, a quick search didn’t provide anything interesting.
  - ToucanLoucan 2 days ago
    
    Scroll through the ChatGPT subreddit right now and tell me there isn't a TON of people in there who are legitimately unwell. Reads like the back page notes of a dystopian novel.
    
    arandomhuman 2 days ago
    
    I think this is less caused by ChatGPT/LLMs and more of a phenomenon in social media circles where people flock to "the thing" and have poor social skills and mental health generally speaking.
  - MangoToupe 2 days ago
    
    In addition to what the parent commenter was likely referring to, there are also the Zizians: https://en.wikipedia.org/wiki/Zizians
  - wizzwizz4 2 days ago
    
    https://futurism.com/chatgpt-mental-health-crises, which references the more famous https://www.rollingstone.com/culture/culture-features/ai-spi... but is a newer article.
    
    xz0r 2 days ago
    
    The article links to a forum post which kind of explains how engagement is maximised https://community.openai.com/t/uncovering-the-intent-behind-...
    
    wizzwizz4 2 days ago
    
    Poetic, but I don't think that really explains anything.
    
    xz0r a day ago
    
    Ever thought about how there's a magnetic quality to mirrors that keeps us looking? I see GPT in a similar light, it functions as a mirror, reflecting aspects of our reality.
- SlowTao 2 days ago
  
  Yes but in a typical western business sense they are merely optimizing for user engadgement and profits. What happens to society a decade from now because of all the slop being produced, that is not their concern. Facebook is just about connecting friends right, totally wont become a series of information moats and bubbles controlled by the algorithms...
  A great communicator on the risks of AI being to heavily intergrated into society is Zak Stein. As someone who works in education, they are see first hand how people are becoming dependent on this stuff rather than any kind of self improvement. The people who are just handing over all their thinking to the machine. It is very bizarre and I am seeing it in my personal experience a lot more over the last few months.
jillesvangurp 2 days ago

The moat is increasingly becoming having access to billions needed to finance the infrastructure needed to serve billions. That's why Google is still in the game. They have that and they are very good at massive scale and have some cost advantages there.
OpenAI is very good at this as well because of their brand name. For many people ChatGPT is all they know. That's the one that's in the news. That's the one everybody keeps talking about. They have many millions of paying users at this point.
This is a non trivial moat. If you can only be successful by not serving most of the market for cost reasons, then you can't be successful. It's how Google has been able to guard its search empire for a quarter century. It's easy to match what they do algorithmically. But then growing from a niche search engine that has maybe a few tens of thousands of users (e.g. Kagi) to Google scale serving essentially most of this planet (minus some fire walled countries like Russia and China), is a bit of a journey.
So Google rolling out search integration is a big deal. It means they are readying themselves for that scale and will have billions of users exposed to this soon.
> Their last funding round in March valued them at 300B. Despite losing 5B last year, they are growing really fast
Yes, they are valued based on world+dog needing agentic AIs and subscribing to the extent of tens or hundreds of dollars/month. It's going to outstrip revenue things like MS Office in its prime.
5B loss is peanuts compared to that. If they weren't burning that, their ambition level would be too low.
Uber now has a substantial portion of the month. They have about 3-4 billion revenue per month. A lot of cost obviously. But they managed 10B profit last year. And they are not done growing yet. They were overvalued at some point and then they crashed, but they are still there and it's a pretty healthy business at this point and that reflects in their stock price. It's basically valued higher now than at the time of the Softbank investment pre-IPO. Of course a lot of stuff needed to be sorted out for that to happen.
seydor 2 days ago

their moat is leaky because llm prices will be dropping forever and the only viable model will be a free model. Eventually everyone will catch up.
Plus there is the thing that "thinking models" can't really solve complex tasks / aren't really as good as they are believed to be .
- Zaheer 2 days ago
  
  I would wager most of their revenue is from the subscriptions - both consumer and business. That pricing is detached from the API pricing. The heavy emphasis on applications more recently is because they realize this as well.
rgavuliak 2 days ago

I don't think the no moat approach makes sense. In a world where more an more content and interaction is done with and via LLMs, the data of your users chatting with your LLM is a super valuable dataset.

blueblisters 2 days ago

This is the best model out there, priced level or lesser than Claude and Gemini

They’re not letting the competition breathe

Davidzheng 2 days ago

Gemini is close (if not better) so it just makes sense no? o3-pro might be ahead of pack tho
- blueblisters 2 days ago
  
  o3 does better especially if you use the api (not ChatGPT)
dorianjp a day ago

appreciate this, the faster we get to cheap commoditization, the better

JojoFatsani 2 days ago

O3 is really good. I haven’t had the same results with o4 unfortunately

vessenes a day ago

Wait, you have access to o4? All I see is o4-mini, a distill of o4. I would not expect that to beat o3/o3-pro.
muzani 2 days ago

It's one of the most unfortunate naming conventions

ucha a day ago

Can we know for sure that the price drop is accompanied by a change in the model such as quantization?

On twitter, some people say that some models perform better at night when there is a less demand which allows them to serve a non-quantized model.

Since the models are only available through API and there is no test to check which version of the model is served, it's hard to know what we're buying...

ramesh31 2 days ago

Anthropic will need to follow suit with Opus soon. It is simply too expensive for anything by an order of magnitude.

sagarpatil 2 days ago

Have they ever decreased the price in the past? I don’t remember.

seydor 2 days ago

when the race to the bottom reaches the bottom, the foundation model companies will be bought by ... energy companies. You 'll be paying for AI with your electricity bill

paxys 2 days ago

It'll be the opposite. Large tech companies are already running their own power plants.
- andyferris 2 days ago
  
  Yes - it’s common in traditional industries too. In my home town the aluminum refinery bought the power station to improve reliability (I should add - through upgrades not screwing over the rest of the power users).

ninetyninenine 2 days ago

You know. because LLMs can only be built by corporations... but because they're so easy to build, I see the price going down massively thanks to competition. Consumers benefit because all the companies are trying to out run each other.

croes 2 days ago

Easy doesn’t mean cheap.
They need lots of energy and customers don’t pay much, if they pay at all
- briian 2 days ago
  
  Exactly,
  The developers of AI models do have a moat, the cost of training the model in the first place.
  It's 90% of the low effort AI wrappers with little to no value add who have no moat.
codr7 2 days ago

And then they all go out of business, since models cost a fortune to build, and their fan club is left staring at their computers trying to remember how to do anything without getting it served on a silver plate.
- merth 2 days ago
  
  Investors pouring money, its probably impossible to go out of business, at least for the big ones, until investors realise this is wrong hill to die on.
  - codr7 2 days ago
    
    Which they will eventually; so the point stands, no matter how unpopular with the AI excusers out there.
- wrsh07 2 days ago
  
  I expect they don't go out of business: at worst they don't start their next training run quite as aggressively and instead let their new very good model be profitable for a minute
  Many many companies are currently thrilled to pay the current model prices for no performance improvement for 2-3 years
  We still have so many features to build on top of current capabilities

ilaksh 2 days ago

Maybe because they also are releasing o3-pro.

OutOfHere 2 days ago

o3 is very much needed in VSCode GitHub CoPilot for Ask/Edit/Agent modes. It is sorely missing there.

janstice a day ago

Sure is - and o3 is missing from the OpenAI models that Azure is serving, which I suspect isn’t a coincidence - if OpenAI has some secret sauce that lets them undercut resellers this might shake up agreements for a bit.

monster_truck a day ago

Curious that the number of usages for plus users remained the same. I don't think they're actually doing anything material to lower the cost by a meaningful amount. It's just margin they've always had, and they cut it because magistral is pretty incredible for being completely free

sagarpatil 2 days ago

Meanwhile Apple: Liquid Glass

visiondude 2 days ago

always seemed to me that efficient caching strategies could greatly reduce costs… wonder if they cooked up something new

xmprt 2 days ago

How are LLMs cached? Every prompt would be different so it's not clear how that would work. Unless you're talking about caching the model weights...
- hadlock 2 days ago
  
  I've asked it a question not in it's dataset three different ways and I see the same three sentences in the response, word for word, which could imply it's caching the core answer. I hadn't previously seen this behavior before this last week.
  - beering 20 hours ago
    
    Isn’t the simpler explanation that if you ask the same question, there’s a chance you would get the same answer?
    In this case you didn’t even get the same answer, you only happened to have one sentence in the answer match.
- HugoDias 2 days ago
  
  This document explains the process very well. It’s a good read: https://platform.openai.com/docs/guides/prompt-caching
  - xmprt 2 days ago
    
    That link explains how OpenAI uses it, but doesn't really walk through how it's any faster. I thought the whole point of transformers was that inference speed no longer depended on prompt length. So how does caching the prompt help reduce latency if the outputs aren't being cached.
    > Regardless of whether caching is used, the output generated will be identical. This is because only the prompt itself is cached, while the actual response is computed anew each time based on the cached prompt
    
    singron a day ago
    
    > I thought the whole point of transformers was that inference speed no longer depended on prompt length
    That's not true at all and is exactly what prompt caching is for. For one, you can at least populate the attention KV Cache, which will scale with the prompt size. It's true that if your prompt is larger than the context size, then the prompt size no longer affects inference speed since it essentially discards the excess.
  - catlifeonmars 2 days ago
    
    > OpenAI routes API requests to servers that recently processed the same prompt,
    My mind immediately goes to rowhammer for some reason.
    At the very least this opens up the possibility of some targeted denial of service
    
    xmprt 2 days ago
    
    Later they mention that they have some kind of rate limiting because if over ~15 requests are being processed per minute, the request will be sent to a different server. I guess you could deny cache usage but I'm not sure what isolation they have between different callers so maybe even that won't work.
    
    catlifeonmars 2 days ago
    
    So the doc mentions you can influence the cache key by passing an optional user parameter. It’s unclear from the doc whether the user parameter is validated or if you can just provide an arbitrary string.
    
    catlifeonmars 2 days ago
    
    15 requests/min is pretty low. Depending on how large the fleet is you might end up getting load balanced to the same one and if it’s round robin then it would be deterministic
- amanda99 2 days ago
  
  You would use a KV cache to cache a significant chunk of the inference work.
  - xmprt 2 days ago
    
    Using KV in the caching context is a bit confusing because it usually means key-value in the storage sense of the word (like Redis), but for LLMs, it means the key and value tensors. So IIUC, the cache will store the results of the K and V matrix multiplications for a given prompt and the only computation that needs to be done is the Q and attention calculations.
  - biophysboy 2 days ago
    
    Do you mean that they provide the same answer to verbatim-equivalent questions, and pull the answer out of storage instead of recalculating each time? I've always wondered if they did this.
    
    Traubenfuchs 2 days ago
    
    I bet there is a set of repetitive single, or two, question user requests that makes out a sizeable amount of all requests. The models are so expensive to run, 1% would be enough. Much less than 1%. To make it less obvious they probably have a big set of response variants. I don't see how they would not do this.
    They probably also have cheap code or cheap models that normalize requests to increase cache hit rate.
    
    koakuma-chan 2 days ago
    
    The prompt may be the same but the seed is different every time.
    
    biophysboy 2 days ago
    
    Could you not cache the top k outputs given a provided input token set? I thought the randomness was applied at the end by sampling the output distribution.
- koakuma-chan 2 days ago
  
  A lot of the prompt is always the same: the instructions, the context, the codebase (if you are coding), etc.
- tasuki 2 days ago
  
  > Every prompt would be different
  No? Eg "how to cook pasta" is probably asked a lot.

MallocVoidstar 2 days ago

Note that they have not actually dropped the price yet: https://x.com/OpenAIDevs/status/1932463601119637532

> We’ll post to @openaidevs once the new pricing is in full effect. In $10… 9… 8…

There is also speculation that they are only dropping the input price, not the output price (which includes the reasoning tokens).

sunaookami 2 days ago

I think that was a joke. New pricing is already in place:
Input: $2.00 / 1M tokens
Cached input: $0.50 / 1M tokens
Output: $8.00 / 1M tokens
https://openai.com/api/pricing/
Now cheaper than gpt-4o and same price as gpt-4.1 (!).
- runako 2 days ago
  
  > Now cheaper than gpt-4o and same price as gpt-4.1 (!).
  This is where the naming choices get confusing. "Should" o3 cost more or less than GPT-4.1? Which is more capable? A generation 3 of tech intuitively feels less advanced than a 4.1 of a (similar) tech.
  - jacob019 2 days ago
    
    Do we know parameter counts? The reasoning models have typically been cheaper per token, but use more tokens. Latency is annoying. I'll keep using gpt-4.1 for day-to-day.
  - koakuma-chan 2 days ago
    
    o3 is a reasoning model, GPT-4.1 is not. They are orthogonal.
    
    runako 2 days ago
    
    My quibble is with naming choices and differentiating. Even here they are confusing:
    - o4 is reasoning
    - 4o is not
    They simply do not do a good job of differentiating. Unless you work directly in the field, it is likely not obvious what is the difference between "our most powerful reasoning model" and "our flagship model for complex tasks."
    "Does my complex task need reasoning or not?" seems to be how one would choose. (What type of task is complex but does not require any reasoning?) This seems less than ideal!
    
    koakuma-chan 2 days ago
    
    This is true, and I believe apps automatically route requests to appropriate models for normie users.
- MallocVoidstar 2 days ago
  
  No, people had tested it after Altman's announcement and had confirmed that they were still being billed at the original price. And I checked the docs ~1h after and they still showed the original price.
  The speculation of only input pricing being lowered was because yesterday they gave out vouchers for 1M free input tokens while output tokens were still billed.
- agsqwe 2 days ago
  
  thinking models produce a lot of internal output tokens making them more expensive than non-reasoning models for similar prompt and visible output lengths
- rvnx 2 days ago
  
  It is slower though
- vitaflo 2 days ago
  
  Still 4x more expensive than Deepseek R1 tho.

nikcub 2 days ago

fyi the price drop has been updated in Cursor:

https://x.com/cursor_ai/status/1932484008816050492

teaearlgraycold 2 days ago

Personally I've found these bigger models (o3/Claude 4 Opus) to be disappointing for coding.

rvnx 2 days ago

Opus is really great but through Claude Code. If you used Cursor or RooCode it could be normal that you get disappointed
- bitpush 2 days ago
  
  This matches my experience, but cant explain it. Do you know what's going on?
  - eunoia 2 days ago
    
    My understanding is context size. Companies like Cursor are trying to minimize the amount of context sent to the models to keep their own costs down. Claude Code seems to send a lot more context with every request and that seems to make the difference.
  - supermdguy 2 days ago
    
    Just guessing, but the new Opus was probably RL tuned to work better with Claude Code's tool calls
- jedisct1 2 days ago
  
  I got the opposite experience. Not with Opus (too expensive), but with Sonnet. I got things done way more efficiently when using Sonnet with Roo than with Claude Code.
  - rgbrenner 2 days ago
    
    same. i ran a few tests ($100 worth of api calls) with opus 4 and didn’t see any difference compared to sonnet 4 other than the price.
    also no idea why he thinks roo is handicapped when claude code nerfs the thinking output and requires typing “think”/think hard/think harder/ultrathink just to expand the max thinking tokens.. which on ultrathink only sets it at 32k… when the max in roo is 51200 and it’s just a setting.
    
    rvnx 2 days ago
    
    I think I could share a trick that could help:
    From my experience (so not an ultimate truth) Claude is not so great at taking the decision for planning by its own: it dives immediately into coding.
    If you ask it to think step-by-step it still doesn’t do it but Gemini 2.5 Pro is good at that planning but terrible at actual coding.
    So you can use Gemini as planner and Claude as programmer and you get something decent on RooCode.
    This “think wisely” that you have to repeat 10x in the prompt is absolutely true
    
    rgbrenner a day ago
    
    I think you misread my comment. I wasn't asking for help. I get consistent good output from Sonnet 4 using RooCode, without needing Gemini for planning.
    Edit: I think I know where our miscommunication is happening...
    The "think"/"ultrathink" series of magic words are a claudecode specific feature used to control the max thinking tokens in the request. For example, in claude code, saying "ultrathink" sets the max thinking tokens to 32k.
    On other clients these keywords do nothing. In Roo, max thinking tokens is a setting. You can just set it to 32k, and then that's the same as saying "ultrathink" in every prompt in claudecode. But in Roo, I can also setup different settings profiles to use for each mode (with different max thinking token settings), configure the mode prompt, system prompt, etc. No magic keywords needed.. and you have full control over the request.
    Claude Code doesn't expose that level of control.
apwell23 2 days ago

i found them all disappointing in their own ways. Atleast deepseek models actually listen to what i say instead of ignoring me doing their own thing like a toddler.

minimaxir 2 days ago

...how? I'd understand a 20-30% price drop from infra improvements for a model as-is, but 80%?

I wonder if "we quantized it lol" would classify as false advertising for modern LLMs.

tofof 2 days ago

Presumably because the price was about 5x higher to begin with than any the competitors at the same tier of performance? Perhaps it's better to get paid anything at all than to just lose 100% of the customers.
drexlspivey 2 days ago

Deepseek made a few major innovations allowing them to achieve major compute efficiency and then published them. My guess is that OpenAI just implemented these themselves.
- vitaflo 2 days ago
  
  Wouldn’t surprise me. And even with this price cut it’s still 4x more expensive than Deepseek R1 is.

alliao 2 days ago

it used to take decades of erosion to make google search a hot mess, now that everything's happening in light speed, we get days for AI models to decay to the point of hot mess again..

maxcomperatore a day ago

groq is better

jsnider3 a day ago

No.

candiddevmike 2 days ago

It's going to be a race to the bottom, they have no moat.

rvnx 2 days ago

Especially now that they are second in the race (behind Anthropic) and lot of free-to-download and free-to-use models are now starting to be viable competitors.
Once new MacBooks and iPhones have enough memory onboard this is going to be a disaster for OpenAI and other providers.
- mattnewton 2 days ago
  
  I'm not sure they're scared of Anthropic - they're doing great work but afaict running into some scaling issues and really focused on winning over developers at the moment.
  If I was OpenAI (or Anthropic for that matter) I would remain scared of Google, who is now awake and able to dump Gemini 2.5 pro on the market at costs that I'm not sure people without their own hardware can compete with, and with the infrastructure to handle everyone switching to them tomorrow.
  - piuantiderp 2 days ago
    
    Google is going to lap them. The hardware muscle they have has not even started flexing
  - itomato 2 days ago
    
    Codex Research Preview appeared in my account in the early AM.
- koakuma-chan 2 days ago
  
  What do you mean, Google is number 1
- slashdev 2 days ago
  
  Thrid for coding, after Anthropic, and Gemini, which was leading last I checked.
- aerhardt 2 days ago
  
  OpenAI are second in the race to Anthropic in some benchmarks (maybe?), but OpenAI still dwarves Anthropic in distribution and popularity.
  - ratedgene 2 days ago
    
    That's slowly changing. I know some relatively non-tech savvy young people using things like Claude for various reasons, so people are exploring options.
    
    jstummbillig 2 days ago
    
    Very, very slowly.
    OpenAI vs Anthropic on Google Trends
    https://trends.google.com/trends/explore?date=today%203-m&q=...
    ChatGPT vs Claude on Google Trends
    https://trends.google.com/trends/explore?date=today%203-m&q=...
    
    sndean 2 days ago
    
    I wonder how much of this is brand name? Like Kleenex. Non-tech people might not search for LLM, generative AI, etc. ChatGPT may just be what people have heard of. I’m assuming OpenAI has a large advantage over Anthropic, and the name helps, but I bet the name is exaggerating the difference here a bit. Not everyone buys Kleenex branded Kleenex.
    
    jstummbillig 2 days ago
    
    You are not going to find Claude when googling for ChatGPT
    
    rvnx 2 days ago
    
    This is such a big difference, thank you for sharing it, I didn't expect the gap to be _that_ huge
- jdprgm 2 days ago
  
  While mac unified ram inference is great for prosumers+ I really don't foresee Apple making 128GB+ options affordable enough to be attractive for inference for the general public. iPhone even less so considering the latest is only at 8GB. Meanwhile the best model sizes will just keep growing.
- paxys 2 days ago
  
  Third behind Anthropic/Google. People are too quick to discount mindshare though. For the vast majority of the world's population AI = LLM = ChatGPT, and that itself will keep OpenAI years ahead of the competition as long as they don't blunder away that audience.
m3kw9 2 days ago

LLM inferencing is race to the bottom but the service layers on top isn’t. People always pay much more for convenience, those are the thing OpenAI focuses on and is harder to replicate
mrweasel 2 days ago

My understanding was that OpenAI couldn't make money at their previous price point, and I don't think operation and training cost have gone down sufficiently to make up for those short comings. So how are they going to make money by lowering the price by 80%?
I get the point is to be the last man standing, and poaching customers by lowering the price, and perhaps attract a few people who wouldn't have bought a subscription at the higher price. I just question how long investors can justify pouring money into OpenAI. OpenAI is also the poster child for modern AI, so if they fail the market will react badly.
Mostly I don't understand Silicon Valley venture capital, but dumping price, making wild purchases for investor money and mostly only leading on branding, why isn't this a sign that OpenAI is failing?
- simonw 2 days ago
  
  OpenAI's Adam Groth credits "engineers optimizing inferencing" for the price drop: https://twitter.com/TheRealAdamG/status/1932440328293806321
  That seems likely to me, all of the LLM providers have been consistently finding new optimizations for the past couple of years.
joshuajooste05 2 days ago

There was an article on here a week or two ago on batch inference.
Do you not think that batch inference gives at least a bit of a moat whereby unit costs fall with more prompts per unit of time, especially if models get more complicated and larger in the future?
- minimaxir 2 days ago
  
  Batch inference is not exclusive to OpenAI.
Szpadel 2 days ago

for sure they are no longer clear winners, but they try to be just barely on top of others.
right now new Gemini surpassed their o3 (barely) in benchmarks for significantly less money so they cut pricing to be still competitive.
I bet they didn't released o4 not because it's not competitive, but because they are doing Nvidia game: release new product that is just enough better to convince people to buy it. so IMO they are holding full o4 model to have something to release after competition release something better that their top horse

biophysboy 2 days ago

I don't know if this is OpenAI's intention, but the little message "you've reached your usage limit!" is actively disincentivizing me from subscribing. For my purposes, the free model is more than good enough; the difference before and after is negligible. I honestly wouldn't pay a dollar.

That said, I'm absolutely willing to hear people out on "value-adds" I am missing out on; I'm not a knee-jerk hater (For context, I work with large, complex & private databases/platforms, so its not really possible for me to do anything but ask for scripting suggestions).

Also, I am 100% expecting a sad day when I'll be forced to subscribe, unless I want to read dick pill ads shoehorned in to the answers (looking at you, YouTube). I do worry about getting dependent on this tool and watching it become enshittified.

Traubenfuchs 2 days ago

> "you've reached your usage limit!"
Just switch to a competitors free offering. There are enough to cycle through not to be hindered by limits. I wonder how much money I have cost those companies by now?
How anyone believes there is any moat for anyone here is beyond me.
- wrsh07 2 days ago
  
  I expect the answer is <$1 as someone who shares a discord server with a friend where we egregiously ping the models
wrsh07 2 days ago

o3 is so good it's worth paying for a minute (just for plus) just to see what it's like
I've never used anything like it. I think new Claude is similarly capable

stevev 2 days ago

It was only a matter of time considering Deepseek R1’s recent release. OpenAI’s competitor is an open-source product that offers similar quality at a tenth of the cost. Now they’re just trying to prevent customers from leaving.

boyka 2 days ago

80%? So this is either same Trump style "art of the deal" with setting unreasonable pricing in the first place or desperately needing customers?

unraveller 2 days ago

I have no moat and I must make these GPUs scream.

godelski 2 days ago

For those wondering

  Yesterday:               Today
  -------------           -------------
  Price                   Price
  Input:                  Input:
  $10.00 / 1M tokens      $2.00 / 1M tokens
  Cached input:           Cached input:
  $2.50 / 1M tokens       $0.50 / 1M tokens
  Output:                 Output:
  $40.00 / 1M tokens      $8.00 / 1M tokens

https://archive.is/20250610154009/https://openai.com/api/pri...

https://openai.com/api/pricing/

koakuma-chan 2 days ago

OpenAI dropped the price by so much that the server also went down.

pbasista 2 days ago

Is the price drop really the reason for their recent outage?
Or is the price drop an attempt to cover up bad news about the outage with news about the price drop?
- johanyc 2 days ago
  
  > Or is the price drop an attempt to cover up bad news about the outage with news about the price drop?
  This makes no sense. No way a global outage will get less coverage than the price drop.
  Also the earliest sign of price drop is this tweet 20 hrs ago (https://x.com/OpenAIDevs/status/1932248668469445002), which is earlier than the earliest outage reports 13hrs ago on https://downdetector.com/status/openai/
  - koakuma-chan 2 days ago
    
    > No way a global outage will get less coverage than the price drop.
    Have you seen today's outage on any news outlet? I have not. Is there an HN thread?
    
    johanyc 2 days ago
    
    https://www.forbes.com/sites/tylerroush/2025/06/10/chatgpt-o...

man4 2 days ago

[dead]

polskibus 2 days ago

Is this a reaction to Apple paper showing that reasoning models don’t really reason?

anothermathbozo 2 days ago

Why would that be?
dragandj a day ago

If these kids could read, they would be very upset.

madebywelch 2 days ago

They could drop the price 100% and I still wouldn't use it, so long as they're retaining my data.

simonw 2 days ago

Sounds like you want their Zero Data Retention plan: https://platform.openai.com/docs/guides/your-data#zero-data-...
(It's "contact us" pricing, so I have no idea how much that would set you back. I'm guessing it's not cheap.)
- scudsworth 2 days ago
  
  it doesn't seem like this would supercede a court order
  - tech234a 2 days ago
    
    Actually it does according to https://openai.com/index/response-to-nyt-data-demands/
  - otterley a day ago
    
    The court order doesn't require OpenAI to modify their software. ZDR is implemented through a separate API with separate endpoints that never retained data in the first place.