OpenAI Codex on Mac: 48 Hours of Real Use (What Broke)

OpenAI shipped the biggest Codex update since its rebrand on Thursday afternoon — roughly an hour after Anthropic’s Opus 4.7 announcement. The timing wasn’t subtle. What it ships with isn’t either: Codex Desktop can now see your Mac screen, click your apps, type into your windows, and keep working while you go do something else.

Forty-eight hours in, people have already used it to sort their email, play Music through the Music app, run bug reproductions in parallel, and — in one viral demo — play a round of Slay the Spire 2 with a single prompt. Also: it’s broken on Intel Macs, it’s blocked in the EU and UK, and the memory feature will slow your app to a crawl if you turn it on today.

Here’s what actually works, what doesn’t, and whether you should bother installing it this weekend.

What Is Codex Desktop, Exactly?

If you’ve been using ChatGPT but haven’t touched Codex, the fast version: Codex is OpenAI’s coding agent. Think of it as a more capable cousin of ChatGPT that lives in a Mac app, talks to your editor, runs its own tests, and can now — as of April 16 — operate any other app on your machine with its own cursor.

For the pros: this is version 26.415. It runs on macOS (Intel support just landed but has bugs), uses the gpt-image-1.5 model for image generation, ships with 90+ plugins covering everything from Atlassian Rovo to Remotion to Microsoft Suite, and includes a preview of persistent memory. There’s a new $100/month Codex Pro tier with 5x the limits of the $20 Plus plan.

The one sentence that matters for most readers: Codex can now use your computer. Not in a “we made a tool call look like a click” way. Actual cursor on your actual screen, moving through actual apps. In parallel with you. Without hijacking the keyboard.

That’s the pitch. The reality is more interesting.

Setting It Up (and the Permissions Rabbit Hole)

Installing is the easy part. Open Codex, click Settings, find Computer Use, hit Install. macOS will then drag you through its usual dance: Privacy & Security > Screen Recording > toggle Codex > quit and reopen the app. Then Accessibility. Then quit and reopen again.

A Mac dev on X posted what a lot of people were thinking: this permission flow sucks. Not Codex’s fault — macOS forces every app that wants to see and click through this multi-screen drag-and-drop ordeal. OpenAI did what it could (the onboarding screens are nicer than most apps bother with), but you’re still clicking through five panels to grant one thing.

Once you’re through, there’s a second layer. Codex asks before touching any app the first time. You can say “Always allow” for apps you trust. You can keep a tight whitelist — just Safari and Keynote, say — or open the floodgates to everything.

Two things it won’t let you do no matter how many boxes you tick: automate Codex itself (so it can’t bypass its own approvals by prompting itself), or approve a sudo prompt on your behalf. If the system pops up asking for your admin password, that’s still your job.

One note for EU, UK, and Swiss readers: computer use is blocked in your region at launch. Someone is already posting a ProtonVPN workaround (step 4: “turn off VPN, enjoy”), but that’s a terms-of-service gamble you’d be making knowingly. Better to wait — OpenAI has said these regions are “coming soon.”

5 Things People Actually Did With It

Not demos OpenAI picked. Real tasks from the first 48 hours.

1. Sort your Mac Mail inbox by topic and priority

One of the earliest demos was a scientist asking Codex to open Mac Mail, read everything from the past day, and group it by topic and urgency. It opened the app, scrolled through threads, and came back with a clean summary — the kind of thing you’d normally pay an EA to do.

What makes this different from Claude’s Computer Use or Perplexity’s Personal Computer is that Codex stayed out of the way. Federico Viticci at MacStories called it “the best [computer use] feature I’ve ever tested” — partly because it reads the macOS accessibility tree (the same thing VoiceOver uses) instead of relying on screenshots and click approximations. It’s more precise. It also doesn’t need to bring Mail to the front, which means you can keep doing your actual work.

2. Multitask while it uses Messages

Another user asked Codex to screenshot their current chat window, open Messages, and send the screenshot to a friend. Nothing fancy — but the user was still writing an email on the same machine while the cursor moved through Messages on its own.

This is the “magical” part that keeps showing up in first reactions. Two cursors, two agents, one Mac. Parallel, not serial. Anthropic’s Claude Code shipped Agent Teams a day before this (April 15) — similar idea, different execution. Codex’s implementation feels closer to “background job” than “watch me work.”

3. Run multiple bug reproductions at once

A developer posted a workflow that resonated with the dev community: three agents, each reproducing a different bug, all running in parallel while he ate lunch. One used the Slack plugin to pull context from a thread. Another used GitHub. A third operated the actual app.

The 90+ plugins matter here. Each one is essentially a pre-wired connection to a service — Jira, Linear, Notion, GitHub, CircleCI, GitLab, Render, Remotion, Vercel, Microsoft Suite. You don’t set up OAuth for each; Codex handles it. This is OpenAI’s answer to the Claude Skills ecosystem that obra/superpowers and ui-ux-pro-max have been building toward. Both companies are now fighting over who owns the “agent plus ecosystem” stack.

4. Mirror your iPhone and control apps on your phone

This one surprised me. With macOS’s iPhone Mirror feature turned on, Codex’s computer use extends to your phone. One user had Codex tap through an iOS app via the mirrored window. It worked — slower and less accurate than on native Mac apps, but it worked.

That’s a whole category of automation nobody was talking about a week ago. If you need a thing done in an iOS-only app that has no API, you now have a path.

5. Build a Keynote deck from a rough outline while you’re on a call

Not a demo I saw publicly, but a use case that keeps coming up in private conversations: give Codex a bulleted outline, point it at Keynote, let it build a deck while you’re in a meeting. It opens the app, types your text, picks templates, drops in any images the gpt-image-1.5 model produces.

This is where Codex stops being a coding tool and starts being a general-purpose Mac assistant. Which, to be fair, is exactly what OpenAI says in the announcement: “Codex for (almost) everything.”

What It Can’t Do (and Where It Breaks)

Every company launch post skips this section. Here it is.

Intel Macs are broken. OpenAI added Intel support in this release, which is good news for anyone still on a 2019 MacBook Pro. But a Japanese user posted that the main Computer Use feature just doesn’t activate on Intel — “should be supported but there’s some kind of bug.” If you’re on Apple Silicon, you’re fine. If you’re not, wait for 26.416.

Memory will slow the app down. The preview memory feature (the thing that’s supposed to remember your preferences across threads) causes noticeable slowdowns. One user enabled it, watched Codex crawl, reverted it, and posted the commands. Memory is a “turn it on when they ship the production version” feature today.

Rate limits hit fast on the $20 Plus plan. Before computer use, people were already hitting limits on Codex Plus. Now that every task can involve multiple parallel agents and dozens of screen reads per minute, the complaint is louder. The new $100 Codex Pro tier offers 5x the limits — but if you don’t want to jump to that, you’re going to notice the ceiling.

Asana didn’t work. A user tried to set up an Asana project via computer use. It chugged. It failed. He switched to “make me a CSV for import,” got it in under a minute, and had the project live in five. The lesson: computer use is for things that can’t be done any other way. If there’s an API or an import path, use that.

Coding tasks still break in the middle. A developer asked Codex to edit a React component, run tests, and submit. Two out of three runs got stuck on the test step. Smooth in a demo video, rougher in daily use. This isn’t unique to Codex — every agent does this — but computer use doesn’t fix it.

It occasionally hijacks the screen. OpenAI’s pitch is that computer use runs in the background. Most of the time it does. But one user reported Codex effectively taking over his Mac — to the point he was considering buying a dedicated machine just for Codex work. Your mileage will vary by app.

Japanese input is buggy. A user in Japan reported that computer use can’t type Japanese correctly. If you work in a non-Latin-script language, validate before committing.

Terminal and Codex itself are off-limits. You can’t use computer use to drive the terminal (security design — no bypassing sandboxes through a sideways click). And Codex can’t operate Codex. Which is actually fine; you shouldn’t want it to.

Codex vs Claude Code After This Update

Before this week, the comparison was roughly: Codex is cheaper per token and better at terminal benchmarks; Claude Code writes cleaner code in blind evaluations. That’s still mostly true.

What changed: Codex now has computer use and parallel agents and persistent memory and 90+ plugins in a polished Mac app. Claude Code shipped Agent Teams two days earlier and has had computer use since March. Feature parity is now roughly where the press coverage said it would be. Which means the decision gets more interesting, not less.

Dimension	Codex Desktop (Apr 17)	Claude Code (current)
Computer use on Mac	Yes — background cursor, AX tree access	Yes — screenshot-based
Parallel agents	Yes — multiple cursors at once	Yes — via Agent Teams
Plugin/skill ecosystem	90+ official plugins	Obra/superpowers, community skills
Persistent memory	Preview (buggy today)	Yes — more mature
macOS polish	Purpose-built Mac app	Cross-platform, runs in terminal
Image generation	Built-in (`gpt-image-1.5`)	No — use external tools
$20 tier value	More usage per dollar on Plus	Runs out of a single deep session
SWE-bench Pro	Similar to Claude Code	Similar to Codex
Terminal-Bench 2.0	Noticeable lead	Slightly behind
Blind code quality eval	25% preferred	67% preferred
Availability	Mac first, no EU/UK yet	Everywhere

The short version: if you’re on a Mac and your work involves apps that don’t have APIs, Codex is the better tool this week. If you’re shipping production code where review quality matters more than speed, Claude Code still writes code people prefer when they don’t know which tool made it.

You don’t have to pick one, by the way. OpenAI also shipped an official codex-plugin-cc repo that lets Claude Code users delegate to Codex as a subagent. The feature-parity war is quietly turning into cross-ecosystem plumbing.

What This Means for You

If you’re a solo dev on a Mac: You have a tool that can now test your frontend across real apps (Chrome, Safari, your Electron build) without you building a test suite for it. The 48-hour “install it and see” test is probably worth your Saturday morning.

If you’re a prosumer — a consultant, indie, or solo operator: Skip the coding angle entirely. The email, Messages, Keynote, and research workflows are where this earns back the $20. Grant Codex access to your inbox and Slack, hand it your outline, let it draft while you’re on a call.

If you’re on ChatGPT Plus and wondering whether to upgrade: Stay on Plus for two weeks. Use what’s there. If you hit the rate limit more than twice a day, consider Pro. If you don’t, you don’t need it yet.

If you’re on an Intel Mac, in the EU/UK, or building Japanese-language workflows: Wait. The launch caught OpenAI with three rough edges, and they’ll patch them. The feature won’t go away.

The bottom line: Install it, spend an afternoon giving it real tasks, find the two or three things it actually does better than you do, then build those into your week. Skip the hype, skip the “replaces your job” takes, and treat it like what it is: a junior assistant who can click around in apps for you. The things junior assistants do well are the things this does well.

Who Should Install This Weekend

Short answer: anyone on an Apple Silicon Mac with ChatGPT Plus or Pro who has at least one repeating multi-app workflow they’ve been meaning to automate.

Longer answer:

Install now if: You’re on macOS 14+, M1 or later, and you already have a specific task in mind (inbox triage, frontend testing, research compilation, Keynote building from outlines).
Install but don’t trust it yet if: You have the hardware but no specific workflow — you’ll be wowed by the demos, but you won’t get ROI until you have a real job for it.
Wait a week if: You’re on Intel, in the EU/UK, or you rely on the memory feature.
Skip for now if: You’re happy with Claude Code and your work is mostly production code. The Codex computer use advantage is real, but it doesn’t outweigh Claude Code’s quality lead for pure coding work today.

The Bottom Line

This is the biggest Codex update since the rebrand, and it probably pushed the agent-for-prosumers category forward more in 48 hours than the previous six months combined. It’s not a Claude Code killer — both tools now have the same feature list, and the real differences are in polish and judgment. But if you’re on a Mac and your work involves apps that don’t expose clean APIs, Codex Desktop just became the most capable thing you can install today.

The 90+ plugins are the sneaky part. Computer use will get the headlines for the next week. But the plugin ecosystem — the ability to run three agents across Slack, Jira, and your codebase in one workflow — is what makes this sticky six months from now.

Go install it. Spend an afternoon. Find one task. Automate that task. Come back in a month and decide if it’s earned its keep.