Office AI: What’s Behind the New Race Between Anthropic and OpenAI?
OpenAI and Anthropic train office AI agents in RL simulators—and eye 'virtual co-workers'.
Top AI developers like OpenAI and Anthropic have entered a race to build autonomous AI agents for office work, The Information reports. Jensen Huang predicts IT departments will transform into “HR for neural networks.” This direction is becoming a key industry trend.
ForkLog explores how large language models (LLMs) are adapting to handle corporate employees’ tasks.
The New School
Traditionally, AI learns in two steps. First, the model absorbs vast amounts of information. Then it undergoes fine-tuning for specialization.
But office tasks demand more. A model must learn to interact with applications like a human. It needs to perform purposeful actions in digital environments and understand cause and effect.
This gap between “knowing” and “doing” gave birth to a new school for artificial intelligence. Previously, models learned from static data. Now they go on “internships” inside virtual copies of real office applications.
Experts call these RL environments. They simulate popular services like Salesforce, LinkedIn, or Gmail. Inside these simulations, the model experiments, receives feedback, and sharpens its skills.
Training Grounds for AI Agents
RL environments resemble flight simulators, but for AI agents. Leading labs are already building them. Turing became one of the first startups to develop over 1,000 virtual trainers—from clones of Airbnb and Zendesk to Microsoft Excel spreadsheets. The project offers its tools to OpenAI as well. In June 2025, it raised $111 million in funding.
Other startups followed. Scale received $14 billion from Meta, while its competitor Surge is negotiating $1 billion in financing.
A “training day” for an AI agent unfolds roughly like this:
First, it receives a specific instruction in natural language. For example: analyze a Salesforce database, find clients with no contact for over six months, and send them a meeting invitation.
The model “wakes up” inside a virtual application interface and starts learning through trial and error.
For each detail, the system creates a checklist of correct actions. It verifies every step. If the agent performs everything correctly, the system reinforces its strategy. If not, error analysis helps adjust all steps for the next attempt.
Scale and safety offer the main advantages. An AI agent can repeat iterations of the same task for hours until the actions become automatic. Meanwhile, it won’t spam real clients, break a database, or delete critical information.
Expensive Teachers
To maximize effectiveness, companies hire specialists from diverse fields—biology, programming, medicine. These experts demonstrate proper tool usage to AI. The model memorizes not just steps but also the expert’s decision-making logic.
As progress accelerates, requirements tighten. Early stages only needed student-level knowledge. Now labs recruit professionals from corporations like NASA and other government projects.
Demand drives prices up. According to Labelbox, which supplies specialists to OpenAI and other giants, about 20% of its contractors earn over $90 per hour. Nearly 10% make more than $120. Over the next eighteen months, rates for top experts should jump to $150-250.
OpenAI plans to spend approximately $1 billion on experts and RL environments in 2025. By 2030, that figure will reach $8 billion. Rumors suggest Anthropic may allocate up to $1 billion over the next year just to build and operate virtual applications.
The Bigger Goal
One might assume Anthropic and OpenAI invest billions in simulators and experts just to make their models slightly smarter. But the real goal is far larger. They aim to break through the ceiling of current AI capabilities and build a new business model.
First, LLM training methods based on predicting the next word in internet texts have reached their limit. RL environments offer a qualitatively different path. They enable AI not just to generate text but to act within complex, multi-step processes. This holds the key to genuine autonomy.
According to Surge CEO Edwin Chen, Anthropic and OpenAI’s methods “mirror how humans learn.” They place AI models in conditions as close to the real world as possible.
But the main draw of RL environments lies in monetization potential. For AI giants, selling API access to a chatbot is only the first step. The next, far more valuable business model involves renting out “virtual employees.” In this new hybrid reality, AI agents will primarily process data and handle administrative tasks.
On one hand, this shift sparks enthusiasm about unprecedented productivity growth and improved routine operations. On the other, it raises understandable anxiety about job displacement.
Will AI Replace Us All?
Startup Magazine believes the ideal hybrid work model builds on augmentation, not replacement. Experts cited customer support as an example.
From the employee’s perspective: A digital assistant trained on internal knowledge bases and capable of adopting the company’s brand voice handles up to 80% of routine tasks. It tracks orders, answers frequently asked questions, and generates standard reports. This frees people from monotonous work and reduces burnout risk. It also lets them focus on complex, emotionally charged cases requiring empathy and creative thinking.
From the customer’s perspective: They receive instant, accurate answers to simple questions at any hour. When they do reach a human, they get higher quality, deeper service because the specialist isn’t bogged down by tedious work.
Experts noted that successful implementation depends on company approach. Leaders must focus on employee morale, invest in retraining, and experiment.
Major companies betting on AI agents indirectly support this view. Their goal isn’t to replace humans but to create powerful partners that handle most daily tasks.
Moreover, studies show rapid AI advancement hasn’t significantly impacted the job market.
The Entire Economy as a Training Ground
The future vision circulating in OpenAI’s corridors grows even bolder. According to The Information, one senior executive at the company privately stated he expects “the entire economy” to transform into one giant “RL machine.”
This means AI will eventually learn not from artificial simulations but from real recordings of workflows by professionals worldwide. It will watch how doctors diagnose patients in medical systems. It will observe logistics experts optimizing supply chains. It will see lawyers drafting contracts.
Подписывайтесь на ForkLog в социальных сетях
Found a mistake? Select it and press CTRL+ENTER
Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!