Technology

Grok 4.3 Review: xAI's Always-On Reasoning Model Analysed

B
Benjamin
·May 17, 2026·8 min read·0 views
Grok 4.3 Review: xAI's Always-On Reasoning Model Analysed

xAI's Grok 4.3 lands with always-on reasoning, a 1-million-token context window, and some of the most aggressive per-token pricing in its class — but how does it hold up against OpenAI and Anthropic? We break down the benchmarks, real-world performance, and who it's actually built for.

Introduction

xAI quietly shipped Grok 4.3 in late April 2026 — no press conference, no model card, just a new entry in the model selector and Elon Musk confirming it on X. Despite the understated rollout, Grok 4.3 marks a meaningful architectural shift for xAI's flagship text model. The headline change is reasoning that is always active: unlike most models that let you toggle chain-of-thought on or off, Grok 4.3 thinks before every single reply by design.


Grok 4.3 API'si nasıl kullanılır - CometAPI - Tüm Yapay Zeka Modelleri Tek  Bir API'de

What's New in Grok 4.3?

Always-On Reasoning

Previous Grok versions let users configure reasoning effort. In 4.3 that knob is gone — the model reasons on every request. xAI's bet is that the added latency is worth it for consistently higher accuracy and more reliable handling of complex, multi-step instructions.

1 Million Token Context Window

Grok 4.3 can hold roughly 750,000 words in a single conversation — enough to feed it an entire codebase or a book-length document without losing context. For comparison, this places it among the largest context windows available in any production model today.

December 2025 Knowledge Cutoff

The model ships with a December 2025 training cutoff. Built-in web search (via Web Search and X Search tools) lets it access live information on demand, largely sidestepping the limitation for real-world use cases.

Improved Agentic Tooling

The release notes list a built-in Python code execution sandbox, video upload support, and an agent library. xAI has also teased "Grok Computer," an agentic desktop product analogous to Anthropic's Computer Use and OpenAI Operator.

Grok 4.3 is out in the API : r/singularity

Aggressive API Pricing

The API is priced at $1.25 per million input tokens and $2.50 per million output tokens for requests up to 200,000 tokens (higher rates apply above that threshold). Independent analysis by Artificial Analysis estimates this is roughly 20% cheaper to run than Grok 4.20 — with a higher intelligence score to boot.


Grok 4 Benchmarks : r/LocalLLaMA

Benchmark Performance

According to Artificial Analysis, Grok 4.3 scores 53 on the Artificial Analysis Intelligence Index — placing it just above Claude Sonnet 4.6 and Muse Spark, and 4 points ahead of Grok 4.20. Its biggest single-benchmark leap is on GDPval-AA (real-world agentic tasks), where it jumped 321 Elo points over its predecessor, surpassing Gemini 3.1 Pro Preview and GPT-5.4 mini.

Note: Grok 4.3 still trails the leading frontier models from OpenAI and Anthropic on the overall Intelligence Index.


Where It Shines

Grok 4.3 is clearly optimised for long agentic sessions: multi-file code edits, long-document analysis, legal and financial research, and multi-turn instruction-following workflows. Specific strengths include:

  • Instruction following — 98% on τ²-Bench Telecom puts it among the top instruction-following models available

  • Agentic customer support tasks — strong performance in multi-step, tool-assisted workflows

  • Long-document analysis — the 1M context window is practically unmatched for reading and reasoning over large inputs in a single pass

  • Live X data access — native integration with X Search is a genuine differentiator; no other frontier model offers real-time access to X posts, profiles, and threads

  • Cost efficiency — at $1.25/$2.50 per million tokens for an always-on reasoning model, it undercuts comparable models by 5–10x


Where It Falls Short

  • Hard mathematics — scoring just 11% on ProofBench, the model clearly struggles with rigorous mathematical proof tasks

  • General coding — evaluations from Vals AI note it "remains weak on general coding tasks," trailing leading models from OpenAI and Anthropic

  • Reasoning latency ("narcolepsy") — community reports describe a side-effect of always-on reasoning where the model overthinks and stalls on agentic actions, causing unnecessary delays

  • Hallucination trade-off — while Grok 4.3 gained 8 points on AA-Omniscience Accuracy, it lost 8 points on AA-Omniscience Non-Hallucination Rate, meaning Grok 4.20 still leads on factual reliability


Pricing & Access

VPN ile Grok AI'ı daha ucuza nasıl satın alınır?

Important: Legacy model slugs including grok-4, grok-4-fast, grok-4-1-fast, and grok-code-fast-1 were retired on 15 May 2026. Requests to these slugs now automatically redirect to Grok 4.3 at standard Grok 4.3 pricing.


How It Compares to the Competition

Grok 4.3 sits comfortably on the Pareto frontier for cost vs. intelligence — meaning there is no cheaper model at this intelligence level, and no more intelligent model at this price point. However, it is not a state-of-the-art frontier model.

  • vs. GPT-5.5 (xhigh): Grok 4.3 trails by 276 Elo points on GDPval-AA, with an expected win rate of roughly 17% in head-to-head agentic tasks.

  • vs. Claude Sonnet 4.6: Grok 4.3 scores slightly higher on the Intelligence Index but loses ground on hallucination rate and coding reliability.

  • vs. Grok 4.20: Meaningfully better on agentic benchmarks and roughly 20% cheaper to run — a clear upgrade within the Grok family.


What's Coming Next

xAI is moving fast. According to Elon Musk, Grok 4.4 (1 trillion parameters — twice the size of 4.3) was in training at the time of the 4.3 launch, with Grok 4.5 at 1.5T parameters to follow. At the end of the roadmap sits Grok 5, targeting 10 trillion parameters across two variants — a 20x scale increase from the current public model.


Verdict

Grok 4.3 is a strong value-tier reasoning model — not a state-of-the-art frontier model, but arguably the best cost-per-intelligence option in its bracket right now. If your work involves long documents, agentic pipelines, legal or financial research, or anything that benefits from live X data, it is absolutely worth testing at this price point.

For hard mathematics and pure coding benchmarks, the leading OpenAI and Anthropic models still have the edge. But if you're running agents at scale and watching your token costs, Grok 4.3 is one of the most compelling daily-driver models available today.

Tags

Grok 4.3xAI Grok reviewGrok 4.3 benchmarkalways-on reasoning LLMGrok 4.3 vs GPTGrok 4.3 vs ClaudeGrok 4.3 API pricingbest AI model 20261 million token context windowxAI language model