Sony AI published Project Ace in Nature this month, and the headline number is that their table tennis robot wins three of five games against elite players with ten or more years of experience, and has now beaten several top-level professionals in matches dating back to December 2025. That is a meaningful step past Google DeepMind's 2024 table tennis robot, which played at roughly amateur level. Project lead Peter Dürr's team is making the strongest claim yet for "expert-level performance in any competitive physical sport" — a category that previously required either narrow sim-only environments or robots that could only handle one or two trained scenarios.

What makes this paper interesting from a builder's perspective is not the deep reinforcement learning. The RL is conventional: a policy that takes ball state and predicts where to swing the paddle. What makes it work is the perception stack: nine cameras across three vision systems, ball tracking at 200 Hz with millimeter-level accuracy and around ten milliseconds of latency, and spin measurement at up to 700 Hz. Table tennis is fundamentally a perception problem before it is a control problem — a 40-millimeter ball moving at 30 meters per second leaves you about 50 milliseconds to read spin, predict trajectory, decide a shot, and swing. Get the perception below ten milliseconds and a competent control policy can do the rest. Get it at 30 milliseconds and you are always responding to where the ball was, not where it is.

The honest limitation in the paper, surfaced by an opposing pro player, is that the robot does not adapt the way humans do. The quote — "impossible to sense what kind of shots it dislikes" — is exactly inverted from how human top-level play works. Humans read the opponent's body language, paddle angle on backswing, and weight transfer to predict shot direction before the ball is even hit, and they steer the rally toward the opponent's weaknesses across many points. Project Ace plays each ball cleanly but does not learn the human across the match. Sony acknowledges this; "adapting to their opponent and finding weaknesses" is the open research area. That gap is what separates "beats good players sometimes" from "wins tournaments."

For developers thinking about embodied AI, the Sony result is a useful corrective to the narrative that the bottleneck is always the model. In this case the model is fairly standard, the embodiment is highly specialized, and the breakthrough is sensor fusion and real-time perception. The same lesson applies to any robotics product that has to react to fast-moving physical state: you can throw arbitrary compute at the policy, but if your perception loop is slower than the dynamics you care about, the policy quality is irrelevant. The next round of physical AI demos worth taking seriously will be the ones that publish their sensing latency and accuracy alongside their policy benchmarks. Sony did. The papers that do not are usually hiding something.