跳过正文
  1. Posts/

Google Gemini 3 Unveiled: Demis Hassabis & Josh Woodward Talk AI Agents, Benchmarks and the Road to AGI

·13584 字·28 分钟
作者
LonelyTrek

《HardFork》播客的主持人(Kevin Roose 和 Casey Newton)与嘉宾(Demis Hassabis 和 Josh Woodward)的常规互动模式,以下是整理后的对话内容:

《Hard Fork》播客主持人凯文·鲁斯与凯西·牛顿,与嘉宾德米斯·哈萨比斯及乔什·伍德沃德之间惯常的交流互动模式,现奉上整理后的对话内容:

Kevin Roose Well, Casey, we have a special emergencypodcast episode today about the launchof Gemini 3.

凯文:凯西,今天我们特别加开一期紧急播客,专门聊聊 Gemini 3 的发布。

Casey Newton Yes, Kevin, hotly awaited, muchdiscussed among AI nerds here in SiliconValley. We are finally about to get ourhands on the genuine article.

没错,凯文,这款产品在硅谷的 AI 圈早已成为热议焦点,众人期盼已久。如今,我们终于要真正将它握在手中,亲身体验这件名副其实的“真品”。

Kevin Roose Yeah. So, normally we wouldn’t break ourFriday publication schedule to publish aspecial episode just about a new modelcoming out uh from one of the big AIcompanies. They’re releasing models allthe time. But there are a couple reasonsthat we thought it was worth doing thisthis week uh to talk about this modelGemini 3 in particular. The first isthat we got uh some time with DennisAssabis and Josh Woodward, two of theleading AI executives at Google. Dennisof course is the CEO of Google DeepMind,which is their in in-house AI lab. AndJosh Woodward is the uh VP of the Geminiteam and some other stuff there atGoogle. So we were excited to talk tothem and ask them about this big newmodel release. Um, but I think there area couple other reasons we wereinterested in doing this as well.

凯文·鲁斯: 没错。我们平时可不会因为某家大型 AI 公司发布新模型,就破例打乱周五的固定更新节奏,专门做一期特别节目。毕竟这些公司几乎随时都在推出新模型。但这一次,我们觉得有几个理由,足以让我们聚焦聊聊 Gemini 3。首先,我们有幸采访到了谷歌的两位重量级 AI 高管——Demis Hassabis 和 Josh Woodward。Demis 是 Google DeepMind 的首席执行官,也就是谷歌内部专门研发 AI 的实验室掌舵人;而 Josh Woodward 则是 Gemini 团队的副总裁,同时还负责谷歌的其他一些项目。能和他们面对面探讨这次重磅新模型的发布,我们自然充满期待。

当然,除了这场采访之外,还有几个原因,让我们觉得这期节目特别值得做。

Casey Newton Yeah. I mean, one big thing, Kevin, isjust that maybe more than other modelreleases, this one seems to have theattention of Google’s competitors. We’rehearing a lot of whispers from folks whowork at other AI labs that h it seemslike Gemini 3 has managed to figure somethings out in a way that may be bad fortheir businesses. And I think around theAI industry, there’s sort of thisfeeling that uh Google, which kind ofstruggled in AI for a couple yearsthere. They had the launch of Bard andthe first versions of Gemini, which hadsome issues. And I think they were seenas sort of catching up to thestate-of-the-art. And now I think thequestion is like, is this them sort ofreturning to the top of the AIleaderboard? Is this them taking theircrown back?

没错,凯文。其实,这次与以往模型发布相比最大的不同之一,是Gemini 3似乎格外牵动了谷歌竞争对手的神经。我们最近从其他AI实验室的业内人士那里听到不少传言,大家都在议论,Gemini 3似乎在某些关键领域真正攻克了难题,而这些突破很可能会对他们的业务造成不小冲击。在整个AI行业中,似乎弥漫着一种气息——谷歌在过去几年里在AI领域的确经历了一段低潮期。无论是Bard的发布,还是Gemini的最初版本,都曾遭遇不少问题,外界普遍认为那只是他们在努力追赶最前沿技术的脚步。而如今,问题已然变成:谷歌是否正要重返AI领域的巅峰?他们是否准备将那顶“王冠”再度戴回自己头上?

Kevin Roose Um, so we’ll get into allthat with Demis and Josh. Uh but let’sjust talk Casey about what we know aboutGemini 3. They held a briefing uh earlythis week and told us a little bit aboutthe the new model and what it can do. Sowhat did we learn about Gemini 3?

嗯,稍后我们会和 Demis 以及 Josh 深入探讨这些话题。不过在此之前,Casey,我们先聊聊目前掌握的 Gemini 3 情况。本周早些时候,谷歌举行了一场简报会,向我们透露了这款新模型的一些信息和能力。那么,关于 Gemini 3,我们究竟有哪些新的发现呢?

Casey Newton Yeah. Well, so in terms of what it cando, which is always the most interestingto me, Google shared a few differentthings. Um, one, in addition to sayingall the things you would expect, likeit’s better at coding and it’s better atvibe coding, it also is going to do somenew things around generating interfacesfor you when you ask it a question. Sonowadays, you ask most chatbots aquestion, it’ll spit back an answer intext, maybe it shows you an image.According to the Google folks, Gemini 3is just going to start building custominterfaces for you. So they showed anexample where somebody wanted to learnabout Vincent Van Gogh, the painter, andGemini 3 just sort of like coded up aninteractive tutorial that had all sortsof like images and interactive elements.They showed another example thatinvolved building a mortgage calculatorfor buying a home over a milliondollars, which is the lowest amount ofmoney that anyone at Google can imaginespending on a home. So these are thekinds of things that you can expect tofind in Gemini 3, Kevin.

凯西·牛顿: 是啊。要说它能做些什么,这一直是我最感兴趣的部分。这次谷歌展示了不少新功能。首先,除了大家能想到的常规升级,比如更强的代码能力、更流畅的编程体验之外,Gemini 3 还加入了全新的玩法——它能根据你的提问直接生成定制化的互动界面。以往的大多数聊天机器人,回答你的问题时顶多回你一段文字,偶尔附上一张图片。而据谷歌介绍,Gemini 3 可以直接为你搭建专属界面。

他们演示了一个案例:有人想了解画家文森特·梵高,Gemini 3 就现场生成了一个互动教程,既有丰富的图片,也有可操作的互动元素。另一个例子是帮人制作买房按揭计算器——而且房价设定为超过一百万美元,据说这已经是谷歌员工所能想象的最低房价了(笑)。所以,Kevin,这些就是你在 Gemini 3 中可以期待的全新体验。

Kevin Roose Yeah. So I would say the theme of thebriefing and of the materials thatGoogle shared ahead of the uh Gemini 3launch was uh this is just kind ofbetter than their last model Gemini 2.5Pro in basically all respects. Some ofthe benchmarks that caught uh myattention uh one was this benchmark testcalled humanity’s last exam uh which issort of a very hard interdisciplinaryexam that consists of a bunch ofquestions like basically a graduatestudent or PhD level. Um, and theirprevious model, Gemini 2.5 Pro, gotabout a 21.6%on that test. And Gemini 3 Pro gets a37.5%on that test. That’s basically the storyof all of these benchmarks. They theygave, you know, more than a dozenexamples of various benchmarks where thenew model just beats the old onehandily. Um, and, you know, to a lot ofpeople, I think that may not um matter.Most people uh who are using Google’s AIproducts are probably not out theretrying to solve like novel problems inphysics. But their basic pitch for thisis just like this is a state-of-the-artmodel. Anything that you could do withChat GPT or Claude or even the olderversions of Gemini, you can do betterwith Gemini 3 Pro. They also talkedabout testing what they’re calling theGemini agent, which is going to be ableto do one thing in particular that I’vebeen waiting for somebody to do forever,which is look through your inbox,understand its contents, proposereplies, kind of um, you know, organizelike emails together, and really sort ofhelp you get your inbox under control ina way that um, I personally have neverbeen able to. So, we basically only sawa few animated gifts about that, butthat will definitely be one of the firstthings that I try when I get my hands onGemini 3.

凯文·鲁斯: 是啊。整体来看,这次简报和谷歌在 Gemini 3 发布前披露的资料,都围绕着一个核心信息:Gemini 3 Pro 在几乎所有维度上都全面超越了上一代 Gemini 2.5 Pro。让我印象最深的是一项名为“Humanity’s Last Exam”(人类终极考验)的基准测试——这是一套极其艰深的跨学科试题,难度相当于研究生甚至博士级别。此前,Gemini 2.5 Pro 在这项测试中得分约为 21.6%,而 Gemini 3 Pro 则一举提升到 37.5%。这种“压倒性”优势几乎贯穿了所有测试结果——谷歌给出十多个不同基准的例证,新模型在每一项上都轻松胜出。

当然,对很多用户而言,这些分数的提升也许并不直接触及他们的日常使用场景。毕竟,大多数人使用谷歌的 AI 产品,并不是为了攻克前沿物理难题。但谷歌的核心卖点很简单:这是当前最先进的模型——无论你之前依赖的是 ChatGPT、Claude,还是旧版 Gemini,用 Gemini 3 Pro 都能做得更好。

他们还介绍了正在测试的一项新功能——所谓的 Gemini agent(Gemini 智能代理)。它具备一个我期待已久的能力:自动扫描你的邮箱,理解邮件内容,给出回复建议,并智能归类、整理相关邮件,真正帮你让收件箱井然有序——而这正是我多年来一直无法做到的事。虽然目前我们只看到几个简短的动画演示,但等我拿到 Gemini 3,第一个想尝试的,毫无疑问就是这个功能。

Casey Newton Yeah. And they are not, we should say,rolling this out to everyone right away.It’s going to be available uh this weekfor uh users in the Gemini app and alsoin uh the AI mode, which is sort of thethe tab off the off to the side of themain Google search engine. Um it willalso be available for developers invarious products. Um but they’re notsort of saying when this will come tothings like the Gemini integrations inGoogle Docs or Gmail. Uh these verypopular things that um you know are usedby billions of people a day. But Ithought it was interesting that theyhave brought this model to Google searchalbeit in this AI mode that’s not sortof the main search bar. Um that to mesuggests that they feel like they can userve this model cheaply enough to makeit potentially something that billionsof people could use um and that thatwould not melt their their servers andincur you know billions of dollars ofcosts.

凯西·牛顿: 对,而且需要说明的是,这次谷歌并不是立刻面向所有人全面开放。Gemini 3 将在本周率先登陆 Gemini App(Gemini 应用)以及 Google 搜索的 AI 模式——也就是主搜索页面侧边的那个标签页。同时,开发者也可以在部分产品中开始体验这个新模型。不过,像 Google Docs(谷歌文档)或 Gmail 这样每天有数十亿人使用的热门服务,官方暂时还没有给出具体的集成时间表。

我觉得有意思的是,尽管 Gemini 3 目前只是以 AI 模式的形式出现,还没进入主搜索框,但它毕竟已经被引入了 Google 搜索。这在我看来意味着,谷歌有信心以足够低的成本来运行它,让数十亿用户都能使用,而不会让服务器被拖垮,也不会因此烧掉几十亿美元的开支。

Kevin Roose Yeah. So far they say that the usagekeeps going up for AI overviews and uhevery quarter they continue to make moremoney. So seems to be working out forthem.

凯文·鲁斯: 是啊。到目前为止,他们说 AI Overview(AI 概览)的使用率仍在稳步攀升,而且每个季度的营收也在持续增长。看起来,这套打法对他们而言的确行之有效。

Casey Newton Not working out for the rest of the webbut it’s working out well for Google.

对整个互联网而言或许并不利,但对谷歌来说却是收效显著。

Kevin Roose Yeah. But I think that’s like obviouslyGoogle’s big advantage here over theircompetitors is that, you know, they haveproducts that are used by billions ofpeople a day and they can kind of shoveGemini 3 into those products over timeand just get more and more usage and getmore data and and use that to improvetheir models. So, which is why we alwaystell students when they ask us foradvice, step one, build an illegalmonopoly.

凯文·鲁斯: 没错。不过我觉得,这正是谷歌相比竞争对手的最大优势——它拥有每天触达数十亿用户的产品,可以循序渐进地把 Gemini 3 融入其中,不断扩大使用规模,积累更多数据,再用这些数据反哺模型优化。也正因如此,我们每次有学生来请教建议时,第一步总会半开玩笑地说:先去打造一个“非法垄断”。

Casey Newton Yes. And speaking of students, the othernotable announcement that Google uh ismaking this week is that they are givingall US college students uh a year offree access to a paid version of Gemini.Um which uh is I think a smart move. Ifeel a little gross about it. Likeessentially telling students, hey, uhwhy don’t you uh why why don’t you usethis to maybe do some of your homework?Maybe help you with your exams. Uh we’llgive you the first hit for free. Yeah.You know, I I was also struck during thebriefing that we had this morning that Ibelieve three different people uh usedthe phrase learn anything. This seemslike it has become a very prominentplank of Google’s messaging is they arepresenting Gemini as a learning tool. Umwhich I maybe is just sort of aeuphemism for a do your homework tool. Idon’t know.

凯西·牛顿: 没错。说到学生,谷歌本周还有一个颇值得关注的消息——他们将为所有美国大学生提供为期一年的 Gemini 付费版免费使用权。我觉得这招挺聪明,但心里多少有点不舒服。说白了,就是在暗示学生:“嘿,要不你用这个来写写作业?考试的时候也能帮你一把。我们先让你免费尝个鲜。”你懂的。我今天早上的发布会上也留意到,有三位发言人都用了“学任何东西”这个说法。看起来,这已经成了谷歌宣传 Gemini 的核心卖点之一——塑造它为一款学习工具。嗯,不过在我看来,这其实就是“帮你写作业”的委婉版本吧?天晓得。

Kevin Roose Yes. Okay. So that is what we know aboutGemini 3. We will be doing our owntesting and reviewing of Gemini 3 onceit is fully out uh on Tuesday. But fornow, we wanted to just kind of give youthe basics and also bring you ourinterview with Demis Abus and JoshWoodward of Google Debind. And before weget to that, we should obviously makeour AI disclosures. I work for the NewYork Times company which is suing OpenAIand Microsoft over the training of largelanguage models.

目前我们掌握的关于 Gemini 3 的信息大致如此。等到它在周二正式上线,我们会亲自进行测试并推出详尽的评测报告。不过在此之前,先为大家梳理核心要点,并带来我们对 Google DeepMind 的 Demis Abus 和 Josh Woodward 的专访。

在进入采访之前,必须先进行 AI 相关的利益披露:我供职于纽约时报公司,而纽约时报正在起诉 OpenAI 和微软,指控他们在训练大型语言模型时使用了我们的内容。

Casey Newton And my boyfriend works at Anthropic.

顺便提一句,我男朋友在 Anthropic 工作。

Kevin Roose Dennis and Josh, welcome to Hardfork.

德尼斯、乔什,欢迎来到《硬分叉》。

Demis Hassabis / Josh Woodward Great to be here.

非常高兴能来到这里。

Kevin Roose So, two years ago,Sundar Pachai told us that Bard, rest inpeace, uh was a souped-up Civic uh thatwas in a race with more powerful cars.What kind of car is Gemini 3?

凯文·鲁斯: 两年前,桑达尔·皮查伊曾形容 Bard(如今已成追忆)像是一辆经过改装的本田思域,要在赛道上和那些更为强劲的赛车同场竞逐。那么,Gemini 3 又能算是怎样的一款车呢?

Josh Woodward That’s a good one. Dennis, do you wantto take it?

乔什·伍德沃德
这个提问很妙。德米斯,你来接吗?

Demis Hassabis Well, um I hope it’s a bit faster than aHonda Civic. Um you know, I don’t reallythink of it in terms of cars. Maybe it’sone of those cool drag racers.

德米斯·哈萨比斯
嗯,我倒希望它能比本田思域那种改装版还快些。说实话,我平时并不太习惯用汽车来作比喻,或许它更像那类酷炫的直线加速赛车。

Kevin Roose Yeah. Yeah. So, people are reallyexcited about this model. Uh we um havebeen hearing from folks that have beensort of early testing it. Um obviouslyyou guys have shown off a lot of thebenchmarks. Very impressive. Um what canGemini do on a concrete level thatprevious AI models couldn’t?

凯文·鲁斯: 是啊,是啊,大家对这款新模型的热情高涨。我们最近也从一些已经在做早期测试的人那里听到了反馈。你们展示了不少基准测试数据,确实令人印象深刻。那么,Gemini 在具体功能上,究竟能实现哪些以往的 AI 模型无法做到的事情呢?

Josh Woodward Well, I I’ll jump in maybe a couple ofthings that stand out. One, we’restarting to see this model really excelon reasoning and being able to thinkmany steps uh at the same time.Sometimes models in the past would loselose their train of thought, lose track.Um, this one’s way better at that. Theother thing you’ll see tomorrow as wellis all kinds of new generativeinterfaces. Uh, this is our best modelyet at being able to create new types ofinterfaces. It gives people really acustom sort of design and sort of answerto their questions. And then maybe thethird thing I would say is we’ve put alot of investment in coding itself. Andso a lot of the coding examples, you’llsee some new products coming out likeGoogle anti-gravity will also kind ofshowcase that.

乔什·伍德沃德: 我来补充几件我觉得尤为亮眼的地方。首先,这一代模型在多步推理方面的表现令人惊喜,能够在同一时间并行思考多个步骤。过去的模型有时会出现“断片”,思路中途偏离或丢失,而这一版本在保持连贯性上要稳定得多。其次,明天大家将会看到各种全新的生成式交互界面。这是我们迄今为止最强的一款模型,能够创造出多样化的新型界面,为用户量身打造设计方案,并精准回应他们的需求。最后,我们在编程能力的提升上投入了大量资源。很多与代码相关的示例将随新产品一同亮相,比如 Google anti-gravity(谷歌反重力),这些都将很好地展示出模型在技术上的飞跃。

Casey Newton There’s been some discussion that foraverage users, the chat use case canfeel solved, that sort of average usersof products like Gemini kind of almostcan’t even think of a question to ask itthat will generate something that feelsmeaningfully different from what theywere able to get in the last model. Towhat extent does that feel true to youin Gemini 3? And to what extent do youthink average folks are really going tonotice a difference?

凯西·牛顿: 最近不少人讨论,对于普通用户而言,聊天型 AI 的使用场景似乎已经“尘埃落定”。很多人在使用 Gemini 这样的产品时,几乎想不出还能问出什么问题,能让它的回答与上一代模型相比有明显的新意。你觉得这种情况在 Gemini 3 上是否依然成立?你认为一般用户真的会察觉到升级带来的差别吗?

Josh Woodward Yeah, one of the things I guess we’reseeing in some of the testing and Disfeel free to chime in too is I thinkthese are really for us this is a modelthat it’s more concise. It’s moreexpressive. It starts to presentinformation in a way that’s much easierto understand. And I think for mostpeople that’s going to be a bigimmediate effect. And then I think whatstarts to get interesting is how thesemodels start to interact with othertypes of information. So we talk a lotabout how students are going to be ableto learn with this model or even howthis model can connect to other types ofdata you might have in other Googleproducts with your permission. These arethe ways I think we’re starting to showkind of it’s going beyond just thestandard text kind of Q&A back andforth.

乔什·伍德沃德: 是的,我们在一些测试中已经注意到——迪米斯,你也可以补充——我觉得这款模型对我们来说最突出的特点,是更加简洁凝练,更具表现力,信息的呈现方式也更容易让人理解。我想,对大多数用户而言,这会带来一种非常直接且立刻能感受到的改变。更令人兴奋的是,它开始能够与其他类型的信息进行互动。我们常常探讨,比如学生如何借助它学习,甚至它如何在获得你授权的前提下,连接到你在其他 Google 产品中的数据。这些新能力,正展现着它的延展性——它已不再局限于传统的文本问答,而是在迈向更丰富、更灵活的信息交互。

Demis Hassabis Yeah, I think I’d add to thatjust like you know its generalreliability on things is incredibly youknow you’ll notice that when you use it.Um I think also we work quite hard onthe persona which we call it internallylike the style of it. I think it’s moresuccinct. I think it’s more to thepoint. It’s helpful. I feel like it’sgot a better style about it. I I find itmore pleasant to to to brainstorm withand use. Um and then I think you know Ithink there are various things wherethere’s almost a step change. But I feellike it’s crossed a sort of threshold ofusefulness on things like Vibe coding.I’ve been getting back into my gamesprogramming. I’m going to I’m going toset myself some projects over Christmason that because I feel like it’sactually got to a point where it’sincredibly useful uh and and capable onfront end and things like this um thatperhaps previous versions weren’t sogood at.

德米斯·哈萨比斯:
是的,我还想补充一点,你会发现它在各种任务上的整体可靠性真的非常高——一旦用过,你就能明显感受到。我们在它的“人格”设计上投入了不少精力,内部称作“persona”,其实就是它的风格定位。我觉得它现在的表达更加简洁凝练、直截了当,也更有针对性,整体风格更成熟、更有品位。对我而言,无论是用它来进行头脑风暴还是日常使用,都比以前更加愉快和顺手。
此外,我觉得在某些方面它几乎实现了质的飞跃。比如编码体验——最近我又重新投入到游戏编程中,打算圣诞假期为自己设定几个项目,因为我觉得它现在在前端开发等领域已经强大到令人惊叹,实用性极高,而以前的版本在这些方面还不够出色。

Kevin Roose Dennis, the last time we hadyou on the show in May, uh you said thatyou think we’re 5 to 10 years away fromAGI and that there might be a fewsignificant breakthroughs needed betweenuh here and there. Has Gemini 3 andobserving how good it is changed any ofthose timelines or does it incorporateany of those breakthroughs that youthought would be necessary?

凯文·鲁斯: 德米斯,上一次你五月做客我们节目时,你曾说距离实现 AGI(通用人工智能)大约还有 5 到 10 年,而且在这段时间里可能还需要几次重大的技术突破。如今你已经见识了 Gemini 3 的水平,这让你的时间表有任何改变吗?它是否已经融入了当初你认为必须达成的那些关键突破?

Demis Hassabis No, I think it’s I think it’s sort ofdead on track if you if you if you seewhat I mean. I we’re really happy withthis progress. I think it’s anabsolutely amazing model. uh and is isright on track of what I was expectingand and the trajectory we’ve been onactually for the last couple of yearssince the beginning of Gemini which Ithink’s been the fastest progress ofanybody in the industry and I thinkwe’re going to continue doing thattrajectory and we we we expect that tocontinue but on top of that I stillthink there’ll be one or two more thingsthat are required to really get the theconsistency across the board that you’dexpect from a general intelligence umand also improvements still on reasoningon memoryUm, and perhaps things like world modelideas that you also know we’re workingon with Simmer and Genie. Um, they willbuild on top of Gemini, but but extendit in various ways. And I think some ofthose ideas are going to be required aswell to fully solve physicalintelligence and things like that. So,I’m I’m both are true. I I’m reallyhappy with the progress of Gemini 3. Ithink people are going to be prettypretty pleasantly surprised. Um but it’son track of what we were expecting theprogress to be and I think that meansstill 5 to 10 years with with one or twomore perhaps uh breakthroughs required.

德米斯·哈萨比斯: 不,我觉得这一切完全在我们设定的轨道上,你懂我的意思吧。我们对目前的进展非常满意,我认为这是一个令人惊叹的模型,它不仅毫无偏差地符合我的预期,还延续了我们自 Gemini 项目启动以来的成长轨迹——这几年我们在行业内的前进速度是最快的。我相信我们会继续保持这种势头,并且这种发展趋势还会延续。

不过,要实现你对通用智能(AGI)应有的那种全面稳定、一致的表现,或许还需要一两个关键性突破。同时,推理能力与记忆力等方面依然有提升空间。至于世界模型(world model)这样的概念,你也清楚,我们正与 Simmer 和 Genie 一起探索这些方向。这些研究会在 Gemini 的基础上进一步延展、赋能。我认为,这些新思路对于真正攻克物理智能等更复杂的领域同样不可或缺。

所以,两种说法都成立——我对 Gemini 3 的进展非常满意,并且相信它会带来意想不到的惊喜。但整体而言,这完全在我们的预期之中。我判断,距离实现 AGI 仍需 5 到 10 年,中间可能还要经历一到两个重大的技术突破。

Casey Newton You mentioned uh Gemini 3’s style.There’s been a lot of discussionrecently about AI companions, therelationships people are developing withthem. How do you think about Gemini 3’spersonality and what kind ofrelationship do you want users to havewith it?

你刚才提到过 Gemini 3 的风格。最近,关于 AI 伴侣的讨论十分热烈,人们正在与这些 AI 建立各种形式的关系。你们如何看待 Gemini 3 的“个性”?又希望用户与它之间形成怎样的互动与联系?

Josh Woodward I I would say in the appitself, um Casey, we’re reallyinterested in kind of we see it on theteam a lot as almost like a a tool orit’s something you’re using to kind ofwork through and kind of cut throughyour day. And so whether it’s kind of ifit’s helping on different types ofquestions you have or helping you createthings, that’s really where we see itreally kind of excelling um and kind ofthe direction we want to see it. I thinkif you zoom out, if you look at Geminior some of our other projects likeNotebook LM or Flow, we’re really kindof trying to think through how does AIreally be this superpower kind of supertool in your toolbox that you can usewhether it’s for writing or researchingor creating films or whatnot. And sothat’s really more where we’re wherewe’re focused. Um I think over timewe’re really interested on the team tobe able to track things like how manytasks did we help you complete in yourday? Um, that’s a new type of metricthat I think we get excited about andsort of a way that the original sort ofGoogle search worked. You would come toit, you would sort of try to get uh ananswer or sent to a page and sort ofmove on from there.

乔什·伍德沃德: 我觉得在 Gemini 应用本身里,凯西,我们团队更看重的是它作为“一件趁手工具”的角色。我们时常把它视作一种能帮你理清事务、穿越繁杂、提升效率的日常伙伴。无论是为你解答各类问题,还是协助你创作作品,这正是 Gemini 能大放异彩的地方,也是我们希望它持续进化的方向。

如果把镜头拉远,再看 Gemini 以及我们的其他项目,比如 Notebook LM(智能笔记助手)或 Flow(流程自动化平台),我们始终在思考——AI 能否真正成为你工具箱里的“超级能力”,无论用在写作、研究,还是拍摄影片等创作场景,都能让你事半功倍。这正是我们聚焦的核心:让 AI 像得力的全能帮手,随时为你的灵感和生产力加码。

我们团队也希望,在未来能够追踪一些新的、有意义的指标,比如:在一天当中,我们帮你完成了多少任务。这样的衡量方式让我们很兴奋,也让人联想到谷歌搜索的初衷——用户来搜索,快速得到答案或抵达所需页面,然后顺畅地继续他们的工作。我们期望 Gemini 能在日常中扮演同样高效、实用、不可或缺的角色。

Casey Newton Well, that that all sounds very good andresponsible, but I’m wondering about allthe viral engagement you’re leaving onthe table by not making this thing anerotic companion. Um, big oversight.

嗯,这听起来的确很稳重、负责任,不过我还是忍不住想问——你们不把它做成那种“暧昧陪伴型”AI,是不是就等于把一大波爆款热度拱手送人了?这可算是个不小的疏忽啊。

Demis Hassabis / Josh Woodward No comment.

德米斯·哈萨比斯 / 乔什·伍德沃德
不予置评。

Kevin Roose Um, some of your competitors have beenvery nervous in the days and weeksleading up to Gemini 3. I think they’vestarted hearing the same rumblings thatuh that that we have about this modelbeing quite good and maybe the narrativeshifting from sort of Google playingcatch-up in AI to now sort of being ontop of of the race or at least in a in aleadership position there. Do you feellike Google is ahead in the AI raceright now?

凯文·鲁斯: 在 Gemini 3 即将发布的前几周,你们的一些竞争对手似乎显得格外紧张。我想他们也和我们一样,听到了不少风声——这代模型表现相当亮眼,甚至可能会改变此前的行业叙事:从“谷歌在 AI 领域奋力追赶”,变成“谷歌如今领跑,至少稳居前列”。你觉得,在这场 AI 竞赛中,谷歌现在已经占据了领先位置吗?

Demis Hassabis Look, it’s a as you guys know very well,it’s a ferocious, you know, competitiveenvironment. um probably the mostcompetitive there’s ever been. So onecan never you know it’s almost reallythe only important thing is your rate ofprogress right from where you are andthat’s what we’re focusing on and we’revery happy about that. I mean I don’treally see it as a sort of like you knowwe were we’re back in the lead orsomething like that. We we’ve alwayspioneered the research part of this. Ithink it’s like getting into our groovein making sure that downstream reflectedin all of our products and I think we’rereally getting into our stride there. Ithink you saw that actually last IO Iwould say. Um, and we’re getting betterand better at that. like with GDM beingsort of the engine room of Google and uhand of course there’s a Gemini app,there’s notebook LM, these AI firstproducts, but there’s also powering upall these amazing existing Googleproducts whether that’s maps, YouTube,Android, you know, search of course withuh AI first uh uh features and actuallyin some cases reimagining things from anAI first perspective with you know oftenGemini under the hood and that’s goingamazingly well and I think we’re onlymidway through that evolution but it’svery exciting to see how, you know, muchvalue and excitement our users aregetting when they see each of those newfeatures and, you know, for example,Workspace and Gmail and so on. There’sit’s almost almost endless possibilitiesthere. So, um, we’re really excitedabout that as well as all of these uh AIfirst uh products that we’re also umimagining and and prototyping.

德米斯·哈萨比斯: 你们都很清楚,当下这个领域的竞争之激烈,堪称前所未有。真正重要的,其实只有一件事——从你当前所处的位置开始,提升自己的进步速度。这正是我们全力聚焦的方向,而我们对此也非常满意。我并不认为这是什么“我们重返领跑位置”之类的说法——研究一直是我们的先锋阵地。如今更像是我们找到了属于自己的节奏,确保这些研究成果能够顺畅地传递到各类产品之中,我认为我们在这方面正日趋成熟。其实在去年的 Google I/O 大会上,你们或许已经看到了这一点。

我们的能力正不断提升。比如,Google DeepMind 已经成为谷歌的强劲引擎。当然,还有 Gemini 应用、Notebook LM 等一系列以 AI 为核心的新产品,但更重要的是,我们在为谷歌现有的众多优秀产品注入新的动力——无论是地图、YouTube、Android,还是谷歌搜索,都在融入“AI 优先”的功能。有些产品甚至从根基开始,以 AI 的视角进行重新构想,背后驱动的往往就是 Gemini。这些进展进行得非常顺利,我觉得我们目前只是走到这一演化过程的中段,但已经能明显看到,当用户体验这些新功能时,那份惊喜与价值正在涌现出来,比如 Workspace、Gmail 等等——未来的可能性几乎无穷无尽。

因此,无论是这些“AI 优先”的创新产品,还是我们正在构思与原型打造的新项目,都令我们倍感兴奋。

Casey Newton We had a historian on the show last weekwho was using uh an unreleased Googlemodel in AI Studio and it had sort ofblown his mind with how it was able totranscribe these very old documents andreason correctly about you know whatkind of you know what was themeasurements of the sugar in this sortof 1800s fur trade in Canada. Do youthink you can tell us once and for allwas this man using Gemini 3?

凯西·牛顿: 我们上周请来了一位历史学家做客节目,他在 AI Studio 中使用了一款谷歌尚未发布的模型,结果被它的能力惊得目瞪口呆。这个模型不仅能精准转录那些尘封已久的古老文献,还能做出准确推理,比如解析十九世纪加拿大皮毛贸易中糖分的测量数据。你能不能干脆一次性给我们揭晓,这位历史学家用的到底是不是 Gemini 3?

Demis Hassabis Not sure about that one.

德米斯·哈萨比斯
这件事我还无法断言。

Josh Woodward Okay. I I will say um the model isthough quite amazing at making theseconnections and I don’t know if thehistorian was using kind of photos ofold documents or diaries or whatnot.

乔什·伍德沃德
好吧。我得说,这个模型在建立各种联系方面的表现确实令人惊叹。我不太确定那位历史学家用的是老照片、古文手稿,还是日记之类的资料。

Casey Newton That’s what he was doing.

他正是在做这件事。

Josh Woodward He’s very good at this. Um and uh youknow someone like me who has pretty poorhandwriting, you could take us a page ofnotes and it’ll kind of take that andrun with it uh with no problem, nosweat. So

他在这方面确实游刃有余。像我这种字迹潦草的人,也能把整页手写笔记交给它处理,Gemini 3 都能轻松识别理解,并自然衔接后续操作,毫无压力。

Kevin Roose you mentioned that uh on this call thatyou’re going to be integrating this intosearch in the AI mode that sort of is isa side tab on the main Google searchengine. Does that mean that you found away to serve this model more efficientlyand cheaply than previous models?

凯文·鲁斯: 你刚才提到,这次你们计划将 Gemini 3 集成到谷歌搜索的 AI 模式中,也就是主搜索页面侧栏的那个标签。那这是否意味着,你们已经找到了一种相比以往更高效、更低成本的方式来运行和部署这款模型?

Demis Hassabis I think we’re we’re always on the cutting.I think I feel like the thing we doreally well apart from the overallperformance of our models and gettingbetter and better at that is is is theefficiency of our models and thedistillation techniques and many manyother techniques that we sort of createdand pioneered that we’re now putting touse. Um obviously we we it’s necessaryfor us because we have extreme use casesof things like AI overviews and othersthat we have to serve billions of users.Uh and then of course um some of ourcloud customer enterprise customersreally appreciate that efficiency costefficiency too. So we’ve always tried tobe on this par frontier of cost toperformance and wherever you want to beon that frontier. If you valueperformance most or if you value uh costuh the most then they’ll be one of themodels in the model family for you. Soof course we’re only announcing protoday but we are um uh also working onthe other family of uh models for the3.0. O era. So you’ll see a lot moreabout that pretty soon.

德米斯·哈萨比斯: 我始终认为,我们一直行进在技术的最前沿。除了不断提升模型的整体性能之外,我们真正引以为傲的,是在模型高效性上的突破——包括我们自主开创并率先应用的蒸馏技术,以及许多其他由我们发明并付诸实践的创新方法。显然,这对我们来说至关重要,因为像 AI Overview(AI 概览)这样的功能,需要为数十亿用户提供服务,效率便成为不可或缺的核心。当然,我们的云端企业客户同样高度重视这种成本与效率的平衡。因此,我们始终努力站在“性能与成本”这一前沿,无论你更看重顶级性能,还是更在意成本控制,都能在我们的模型家族中找到契合需求的产品。今天我们只发布了 Pro 版本,但实际上,我们已在为 3.0 时代研发其他系列的模型,不久之后,大家就会看到更多的更新与成果。

Casey Newton Yeah. Uh it seems like every time we seethe release of a new frontier model, weget to revisit the discussion aboutscaling laws and are we beginning to seediminishing returns and I can predict afew Twitter accounts that will probablyhave something to say about this overthe next few days. So I thought I wouldjust sort of ask you before we have thatdiscourse, how are you guys thinkingabout that in relation to Gemini 3?

凯西·牛顿: 几乎每次有新一代前沿模型问世,那个关于“规模定律”以及我们是否正在进入收益递减阶段的老话题,都会被重新翻出来。我甚至可以预见,接下来几天推特上一定会有几位熟悉的账号对此发表看法。所以我想在大家开始热议之前,先来问问你们——针对 Gemini 3,你们是怎么看这个问题的?

Demis Hassabis Yeah, we’re very happy with the theprogress Gemini 3 represents over 2.5.So I would say uh uh sort of actuallyreferencing what we discussed earlierthat that the the progress is basicallywhat we’re expecting and on track andwe’re and we’re really pleased with it.Um but that that’s not to say that it’slike there is some kind of diminishingreturns. People when they heardiminishing returns they think of is itzero or exponential, right? But there’salso in between. So there can bediminishing. It’s not like going to likeexponentially double with every era, butit’s not um uh it’s but it’s still wellworth doing, right? And and andextremely good return on thatinvestment. So, I think we’re in thatera. Uh and then, you know, as I said, Imy suspicion is although we’ll see isthat still one or two more breakthroughsare required, research breakthroughs arerequired to get all the way to AGI. Butin the meantime, you’re going toobviously need as scaled as possibleversions of these foundation models,multimodal foundation models that we’rebuilding today and still seeing greatprogress on.

德米斯·哈萨比斯: 是的,我们对 Gemini 3 相较于 2.5 的提升感到非常满意。正如我们刚才谈到的,这次的进展完全符合预期,一切稳步推进,让我们欣慰不已。当然,这并不意味着进步会出现所谓的“收益递减到毫无意义”。人们一听到“收益递减”,往往会将它理解为要么归零,要么指数级增长,但其实两者之间还有广阔的中间地带。换句话说,进步或许会递减,但并非每个阶段都能成倍飙升;然而,这依旧非常值得投入,因为回报依然丰厚而宝贵。我认为我们正处在这样的时期。

另外,正如我刚才所说,我个人的判断——当然还需进一步验证——是要真正迈向 AGI(通用人工智能),可能还需要一两次重大的研究性突破。不过在此之前,我们显然需要尽可能扩展这些基础模型的规模,尤其是我们正在构建的多模态基础模型,目前它们依然在不断取得令人振奋的进展。

Kevin Roose Uh, which of the many benchmarks thatyou showed off today do you feel like isgoing to matter most to the averageuser?

凯文·鲁斯: 你们今天展示了这么多基准测试,你认为哪一项对普通用户而言最为关键、最具影响力?

Josh Woodward Oh, that’s a good question. I I thinkmost people don’t look at the benchmarksas closely as we do, but the benchmarksare always a proxy, right? So, you lookat something like cracking the 1500 ELOon LM Marina. Um, that’s great, but whatreally matters is kind of the usersatisfaction in the products, too. And Ithink what’s been encouraging to us isthese are still moving in the samedirection. They’re good proxies for eachother. And so, ultimately, I think we’llwe’ll put out all the benchmarks andwe’re very proud of them and theyrepresent amazing progress, but you alsohave to be able to translate that intoproduct experiences that matter. And so,we try to do both with every one ofthese releases.

这是个好问题。我认为,大多数人并不会像我们一样密切关注各类基准测试,但归根结底,这些测试只是一个参考坐标。比如在 LM Arena(语言模型竞技场)上突破 1500 ELO(国际象棋等级分),固然令人振奋,但真正重要的,是用户在实际使用产品时的满意度。让我们欣慰的是,这两者的走势目前高度一致,彼此能够相互印证。因此,我们会公布所有的基准测试成绩,并为这些成果感到自豪——它们确实体现了非凡的进步。但最终,关键在于将这些技术化为触手可及、真正有价值的产品体验。所以,每一次版本发布,我们都会力求兼顾这两方面。

Casey Newton Any new dangerous capabilities or safetyconcerns? uh that come with theincreased power of the model.

随着模型性能的增强,这次是否带来了新的高风险功能或安全隐忧?

Demis Hassabis I think well we’ve done we’ve takenquite a long time on this model tobecause it’s it’s frontier and um youknow has some new capabilities and it’sit’s very capable as you can see fromthe benchmarks and um and as as Joshsaid we don’t we don’t you know we makesure to not overindex internally onthose benchmarks. They’re just a proxyfor overall performance and that’s whywe care about them across the board andthen ultimately how how our usersexperience them. Um but we spend a lotof time on on on testing safety testingall the different dimensions uh with thesafety institutes and also externaltesters that we work with as well uh aswell as of course doing a ton ofinternal testing. So I would say this isour most thoroughly tested uh model sofar.

德米斯·哈萨比斯: 我认为,我们在这个模型上的研发周期比以往更长,原因在于它处于技术前沿,并且具备多项全新的能力。从各类基准测试的结果来看,它的表现极为出色。不过,正如乔希刚刚提到的,我们在内部并不会过度依赖这些基准测试。毕竟,它们只是整体性能的一个参考指标,我们更关注的是模型在各个方面的全面表现,以及最终用户的真实使用体验。

在安全性方面,我们投入了大量精力进行全面测试,覆盖了各个维度。我们不仅与安全研究机构密切合作,还邀请了外部测试人员参与,同时自然也进行了大量内部测试。因此,我可以毫不夸张地说,这是迄今为止我们经过最为严谨、最为彻底测试的模型。

Casey Newton Do you want to mention any of those sortof new capabilities that popped upwhether or not it was like for a safetything? Was there something in therewhere you thought, “Okay, yeah, wedefinitely need to make sure we’resending this to a bunch of

你想谈谈那些新出现的功能吗?不管是不是出于安全考虑,有没有哪一项让你心里一瞬间闪过,“对,这个我们必须严加把关,在推送给大批用户之前一定要格外注意”?

Demis Hassabis Well, look, it’s just making sure wewe’ve worked really hard on things liketool call usage and function calling andand these kinds of things. Obviously,they’re super important for codingcapabilities and and developers wantthat and so on and it’s very importantin general for reasoning. Um, but italso makes them more capable for for forum riskier things too like cyber. So wehave to be you know we have to be sortof doubly cautious as we improve thosedimensions for all the good use casesthat we’re continually checking on allthose kinds of measures that um theycan’t be they can’t be misused.

德米斯·哈萨比斯: 我们一直在全力打磨诸如工具调用(tool call usage)、函数调用(function calling)等功能,这些对编程能力至关重要,也是开发者亟需的核心特性,对整体推理能力的提升同样非常关键。当然,这也意味着模型在一些高风险领域,例如网络安全方面的实力会显著增强。因此,在优化这些维度时,我们必须格外谨慎,在持续验证所有正向应用场景的同时,严密防范任何可能的滥用。

Kevin Roose Are we in an AI bubble?

你认为我们现在正处在一场 AI 行业的泡沫之中吗?

Demis Hassabis Uh I think uh we it’s it’s too binary aquestion I would say. I I think uh Imean my view on this, this is juststrictly my own opinion, is that thereare some parts of the of the AI industrythat are probably in a bubble. Um youknow, if you look at like seedinvestment rounds being multi10 billionrounds with basically nothing, it seemsum I mean there’s talented teams, but itseems like uh that that might be thefirst signs of some kind of bubble. Uhon the other hand, you know, I thinkthere’s a lot of amazing work and valueto at least from our perspective that wesee that not only are there all the newproduct areas. So Gemini app, notebookLM, but thinking more forward, robotics,gaming, I mean there’s incredible usesof and and not just Gemini, but some ofour other models, Genie, you can imaginemy my old games paying background, youknow, I’m itching to to to think aboutwhat could be done there. And I and drugdiscovery, we’re doing with isomeorphicand Whimo. And so there’s all these newgreen field areas. They’re going to takea while to mature into massivemultiundred billion dollar businesses,but I think that there’s actuallypotential for half a dozen to a dozenthere that that that I think Alphabetwill be involved with, which I’m reallyexcited about. Um, but also immediatereturns. We got of course the engineroom, you know, this is the engine roompart of Google where we’re pushing thisinto all of these incredible, you know,multi-billion user products that peopleuse every day. And there’s there’s justalmost we have so many ideas. It’s justabout execution. Like how would you rereorganize workspace around thatAndroid, YouTube, there’s just so muchpotential there. And I think a lot ofthat will also bring in uh near-termnear-term revenue and and and directreturns while we’re also investing in uhthe future. Uh not to speak of, youknow, cloud revenue and TPUs and all ofthat. Uh uh which I think is also goingto be huge. So I feel really good aboutwhere we are as Alphabet whether or notthere’s a bubble or not. I think our jobis to be uh winning in both cases,right? If there’s no bubble and andthings carry on, then we’re going totake advantage of that opportunity. Butif there is some sort of bubble andthere’s a retrenchment, I think we’llalso be best placed to take advantage ofthat scenario as well.

德米斯·哈萨比斯: 嗯,我觉得这个问题提得有点过于简单化了。就我个人观点来看——仅代表我自己——AI 行业的确有些领域可能已显现出泡沫迹象。比如现在的种子轮融资,随便一轮就能达到几十亿美元,然而项目本身几乎还没有实质性的成果。团队固然非常优秀,但这种现象或许正是泡沫初起的信号。

不过话说回来,从我们的视角看,行业中依然充满了令人振奋的创新与价值。除了新推出的产品,比如 Gemini App、Notebook LM(笔记本语言模型),再往前看,机器人、电子游戏这些领域都蕴藏着不可思议的应用前景。而且不仅仅是 Gemini,我们还有其他模型,比如 Genie。结合我早年从事游戏开发的背景,可以想象我有多迫不及待想探索它们在游戏里能实现的可能性。

在药物研发方面,我们正与 Isomorphic 和 Whimo 合作,推动重大进展。总之,这些崭新的“绿地”领域虽然可能需要几年甚至更长时间才能孕育成数千亿美元级的庞大业务,但我认为至少有六到十二个方向具备巨大潜力,而 Alphabet(谷歌母公司)必将参与其中,这让我倍感兴奋。

与此同时,我们已有切实的即期回报。作为谷歌的“引擎室”,我们正将这些技术注入每天为数十亿用户服务的核心产品。创意琳琅满目,关键在于落地执行,比如围绕这些技术重新构建 Workspace(谷歌办公套件)、Android、YouTube——潜力巨大。这些创新不仅能够带来可观的短期收入和直接收益,我们也在持续投资未来。更不用说云服务、TPU(张量处理单元)等业务——它们同样将发挥重要作用。

因此,我对 Alphabet 当前的布局充满信心。无论有没有泡沫,我们的任务都是在各种局面下保持优势。如果行业持续繁荣,我们会全力把握机遇;即使出现泡沫并带来收缩,我们也将是最有能力应对并从中受益的公司。

Casey Newton All right, let’s imagine it’sThanksgiving coming up and it’s it’s theBay Area and one of our listeners uh youknow changes the subject from politicswhich is upsetting everyone to AI givegive people something to be excitedabout and someone say hey I heard Gemini3 just came out like what could itactually do what’s the example that youwould have our listeners show theirfriends whether it’s on their phone andtheir laptop to be get a load of thisand save Thanksgiving

好吧,假设感恩节快到了,地点是在湾区。我们的某位听众忽然把话题从令人头痛的政治,巧妙地转到人工智能——给大家换个让人兴奋的话题。有人说:“嘿,我听说 Gemini 3 刚上线,它到底能做些什么?有没有什么靠谱的例子,可以让大家在手机或笔记本上当场演示,惊艳一下朋友们,顺便拯救这场感恩节聚会?”

Josh Woodward yeah I don’t I don’t know if it’ll saveThanksgiving but it could probablyprovide some laughs you know We’re ourimagery models in Gemini are still bestin the world. So what we what I wouldsay grab your phone can be, you know,iPhone, Android, doesn’t matter. Pull itout, you can take a selfie, uh putyourself in it and edit it. People arestill doing that at huge amounts. Um andit’s great fun. And then I think you canthen show off any kind of othercapabilities in the new Gemini 3alongside it. So, this is what we’reseeing people kind of coming for a lotof these interesting use cases and thenstarting to try other parts of the app,too.

乔什·伍德沃德: 嗯,我不敢说它一定能拯救感恩节,但肯定能带来不少笑声。我们的 Gemini 图像生成模型仍然是世界顶尖水准。所以我建议,随手拿起你的手机——无论是 iPhone 还是安卓——拍一张自拍,把自己加进去,然后自由发挥去编辑。现在很多人依然在大量玩这些功能,真的很有意思。接着,你还可以顺势展示 Gemini 3 的其他新能力。我们发现,很多用户正是因为这些有趣的用法而来,随后开始探索应用里的其他功能。

Kevin Roose You heard it here. Nano Banano will saveThanksgiving dinner. Um, gentlemen, thank you. It’s great totalk. Um, and thanks for making thetime.

你们都听到了,Nano Banano(纳米香蕉)将拯救感恩节晚宴。嗯,各位,衷心感谢,和你们聊得非常愉快,也感谢你们在百忙之中抽出时间参加节目。

Demis Hassabis / Josh Woodward Thanks. Thanks for having us.

德米斯·哈萨比斯 / 乔什·伍德沃德
感谢邀请,能来到这里我们非常荣幸。

Kevin Roose Oh, good. Thank you.

凯文:
太好了,非常感谢。