Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
This is a brief summary of what we believe to be the most important takeaways from our new paper and from our findings shown in the o1 system card. We also specifically clarify what we think we did NOT show. Paper: https://www.apolloresearch.ai/research/scheming-reasoning-evaluations Twitter about paper: https://x.com/apolloaisafety/status/1864735819207995716 Twitter about o1 system card: https://x.com/apolloaisafety/status/1864737158226928124 What we think the most important findings are Models are now capable enough to do in-context scheming reasoningWe say an AI system is “scheming” if it covertly pursues misaligned goals, hiding its true capabilities andobjectives. We think that in order to scheme, models likely need to be goal-directed, situationally aware, and capable enough to reason about scheming as a strategy. In principle, models might acquire situational awareness and stable long-term goals during training, and then scheme in pursuit of those goals. We only study the simpler case of “in-context scheming,” where [...] ---Outline:(00:28) What we think the most important findings are(00:33) Models are now capable enough to do in-context scheming reasoning(03:53) The scheming actions can sometimes be quite subtle(05:05) Models sometimes double down on their deception(06:35) Models sometimes scheme without any goal nudging(07:34) Sonnet-3.5 and opus-3 can sandbag in pursuit of helpfulness(08:22) Sonnet-3.5 can sandbag without relying on CoT reasoning(09:28) What we are not claiming:(10:33) Takeaways(10:36) Chain-of-thought monitoring(11:42) Chain-of-thought access(13:27) Inability safety cases now have to be more specificThe original text contained 7 images which were described by AI. --- First published: December 5th, 2024 Source: https://www.lesswrong.com/posts/8gy7c8GAPkuu6wTiX/frontier-models-are-capable-of-in-context-scheming --- Narrated by TYPE III AUDIO. ---Images from the article:
12/6/24 • 14:46
TLDR: LessWrong + Lighthaven need about $3M for the next 12 months. Donate here, or send me an email, DM or signal message (+1 510 944 3235) if you want to support what we do. Donations are tax-deductible in the US. Reach out for other countries, we can likely figure something out. We have big plans for the next year, and due to a shifting funding landscape we need support from a broader community more than in any previous year.I've been running LessWrong/Lightcone Infrastructure for the last 7 years. During that time we have grown into the primary infrastructure provider for the rationality and AI safety communities. "Infrastructure" is a big fuzzy word, but in our case, it concretely means: We build and run LessWrong.com and the AI Alignment Forum.[1]We built and run Lighthaven (lighthaven.space), a ~30,000 sq. ft. campus in downtown Berkeley where we [...] ---Outline:(03:52) LessWrong(06:36) Does LessWrong influence important decisions?(09:37) Does LessWrong make its readers/writers more sane?(11:37) LessWrong and intellectual progress(19:08) Lighthaven(22:04) The economics of Lighthaven(24:26) How does Lighthaven improve the world?(28:41) The relationship between Lighthaven and LessWrong(30:36) Lightcone and the funding ecosystem(35:17) Our work on funding infrastructure(37:57) If its worth doing its worth doing with made-up statistics(38:44) The OP GCR capacity building team survey(42:09) Lightcone/LessWrong cannot be funded by just running ads(43:55) Comparing LessWrong to other websites and apps(45:00) Lighthaven event surplus(47:13) The future of (the) Lightcone(48:02) Lightcone culture and principles(50:04) Things I wish I had time and funding for(59:31) What do you get from donating to Lightcone?(01:02:03) Tying everything togetherThe original text contained 22 footnotes which were omitted from this narration. The original text contained 7 images which were described by AI. --- First published: November 30th, 2024 Source: https://www.lesswrong.com/posts/5n2ZQcbc7r4R8mvqc/the-lightcone-is-nothing-without-its-people-lw-lighthaven-s-5 --- Narrated by TYPE III AUDIO. ---Images from the article:
11/30/24 • 63:15
Balsa Policy Institute chose as its first mission to lay groundwork for the potential repeal, or partial repeal, of section 27 of the Jones Act of 1920. I believe that this is an important cause both for its practical and symbolic impacts.The Jones Act is the ultimate embodiment of our failures as a nation.After 100 years, we do almost no trade between our ports via the oceans, and we build almost no oceangoing ships.Everything the Jones Act supposedly set out to protect, it has destroyed. Table of Contents What is the Jones Act?Why Work to Repeal the Jones Act?Why Was the Jones Act Introduced?What is the Effect of the Jones Act?What Else Happens When We Ship More Goods Between Ports?Emergency Case Study: Salt Shipment to NJ in [...] ---Outline:(00:38) What is the Jones Act?(01:33) Why Work to Repeal the Jones Act?(02:48) Why Was the Jones Act Introduced?(03:19) What is the Effect of the Jones Act?(06:52) What Else Happens When We Ship More Goods Between Ports?(07:14) Emergency Case Study: Salt Shipment to NJ in the Winter of 2013-2014(12:04) Why no Emergency Exceptions?(15:02) What Are Some Specific Non-Emergency Impacts?(18:57) What Are Some Specific Impacts on Regions?(22:36) What About the Study Claiming Big Benefits?(24:46) What About the Need to ‘Protect’ American Shipbuilding?(28:31) The Opposing Arguments Are Disingenuous and Terrible(34:07) What Alternatives to Repeal Do We Have?(35:33) What Might Be a Decent Instinctive Counterfactual?(41:50) What About Our Other Protectionist and Cabotage Laws?(43:00) What About Potential Marine Highways, or Short Sea Shipping?(43:48) What Happened to All Our Offshore Wind?(47:06) What Estimates Are There of Overall Cost?(49:52) What Are the Costs of Being American Flagged?(50:28) What Are the Costs of Being American Made?(51:49) What are the Consequences of Being American Crewed?(53:11) What Would Happen in a Real War?(56:07) Cruise Ship Sanity Partially Restored(56:46) The Jones Act Enforcer(58:08) Who Benefits?(58:57) Others Make the Case(01:00:55) An Argument That We Were Always Uncompetitive(01:02:45) What About John Arnold's Case That the Jones Act Can’t Be Killed?(01:09:34) What About the Foreign Dredge Act of 1906?(01:10:24) Fun Stories--- First published: November 27th, 2024 Source: https://www.lesswrong.com/posts/dnH2hauqRbu3GspA2/repeal-the-jones-act-of-1920 --- Narrated by TYPE III AUDIO. ---Images from the article:
11/29/24 • 73:53
This is the full text of a post from "The Obsolete Newsletter," a Substack that I write about the intersection of capitalism, geopolitics, and artificial intelligence. I’m a freelance journalist and the author of a forthcoming book called Obsolete: Power, Profit, and the Race for Machine Superintelligence. Consider subscribing to stay up to date with my work. An influential congressional commission is calling for a militarized race to build superintelligent AI based on threadbare evidence The US-China AI rivalry is entering a dangerous new phase. Earlier today, the US-China Economic and Security Review Commission (USCC) released its annual report, with the following as its top recommendation: Congress establish and fund a Manhattan Project-like program dedicated to racing to and acquiring an Artificial General Intelligence (AGI) capability. AGI is generally defined as systems that are as good as or better than human capabilities across all cognitive domains and [...] ---Outline:(00:28) An influential congressional commission is calling for a militarized race to build superintelligent AI based on threadbare evidence(03:09) What China has said about AI(06:14) Revealing technical errors(08:29) ConclusionThe original text contained 1 image which was described by AI. --- First published: November 20th, 2024 Source: https://www.lesswrong.com/posts/KPBPc7RayDPxqxdqY/china-hawks-are-manufacturing-an-ai-arms-race --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
11/29/24 • 10:11
In contract law, there's this thing called a “representation”. Example: as part of a contract to sell my house, I might “represent that” the house contains no asbestos. How is this different from me just, y’know, telling someone that the house contains no asbestos? Well, if it later turns out that the house does contain asbestos, I’ll be liable for any damages caused by the asbestos (like e.g. the cost of removing it).In other words: a contractual representation is a factual claim along with insurance against that claim being false.I claim[1] that people often interpret everyday factual claims and predictions in a way similar to contractual representations. Because “representation” is egregiously confusing jargon, I’m going to call this phenomenon “assurance”.Prototypical example: I tell my friend that I plan to go to a party around 9 pm, and I’m willing to give them a ride. My friend [...] The original text contained 1 footnote which was omitted from this narration. --- First published: October 20th, 2024 Source: https://www.lesswrong.com/posts/p9rQJMRq4qtB9acds/information-vs-assurance --- Narrated by TYPE III AUDIO.
11/27/24 • 04:31
Epistemic Status: 13 years working as a therapist for a wide variety of populations, 5 of them working with rationalists and EA clients. 7 years teaching and directing at over 20 rationality camps and workshops. This is an extremely short and colloquially written form of points that could be expanded on to fill a book, and there is plenty of nuance to practically everything here, but I am extremely confident of the core points in this frame, and have used it to help many people break out of or avoid manipulative practices.TL;DR: Your wants and preferences are not invalidated by smarter or more “rational” people's preferences. What feels good or bad to someone is not a monocausal result of how smart or stupid they are. Alternative titles to this post are "Two people are enough to form a cult" and "Red flags if dating rationalists," but this [...] ---Outline:(02:53) 1) You are not too stupid to know what you want.(07:34) 2) Feeling hurt is not a sign of irrationality.(13:15) 3) Illegible preferences are not invalid.(17:22) 4) Your preferences do not need to fully match your communitys.(21:43) Final ThoughtsThe original text contained 3 footnotes which were omitted from this narration. The original text contained 2 images which were described by AI. --- First published: November 26th, 2024 Source: https://www.lesswrong.com/posts/LifRBXdenQDiX4cu8/you-are-not-too-irrational-to-know-your-preferences-1 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
11/27/24 • 23:36
[Warning: This post is probably only worth reading if you already have opinions on the Solomonoff induction being malign, or at least heard of the concept and want to understand it better.] IntroductionI recently reread the classic argument from Paul Christiano about the Solomonoff prior being malign, and Mark Xu's write-up on it. I believe that the part of the argument about the Solomonoff induction is not particularly load-bearing, and can be replaced by a more general argument that I think is easier to understand. So I will present the general argument first, and only explain in the last section how the Solomonoff prior can come into the picture.I don't claim that anything I write here is particularly new, I think you can piece together this picture from various scattered comments on the topic, but I think it's good to have it written up in one place. [...] ---Outline:(00:17) Introduction(00:56) How an Oracle gets manipulated(05:25) What went wrong?(05:28) The AI had different probability estimates than the humans for anthropic reasons(07:01) The AI was thinking in terms of probabilities and not expected values(08:40) Probabilities are cursed in general, only expected values are real(09:19) What about me?(13:00) Should this change any of my actions?(16:25) How does the Solomonoff prior come into the picture?(20:10) ConclusionThe original text contained 14 footnotes which were omitted from this narration. --- First published: November 17th, 2024 Source: https://www.lesswrong.com/posts/KSdqxrrEootGSpKKE/the-solomonoff-prior-is-malign-is-a-special-case-of-a --- Narrated by TYPE III AUDIO.
11/25/24 • 21:02
Audio note: this article contains 33 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. Many of you readers may instinctively know that this is wrong. If you flip a coin (50% chance) twice, you are not guaranteed to get heads. The odds of getting a heads are 75%. However you may be surprised to learn that there is some truth to this statement; modifying the statement just slightly will yield not just a true statement, but a useful one.It's a spoiler, though. If you want to figure this out as you read this article yourself, you should skip this and then come back. Ok, ready? Here it is:It's a _1/n_ chance and I did it _n_ times, so the odds should be... _63%_. Almost always. The math:Suppose you're [...] ---Outline:(01:04) The math:(02:12) Hold on a sec, that formula looks familiar...(02:58) So, if something is a _1/n_ chance, and I did it _n_ times, the odds should be... _63\\%_.(03:12) What Im NOT saying:--- First published: November 18th, 2024 Source: https://www.lesswrong.com/posts/pNkjHuQGDetRZypmA/it-s-a-10-chance-which-i-did-10-times-so-it-should-be-100 --- Narrated by TYPE III AUDIO.
11/20/24 • 04:58
As part of the court case between Elon Musk and Sam Altman, a substantial number of emails between Elon, Sam Altman, Ilya Sutskever, and Greg Brockman have been released as part of the court proceedings. I have found reading through these really valuable, and I haven't found an online source that compiles all of them in an easy to read format. So I made one.I used AI assistance to generate this, which might have introduced errors. Check the original source to make sure it's accurate before you quote it: https://www.courtlistener.com/docket/69013420/musk-v-altman/ [1] Sam Altman to Elon Musk - May 25, 2015Been thinking a lot about whether it's possible to stop humanity from developing AI.I think the answer is almost definitely not.If it's going to happen anyway, it seems like it would be good for someone other than Google to do it first.Any thoughts on [...] ---Outline:(00:36) Sam Altman to Elon Musk - May 25, 2015(01:19) Elon Musk to Sam Altman - May 25, 2015(01:28) Sam Altman to Elon Musk - Jun 24, 2015(03:31) Elon Musk to Sam Altman - Jun 24, 2015(03:39) Greg Brockman to Elon Musk, (cc: Sam Altman) - Nov 22, 2015(06:06) Elon Musk to Sam Altman - Dec 8, 2015(07:07) Sam Altman to Elon Musk - Dec 8, 2015(07:59) Sam Altman to Elon Musk - Dec 11, 2015(08:32) Elon Musk to Sam Altman - Dec 11, 2015(08:50) Sam Altman to Elon Musk - Dec 11, 2015(09:01) Elon Musk to Sam Altman - Dec 11, 2015(09:08) Sam Altman to Elon Musk - Dec 11, 2015(09:26) Elon Musk to: Ilya Sutskever, Pamela Vagata, Vicki Cheung, Diederik Kingma, Andrej Karpathy, John D. Schulman, Trevor Blackwell, Greg Brockman, (cc:Sam Altman) - Dec 11, 2015(10:35) Greg Brockman to Elon Musk, (cc: Sam Altman) - Feb 21, 2016(15:11) Elon Musk to Greg Brockman, (cc: Sam Altman) - Feb 22, 2016(15:54) Greg Brockman to Elon Musk, (cc: Sam Altman) - Feb 22, 2016(16:14) Greg Brockman to Elon Musk, (cc: Sam Teller) - Mar 21, 2016(17:58) Elon Musk to Greg Brockman, (cc: Sam Teller) - Mar 21, 2016(18:08) Sam Teller to Elon Musk - April 27, 2016(19:28) Elon Musk to Sam Teller - Apr 27, 2016(20:05) Sam Altman to Elon Musk, (cc: Sam Teller) - Sep 16, 2016(25:31) Elon Musk to Sam Altman, (cc: Sam Teller) - Sep 16, 2016(26:36) Sam Altman to Elon Musk, (cc: Sam Teller) - Sep 16, 2016(27:01) Elon Musk to Sam Altman, (cc: Sam Teller) - Sep 16, 2016(27:17) Sam Altman to Elon Musk, (cc: Sam Teller) - Sep 16, 2016(27:29) Sam Teller to Elon Musk - Sep 20, 2016(27:55) Elon Musk to Sam Teller - Sep 21, 2016(28:11) Ilya Sutskever to Elon Musk, Greg Brockman - Jul 20, 2017(29:41) Shivon Zilis to Elon Musk, (cc: Sam Teller) - Aug 28, 2017(33:15) Elon Musk to Shivon Zilis, (cc: Sam Teller) - Aug 28, 2017(33:30) Ilya Sutskever to Elon Musk, Sam Altman, (cc: Greg Brockman, Sam Teller, Shivon Zilis) - Sep 20, 2017(39:05) Elon Musk to Ilya Sutskever, Sam Altman (cc: Greg Brockman, Sam Teller, Shivon Zilis) - Sep 20, 2017(39:24) Sam Altman to Elon Musk, Ilya Sutskever (cc: Greg Brockman, Sam Teller, Shivon Zilis) - Sep 21, 2017(39:40) Shivon Zilis to Elon Musk, (cc: Sam Teller) - Sep 22, 2017(40:10) Elon Musk to Shivon Zilis (cc: Sam Teller) - Sep 22, 2017(40:20) Shivon Zilis to Elon Musk, (cc: Sam Teller) - Sep 22, 2017(41:54) Sam Altman to Elon Musk (cc: Greg Brockman, Ilya Sutskever, Sam Teller, Shivon Zilis) - Jan 21, 2018(42:28) Elon Musk to Sam Altman (cc: Greg Brockman, Ilya Sutskever, Sam Teller, Shivon Zilis) - Jan 21, 2018(42:42) Andrej Karpathy to Elon Musk, (cc: Shivon Zilis) - Jan 31, 2018
11/19/24 • 63:06
Epistemic status: Toy model. Oversimplified, but has been anecdotally useful to at least a couple people, and I like it as a metaphor. IntroductionI’d like to share a toy model of willpower: your psyche's conscious verbal planner “earns” willpower (earns a certain amount of trust with the rest of your psyche) by choosing actions that nourish your fundamental, bottom-up processes in the long run. For example, your verbal planner might expend willpower dragging you to disappointing first dates, then regain that willpower, and more, upon finding a date that leads to good long-term romance. Wise verbal planners can acquire large willpower budgets by making plans that, on average, nourish your fundamental processes. Delusional or uncaring verbal planners, on the other hand, usually become “burned out” – their willpower budget goes broke-ish, leaving them little to no access to willpower.I’ll spend the next section trying to stick this [...] ---Outline:(00:17) Introduction(01:10) On processes that lose their relationship to the unknown(02:58) Ayn Rand's model of “living money”(06:44) An analogous model of “living willpower” and burnout.The original text contained 2 footnotes which were omitted from this narration. --- First published: November 16th, 2024 Source: https://www.lesswrong.com/posts/xtuk9wkuSP6H7CcE2/ayn-rand-s-model-of-living-money-and-an-upside-of-burnout --- Narrated by TYPE III AUDIO.
11/18/24 • 09:02
Midjourney, “infinite library”I’ve had post-election thoughts percolating, and the sense that I wanted to synthesize something about this moment, but politics per se is not really my beat. This is about as close as I want to come to the topic, and it's a sidelong thing, but I think the time is right.It's time to start thinking again about neutrality. Neutral institutions, neutral information sources. Things that both seem and are impartial, balanced, incorruptible, universal, legitimate, trustworthy, canonical, foundational.1We don’t have them. Clearly.We live in a pluralistic and divided world. Everybody's got different “reality-tunnels.” Attempts to impose one worldview on everyone fail.To some extent this is healthy and inevitable; we are all different, we do disagree, and it's vain to hope that “everyone can get on the same page” like some kind of hive-mind. On the other hand, lots of things aren’t great [...] ---Outline:(02:14) Not “Normality”(04:36) What is Neutrality Anyway?(07:43) “Neutrality is Impossible” is Technically True But Misses The Point(10:50) Systems of the World(15:05) Let's Talk About Online--- First published: November 13th, 2024 Source: https://www.lesswrong.com/posts/WxnuLJEtRzqvpbQ7g/neutrality --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
11/17/24 • 24:08
Trump and the Republican party will yield broad governmental control during what will almost certainly be a critical period for AGI development. In this post, we want to briefly share various frames and ideas we’ve been thinking through and actively pitching to Republican lawmakers over the past months in preparation for this possibility.Why are we sharing this here? Given that >98% of the EAs and alignment researchers we surveyed earlier this year identified as everything-other-than-conservative, we consider thinking through these questions to be another strategically worthwhile neglected direction. (Along these lines, we also want to proactively emphasize that politics is the mind-killer, and that, regardless of one's ideological convictions, those who earnestly care about alignment must take seriously the possibility that Trump will be the US president who presides over the emergence of AGI—and update accordingly in light of this possibility.)Political orientation: combined sample of (non-alignment) [...] ---Outline:(01:20) AI-not-disempowering-humanity is conservative in the most fundamental sense(03:36) Weve been laying the groundwork for alignment policy in a Republican-controlled government(08:06) Trump and some of his closest allies have signaled that they are genuinely concerned about AI risk(09:11) Avoiding an AI-induced catastrophe is obviously not a partisan goal(10:48) Winning the AI race with China requires leading on both capabilities and safety(13:22) Concluding thoughtThe original text contained 4 footnotes which were omitted from this narration. The original text contained 1 image which was described by AI. --- First published: November 15th, 2024 Source: https://www.lesswrong.com/posts/rfCEWuid7fXxz4Hpa/making-a-conservative-case-for-alignment --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
11/16/24 • 14:20
As part of the court case between Elon Musk and Sam Altman, a substantial number of emails between Elon, Sam Altman, Ilya Sutskever, and Greg Brockman have been released as part of the court proceedings. I have found reading through these really valuable, and I haven't found an online source that compiles all of them in an easy to read format. So I made one.I used AI assistance to generate this, which might have introduced errors. Check the original source to make sure it's accurate before you quote it: https://www.courtlistener.com/docket/69013420/musk-v-altman/ [1] Sam Altman to Elon Musk - May 25, 2015Been thinking a lot about whether it's possible to stop humanity from developing AI.I think the answer is almost definitely not.If it's going to happen anyway, it seems like it would be good for someone other than Google to do it first.Any thoughts on [...] ---Outline:(00:37) Sam Altman to Elon Musk - May 25, 2015(01:20) Elon Musk to Sam Altman - May 25, 2015(01:29) Sam Altman to Elon Musk - Jun 24, 2015(03:33) Elon Musk to Sam Altman - Jun 24, 2015(03:41) Greg Brockman to Elon Musk, (cc: Sam Altman) - Nov 22, 2015(06:07) Elon Musk to Sam Altman - Dec 8, 2015(07:09) Sam Altman to Elon Musk - Dec 8, 2015(08:01) Sam Altman to Elon Musk - Dec 11, 2015(08:34) Elon Musk to Sam Altman - Dec 11, 2015(08:52) Sam Altman to Elon Musk - Dec 11, 2015(09:02) Elon Musk to Sam Altman - Dec 11, 2015(09:10) Sam Altman to Elon Musk - Dec 11, 2015(09:28) Elon Musk to: Ilya Sutskever, Pamela Vagata, Vicki Cheung, Diederik Kingma, Andrej Karpathy, John D. Schulman, Trevor Blackwell, Greg Brockman, (cc:Sam Altman) - Dec 11, 2015(10:37) Greg Brockman to Elon Musk, (cc: Sam Altman) - Feb 21, 2016(15:13) Elon Musk to Greg Brockman, (cc: Sam Altman) - Feb 22, 2016(15:55) Greg Brockman to Elon Musk, (cc: Sam Altman) - Feb 22, 2016(16:16) Greg Brockman to Elon Musk, (cc: Sam Teller) - Mar 21, 2016(17:59) Elon Musk to Greg Brockman, (cc: Sam Teller) - Mar 21, 2016(18:09) Sam Teller to Elon Musk - April 27, 2016(19:30) Elon Musk to Sam Teller - Apr 27, 2016(20:06) Sam Altman to Elon Musk, (cc: Sam Teller) - Sep 16, 2016(25:32) Elon Musk to Sam Altman, (cc: Sam Teller) - Sep 16, 2016(26:38) Sam Altman to Elon Musk, (cc: Sam Teller) - Sep 16, 2016(27:03) Elon Musk to Sam Altman, (cc: Sam Teller) - Sep 16, 2016(27:18) Sam Altman to Elon Musk, (cc: Sam Teller) - Sep 16, 2016(27:31) Sam Teller to Elon Musk - Sep 20, 2016(27:57) Elon Musk to Sam Teller - Sep 21, 2016(28:13) Ilya Sutskever to Elon Musk, Greg Brockman - Jul 20, 2017(29:42) Shivon Zilis to Elon Musk, (cc: Sam Teller) - Aug 28, 2017(33:16) Elon Musk to Shivon Zilis, (cc: Sam Teller) - Aug 28, 2017(33:32) Ilya Sutskever to Elon Musk, Sam Altman, (cc: Greg Brockman, Sam Teller, Shivon Zilis) - Sep 20, 2017(39:07) Elon Musk to Ilya Sutskever (cc: Sam Altman; Greg Brockman; Sam Teller; Shivon Zilis) - Sep 20, 2017 (2:17PM)(39:42) Elon Musk to Ilya Sutskever, Sam Altman (cc: Greg Brockman, Sam Teller, Shivon Zilis) - Sep 20, 2017 (3:08PM)(40:03) Sam Altman to Elon Musk, Ilya Sutskever (cc: Greg Brockman, Sam Teller, Shivon Zilis) - Sep 21, 2017(40:18) Shivon Zilis to Elon Musk, (cc: Sam Teller) - Sep 22, 2017(40:49) Elon Musk to Shivon Zilis (cc: Sam Teller) - Sep 22, 2017(40:59) Shivon Zilis to Elon Musk, (cc: Sam Teller) - Sep 22, 2017(42:33) Sam Altman to Elon Musk (cc: Greg Brockman, Ilya Sutskever, Sam Teller, Shivon Zilis) - Jan 21, 2018(43:07) Elon Musk to Sam Altman (cc: Greg Brockman, Ilya S
11/16/24 • 63:44
Thanks to Holden Karnofsky, David Duvenaud, and Kate Woolverton for useful discussions and feedback.Following up on our recent “Sabotage Evaluations for Frontier Models” paper, I wanted to share more of my personal thoughts on why I think catastrophic sabotage is important and why I care about it as a threat model. Note that this isn’t in any way intended to be a reflection of Anthropic's views or for that matter anyone's views but my own—it's just a collection of some of my personal thoughts.First, some high-level thoughts on what I want to talk about here: I want to focus on a level of future capabilities substantially beyond current models, but below superintelligence: specifically something approximately human-level and substantially transformative, but not yet superintelligent. While I don’t think that most of the proximate cause of AI existential risk comes from such models—I think most of the direct takeover [...] ---Outline:(02:31) Why is catastrophic sabotage a big deal?(02:45) Scenario 1: Sabotage alignment research(05:01) Necessary capabilities(06:37) Scenario 2: Sabotage a critical actor(09:12) Necessary capabilities(10:51) How do you evaluate a model's capability to do catastrophic sabotage?(21:46) What can you do to mitigate the risk of catastrophic sabotage?(23:12) Internal usage restrictions(25:33) Affirmative safety cases--- First published: October 22nd, 2024 Source: https://www.lesswrong.com/posts/Loxiuqdj6u8muCe54/catastrophic-sabotage-as-a-major-threat-model-for-human --- Narrated by TYPE III AUDIO.
11/15/24 • 27:19
Related: Book Review: On the Edge: The GamblersI have previously been heavily involved in sports betting. That world was very good to me. The times were good, as were the profits. It was a skill game, and a form of positive-sum entertainment, and I was happy to participate and help ensure the sophisticated customer got a high quality product. I knew it wasn’t the most socially valuable enterprise, but I certainly thought it was net positive.When sports gambling was legalized in America, I was hopeful it too could prove a net positive force, far superior to the previous obnoxious wave of daily fantasy sports. It brings me no pleasure to conclude that this was not the case. The results are in. Legalized mobile gambling on sports, let alone casino games, has proven to be a huge mistake. The societal impacts are far worse than I expected. Table [...] ---Outline:(01:02) The Short Answer(02:01) Paper One: Bankruptcies(07:03) Paper Two: Reduced Household Savings(08:37) Paper Three: Increased Domestic Violence(10:04) The Product as Currently Offered is Terrible(12:02) Things Sharp Players Do(14:07) People Cannot Handle Gambling on Smartphones(15:46) Yay and Also Beware Trivial Inconveniences (a future full post)(17:03) How Does This Relate to Elite Hypocrisy?(18:32) The Standard Libertarian Counterargument(19:42) What About Other Prediction Markets?(20:07) What Should Be DoneThe original text contained 3 images which were described by AI. --- First published: November 11th, 2024 Source: https://www.lesswrong.com/posts/tHiB8jLocbPLagYDZ/the-online-sports-gambling-experiment-has-failed --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
11/12/24 • 22:11
This post comes a bit late with respect to the news cycle, but I argued in a recent interview that o1 is an unfortunate twist on LLM technologies, making them particularly unsafe compared to what we might otherwise have expected:The basic argument is that the technology behind o1 doubles down on a reinforcement learning paradigm, which puts us closer to the world where we have to get the value specification exactly right in order to avert catastrophic outcomes. RLHF is just barely RL. - Andrej KarpathyAdditionally, this technology takes us further from interpretability. If you ask GPT4 to produce a chain-of-thought (with prompts such as "reason step-by-step to arrive at an answer"), you know that in some sense, the natural-language reasoning you see in the output is how it arrived at the answer.[1] This is not true of systems like o1. The o1 training rewards [...] The original text contained 1 footnote which was omitted from this narration. --- First published: November 11th, 2024 Source: https://www.lesswrong.com/posts/BEFbC8sLkur7DGCYB/o1-is-a-bad-idea --- Narrated by TYPE III AUDIO.
11/12/24 • 04:40
TL;DR: I'm presenting three recent papers which all share a similar finding, i.e. the safety training techniques for chat models don’t transfer well from chat models to the agents built from them. In other words, models won’t tell you how to do something harmful, but they are often willing to directly execute harmful actions. However, all papers find that different attack methods like jailbreaks, prompt-engineering, or refusal-vector ablation do transfer.Here are the three papers: AgentHarm: A Benchmark for Measuring Harmfulness of LLM AgentsRefusal-Trained LLMs Are Easily Jailbroken As Browser AgentsApplying Refusal-Vector Ablation to Llama 3.1 70B Agents What are language model agentsLanguage model agents are a combination of a language model and a scaffolding software. Regular language models are typically limited to being chat bots, i.e. they receive messages and reply to them. However, scaffolding gives these models access to tools which they can [...] ---Outline:(00:55) What are language model agents(01:36) Overview(03:31) AgentHarm Benchmark(05:27) Refusal-Trained LLMs Are Easily Jailbroken as Browser Agents(06:47) Applying Refusal-Vector Ablation to Llama 3.1 70B Agents(08:23) Discussion--- First published: November 3rd, 2024 Source: https://www.lesswrong.com/posts/ZoFxTqWRBkyanonyb/current-safety-training-techniques-do-not-fully-transfer-to --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
11/9/24 • 10:10
At least, if you happen to be near me in brain space.What advice would you give your younger self?That was the prompt for a class I taught at PAIR 2024. About a quarter of participants ranked it in their top 3 of courses at the camp and half of them had it listed as their favorite.I hadn’t expected that.I thought my life advice was pretty idiosyncratic. I never heard of anyone living their life like I have. I never encountered this method in all the self-help blogs or feel-better books I consumed back when I needed them.But if some people found it helpful, then I should probably write it all down. Why Listen to Me Though?I think it's generally worth prioritizing the advice of people who have actually achieved the things you care about in life. I can’t tell you if that's me [...] ---Outline:(00:46) Why Listen to Me Though?(04:22) Pick a direction instead of a goal(12:00) Do what you love but always tie it back(17:09) When all else fails, apply random searchThe original text contained 3 images which were described by AI. --- First published: September 28th, 2024 Source: https://www.lesswrong.com/posts/uwmFSaDMprsFkpWet/explore-more-a-bag-of-tricks-to-keep-your-life-on-the-rails --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
11/4/24 • 21:00
I open my eyes and find myself lying on a bed in a hospital room. I blink."Hello", says a middle-aged man with glasses, sitting on a chair by my bed. "You've been out for quite a long while.""Oh no ... is it Friday already? I had that report due -""It's Thursday", the man says."Oh great", I say. "I still have time.""Oh, you have all the time in the world", the man says, chuckling. "You were out for 21 years."I burst out laughing, but then falter as the man just keeps looking at me. "You mean to tell me" - I stop to let out another laugh - "that it's 2045?""January 26th, 2045", the man says."I'm surprised, honestly, that you still have things like humans and hospitals", I say. "There were so many looming catastrophes in 2024. AI misalignment, all sorts of [...] --- First published: November 4th, 2024 Source: https://www.lesswrong.com/posts/BarHSeciXJqzRuLzw/survival-without-dignity --- Narrated by TYPE III AUDIO.
11/4/24 • 29:37
Claim: memeticity in a scientific field is mostly determined, not by the most competent researchers in the field, but instead by roughly-median researchers. We’ll call this the “median researcher problem”.Prototypical example: imagine a scientific field in which the large majority of practitioners have a very poor understanding of statistics, p-hacking, etc. Then lots of work in that field will be highly memetic despite trash statistics, blatant p-hacking, etc. Sure, the most competent people in the field may recognize the problems, but the median researchers don’t, and in aggregate it's mostly the median researchers who spread the memes.(Defending that claim isn’t really the main focus of this post, but a couple pieces of legible evidence which are weakly in favor: People did in fact try to sound the alarm about poor statistical practices well before the replication crisis, and yet practices did not change, so clearly at least [...] --- First published: November 2nd, 2024 Source: https://www.lesswrong.com/posts/vZcXAc6txvJDanQ4F/the-median-researcher-problem-1 --- Narrated by TYPE III AUDIO.
11/4/24 • 02:58
This is a link post.We (Connor Leahy, Gabriel Alfour, Chris Scammell, Andrea Miotti, Adam Shimi) have just published The Compendium, which brings together in a single place the most important arguments that drive our models of the AGI race, and what we need to do to avoid catastrophe.We felt that something like this has been missing from the AI conversation. Most of these points have been shared before, but a “comprehensive worldview” doc has been missing. We’ve tried our best to fill this gap, and welcome feedback and debate about the arguments. The Compendium is a living document, and we’ll keep updating it as we learn more and change our minds.We would appreciate your feedback, whether or not you agree with us: If you do agree with us, please point out where you think the arguments can be made stronger, and contact us if there are [...] --- First published: October 31st, 2024 Source: https://www.lesswrong.com/posts/prm7jJMZzToZ4QxoK/the-compendium-a-full-argument-about-extinction-risk-from --- Narrated by TYPE III AUDIO.
11/1/24 • 04:18
There are two nuclear options for treating depression: Ketamine and TMS; This post is about the latter.TMS stands for Transcranial Magnetic Stimulation. Basically, it fixes depression via magnets, which is about the second or third most magical things that magnets can do.I don’t know a whole lot about the neuroscience - this post isn’t about the how or the why. It's from the perspective of a patient, and it's about the what.What is it like to get TMS? TMS The GatekeepingFor Reasons™, doctors like to gatekeep access to treatments, and TMS is no different. To be eligible, you generally have to have tried multiple antidepressants for several years and had them not work or stop working. Keep in mind that, while safe, most antidepressants involve altering your brain chemistry and do have side effects.Since TMS is non-invasive, doesn’t involve any drugs, and has basically [...] ---Outline:(00:35) TMS(00:38) The Gatekeeping(01:49) Motor Threshold Test(04:08) The Treatment(04:15) The Schedule(05:20) The Experience(07:03) The Sensation(08:21) Results(09:06) ConclusionThe original text contained 2 images which were described by AI. --- First published: October 31st, 2024 Source: https://www.lesswrong.com/posts/g3iKYS8wDapxS757x/what-tms-is-like --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
10/31/24 • 11:01
Epistemic status: model-building based on observation, with a few successful unusual predictions. Anecdotal evidence has so far been consistent with the model. This puts it at risk of seeming more compelling than the evidence justifies just yet. Caveat emptor.Imagine you're a very young child. Around, say, three years old.You've just done something that really upsets your mother. Maybe you were playing and knocked her glasses off the table and they broke.Of course you find her reaction uncomfortable. Maybe scary. You're too young to have detailed metacognitive thoughts, but if you could reflect on why you're scared, you wouldn't be confused: you're scared of how she'll react.She tells you to say you're sorry.You utter the magic words, hoping that will placate her.And she narrows her eyes in suspicion."You sure don't look sorry. Say it and mean it."Now you have a serious problem. [...] ---Outline:(02:16) Newcomblike self-deception(06:10) Sketch of a real-world version(08:43) Possible examples in real life(12:17) Other solutions to the problem(12:38) Having power(14:45) Occlumency(16:48) Solution space is maybe vast(17:40) Ending the need for self-deception(18:21) Welcome self-deception(19:52) Look away when directed to(22:59) Hypothesize without checking(25:50) Does this solve self-deception?(27:21) SummaryThe original text contained 7 footnotes which were omitted from this narration. --- First published: October 27th, 2024 Source: https://www.lesswrong.com/posts/5FAnfAStc7birapMx/the-hostile-telepaths-problem --- Narrated by TYPE III AUDIO.
10/28/24 • 28:38
This post includes a "flattened version" of an interactive diagram that cannot be displayed on this site. I recommend reading the original version of the post with the interactive diagram, which can be found here.Over the last few months, ARC has released a number of pieces of research. While some of these can be independently motivated, there is also a more unified research vision behind them. The purpose of this post is to try to convey some of that vision and how our individual pieces of research fit into it.Thanks to Ryan Greenblatt, Victor Lecomte, Eric Neyman, Jeff Wu and Mark Xu for helpful comments. A bird's eye viewTo begin, we will take a "bird's eye" view of ARC's research.[1] As we "zoom in", more nodes will become visible and we will explain the new nodes.An interactive version of the [...] ---Outline:(00:43) A birds eye view(01:00) Zoom level 1(02:18) Zoom level 2(03:44) Zoom level 3(04:56) Zoom level 4(07:14) How ARCs research fits into this picture(07:43) Further subproblems(10:23) ConclusionThe original text contained 2 footnotes which were omitted from this narration. The original text contained 3 images which were described by AI. --- First published: October 23rd, 2024 Source: https://www.lesswrong.com/posts/ztokaf9harKTmRcn4/a-bird-s-eye-view-of-arc-s-research --- Narrated by TYPE III AUDIO. ---Images from the article:
10/27/24 • 11:05
1. 4.4% of the US federal budget went into the space race at its peak.This was surprising to me, until a friend pointed out that landing rockets on specific parts of the moon requires very similar technology to landing rockets in soviet cities.[1]I wonder how much more enthusiastic the scientists working on Apollo were, with the convenient motivating story of “I’m working towards a great scientific endeavor” vs “I’m working to make sure we can kill millions if we want to”. 2.The field of alignment seems to be increasingly dominated by interpretability. (and obedience[2])This was surprising to me[3], until a friend pointed out that partially opening the black box of NNs is the kind of technology that would scaling labs find new unhobblings by noticing ways in which the internals of their models are being inefficient and having better tools to evaluate capabilities advances.[4]I [...] ---Outline:(00:03) 1.(00:35) 2.(01:20) 3.The original text contained 6 footnotes which were omitted from this narration. --- First published: October 21st, 2024 Source: https://www.lesswrong.com/posts/h4wXMXneTPDEjJ7nv/a-rocket-interpretability-analogy --- Narrated by TYPE III AUDIO.
10/25/24 • 02:30
This summer, I participated in a human challenge trial at the University of Maryland. I spent the days just prior to my 30th birthday sick with shigellosis. What? Why?Dysentery is an acute disease in which pathogens attack the intestine. It is most often caused by the bacteria Shigella. It spreads via the fecal-oral route. It requires an astonishingly low number of pathogens to make a person sick – so it spreads quickly, especially in bad hygienic conditions or anywhere water can get tainted with feces.It kills about 70,000 people a year, 30,000 of whom are children under the age of 5. Almost all of these cases and deaths are among very poor people.The primary mechanism by which dysentery kills people is dehydration. The person loses fluids to diarrhea and for whatever reason (lack of knowledge, energy, water, etc) cannot regain them sufficiently. Shigella bacteria are increasingly [...] ---Outline:(00:15) What? Why?(01:18) The deal with human challenge trials(02:46) Dysentery: it's a modern disease(04:27) Getting ready(07:25) Two days until challenge(10:19) One day before challenge: the age of phage(11:08) Bacteriophage therapy: sending a cat after mice(14:14) Do they work?(16:17) Day 1 of challenge(17:09) The waiting game(18:20) Let's learn about Shigella pathogenesis(23:34) Let's really learn about Shigella pathogenesis(27:03) Out the other side(29:24) AftermathThe original text contained 3 footnotes which were omitted from this narration. The original text contained 2 images which were described by AI. --- First published: October 22nd, 2024 Source: https://www.lesswrong.com/posts/inHiHHGs6YqtvyeKp/i-got-dysentery-so-you-don-t-have-to --- Narrated by TYPE III AUDIO. ---Images from the article:
10/24/24 • 31:39
This is a link post. Part 1: Our Thinking Near and Far1 Abstract/Distant Future Bias2 Abstractly Ideal, Concretely Selfish3 We Add Near, Average Far4 Why We Don't Know What We Want5 We See the Sacred from Afar, to See It Together6 The Future Seems Shiny7 Doubting My Far Mind Disagreement8 Beware the Inside View9 Are Meta Views Outside Views?10 Disagreement Is Near-Far Bias11 Others' Views Are Detail12 Why Be Contrarian?13 On Disagreement, Again14 Rationality Requires Common Priors15 Might Disagreement Fade Like Violence? Biases16 Reject Random Beliefs17 Chase Your Reading18 Against Free Thinkers19 Eventual Futures20 Seen vs. Unseen Biases21 Law as No-Bias Theatre22 Benefit of Doubt = Bias Part 2: Our Motives Signaling23 Decision Theory Remains Neglected24 What Function Music?25 Politics isn't about Policy26 Views [...] ---Outline:(00:07) Part 1: Our Thinking(00:12) Near and Far(00:37) Disagreement(01:04) Biases(01:28) Part 2: Our Motives(01:33) Signaling(02:01) Norms(02:35) Fiction(02:58) The Dreamtime(03:19) Part 3: Our Institutions(03:25) Prediction Markets(03:48) Academia(04:06) Medicine(04:15) Paternalism(04:29) Law(05:21) Part 4: Our Past(05:26) Farmers and Foragers(05:55) History as Exponential Modes(06:09) The Great Filter(06:35) Part 5: Our Future(06:39) Aliens(07:01) UFOs(07:22) The Age of Em(07:44) Artificial Intelligence--- First published: October 20th, 2024 Source: https://www.lesswrong.com/posts/JxsJdBnL2gG5oa2Li/overcoming-bias-anthology --- Narrated by TYPE III AUDIO.
10/23/24 • 08:33
Of all the cognitive tools our ancestors left us, what's best? Society seems to think pretty highly of arithmetic. It's one of the first things we learn as children. So I think it's weird that only a tiny percentage of people seem to know how to actually use arithmetic. Or maybe even understand what arithmetic is for. Why?I think the problem is the idea that arithmetic is about “calculating”. No! Arithmetic is a world-modeling technology. Arguably, it's the best world-modeling technology: It's simple, it's intuitive, and it applies to everything. It allows you to trespass into scientific domains where you don’t belong. It even has an amazing error-catching mechanism built in.One hundred years ago, maybe it was important to learn long division. But the point of long division was to enable you to do world-modeling. Computers don’t make arithmetic obsolete. If anything, they do the opposite. Without [...] ---Outline:(01:17) Chimps(06:18) Big blocks(09:34) More big blocksThe original text contained 5 images which were described by AI. --- First published: October 17th, 2024 Source: https://www.lesswrong.com/posts/r2LojHBs3kriafZWi/arithmetic-is-an-underrated-world-modeling-technology --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or anothe
10/22/24 • 12:20
This post starts out pretty gloomy but ends up with some points that I feel pretty positive about. Day to day, I'm more focussed on the positive points, but awareness of the negative has been crucial to forming my priorities, so I'm going to start with those. It's mostly addressed to the EA community, but is hopefully somewhat of interest to LessWrong and the Alignment Forum as well. My main concernsI think AGI is going to be developed soon, and quickly. Possibly (20%) that's next year, and most likely (80%) before the end of 2029. These are not things you need to believe for yourself in order to understand my view, so no worries if you're not personally convinced of this.(For what it's worth, I did arrive at this view through years of study and research in AI, combined with over a decade of private forecasting practice [...] ---Outline:(00:28) My main concerns(03:41) Extinction by industrial dehumanization(06:00) Successionism as a driver of industrial dehumanization(11:08) My theory of change: confronting successionism with human-specific industries(15:53) How I identified healthcare as the industry most relevant to caring for humans(20:00) But why not just do safety work with big AI labs or governments?(23:22) ConclusionThe original text contained 1 image which was described by AI. --- First published: October 12th, 2024 Source: https://www.lesswrong.com/posts/Kobbt3nQgv3yn29pr/my-theory-of-change-for-working-in-ai-healthtech --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
10/15/24 • 25:15
This post focuses on philosophical objections to Bayesianism as an epistemology. I first explain Bayesianism and some standard objections to it, then lay out my two main objections (inspired by ideas in philosophy of science). A follow-up post will speculate about how to formalize an alternative. Degrees of beliefThe core idea of Bayesianism: we should ideally reason by assigning credences to propositions which represent our degrees of belief that those propositions are true.If that seems like a sufficient characterization to you, you can go ahead and skip to the next section, where I explain my objections to it. But for those who want a more precise description of Bayesianism, and some existing objections to it, I’ll more specifically characterize it in terms of five subclaims. Bayesianism says that we should ideally reason in terms of: Propositions which are either true or false (classical logic)Each of [...] ---Outline:(00:22) Degrees of belief(04:06) Degrees of truth(08:05) Model-based reasoning(13:43) The role of BayesianismThe original text contained 1 image which was described by AI. --- First published: October 6th, 2024 Source: https://www.lesswrong.com/posts/TyusAoBMjYzGN3eZS/why-i-m-not-a-bayesian --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
10/15/24 • 17:47