Podcasts TecnologiaMachine Learning Street Talk (MLST)

Ouve este podcast gratuitamente na aplicação:

radio.pt

Sleeptimer

Despertar

Guardar rádio

Descarrega gratuitamente na App Store

Machine Learning Street Talk (MLST)

Tecnologia

Último episódio

Episódios Disponíveis

5 de 214

Prof. Randall Balestriero - LLMs without pretraining and SSL
Randall Balestriero joins the show to discuss some counterintuitive findings in AI. He shares research showing that huge language models, even when started from scratch (randomly initialized) without massive pre-training, can learn specific tasks like sentiment analysis surprisingly well, train stably, and avoid severe overfitting, sometimes matching the performance of costly pre-trained models. This raises questions about when giant pre-training efforts are truly worth it.He also talks about how self-supervised learning (where models learn from data structure itself) and traditional supervised learning (using labeled data) are fundamentally similar, allowing researchers to apply decades of supervised learning theory to improve newer self-supervised methods.Finally, Randall touches on fairness in AI models used for Earth data (like climate prediction), revealing that these models can be biased, performing poorly in specific locations like islands or coastlines even if they seem accurate overall, which has important implications for policy decisions based on this data.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT + SHOWNOTES:https://www.dropbox.com/scl/fi/n7yev71nsjso71jyjz1fy/RANDALLNEURIPS.pdf?rlkey=0dn4injp1sc4ts8njwf3wfmxv&dl=0TOC:1. Model Training Efficiency and Scale [00:00:00] 1.1 Training Stability of Large Models on Small Datasets [00:04:09] 1.2 Pre-training vs Random Initialization Performance Comparison [00:07:58] 1.3 Task-Specific Models vs General LLMs Efficiency2. Learning Paradigms and Data Distribution [00:10:35] 2.1 Fair Language Model Paradox and Token Frequency Issues [00:12:02] 2.2 Pre-training vs Single-task Learning Spectrum [00:16:04] 2.3 Theoretical Equivalence of Supervised and Self-supervised Learning [00:19:40] 2.4 Self-Supervised Learning and Supervised Learning Relationships [00:21:25] 2.5 SSL Objectives and Heavy-tailed Data Distribution Challenges3. Geographic Representation in ML Systems [00:25:20] 3.1 Geographic Bias in Earth Data Models and Neural Representations [00:28:10] 3.2 Mathematical Limitations and Model Improvements [00:30:24] 3.3 Data Quality and Geographic Bias in ML DatasetsREFS:[00:01:40] Research on training large language models from scratch on small datasets, Randall Balestriero et al.https://openreview.net/forum?id=wYGBWOjq1Q[00:10:35] The Fair Language Model Paradox (2024), Andrea Pinto, Tomer Galanti, Randall Balestrierohttps://arxiv.org/abs/2410.11985[00:12:20] Muppet: Massive Multi-task Representations with Pre-Finetuning (2021), Armen Aghajanyan et al.https://arxiv.org/abs/2101.11038[00:14:30] Dissociating language and thought in large language models (2023), Kyle Mahowald et al.https://arxiv.org/abs/2301.06627[00:16:05] The Birth of Self-Supervised Learning: A Supervised Theory, Randall Balestriero et al.https://openreview.net/forum?id=NhYAjAAdQT[00:21:25] VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning, Adrien Bardes, Jean Ponce, Yann LeCunhttps://arxiv.org/abs/2105.04906[00:25:20] No Location Left Behind: Measuring and Improving the Fairness of Implicit Representations for Earth Data (2025), Daniel Cai, Randall Balestriero, et al.https://arxiv.org/abs/2502.06831[00:33:45] Mark Ibrahim et al.'s work on geographic bias in computer vision datasets, Mark Ibrahimhttps://arxiv.org/pdf/2304.12210
--------
34:30
How Machines Learn to Ignore the Noise (Kevin Ellis + Zenna Tavares)
Prof. Kevin Ellis and Dr. Zenna Tavares talk about making AI smarter, like humans. They want AI to learn from just a little bit of information by actively trying things out, not just by looking at tons of data.They discuss two main ways AI can "think": one way is like following specific rules or steps (like a computer program), and the other is more intuitive, like guessing based on patterns (like modern AI often does). They found combining both methods works well for solving complex puzzles like ARC.A key idea is "compositionality" - building big ideas from small ones, like LEGOs. This is powerful but can also be overwhelming. Another important idea is "abstraction" - understanding things simply, without getting lost in details, and knowing there are different levels of understanding.Ultimately, they believe the best AI will need to explore, experiment, and build models of the world, much like humans do when learning something new.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT:https://www.dropbox.com/scl/fi/3ngggvhb3tnemw879er5y/BASIS.pdf?rlkey=lr2zbj3317mex1q5l0c2rsk0h&dl=0 Zenna Tavares:http://www.zenna.org/Kevin Ellis:https://www.cs.cornell.edu/~ellisk/TOC:1. Compositionality and Learning Foundations [00:00:00] 1.1 Compositional Search and Learning Challenges [00:03:55] 1.2 Bayesian Learning and World Models [00:12:05] 1.3 Programming Languages and Compositionality Trade-offs [00:15:35] 1.4 Inductive vs Transductive Approaches in AI Systems2. Neural-Symbolic Program Synthesis [00:27:20] 2.1 Integration of LLMs with Traditional Programming and Meta-Programming [00:30:43] 2.2 Wake-Sleep Learning and DreamCoder Architecture [00:38:26] 2.3 Program Synthesis from Interactions and Hidden State Inference [00:41:36] 2.4 Abstraction Mechanisms and Resource Rationality [00:48:38] 2.5 Inductive Biases and Causal Abstraction in AI Systems3. Abstract Reasoning Systems [00:52:10] 3.1 Abstract Concepts and Grid-Based Transformations in ARC [00:56:08] 3.2 Induction vs Transduction Approaches in Abstract Reasoning [00:59:12] 3.3 ARC Limitations and Interactive Learning Extensions [01:06:30] 3.4 Wake-Sleep Program Learning and Hybrid Approaches [01:11:37] 3.5 Project MARA and Future Research DirectionsREFS:[00:00:25] DreamCoder, Kevin Ellis et al.https://arxiv.org/abs/2006.08381[00:01:10] Mind Your Step, Ryan Liu et al.https://arxiv.org/abs/2410.21333[00:06:05] Bayesian inference, Griffiths, T. L., Kemp, C., & Tenenbaum, J. B.https://psycnet.apa.org/record/2008-06911-003[00:13:00] Induction and Transduction, Wen-Ding Li, Zenna Tavares, Yewen Pu, Kevin Ellishttps://arxiv.org/abs/2411.02272[00:23:15] Neurosymbolic AI, Garcez, Artur d'Avila et al.https://arxiv.org/abs/2012.05876[00:33:50] Induction and Transduction (II), Wen-Ding Li, Kevin Ellis et al.https://arxiv.org/abs/2411.02272[00:38:35] ARC, François Chollethttps://arxiv.org/abs/1911.01547[00:39:20] Causal Reactive Programs, Ria Das, Joshua B. Tenenbaum, Armando Solar-Lezama, Zenna Tavareshttp://www.zenna.org/publications/autumn2022.pdf[00:42:50] MuZero, Julian Schrittwieser et al.http://arxiv.org/pdf/1911.08265[00:43:20] VisualPredicator, Yichao Lianghttps://arxiv.org/abs/2410.23156[00:48:55] Bayesian models of cognition, Joshua B. Tenenbaumhttps://mitpress.mit.edu/9780262049412/bayesian-models-of-cognition/[00:49:30] The Bitter Lesson, Rich Suttonhttp://www.incompleteideas.net/IncIdeas/BitterLesson.html[01:06:35] Program induction, Kevin Ellis, Wen-Ding Lihttps://arxiv.org/pdf/2411.02272[01:06:50] DreamCoder (II), Kevin Ellis et al.https://arxiv.org/abs/2006.08381[01:11:55] Project MARA, Zenna Tavares, Kevin Ellishttps://www.basis.ai/blog/mara/
--------
1:16:55
Eiso Kant (CTO poolside) - Superhuman Coding Is Coming!
Eiso Kant, CTO of poolside AI, discusses the company's approach to building frontier AI foundation models, particularly focused on software development. Their unique strategy is reinforcement learning from code execution feedback which is an important axis for scaling AI capabilities beyond just increasing model size or data volume. Kant predicts human-level AI in knowledge work could be achieved within 18-36 months, outlining poolside's vision to dramatically increase software development productivity and accessibility. SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***Eiso Kant:https://x.com/eisokanthttps://poolside.ai/TRANSCRIPT:https://www.dropbox.com/scl/fi/szepl6taqziyqie9wgmk9/poolside.pdf?rlkey=iqar7dcwshyrpeoz0xa76k422&dl=0TOC:1. Foundation Models and AI Strategy [00:00:00] 1.1 Foundation Models and Timeline Predictions for AI Development [00:02:55] 1.2 Poolside AI's Corporate History and Strategic Vision [00:06:48] 1.3 Foundation Models vs Enterprise Customization Trade-offs2. Reinforcement Learning and Model Economics [00:15:42] 2.1 Reinforcement Learning and Code Execution Feedback Approaches [00:22:06] 2.2 Model Economics and Experimental Optimization3. Enterprise AI Implementation [00:25:20] 3.1 Poolside's Enterprise Deployment Strategy and Infrastructure [00:26:00] 3.2 Enterprise-First Business Model and Market Focus [00:27:05] 3.3 Foundation Models and AGI Development Approach [00:29:24] 3.4 DeepSeek Case Study and Infrastructure Requirements4. LLM Architecture and Performance [00:30:15] 4.1 Distributed Training and Hardware Architecture Optimization [00:33:01] 4.2 Model Scaling Strategies and Chinchilla Optimality Trade-offs [00:36:04] 4.3 Emergent Reasoning and Model Architecture Comparisons [00:43:26] 4.4 Balancing Creativity and Determinism in AI Models [00:50:01] 4.5 AI-Assisted Software Development Evolution5. AI Systems Engineering and Scalability [00:58:31] 5.1 Enterprise AI Productivity and Implementation Challenges [00:58:40] 5.2 Low-Code Solutions and Enterprise Hiring Trends [01:01:25] 5.3 Distributed Systems and Engineering Complexity [01:01:50] 5.4 GenAI Architecture and Scalability Patterns [01:01:55] 5.5 Scaling Limitations and Architectural Patterns in AI Code Generation6. AI Safety and Future Capabilities [01:06:23] 6.1 Semantic Understanding and Language Model Reasoning Approaches [01:12:42] 6.2 Model Interpretability and Safety Considerations in AI Systems [01:16:27] 6.3 AI vs Human Capabilities in Software Development [01:33:45] 6.4 Enterprise Deployment and Security ArchitectureCORE REFS (see shownotes for URLs/more refs):[00:15:45] Research demonstrating how training on model-generated content leads to distribution collapse in AI models, Ilia Shumailov et al. (Key finding on synthetic data risk)[00:20:05] Foundational paper introducing Word2Vec for computing word vector representations, Tomas Mikolov et al. (Seminal NLP technique)[00:22:15] OpenAI O3 model's breakthrough performance on ARC Prize Challenge, OpenAI (Significant AI reasoning benchmark achievement)[00:22:40] Seminal paper proposing a formal definition of intelligence as skill-acquisition efficiency, François Chollet (Influential AI definition/philosophy)[00:30:30] Technical documentation of DeepSeek's V3 model architecture and capabilities, DeepSeek AI (Details on a major new model)[00:34:30] Foundational paper establishing optimal scaling laws for LLM training, Jordan Hoffmann et al. (Key paper on LLM scaling)[00:45:45] Seminal essay arguing that scaling computation consistently trumps human-engineered solutions in AI, Richard S. Sutton (Influential "Bitter Lesson" perspective)<trunc - see PDF shownotes>
--------
1:36:28
The Compendium - Connor Leahy and Gabriel Alfour
Connor Leahy and Gabriel Alfour, AI researchers from Conjecture and authors of "The Compendium," joinus for a critical discussion centered on Artificial Superintelligence (ASI) safety and governance. Drawing from their comprehensive analysis in "The Compendium," they articulate a stark warning about the existential risks inherent in uncontrolled AI development, framing it through the lens of "intelligence domination"—where a sufficiently advanced AI could subordinate humanity, much like humans dominate less intelligent species.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT + REFS + NOTES:https://www.dropbox.com/scl/fi/p86l75y4o2ii40df5t7no/Compendium.pdf?rlkey=tukczgf3flw133sr9rgss0pnj&dl=0https://www.thecompendium.ai/https://en.wikipedia.org/wiki/Connor_Leahyhttps://www.conjecture.dev/abouthttps://substack.com/@gabeccTOC:1. AI Intelligence and Safety Fundamentals [00:00:00] 1.1 Understanding Intelligence and AI Capabilities [00:06:20] 1.2 Emergence of Intelligence and Regulatory Challenges [00:10:18] 1.3 Human vs Animal Intelligence Debate [00:18:00] 1.4 AI Regulation and Risk Assessment Approaches [00:26:14] 1.5 Competing AI Development Ideologies2. Economic and Social Impact [00:29:10] 2.1 Labor Market Disruption and Post-Scarcity Scenarios [00:32:40] 2.2 Institutional Frameworks and Tech Power Dynamics [00:37:40] 2.3 Ethical Frameworks and AI Governance Debates [00:40:52] 2.4 AI Alignment Evolution and Technical Challenges3. Technical Governance Framework [00:55:07] 3.1 Three Levels of AI Safety: Alignment, Corrigibility, and Boundedness [00:55:30] 3.2 Challenges of AI System Corrigibility and Constitutional Models [00:57:35] 3.3 Limitations of Current Boundedness Approaches [00:59:11] 3.4 Abstract Governance Concepts and Policy Solutions4. Democratic Implementation and Coordination [00:59:20] 4.1 Governance Design and Measurement Challenges [01:00:10] 4.2 Democratic Institutions and Experimental Governance [01:14:10] 4.3 Political Engagement and AI Safety Advocacy [01:25:30] 4.4 Practical AI Safety Measures and International CoordinationCORE REFS:[00:01:45] The Compendium (2023), Leahy et al.https://pdf.thecompendium.ai/the_compendium.pdf[00:06:50] Geoffrey Hinton Leaves Google, BBC Newshttps://www.bbc.com/news/world-us-canada-65452940[00:10:00] ARC-AGI, Chollethttps://arcprize.org/arc-agi[00:13:25] A Brief History of Intelligence, Bennetthttps://www.amazon.com/Brief-History-Intelligence-Humans-Breakthroughs/dp/0063286343[00:25:35] Statement on AI Risk, Center for AI Safetyhttps://www.safe.ai/work/statement-on-ai-risk[00:26:15] Machines of Love and Grace, Amodeihttps://darioamodei.com/machines-of-loving-grace[00:26:35] The Techno-Optimist Manifesto, Andreessenhttps://a16z.com/the-techno-optimist-manifesto/[00:31:55] Techno-Feudalism, Varoufakishttps://www.amazon.co.uk/Technofeudalism-Killed-Capitalism-Yanis-Varoufakis/dp/1847927270[00:42:40] Introducing Superalignment, OpenAIhttps://openai.com/index/introducing-superalignment/[00:47:20] Three Laws of Robotics, Asimovhttps://www.britannica.com/topic/Three-Laws-of-Robotics[00:50:00] Symbolic AI (GOFAI), Haugelandhttps://en.wikipedia.org/wiki/Symbolic_artificial_intelligence[00:52:30] Intent Alignment, Christianohttps://www.alignmentforum.org/posts/HEZgGBZTpT4Bov7nH/mapping-the-conceptual-territory-in-ai-existential-safety[00:55:10] Large Language Model Alignment: A Survey, Jiang et al.http://arxiv.org/pdf/2309.15025[00:55:40] Constitutional Checks and Balances, Bokhttps://plato.stanford.edu/entries/montesquieu/<trunc, see PDF>
--------
1:37:10
ARC Prize v2 Launch! (Francois Chollet and Mike Knoop)
We are joined by Francois Chollet and Mike Knoop, to launch the new version of the ARC prize! In version 2, the challenges have been calibrated with humans such that at least 2 humans could solve each task in a reasonable task, but also adversarially selected so that frontier reasoning models can't solve them. The best LLMs today get negligible performance on this challenge. https://arcprize.org/SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT:https://www.dropbox.com/scl/fi/0v9o8xcpppdwnkntj59oi/ARCv2.pdf?rlkey=luqb6f141976vra6zdtptv5uj&dl=0TOC:1. ARC v2 Core Design & Objectives [00:00:00] 1.1 ARC v2 Launch and Benchmark Architecture [00:03:16] 1.2 Test-Time Optimization and AGI Assessment [00:06:24] 1.3 Human-AI Capability Analysis [00:13:02] 1.4 OpenAI o3 Initial Performance Results2. ARC Technical Evolution [00:17:20] 2.1 ARC-v1 to ARC-v2 Design Improvements [00:21:12] 2.2 Human Validation Methodology [00:26:05] 2.3 Task Design and Gaming Prevention [00:29:11] 2.4 Intelligence Measurement Framework3. O3 Performance & Future Challenges [00:38:50] 3.1 O3 Comprehensive Performance Analysis [00:43:40] 3.2 System Limitations and Failure Modes [00:49:30] 3.3 Program Synthesis Applications [00:53:00] 3.4 Future Development RoadmapREFS:[00:00:15] On the Measure of Intelligence, François Chollethttps://arxiv.org/abs/1911.01547[00:06:45] ARC Prize Foundation, François Chollet, Mike Knoophttps://arcprize.org/[00:12:50] OpenAI o3 model performance on ARC v1, ARC Prize Teamhttps://arcprize.org/blog/oai-o3-pub-breakthrough[00:18:30] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jason Wei et al.https://arxiv.org/abs/2201.11903[00:21:45] ARC-v2 benchmark tasks, Mike Knoophttps://arcprize.org/blog/introducing-arc-agi-public-leaderboard[00:26:05] ARC Prize 2024: Technical Report, Francois Chollet et al.https://arxiv.org/html/2412.04604v2[00:32:45] ARC Prize 2024 Technical Report, Francois Chollet, Mike Knoop, Gregory Kamradthttps://arxiv.org/abs/2412.04604[00:48:55] The Bitter Lesson, Rich Suttonhttp://www.incompleteideas.net/IncIdeas/BitterLesson.html[00:53:30] Decoding strategies in neural text generation, Sina Zarrießhttps://www.mdpi.com/2078-2489/12/9/355/pdf
--------
54:15

Mais podcasts de Tecnologia

Sobre Machine Learning Street Talk (MLST)

Welcome! We engage in fascinating discussions with pre-eminent figures in the AI field. Our flagship show covers current affairs in AI, cognitive science, neuroscience and philosophy of mind with in-depth analysis. Our approach is unrivalled in terms of scope and rigour – we believe in intellectual diversity in AI, and we touch on all of the main ideas in the field with the hype surgically removed. MLST is run by Tim Scarfe, Ph.D (https://www.linkedin.com/in/ecsquizor/) and features regular appearances from MIT Doctor of Philosophy Keith Duggar (https://www.linkedin.com/in/dr-keith-duggar/).

Sítio Web de podcast

Tecnologia