Test-Time Adaptation: the key to reasoning with DL (Mohamed Osman) – Machine Learning Street Talk (MLST) – PC.ST

Listen to the episode on your favorite platforms:

Mohamed Osman joins to discuss MindsAI's highest scoring entry to the ARC challenge 2024 and the paradigm of test-time fine-tuning. They explore how the team, now part of Tufa Labs in Zurich, achieved state-of-the-art results using a combination of pre-training techniques, a unique meta-learning strategy, and an ensemble voting mechanism. Mohamed emphasizes the importance of raw data input and flexibility of the network.

SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

TRANSCRIPT + REFS:

https://www.dropbox.com/scl/fi/jeavyqidsjzjgjgd7ns7h/MoFInal.pdf?rlkey=cjjmo7rgtenxrr3b46nk6yq2e&dl=0

Mohamed Osman (Tufa Labs)

https://x.com/MohamedOsmanML

Jack Cole (Tufa Labs)

https://x.com/MindsAI_Jack

How and why deep learning for ARC paper:

https://github.com/MohamedOsman1998/deep-learning-for-arc/blob/main/deep_learning_for_arc.pdf

TOC:

1. Abstract Reasoning Foundations

[] 1.1 Test-Time Fine-Tuning and ARC Challenge Overview

[] 1.2 Neural Networks vs Programmatic Approaches to Reasoning

[] 1.3 Code-Based Learning and Meta-Model Architecture

[] 1.4 Technical Implementation with Long T5 Model

2. ARC Solution Architectures

[] 2.1 Test-Time Tuning and Voting Methods for ARC Solutions

[] 2.2 Model Generalization and Function Generation Challenges

[] 2.3 Input Representation and VLM Limitations

[] 2.4 Architecture Innovation and Cross-Modal Integration

[] 2.5 Future of ARC Challenge and Program Synthesis Approaches

3. Advanced Systems Integration

[] 3.1 DreamCoder Evolution and LLM Integration

[] 3.2 MindsAI Team Progress and Acquisition by Tufa Labs

[] 3.3 ARC v2 Development and Performance Scaling

[] 3.4 Intelligence Benchmarks and Transformer Limitations

[] 3.5 Neural Architecture Optimization and Processing Distribution

REFS:

[] Original ARC challenge paper, François Chollet

https://arxiv.org/abs/1911.01547

[] DreamCoder, Kevin Ellis et al.

https://arxiv.org/abs/2006.08381

[] Deep Learning with Python, François Chollet

https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438

[] Deep Learning with Python, François Chollet

https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438

[] Influence of pretraining data for reasoning, Laura Ruis

https://arxiv.org/abs/2411.12580

[] Latent Program Networks, Clement Bonnet

https://arxiv.org/html/2411.08706v1

[] T5, Colin Raffel et al.

https://arxiv.org/abs/1910.10683

[] Combining Induction and Transduction for Abstract Reasoning, Wen-Ding Li, Kevin Ellis et al.

https://arxiv.org/abs/2411.02272

[] Six finger problem, Chen et al.

https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_SpatialVLM_Endowing_Vision-Language_Models_with_Spatial_Reasoning_Capabilities_CVPR_2024_paper.pdf

[] DeepSeek-R1-Distill-Llama, DeepSeek AI

https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B

[] ARC Prize 2024 Technical Report, François Chollet et al.

https://arxiv.org/html/2412.04604v2

[] LLM-Guided Compositional Program Synthesis, Wen-Ding Li and Kevin Ellis

https://arxiv.org/html/2503.15540

[] Abstraction and Reasoning Corpus, François Chollet

https://github.com/fchollet/ARC-AGI

[] O3 breakthrough on ARC-AGI, OpenAI

https://arcprize.org/

[] ConceptARC Benchmark, Arseny Moskvichev, Melanie Mitchell

https://arxiv.org/abs/2305.07141

[] Mixtape: Breaking the Softmax Bottleneck Efficiently, Yang, Zhilin and Dai, Zihang and Salakhutdinov, Ruslan and Cohen, William W.

http://papers.neurips.cc/paper/9723-mixtape-breaking-the-softmax-bottleneck-efficiently.pdf

Smart link

Smart linkhttps://pc.st/e/3MihvHE~h0X

Official site

Official sitehttps://podcasters.spotify.com/pod/show/machinelearningstreettalk

Auto-open

Auto-openhttps://pc.st/e/3MihvHE~h0X?a

Add podcast to the siteEmbed Podcast

Test-Time Adaptation: the key to reasoning with DL (Mohamed Osman)

01:03:36