Schedule
Subscribe to our calendar for the most recent schedule.
November 27th 3:00pm CET - Jacek Golebiowski
Title: AutoML without data: training small models with 10 examples
Abstract:
Organizations need AI that works on their messy data—logs, images, notes, reports—but can’t afford to send sensitive information to the cloud. Small Language Models (SLMs) run locally on your own hardware, keeping data private while delivering fast results at a fraction of the cost. Traditionally, building custom AI for production tasks requires expert teams and months of development. AutoML addresses part of this by automating model selection and training, but practitioners still face the bottleneck of manually labeling thousands of training examples. This is why ChatGPT succeeded: it only asks you to describe what you want, not provide labeled datasets. We believe custom models need to match this experience, so in this session we present our model training pipeline that extends AutoML to data preparation itself. You define the problem, and a larger AI “teacher” automatically generates and refines training examples to create a specialized “student” model tailored to tasks like request triage, API chat interfaces, and data transformations. We’ll dive deeper into the structure of the generated data and demonstrate how to effectively navigate data generation by controlling the latent variables that define each datapoint.
December 11th 3:00pm CET - Robert Tjarko Lange
Title: ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution
Abstract:
We introduce ShinkaEvolve: a new open-source framework leveraging large language models (LLMs) to advance scientific discovery with state-of-the-art performance and unprecedented efficiency. Recent advances in scaling inference time compute of LLMs have enabled significant progress in generalized scientific discovery. These approaches rely on evolutionary agentic harnesses that leverage LLMs as mutation operators to generate candidate solutions. However, current code evolution methods suffer from critical limitations: they are sample inefficient, requiring thousands of samples to identify effective solutions, and remain closed-source, hindering broad adoption and extension. ShinkaEvolve addresses these limitations, introducing three key innovations: a parent sampling technique balancing exploration and exploitation, code novelty rejection-sampling for efficient search space exploration, and a bandit-based LLM ensemble selection strategy. We evaluate ShinkaEvolve across diverse tasks, demonstrating consistent improvements in sample efficiency and solution quality. ShinkaEvolve discovers a new state-of-the-art circle packing solution using only 150 samples, designs high-performing agentic harnesses for AIME mathematical reasoning tasks, identifies improvements to ALE-Bench competitive programming solutions, and discovers novel mixture-of-expert load balancing loss functions that illuminate the space of optimization strategies. Our results demonstrate that ShinkaEvolve achieves broad applicability with exceptional sample efficiency. Finally, ShinkaEvolve recently was able to support human programmers (team Unagi) in winning the 2025 ICFP Competitive Programming Contest by automatically optimizing SAT solver encodings.
Januar 15th 3:00pm CET - Andrei Paleyes
Title: TBA
Abstract: TBA
Januar 22nd 3:00pm CET - Pieter Gijsbers
Title: TBA
Abstract: TBA