A quote from Jack Clark

The most surprising part of DeepSeek-R1 is that it only takes ~800k samples of 'good' RL reasoning to convert other models into RL-reasoners. Now that DeepSeek-R1 is available people will be able to refine samples out of it to convert any other model into an RL reasoner.

— Jack Clark

Posted 28th January 2025 at 6:46 am

Simon Willison’s Weblog

Recent articles