In this tutorial, we explore advanced computer vision techniques using TorchVision’s v2 transforms, modern augmentation strategies, and powerful training enhancements. We walk through the process of building an augmentation pipeline, applying MixUp and CutMix, designing a modern CNN with attention, and implementing a robust training loop. By running everything seamlessly in Google Colab, we position…
Acknowledgements This work was developed by the Gemini Robotics team: Abbas Abdolmaleki, Saminda Abeyruwan, Joshua Ainslie, Jean-Baptiste Alayrac, Montserrat Gonzalez Arenas, Ashwin Balakrishna, Nathan Batchelor, Alex Bewley, Jeff Bingham, Michael Bloesch, Konstantinos Bousmalis, Philemon Brakel, Anthony Brohan, Thomas Buschmann, Arunkumar Byravan, Serkan Cabi, Ken Caluwaerts, Federico Casarini, Christine Chan, Oscar Chang, London Chappellet-Volpini, Jose Enrique…
What Do We Mean by “Physical AI”?
Artificial intelligence in robotics is not just a matter of clever algorithms. Robots operate in the physical world, and their intelligence emerges from the co-design of body and brain. Physical AI describes this integration, where materials, actuation, sensing, and computation shape how learning policies function. The term was…
Image by Author | Canva
# Introduction
Finding real-world datasets can be challenging because they are often private (protected), incomplete (missing features), or expensive (behind a paywall). Synthetic datasets can solve these problems by letting you generate the data based on your project needs.
Synthetic data is artificially generated information that mimics real-life…
Computer vision moved fast in 2025: new multimodal backbones, larger open datasets, and tighter model–systems integration. Practitioners need sources that publish rigorously, link code and benchmarks, and track deployment patterns—not marketing posts. This list prioritizes primary research hubs, lab blogs, and production-oriented engineering outlets with consistent update cadence. Use it to monitor SOTA shifts, grab…
We’re expanding our risk domains and refining our risk assessment process. AI breakthroughs are transforming our everyday lives, from advancing mathematics, biology and astronomy to realizing the potential of personalized education. As we build increasingly powerful AI models, we’re committed to responsibly developing our technologies and taking an evidence-based approach to staying ahead of emerging…
In this tutorial, we walk step by step through using Hugging Face’s LeRobot library to train and evaluate a behavior-cloning policy on the PushT dataset. We begin by setting up the environment in Google Colab, installing the required dependencies, and loading the dataset through LeRobot’s unified API. We then design a compact visuomotor policy that…
Efficient and accountable financial management is nonnegotiable in today’s K-12 landscape. Outdated, traditional software packages can’t keep pace with the complex demands of modern schools. They must invest in a reliable, integrated finance system that unifies day-to-day operations, promoting efficiency and transparency. Discover six top-rated SaaS financial management tools for K-12 schools.
Fund Management &…
Image by Editor | ChatGPT
# Introduction
Ready for a practical walkthrough with little to no code involved, depending on the approach you choose? This tutorial shows how to tie together two formidable tools — OpenAI's GPT models and the Airtable cloud-based database — to prototype a simple, toy-sized retrieval-augmented generation (RAG) system.…
How do you create 3D datasets to train AI for Robotics without expensive traditional approaches? A team of researchers from NVIDIA released “ViPE: Video Pose Engine for 3D Geometric Perception” bringing a key improvement for Spatial AI. It addresses the central, agonizing bottleneck that has constrained the field of 3D computer vision for years.
ViPE…