Skip to content Skip to sidebar Skip to footer

Salesforce AI Introduces FOFPred: A Language-Driven Future Optical Flow Prediction Framework that Enables Improved Robot Control and Video Generation

Salesforce AI research team present FOFPred, a language driven future optical flow prediction framework that connects large vision language models with diffusion transformers for dense motion forecasting in control and video generation settings. FOFPred takes one or more images and a natural language instruction such as ‘moving the bottle from right to left’ and predicts…

Read More

Black Forest Labs Releases FLUX.2 [klein]: Compact Flow Models for Interactive Visual Intelligence

Black Forest Labs releases FLUX.2 [klein], a compact image model family that targets interactive visual intelligence on consumer hardware. FLUX.2 [klein] extends the FLUX.2 line with sub second generation and editing, a unified architecture for text to image and image to image, and deployment options that range from local GPUs to cloud APIs, while keeping…

Read More

AlphaFold: Five Years of Impact

Increasing speed of discovery Cyril Zipfel, professor of Molecular & Cellular Plant Physiology at the University of Zurich and Sainsbury Lab, saw research timelines shrink drastically. They used AlphaFold alongside comparative genomics to better understand how plants perceive changes in their environment, paving the way for more resilient crops. AlphaFold has been cited in more…

Read More

New video generation model updates

Today, Veo is getting more expressive, with improvements that help you create more fun, creative, high-quality videos based on ingredient images, built directly for the mobile format. We’re excited to bring new creative possibilities for everyone from casual storytellers to professional filmmakers. We’re releasing: Improvements to Veo 3.1 Ingredients to Video, our capability that lets…

Read More

Evaluating OCR-to-Markdown Systems Is Fundamentally Broken (and Why That’s Hard to Fix)

Evaluating OCR systems that convert PDFs or document images into Markdown is far more complex than it appears. Unlike plain text OCR, OCR-to-Markdown requires models to recover content, layout, reading order, and representation choices simultaneously. Today’s benchmarks attempt to score this with a mix of string matching, heuristic alignment, and format-specific rules—but in practice, these…

Read More