Meta features a new AI model that can label and track any object in a video as it moves. Segment Anything Model 2 (SAM 2) expands the capabilities of its predecessor, SAM, which was limited to images, and opens up new opportunities for video editing and analysis.
SAM 2’s real-time segmentation is a major technical advancement that could prove huge. It demonstrates how AI can process moving images and distinguish between on-screen elements even when they move in or out of frame.
Segmentation is the term for how software determines which pixels in an image belong to which objects. An AI assistant that can do this makes complex image processing or editing much easier. That was the breakthrough of Meta’s original SAM. SAM has helped segment sonar images of coral reefs, analyzed satellite images to aid in disaster relief efforts, and even analyzed cell phone images to detect skin cancer.
SAM 2 expands video capabilities, which is no small feat and wouldn’t have been possible until very recently. As part of SAM 2’s debut, Meta shared a database of 50,000 videos created to train the model. That’s on top of the 100,000 other videos Meta said it uses. Along with all the training data, real-time video segmentation requires a significant amount of computing power, so while SAM 2 is open and free right now, it likely won’t stay that way forever.
Segment success
With SAM 2, video editors would be able to isolate and manipulate objects within a scene more easily than with the limited capabilities of current editing software, and far beyond manually adjusting each frame. Meta anticipates that SAM 2 will also revolutionize interactive video. Users would be able to select and manipulate objects within live video or virtual spaces thanks to the AI model.
Meta believes that SAM 2 could also play a crucial role in the development and training of machine vision systems, particularly in autonomous vehicles. Accurate and efficient object tracking is essential for these systems to safely interpret and navigate their environments. SAM 2’s capabilities could speed up the process of annotating visual data, providing high-quality training data for these AI systems.
Much of the excitement about AI video revolves around generating videos from text prompts. Models like Sora, Runway, and OpenAI’s Google Veo are getting a lot of attention for a reason. Still, the kind of editing capabilities offered by SAM 2 could play an even bigger role in bringing AI to video creation.
And while Meta may have a head start now, other AI video developers are interested in producing their own version. For example, Google’s recent research has led to video summarization and object recognition features that it’s testing on YouTube. Adobe and its Firefly AI tools also focus on photo and video editing, and include content-aware fill and automatic reframing features.