A person wearing a Superman outfit stands tall, arms crossed over the chest. The person's chin is lifted, eyes locked on the horizon with a determined gaze. A confident smirk appears as the cape flutters slightly in the breeze.
We present Concat-ID, a unified framework for identity-preserving video generation. Concat-ID employs Variational Autoencoders to extract image features, which are concatenated with video latents along the sequence dimension, leveraging solely 3D self-attention mechanisms without the need for additional modules. A novel cross-video pairing strategy and a multi-stage training regimen are introduced to balance identity consistency and facial editability while enhancing video naturalness. Extensive experiments demonstrate Concat-ID's superiority over existing methods in both single and multi-identity generation, as well as its seamless scalability to multi-subject scenarios, including virtual try-on and background-controllable generation. Concat-ID establishes a new benchmark for identity-preserving video synthesis, providing a versatile and scalable solution for a wide range of applications.
We utilize a VAE to extract image latents from reference images and concatenate them at the end of the video latents along the sequence dimension. Concat-ID relies solely on 3D self-attention, which are commonly present in state-of-the-art video generation models, without introducing additional modules and parameters.
To progressively balance identity consistency and facial editability, we construct three types of image-video pairs: pre-training pairs for learning conditional mapping, cross-video pairs for improving face editability, and trade-off pairs for finely balancing identity consistency and face editability.
Owing to the simplicity and efficiency of the model architecture, data construction, and training strategy, Concat-ID can seamlessly scale to multi-identity and multi-subject scenarios.
The woman, bundled up in a vibrant scarf and cozy beanie, exudes warmth with a cheerful smile, as she playfully toss a handful of snow into the crisp winter air. The soft glow of the setting sun casts long shadows across the snowy landscape, creating an enchanting atmosphere that highlights her joyful demeanor.
A person sits alone on a park bench, head bowed slightly, eyes glistening as a single tear slides down the cheek. The lips quiver, pressed into a thin line, while the shoulders tremble with the weight of an unspoken sorrow. The cool breeze carries away the muffled sigh.
A street artist in a worn-out denim jacket and a colorful bandana stands before a weathered wall, holding a can of spray paint. The head tilts slightly as the person examines the work—a vibrant bird taking shape. A focused expression crosses the face, lips pressed together in concentration.
A person in a sleeveless workout top stands in a gym, wiping sweat from the forehead. The head is slightly tilted down, eyes locked onto the weights ahead. The expression is focused, determination clear in the furrowed brow.
A lively man with twinkling eyes and a warm smile stands amidst the vibrant chaos of a bustling market, waving cheerfully at the camera. The golden hue of the setting sun casts gentle shadows around, highlighting the colorful stalls brimming with fresh produce and fragrant spices. As he gesture animatedly, vendors shout his offers, and the air is filled with the rich aroma of street food, creating an atmosphere of energetic charm and lively camaraderie.
A man, dressed in casual attire, sits by a sunlit window sketching in a notebook, pausing occasionally to look up with a playful grin. As sunlight filters through the heer curtains, casting soft shadows across the room, the man twirls a pencil absentmindedly before adding quick strokes to the page. The scene is alive with a sense of relaxed creativity, as the warm afternoon glow bathes the space in a gentle, inviting atmosphere.
Two people sit on a park bench, each holding a cup of hot coffee. One leans back, head tilted up, eyes closed, enjoying the warmth. The other turns slightly, smiling as they speak, gesturing with one hand while the steam from their cup rises into the crisp morning air.
Two people sit on a park bench, each holding a cup of hot coffee. One leans back, head tilted up, eyes closed, enjoying the warmth. The other turns slightly, smiling as they speak, gesturing with one hand while the steam from their cup rises into the crisp morning air.
Two scientists stand in a lab, examining a set of test tubes. One tilts the head, carefully analyzing the liquid inside, while the other writes notes on a clipboard, brow furrowed in concentration. The hum of equipment surrounds them.
Two musicians sit on a rooftop at sunset, each holding an instrument. One strums a guitar, head bobbing slightly with the rhythm, while the other plays a violin, eyes closed in concentration. The golden light casts long shadows as their music blends into the evening air.
Two coworkers stand in an office hallway, deep in discussion. One crosses arms, head tilted slightly, listening intently. The other gestures with one hand, explaining something with enthusiasm. Their conversation echoes softly through the quiet workspace.
Three people are sitting around a wooden table, bathed in soft, warm light. The person in the middle wears a light-colored sweater, speaking with enthusiasm, eyes bright with excitement. The person on the left, dressed in a dark jacket, leans forward slightly, elbows resting on the table, a thoughtful smile on their lips. The person on the right wears a blue T-shirt, head tilted slightly, expression calm with a smile. On the table, there are a few open books, a cup of tea, and a fountain pen.
A person wearing a shirt takes a casual walk through a tree-lined avenue. The scent of freshly cut grass fills the air, and the sun casts long shadows on the sidewalk. The head tilts slightly downward, a content expression settling on the face.
A person wearing a shirt stands on a street, smiling brightly. Eyes sparkle with happiness, and the head tilts slightly to the side.
A person wears a casual shirt, reading a book. The person's head is slightly tilted, eyes focused on the pages, lips curved into a faint smile. Sunlight streams through the tree, casting a warm glow on the face, creating a peaceful atmosphere.
A person wears a casual shirt, reading a book. The person's head is slightly tilted, eyes focused on the pages, lips curved into a faint smile. Sunlight streams through the window, casting a warm glow on the face, creating a peaceful atmosphere.
A person wearing a Superman outfit stands tall, arms crossed over the chest. The person's chin is lifted, eyes locked on the horizon with a determined gaze. A confident smirk appears as the cape flutters slightly in the breeze.
A street artist in a worn-out denim jacket and a colorful bandana stands before a weathered wall, holding a can of spray paint. The head tilts slightly as the person examines the work—a vibrant bird taking shape. A focused expression crosses the face, lips pressed together in concentration.
A person sits alone on a park bench, head bowed slightly, eyes glistening as a single tear slides down the cheek. The lips quiver, pressed into a thin line, while the shoulders tremble with the weight of an unspoken sorrow. The cool breeze carries away the muffled sigh.
Two people sit on a park bench, each holding a cup of hot coffee. One leans back, head tilted up, eyes closed, enjoying the warmth. The other turns slightly, smiling as they speak, gesturing with one hand while the steam from their cup rises into the crisp morning air.
Two friends sit at a train station, waiting for their ride. One stretches, arms raised above the head, letting out a relaxed sigh, while the other leans forward, elbows on knees, checking the time with a focused gaze. The distant sound of an approaching train fills the air.