PHOTOSWAP: Personalized Subject Swapping in Images

1University of California, Santa Cruz, 2Adobe, 3University of California, Santa Barbara

In NeurIPS 2023

Figure 1. Photoswap can effortlessly replace the subject in a source image, which could be either synthetic (first two rows) or real (bottom row), with a personalized subject specified in reference images, while preserving the original subject pose and the composition of the source image.

Abstract

In an era where images and visual content dominate our digital landscape, the ability to manipulate and personalize these images has become a necessity. Envision seamlessly substituting a tabby cat lounging on a sunlit window sill in a photograph with your own playful puppy, all while preserving the original charm and composition of the image. We present Photoswap, a novel approach that enables this immersive image editing experience through personalized subject swapping in existing images. Photoswap first learns the visual concept of the subject from reference images and then swaps it into the target image using pre-trained diffusion models in a training-free manner. We establish that a well-conceptualized visual subject can be seamlessly transferred to any image with appropriate self-attention and cross-attention manipulation, maintaining the pose of the swapped subject and the overall coherence of the image. Comprehensive experiments underscore the efficacy and controllability of Photoswap in personalized subject swapping. Furthermore, Photoswap significantly outperforms baseline methods in human ratings across subject swapping, background preservation, and overall quality, revealing its vast application potential, from entertainment to professional editing.

Controllable Subject Swapping via Training-free Attention Swapping

  • Given several images of a new concept, the diffusion model first learns the concept and converts it into a token.
  • The attention output and attention map in the source image generation process are stored as control source
  • The stored intermediate variables would be transferred to the target image generation process

Figure 2. Photoswap pipeline.

Attention Swap Analysis

  • With consistent steps, swapping the self-attention output provides superior control over the layout, including the subject's gestures and the background details.
  • Excessive swapping could affect the subject's identity, as the new concept introduced through the text prompt might be overshadowed by the swapping of the attention output or attention map.
  • Replacing the attention map for an extensive number of steps can result in an image with significant noise, possibly due to a compatibility issue between the attention map and the v vector.

Figure 3. Results at different swapping steps.

I. Multi-subject swap

Photoswap can disentangle and replace multiple subjects at once.

II. Occluded subject swap

Photoswap can identify the target object while avoiding influencing the non-subject pixels

More Results

From everyday objects to cartoon, the diversity in subject swapping tasks has showcased the versatility and robustness of our framework across different contexts.

Figure 4. More results at various domains.

Qualitative Comparison

We set P2P+DreamBooth as a baseline for Photoswap. The former faces challenges in preserving both the background and the reference subject accurately, while for Photoswap, it is robust to handle various cases.

Figure 5. Comparison with P2P+DreamBooth.

Similarity Control

With proper parameters, we could control the similarity between the generated image and the source image.

Figure 6. Control over similarity.

BibTeX

@misc{gu2023photoswap,
      title={Photoswap: Personalized Subject Swapping in Images}, 
      author={Jing Gu and Yilin Wang and Nanxuan Zhao and Tsu-Jui Fu and Wei Xiong and Qing Liu and Zhifei Zhang and He Zhang and Jianming Zhang and HyunJoon Jung and Xin Eric Wang},
      year={2023},
      eprint={2305.18286},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}