Anchor Objects Drive Realism While Diagnostic Objects Drive Categorization in GAN Generated Scenes

Background Introduction

In the human visual system, the understanding and navigation of natural scenes are exceptionally outstanding in terms of both complexity and efficiency. This process requires the transformation of incoming sensory information into visual features ranging from low-level to high-level, such as edges, object parts, and objects themselves, further reflecting the statistical characteristics of object co-occurrence in real-world scenes. Two important object attributes introduce the concepts of “anchor objects” and “diagnostic objects”. Anchor objects refer to high-frequency co-occurring objects that can predict their location and identity, while diagnostic objects are those that can predict the broader context of the scene (i.e., scene category).

A convergent study by Aylin Kallmayer and Melissa L.-H. Võ from the Department of Psychology at Goethe University Frankfurt, published in the journal “Communications Psychology”, explores the role of anchor objects and diagnostic objects in human visual processing.

Research Source and Background

This article was completed by two authors and published in the journal “Communications Psychology” in 2024. The article examines how the visual system utilizes these object attributes in understanding two dimensions of scenes - authenticity and categorization. To conduct this research, the authors used images generated by Generative Adversarial Networks (GANs), which varied in their performance on realism and categorization.

Research Process

In this paper, the research is mainly divided into two parts: Experiment 1 explores scene realism, and Experiment 2 explores scene categorization. The specific process is as follows:

Experiment 1: Exploring Realism

  1. Participants and Design:

    • 50 participants (36 females, 14 males, mean age 20.74 years).
    • The experiment used 150 generated images and 150 real photographs, covering five indoor scene categories: bedroom, conference room, dining room, kitchen, and living room.
  2. Experimental Procedure:

    • Participants observed images for 50 milliseconds or 500 milliseconds and judged the realism of the images (real or generated).
  3. Data Collection and Analysis:

    • ROC curves and AUC scores were used to evaluate participants’ performance.
    • (Generalized) Linear Mixed Effects Models ((G)LMMs) were used for data analysis.

Experiment 2: Exploring Categorization

  1. Participants and Design:

    • 44 participants (30 females, 14 males, mean age 23.2 years).
    • Used the same generated images as Experiment 1 and some real photographs.
  2. Experimental Procedure:

    • Participants performed a five-choice scene categorization task, with scene categories including bedroom, conference room, dining room, kitchen, and living room.
  3. Data Collection and Analysis:

    • (Generalized) Linear Mixed Effects Models ((G)LMMs) and ROC/AUC were used for data analysis.

Research Results

Experiment 1: Exploring Realism

Under the 50-millisecond condition, participants’ performance was only slightly above chance (AUC = 0.6); while under the 500-millisecond condition, performance significantly improved (AUC = 0.92, P < 0.05). Regression analysis found that high-level visual features and anchor object attributes significantly influenced the realism judgment of images. Specific data are as follows: - High-level features explained up to 60% of the variance in responses and ratings (highest difference value bin10 = 0.53, P < 0.05). - Anchor object attributes significantly influenced realism ratings regardless of image type, display time, and diagnosticity (β = 0.18, SE = 0.06).

Experiment 2: Exploring Categorization

Categorization accuracy was mainly explained by high-level visual features and diagnostic object attributes. Details are as follows: - Categorization accuracy for generated and real images under the 50-millisecond condition (highest difference value for generated images bin10 = 0.18, P < 0.05). - Realism as a continuous predictor significantly influenced categorization accuracy (β = 0.48, SE = 0.16). - Diagnostic object attributes significantly predicted categorization accuracy (β = 0.53, SE = 0.16).

Research Conclusions

This study demonstrates that anchor objects and diagnostic objects play different roles in different dimensions of scene understanding. Specifically: - Anchor objects enhance scene realism by influencing the distribution of low-level to high-level visual features. - Diagnostic objects mainly improve scene categorization accuracy by increasing scene-specific category features.

Research Highlights

One important finding of this study is that generated scenes appear more realistic in short exposures but are more easily distinguished in longer exposures. This suggests that anchor objects play an important role in rapid scene understanding. On the other hand, diagnostic objects have a significant role in improving categorization accuracy, even in the presence of noise in the images.

Significance and Value

The research results indicate that the human visual system can flexibly cope with disturbances at various levels of visual features, thereby maintaining efficiency in complex scene processing. This provides an important theoretical foundation and practical significance for further exploring the complexity of human visual cognition. At the application level, understanding the different functions of anchor objects and diagnostic objects can help improve the performance of computer vision systems and artificial intelligence in complex visual tasks.

Future research can utilize images generated by Generative Adversarial Networks (GANs) to further explore more complex dimensions of visual processing and cognition. In particular, combining with Deep Neural Networks (DNNs) holds promise for revealing more about the operating mechanisms of the human visual system.