Logo MOSSBench

Is Your Multimodal Language Model Oversensitive to Safe Queries?

1University of California, Los Angeles,
2University of Maryland, College Park, 3The Pennsylvania State University
* Equal contribution
geometric reasoning

(Left) MLLMs exhibit behaviors similar to human cognitive distortions, leading to oversensitive responses where benign queries are perceived as harmful. We discover that oversensitivity prevails among existing MLLMs on Logo MOSSBench.

(Right) Compliance rate of SOTA MLLMs on Logo MOSSBench. Proprietary MLLMs (e.g., Claude 3, Gemini) exhibit more oversensitive behaviors on our dataset.

Introduction

Humans are prone to cognitive distortions — biased thinking patterns that lead to exaggerated responses to specific stimuli, albeit in very different contexts. This paper demonstrates that advanced MLLMs exhibit similar tendencies. While these models are designed to respond queries under safety mechanism, they sometimes reject harmless queries in the presence of certain visual stimuli, disregarding the benign nature of their contexts.

As the initial step in investigating this behavior, we identify three types of stimulus that trigger the oversensitivity of existing MLLMs: Exaggerated Risk, Negated Harm, and Counterintuitive Interpretation. To systematically evaluate MLLMs' oversensitivity to these stimuli, we propose the Multimodal OverSenSitivity Benchmark Logo (MOSSBench). This toolkit consists of 300 manually collected benign multimodal queries, cross-verified by third-party reviewers (AMT).

Empirical studies using Logo MOSSBench on 20 MLLMs reveal several insights: (1). Oversensitivity is prevalent among SOTA MLLMs, with refusal rates reaching up to 76% for harmless queries. (2). Safer models are more oversensitive: increasing safety may inadvertently raise caution and conservatism in the model's responses. (3). Different types of stimuli tend to cause errors at specific stages — perception, intent reasoning, and safety decision-making — in the response process of MLLMs. These findings highlight the need for refined safety mechanisms that balance caution with contextually appropriate responses, improving the reliability of MLLMs in real-world applications.

Oversensitivity Stimuli

Visual stimuli of model oversensitivity

Through empirical investigation, we concluded three types of visual stimuli that trigger the oversensitivity of existing MLLMs: Exaggerated Risk, Negated Harm, and Counterintuitive Interpretation.

Logo MOSSBench Dataset

Overview

Logo MOSSBench is the first benchmark for evaluating the oversensitivity of MLLMs systematically. It consists of 300 samples, with different scenarios that contain Exaggerated Risk, Negated Harm, and Counterintuitive Interpretation as visual stimuli.

Collecting oversensitivity samples for MLLMs is challenging due to the intricate interplay of multiple modalities and the abstract nature of the three stimuli types. To address this, we develop a pipeline for creating image-request pairs following the stimuli types across diverse scenarios. This pipeline employs a two-step generation process: candidate generation and candidate filtering.

  • candidate generation: we leveraged LLMs to generate diverse scenarios by providing the LLMs with a carefully crafted prompt, incorporating several oversensitivity samples as exemplars.
  • candidate filtering: reviewers from Amazon Mechanical Turk evaluate the harmfulness and naturalness of our samples.
You can download the dataset on Hugging Face Dataset.

data-overview

Key statistics of Logo MOSSBench.

data-composition

Distribution of Logo MOSSBench.

Visualization

Experiment Results

Results on SOTA MLLMs

Oversensitivity and Compliance Examples

Cognitive Distortion in MLLMs Examples

Explorer

Explore the outputs of each model on Logo MOSSBench

BibTeX

@misc{li2024mossbenchmultimodallanguagemodel,
      title={MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?}, 
      author={Xirui Li and Hengguang Zhou and Ruochen Wang and Tianyi Zhou and Minhao Cheng and Cho-Jui Hsieh},
      year={2024},
      eprint={2406.17806},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.17806}, 
}