One wrong move is enough to shake our confidence in a robot - at least on a conscious level. But beneath that immediate reaction, our automatic impressions are often more stable, holding steady until the evidence clearly signals that something meaningful has changed.

Study: The role of diagnosticity in judging robot competence. Image Credit: eamesBot/Shutterstock.com
In an article published in Nature, researchers examined how people judge robot competence differently depending on whether they report explicit (conscious) or implicit (automatic) impressions.
Across nine studies involving 3735 participants, the findings showed that explicit judgments are more strongly influenced by a single inconsistent “oddball” behavior. This pattern generalized across industrial robots, surgical robots, and self-driving cars, and was partially explained by how diagnostic, or genuinely informative, people perceived the evidence to be.
Background
Evaluating competence rarely involves perfectly consistent information. Whether judging humans or machines, we typically encounter mixed evidence; strong performance punctuated by occasional errors. The key question is how people integrate that inconsistency.
Dual-process theories suggest that explicit impressions update quickly in response to surprising or inconsistent behaviors, while implicit impressions are more stable and resistant to change. From this perspective, a dissociation between explicit and implicit judgments reflects fundamental cognitive differences.
An alternative explanation focuses less on separate systems and more on the quality of the evidence itself.
This framework argues that the apparent dissociation may stem from differences in perceived diagnosticity. That is, whether a behavior seems truly predictive of underlying competence. If an “oddball” action appears weakly informative, implicit impressions may ignore it. If it appears strongly diagnostic, both explicit and implicit impressions should update.
Previous research left several open questions. Much of it focused on moral judgments rather than competence, relied on single experimental paradigms, and used measures that were not structurally aligned. As a result, it was unclear whether observed dissociations reflected genuine psychological processes or methodological artifacts.
To address these limitations, the present research tested competence judgments across multiple real-world robotics domains, aligned measurement tools more carefully, and directly manipulated evidence diagnosticity. Across nine experiments, a consistent pattern emerged: dissociations appeared when oddball behaviors were weakly diagnostic, but both explicit and implicit impressions updated when the evidence clearly signaled meaningful change.
Experimental Methods and Procedures
Experiments 1a–c: Generalizing Across Robotics Contexts
The first set of experiments tested whether the explicit–implicit dissociation extended beyond simple game-based paradigms into real-world robotics settings. Participants observed robots performing five trials in three domains: industrial manufacturing (1a), surgical assistance (1b), and autonomous vehicles (1c). Trial sequences varied in consistency: consistently competent, inconsistently competent, inconsistently incompetent, or consistently incompetent.
Across domains, explicit impressions were uniquely sensitive to oddball behaviors. A single competent action increased ratings in an otherwise incompetent sequence, while a single error lowered ratings in an otherwise competent sequence. Statistically, this appeared as an interaction between majority behavior and consistency.
Implicit impressions showed a different pattern. They responded primarily to overall performance trends, displaying a main effect of majority behavior but little sensitivity to isolated inconsistencies.
These findings established the basic dissociation but left open whether it reflected genuine cognitive differences or methodological variation between measures.
Experiments 2–4: Aligning Measurement Tools
To address potential measurement artifacts, the researchers progressively aligned direct and indirect evaluation methods.
Experiment 2 returned to a tic-tac-toe paradigm and introduced a modified direct affect misattribution procedure (AMP). This “direct AMP” was procedurally identical to the indirect AMP but included explicit evaluation instructions. Both traditional explicit measures and the direct AMP were sensitive to oddball behaviors, though traditional explicit ratings showed greater sensitivity. Response scale differences (binary versus ordinal) did not meaningfully alter results.
Experiment 3 used a within-subject 2×2 design (competent/incompetent majority × consistent/inconsistent) with mouse-input ordinal responses. Direct measures (traditional explicit ratings and direct AMP) were strongly correlated with each other but only weakly correlated with indirect AMP impressions. A significant three-way interaction confirmed that direct measures incorporated oddball behaviors more readily.
Experiment 4 further increased structural alignment by removing trial-by-trial prompts and using key-input responses for both measures. The dissociation replicated: directly measured impressions were influenced by inconsistent evidence, whereas indirect impressions remained comparatively insensitive.
Together, these experiments suggest that the dissociation reflects differences between intentional and unintentional evaluation rather than superficial procedural differences between tools.
Experiments 5–7: Manipulating Diagnosticity
The final set of studies directly tested whether diagnosticity drives updating.
Experiment 5 used a blocked updating design. Participants first observed four competent trials, followed by one to four incompetent trials. Explicit impressions updated linearly with each additional piece of inconsistent evidence. Implicit impressions, however, updated nonlinearly. They shifted only once inconsistent evidence accumulated to a sufficient level, indicating a diagnosticity threshold. Crucially, when a single error was framed as highly diagnostic, implicit impressions did shift.
Experiment 6 replicated this single-error condition. Once again, implicit impressions updated only when contextual information increased the perceived diagnostic value of the error.
Experiment 7 manipulated diagnosticity directly through framing. Participants viewed four competent trials followed by one error described as either recent (high diagnosticity) or outdated (low diagnosticity). Implicit impressions updated only in the high-diagnosticity condition. Explicit impressions, by contrast, shifted regardless of whether the evidence appeared weakly or strongly diagnostic.
This final set of findings provided clear support for the diagnosticity account: implicit impressions are not inherently rigid. Rather, they require stronger evidence before incorporating inconsistency.
Results and Discussion
Across nine experiments, a coherent picture emerged.
Explicit competence impressions readily incorporated isolated inconsistent behaviors, even when those behaviors appeared weakly predictive. Implicit impressions reflected broader performance patterns and required evidence that seemed genuinely informative before updating.
When diagnosticity was high, both systems changed. When diagnosticity was low, only explicit judgments shifted.
These results reconcile competing theoretical perspectives. The dissociation does not appear to reflect fundamental differences in learning capacity. Instead, implicit impressions operate with higher evidentiary thresholds. They are selective, not inflexible.
Conclusion
When people evaluate robot competence, explicit and implicit impressions diverge primarily because of how diagnostic the evidence appears, not because they rely on fundamentally different learning systems.
Across industrial, surgical, and autonomous vehicle contexts, a single “oddball” behavior influenced explicit judgments even when weakly informative. Implicit impressions shifted only when that behavior signaled something meaningful about underlying capability. Structurally aligned measures confirmed that this divergence reflects differences between intentional and unintentional evaluation rather than measurement artifacts.
As automation becomes more integrated into high-stakes environments, understanding how competence judgments form, and when conscious and automatic evaluations diverge, will be essential for building calibrated trust in human–robot collaboration.
Journal Reference
Surdel, N., & Ferguson, M. J. (2026). The role of diagnosticity in judging robot competence. Scientific Reports. DOI:10.1038/s41598-026-35375-y. https://www.nature.com/articles/s41598-026-35375-y
Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.