How do different sources of information arising from different modalities interact to control where we look? To answer this question with respect to real-world operational conditions we presented natural images and spatially localized sounds in (V)isual, Audiovisual (AV) and (A)uditory conditions and measured subjects' eye-movements. Our results demonstrate that eye-movements in AV conditions are spatially biased towards the part of the image corresponding to the sound source. Interestingly, this spatial bias is dependent on the probability of a given image region to be fixated (saliency) in the V condition. This indicates that fixation behaviour during the AV conditions is the result of an integration process. Regression analysis shows that this integration is best accounted for by a linear combination of unimodal saliencies.