Workshop on
Natural Environments Tasks and Intelligence

Peter Bex
Bridging the gulf between classical psychophysics and natural behavior
Schepens Eye Research Institute

Although natural behavior involves moving in dynamic three dimensional environments, the visual system has been studied classically with static monochrome images viewed from a fixed position. I will present the results of studies in which controlled methods were developed to study vision with freely-viewed images of real moving environments with naturally occurring distributions of motion, depth, blur and binocular disparity. While some of the findings support classical data, many of the results challenge their relevance to real-world behavior.

David Brainard
Color constancy in natural tasks
David H. Brainard (collaborators: Ana Radonjic and Nicolas Cottaris)

Color constancy is often studied using adjustment procedures. In real life, however, we rarely adjust object colors. Rather, we use color to identify and choose objects. We have begun to study and quantify color constancy using tasks where subjects select colors across changes of illuminant, and where we embed the selection into naturalistic tasks. One example of this broad direction is adapted from the blocks-copying task of Ballard et al. (2005). At the beginning of each trial the subject saw three rendered scenes —the target, the source and the test— presented on a computer display. The target scene contained four colored blocks of different simulated reflectance. Their arrangement varied randomly on each trial. The source scene contained eight blocks: one pair of potential matches for each target block. The degree of similarity of each potential match to the target varied across trials. The test scene contained four identical dark gray blocks. The subjects’ task was to replace the gray blocks with blocks chosen from the source, so as to recreate the arrangement in the target scene as closely as possible. In the illuminant-constant condition, all three scenes were rendered under the same illumination (D65). In the illuminant-changed condition, the simulated illuminations of the source and test were changed to 12000°K. Based on the subjects’ choices, we inferred their perceptual matches for each target block in each condition via a variant of the maximum likelihood difference scaling method. Two main findings were consistent across our four subjects: (1) When the illumination was constant the distance between the target block and its choice-based match was small (2.2 - 3.9 ΔE), supporting the validity of our method. (2) When the illumination changed, the choicebased matches indicated good constancy (constancy indices 0.7 - 0.8). Our results show that color constancy is high when probed using a natural task.

Miguel Eckstein
Rapidly looking at faces: A sensory optimization theory
Miguel P. Eckstein, Matthew F. Peterson, Charles Or
University of California, Santa Barbara

When viewing a human face people first look towards the eyes. A prominent idea holds that these fixation patterns arise solely due to social norms. Here, I propose that this behavior can be explained as an adaptive brain strategy to learn eye movement plans that optimize performance in evolutionarily important perceptual tasks. First, I show that humans move their eyes to points of fixation that maximize perceptual performance determining the identity, gender, and emotional state of a face. These initial optimal points of fixation, which vary moderately across tasks, are correctly predicted by a foveated Bayesian ideal observer (FIO) that integrates information optimally across the face but is constrained by the decrease in resolution and sensitivity from the fovea towards the visual periphery. A model that disregards the foveated nature of the visual system and makes eye movements either to the regions/features with the highest discriminative information or center of the face fails to predict the human fixations. Second, the preferred points of initial fixation do not vary across cultural group (East Asians vs. Caucasians) and are optimal as predicted by a FIO. Third, there are individual differences in the preferred points of fixation with a majority of humans looking just below the eyes while a minority closer to the tip of the nose. The systematic differences in initial points of fixation persist over time and also correspond to individual variations in the points of fixation that maximize perceptual performance. Together, these results illustrate how the brain optimizes initial eye movements to rapidly extract information from faces based on the statistical distribution of the discriminatory information of faces, general properties of the human visual system and individual specific neural characteristics.

Brett Fajen
The visual control of walking over complex terrain
Rensselaer Polytechnic Institute

When humans walk over flat, obstacle-free terrain, they achieve remarkable energetic efficiency by exploiting the passive mechanical forces inherent to bipedal locomotion. However, little is known about how these principles govern the control of walking over complex terrain containing obstacles and irregularly spaced safe footholds. Inspired by the dynamic walking model of human locomotion, we tested the hypothesis that when humans walk over complex terrain, they select footholds that allow them to exploit their inverted-pendulum-like structure to the benefit of efficiency and stability. We developed a novel experimental paradigm in which subjects walked over a field of randomly distributed virtual obstacles or irregularly spaced target footholds that were projected onto the floor by an LCD projector while their movements were recorded using a full-body motion capture system. Walking behavior was compared across different visibility conditions in which the virtual objects did not appear until they fell within a visibility window centered on the moving subject or were initially visible but disappeared before the subject reached them. The findings suggest that visual information about the terrain at the end of a given step must be available during a critical period that takes place in the previous step. By using information during this critical period, walkers can choose footholds that allow them to walk over complex terrain much like they navigate flat, obstacle-free terrain -- that is, by exploiting their inverted-pendulum-like structure. (This research was conducted in collaboration with Jonathan Matthis.)

Ila Fiete
Neural codes and dynamics for representation and memory
The University of Texas at Austin

Neurons and synapses -- the fundamental components of representation and communication in the brain -- are forgetful and noisy. How does the brain overcome these features to perform accurate computation and generate reliable short-term memory over timescales exceeding the biophysical timescales by factors of a hundred to a thousand? I'll discuss classical theoretical ideas on these questions that, because of the complications of probing neural circuits, have remained stuck at the level of analogy. I will describe a cortical circuit for spatial cognition and memory -- the grid cell system-- and show how recent modeling and analysis of the system substantiate the key theoretical principles. Simultaneously, the same cortical system clearly points to coding strategies in the brain that go far beyond the current theoretical understanding of neural codes. I will describe how the grid cell code for location defines a new class of neural population codes that makes possible exponentially strong noise reduction with neuron number, compared to the polynomial performance of most known neural codes.

Roland Fleming
Shape from Orientation Flows
University of Giessen

Estimating the 3D shape of objects in our surroundings is one of the most important functions of vision. However, the information provided by the retina is fundamentally ambiguous because many different combinations of 3D shape, illumination and surface reflectance are consistent with any given image. Despite this, the visual system is extremely adept at estimating 3D shape across a wide range of viewing conditions—from a photo of an abstract sculpture to a line drawing of an imaginary animal—something that no extant machine vision system can do. How the brain achieves this is poorly understood, and remains one of the most significant outstanding challenges in visual neuroscience. Here I will argue that a number of seemingly different 3D shape cues could share some common underlying computational principles. The key insight is that when patterns such as shading or texture are projected from a 3D object into the 2D retinal image, the patterns are systematically distorted in a way that has dramatic and easily- measurable effects on the local image statistics. The distortions create clearly organized patterns of local image orientation ('orientation flows') that are systematically related to properties of the 3D shape. These orientation flows can be reliably detected by populations of simple filters tuned to different image orientations, similar to the response properties of cells in V1. I will outline some of the computational benefits of using orientation fields to estimate 3D shape and show through illusions and experimental measurements how they can predict both successes and failures of human 3D shape perception across a wide range of natural and unnatural viewing conditions. Together these findings suggest that orientation fields could serve as a powerful, 'common currency' for the first stages of 3D shape estimation.

Joshua Gold
Drift diffusion in a dynamic world
Joshua Gold, University of Pennsylvania

A commonly used model used to study perceptual decisions is the drift diffusion model (DDM). Based on a quantitative description of the path of a particle undergoing Brownian motion in the presence of a constant force, the DDM can account for choices and response times (RTs) for a variety of tasks. However, the DDM’s assumption of a “constant force,” equivalent to a stable source of sensory evidence, does not always hold in a dynamic world. We have developed a novel theoretical framework that extends sequential-sampling models like the DDM to include the possibility that the world can change, even while evidence is being gathered for a decision. The model is derived from a Bayesian inference process and thus can make optimal (i.e., accuracy-maximizing) decisions in dynamic environments. I will describe the model, present psychophysical data supporting its basic predictions, and present preliminary data suggesting possible neural mechanisms.

Kari Hoffman
Interactions of visual search and neural activity in the temporal lobe
York University

Neurons in temporal-lobe neocortex and hippocampus respond to complex stimuli like objects, though the responses show considerable variability in the face of naturalistic changes to the 'bare-bones' passive viewing task design. In addition to increased image and goal complexity, natural scene viewing includes - or even necessitates - sequences of visual fixations, raising the issue of whether/how saccadic eye movements modulate activity in these regions. I will describe saccadic modulation of neural activity in the superior temporal sulcus, as well as changes in oscillatory activity in human and macaque hippocampus during visual search.

Dan Kersten
Human cortical responses to natural images: Some puzzles and a few ideas
University of Minnesota

There is a considerable body of research supporting the following picture of visual object and scene perception. Visual decisions are based on a hierarchical organization of cortical areas through which image information is successively transformed from a large number of local feature representations of a small number of types (e.g. edges at many locations) to increasingly lower-dimensional representations of many types (e.g. dog, car, ...). Functional utility requires integrating many local features to reduce ambiguity, while at the same time selecting task-relevant information. Integration is achieved by within-area linking combined with feedforward, across-area, grouping of features. Further, both task-dependent information selection and accurate ambiguity reduction require feedback from higher-level areas to lower-level areas. I will describe several human neuroimaging experiments that are consistent with this picture, but which also produce a number of puzzles. In particular: Why is regional linking of information sometimes so slow? Why does spatial context at times enhance, suppress, or have no effect on fMRI object and scene responses in early visual cortex? Why does the broad spatial pattern of brain activity during object recognition in background clutter depend on how the objects were learned?

Konrad Koerding
What is going on during natural scene search
Northwestern University

I will talk about various efforts trying to computationally understand the relation between natural scene data and tuning properties of neurons in higher level cortices. I will argue that the problem is incredibly complicated and why current approaches may be limited in leading to an actual understanding. I will supplement this data with other data sources to highlight the exciting complexity of neural representations.

Adam Kohn
Spatial and temporal contextual effects in vision
Albert Einstein College of Medicine

Spatial and temporal context strongly influence visual perception and the neural activity which supports it: a stimulus is perceived and processed differently when it is surrounded or preceded by other stimuli. These contextual effects have traditionally been studied separately, using simple laboratory stimuli. I will present recent progress in understanding contextual effects in primary visual cortex (V1), which overcomes these previous limitations. First, we have recently shown that surround suppression in V1—a canonical form of spatial contextual modulation—varies substantially in magnitude for different natural scenes. Some scenes recruit strong surround suppression, while others provide none. This variability eludes traditional models, but it can be explained with a statistically-optimal gating of surround signals by an inference of image homogeneity. Second, we have found that temporal contextual effects are strongly influenced by spatial context: how a neuron adapts depends on how strongly surround suppression is recruited by a particular adapter. Because both excitatory and suppressive influences can adapt, neurons show a wide-range of adaptation behaviors, opening new views on the function of this experience-based plasticity.

Bijan Pesaran
A role for coherent neural activity in coordinating looking and reaching
New York University

In this talk, I will present the dynamics of neuronal activity in two areas of the posterior parietal cortex that support looking and reaching, area LIP and the parietal reach region, PRR. We identify neurons in area LIP and PRR that fire coherently with LFP activity recorded locally in the same area as well as long-range in the other area. We then study the relationship of the firing of the coherent neurons to on-going behavior. The coherently-active neurons appear to play a privileged role in decisions about where to look and reach as well as in how we coordinate look- reach movements. I will discuss how the results are broadly consistent with a role for spike-field coherence in inter-areal communication.

Jonathan Pillow
Statistical encoding and decoding of decision-related signals from
spike trains in parietal cortex
The University of Texas at Austin

A central problem in systems neuroscience is to decipher the neural mechanisms underlying sensory-motor decision-making. The lateral intraparietal area of parietal cortex (LIP) forms a primary component of neural decision-making circuitry in primates, but its exact functional role in is still fiercely debated. In this talk, I will describe recent work on a statistical model of the information carried by LIP neurons in a decision-making task. We take a "data first" approach to the problem by building a descriptive model of the rich statistical structure of spike trains in LIP. First, I will describe an explicit encoding model of LIP responses, which characterizes the representation of various sensory, motor, and reward-related variables in terms of spiking activity. These dependencies are highly variable across neurons, and depend on spike history in a manner inconsistent with a Poisson rate code. Secondly, I will show how the model can be used to perform Bayesian decoding of decisions from LIP spike responses on single trials. I will discuss implications for the decoding of decisions by downstream brain areas and for understanding the decision-related computations performed by groups of neurons in LIP.

Constantin Rothkopf
From natural stimuli to natural tasks and back
Technical University Darmstadt

There is both considerable theoretical as well as empirical work demonstrating that understanding behavior in naturalistic sequential tasks will require that perceptual, decision, and motor systems be considered together. Here I’ll present several studies that utilize a combination of normative computational modeling, classic behavioral methods, and virtual reality in an attempt to describe naturalistic behavior and uncover the relationships that exist between these systems. First, it will be shown that internal costs and benefits can be inferred from human walking in the framework of inverse optimal control, leading to a quantitative description of natural navigation behavior in terms of underlying goals and showing that human subjects may not always carry out the tasks they are instructed to. Secondly, using a virtual environment allowing for the manipulation of the statistical relationship between observation uncertainties and behavior, it will be shown that the constant bearing strategy in interception tasks is not a fixed heuristic but highly adaptive leading to profound deviations from this strategy so as to allow human subjects to increase their interception probability. Finally, it will be shown, that optimal coding ideas cannot explain several properties of V1 neurons on the basis of natural image statistics alone but require taking into account the statistics of the natural environment, the statistics imposed by the visual system, and the statistics of active visual exploration.

Jeff Schall
Computational and neural mechanisms guiding gaze during visual search
Vanderbilt University

This presentation will survey performance, neural and computational findings demonstrating that gaze is guided during visual search through the operation of distinct stages of visual selection and saccade preparation. These stages can be selectively manipulated though target-distractor similarity, stimulus-response mapping rules, and unexpected perturbation of the visual array. Such manipulations indicate that they are instantiated in different neural populations with distinct connectivity and functional properties. Race and accumulator models provide a comprehensive account of the saccade preparation stage and of the conversion of salience evidence into saccade commands.