Structured representations in the visual cortex

Pietro Berkes, Richard Turner, and Maneesh Sahani

Draft paper: Berkes, P., Richard, T. and Sahani, M. (2007). Complex and simple cells form a structured representation of identities and attributes. (.pdf)
Poster presented at Cosyne 2007: .pdf


Many computational models have offered functional accounts of the organization of the sensory cortex. However, most have lacked the structure needed to extract the high-order causes of the sensory input. Here we present a generative model of visual input based on the duality between the identity of image features and their attributes. The presence of a feature is encoded by a binary identity variable, while its appearance is modeled by a multidimensional manifold, parametrized by a set of attribute variables. When applied to natural image sequences, the model finds attribute manifolds spanned by localized Gabor wavelets with similar positions, orientations, and frequencies, but different phases. Thus the inferred activity of attribute variables after learning resembles that of simple cells in the primary visual cortex. Identity variables indicate the presence of a feature irrespective of its position on the underlying manifold, making them phase-insensitive, like complex cells. The dimensionality of the learnt manifolds and the relationships between the wavelets correspond closely to anatomical and functional observations regarding simple and complex cells. Thus, this generative model makes explicit an interpretation of complex and simple cells as elements in the segmentation of a visual scene into independent features, with a parametrization of their episodic appearance. It also suggest a possible role for them in a hierarchical system that extracts progressively higher-level entities, starting from simpler, low-level features.

Sketch of the generative model
Figure 1: Illustration of the basic structure of the model. Each object or feature is represented by a binary variable c_{t,i} that indicates its presence or absence and is modeled by a manifold formed by the set of its episodic poses, defined by a mapping \Phi_i and parametrized by variables s_{t,ij} that are interpreted as attributes of the object or feature. The episodic poses are multiplied by the state of the identity variables, so that absent objects give no contribution, and then combined through a function f to generate the observations y_{t}. (Click on the image to see it in original size)