Current Events



  • 8/13/2012: Neurithmic Systems awarded ONR Research Contract (Code 341: Computational Neuroscience Program) for applying TEMECOR, a super-efficient spatiotemporal probabilistic inference engine based on sparse distributed representations (SDR) to video event recognition/understanding.

Research Statement

  1. Developing scalable on-line-learning probabilistic reasoning models based on sparse distributed representations (SDR).
  2. SDR provides massive algorithmic speedup for both learning and best-match retrieval of spatial or spatiotemporal patterns. In fact, my algorithm, TEMECOR, developed in the early 90's has constant, "O(1)" (read as "order one"), time complexity for both storage (learning) and best-match retrieval (recognition, inference). This was demonstrated in my 1996 Thesis and described in this 2006 talk at the Redwood Neuroscience Institute. To this date, no other published information processing method achieves this level of performance!
  3. Virtually all graphical probabilistic models to date, e.g., dynamic Bayes nets, HMMs, use localist representations. Influential cortically-inspired recognition models such as Neocognitron and HMAX also use localist representations. This figure shows what a localist HMAX-like model would look like if generalized to use SDR in all representational fields at all hierarchical levels. Also see the NICE Workshop and CNS 2013 links at left.
  4. In evaluating a model's scalability to the massive problems increasingly referred to as "Big data" in media/press, O(1) time complexity trumps all other considerations, including any speedups due to machine parallelism. For example, O(1) time complexity search means that regardless of how big a database grows, the time it takes to find the closest matching item remains constant.
  5. Apart from the technological significance of an algorithm with O(1) time complexity storage and best-match retrieval, in particular, for spatiotemporal patterns (sequences), I believe TEMECOR captures the essential computation carried in neocortex, as described in my 2010 Frontiers in Neuroscience paper.

Spatiotemporal Visual Processing and Video Event Recognition

hierarchical 3D trace

Hierarchical video event recognition model: The input level (L0) can be viewed as primitive retina [or perhaps a thalamic (LGN) representation]. With each input frame, a subset of the L1 macrocolumns (V1 hypercolumns) are activated (because their composing cells receive some threshold level of bottom-up (U) activation from L0): these macrocolumns are the larger, faintly-outlined red squares. Similarly, U-signals from L1 to L2 and L2 to L3 activate subsets of macrocolumns at those levels (though in this instance, L3 consists of one big macrocolumn). All macrocolumns consist of minicolumns (faint pink squares). In general, the minicolumns are winner-take-all (i.e., one cell active per frame), so that an active macrocolumn has one active cell (tiny black square) in each of its minicolumns (i.e., sparse distributed code). However, when a macrocolumn first becomes active after a quiescent period, multiple (for now, two) cells become active in each of that macrocolumn's minicolumns. This is part of a novel (patent-pending) technique for chunking temporal sequences (in other words, for compression), called "Overcoding and Paring". Essentailly Hebbian learning (with a form of synaptic tagging) occurs between co-active codes in neighboring levels on each frame, and within any given macrocolumn, from the code(s) previously active in that and neighboring macrocolumns onto the subsequently active code(s) in that local region of macrocolumns. Essential to this new means of chunking and learning/recognizing sequences in general is the fact that cell (and therefore, code) activation duration (Phi) increases with level, causing higher-level codes to associate with sequences of lower-level codes. If you look closely, you can see that codes at higher levels update less frequently than codes at lower levels.

This animation was created from a live simulation on a very short snippet (10 frames) from the Hollywood2 database. I hope to automate the process of creating these 3D multi-level animations in the near future so that it is easy to produce animations running for hundreds/thousands of frames. This model instance contained a total of about 36 million bottom-up (U), top-down (D), and horizontal (H) synapses (weights) across all levels. More realistic instances will have ~10 levels and ~5-10 billion synapses; models with up to 7 levels and 1.6 billion synpases are currenlty being tested. When 3D animation production is automated, dynamic viewing of activation and learning of the full synaptic outsets/insets of each cell will be enabled (only hinted at now by the yellow and green U projective fields, which suggest how converging signals combine in influencing which codes get chosen). Amongst other things, this will allow viewing/analysis of the spatiotemporal receptive/projective fields that emerge in the model. I would also like to recast in terms of hexagonal grids.

The event depicted in this video snippet is a man walking away from a group of people. That is a fairly high-level semantic event class. It has several component objects/entities each which engages in several component events (spatiotemporal patterns) and it fundamentally takes a nontrivial amount of time (really, more than depicted in this short example) for such an event to unambiguously transpire. It is the (sparse distributed) codes at higher levels, which by fiat, have longer activation durations (persistences) than lower level codes, which will come to represent such higher semantic level event classes.

This example shows the output of a live simulation. But note that it shows only the set of codes that become active (in various macrocolumns on various frames) in response to a few frames of one instance of the semantically high-level event, "walking". Thus, this particular hierarchical (multi-scale), spatiotemporal pattern of codes (and the synaptic increases that occur between the neurons comprising the codes) does not constitute a code for the class "walking", nor for "man walking", nor for "man walking away from a group", etc. That is, what you see here is an episodic memory trace of a particular event, not a semantic memory trace (i.e., representation of a category). To show that this spatiotemporal pattern of activation represented the spatiotemporal category, "walking", we would have to show that the same pattern of activation, or more likely, the same pattern over one or more of the highest levels, becomes active while processing many different instances of walking, or of walking away from a group, etc. The model is currently entering testing in which it will be exposed to large numbers of video snippets while undergoing unsupervised learning.

The hope is that we will be able to demonstrate, initially, that lower-level actions, such as moving/swinging one's arm or leg (as in walking), or simply lifting one's foot, etc. can be learned with an appreciable amount of generality (invariance). As robust learning of such lower level concepts is demonstrated, we intend to move to progressively higher-level concepts, scaffolding on the previously learned concepts/features, just as (presumably) children learn to recognize objects and events.

eXTReMe Tracker