Data Reduction

 Pedar sensor map  Single User, 5 trials

Intro

In Hermes, as is the case in most embedded sensing applications, data drives the investigation. Pedar, a pre-existing system used by kinesthesiologists and physical therapists, is extremely expensive and captures a lot of data (100+ sensors per insole as seen in picture above). In order to understand if all 100 sensors are relevant and if any can be discarded we must turn to data reduction methods.

There are several key characteristics that make this data interesting:

  • The data is spatial
  • It is also temporal since each sensor can be viewed as a time-series sequence of measurements
  • There are lots of measurments (100+)

Insight

To better understand data it can be helpful to first look for structure within the data. Unsupervised learning and clustering methods discussed in the literature (I highly recommend “Pattern Classification” by Duda et al) can be used as exploratory data analysis. A few are:

  • Principal Component Analysis (PCA)
  • Multi-dimensional scaling (MDS)
  • Clustering (all sorts, k-means, hierarchical, tree-based )

Our approach is to apply tree-based clustering, with nodes representing sensors and edges representing “similarity” between sensor readings. We choose correlation as our measure of “similarity”.

We have made several data design-decisions which we have yet to justify; we will do so as we explore the data some more.

Exploration

First we treat each sensor as a time-series of measurements and store them in matrix M:

[s1(t), s1(t-1), s2(t-2)....]
[s2(t), s2(t-1), s2(t-2)....]

We calculate the correlation between each of sensors by using corr command in MATLAB:

corr(M)

This produces the correlation matrix depicted on right side of the figure at the top of the page. Note that the regions of high correlation (lighter pixels) are generally close to the diagonal, since those sensors are spatially closer.

Next we create a minimum spanning tree over the correlation matrix, nodes as sensors and correlation as edge weights. But to make this work we must invert the similarity metric, ie sensors that are highly correlated will have lower wights. Assuming we have inverted the correlations, using MATLAB we can apply the following function:

graphminspantree(L)

Given the spanning tree, to isolate clusters of high correlation we remove links from the spanning tree that are above a threshold. Given our min spanning tree MST we can apply this filter like so:

 MST(MST>thresh)=0;

One way of determining the threshold to use is by graphing the threshold versus the number of created clusters. Generally a knee in the graph will indicate a good value for the threshold as seen here:

 spanning tree thresholding

Once the threshold has been applied, can generate group assignments. We can see the effectiveness of the clustering by graphing the grouped time-series:

 Single User, 1 trial time series

Some unwanted sources of clusters are areas which receive very little applied pressure. These regions/points can be identified by the small variance in measurement values. Therefore, another threshold can be applied to filter out these points. The full processing cycle is demonstrated by the following sequence of figures:

 processing sequence

  • (A) creation of minimal spanning tree using correlation as the similarity measure, hence the weights associated with each link
  • (B) removing weak links (low weight edges) from the graph producing disconnected graphs representing potential clusters
  • (C) variance is used to determine if the potential clusters contribute significantly to the overall pressure signal
  • (D) final cluster set

Code

Code and sample data demonstrating the analysis. Coming soon!

 
matlab/smart_shoe.txt · Last modified: 2009/10/12 01:51 by shaun
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki