In Hermes, as is the case in most embedded sensing applications, data drives the investigation. Pedar, a pre-existing system used by kinesthesiologists and physical therapists, is extremely expensive and captures a lot of data (100+ sensors per insole as seen in picture above). In order to understand if all 100 sensors are relevant and if any can be discarded we must turn to data reduction methods.
There are several key characteristics that make this data interesting:
To better understand data it can be helpful to first look for structure within the data. Unsupervised learning and clustering methods discussed in the literature (I highly recommend “Pattern Classification” by Duda et al) can be used as exploratory data analysis. A few are:
Our approach is to apply tree-based clustering, with nodes representing sensors and edges representing “similarity” between sensor readings. We choose correlation as our measure of “similarity”.
We have made several data design-decisions which we have yet to justify; we will do so as we explore the data some more.
First we treat each sensor as a time-series of measurements and store them in matrix M:
[s1(t), s1(t-1), s2(t-2)....] [s2(t), s2(t-1), s2(t-2)....]
We calculate the correlation between each of sensors by using corr command in MATLAB:
corr(M)
This produces the correlation matrix depicted on right side of the figure at the top of the page. Note that the regions of high correlation (lighter pixels) are generally close to the diagonal, since those sensors are spatially closer.
Next we create a minimum spanning tree over the correlation matrix, nodes as sensors and correlation as edge weights. But to make this work we must invert the similarity metric, ie sensors that are highly correlated will have lower wights. Assuming we have inverted the correlations, using MATLAB we can apply the following function:
graphminspantree(L)
Given the spanning tree, to isolate clusters of high correlation we remove links from the spanning tree that are above a threshold. Given our min spanning tree MST we can apply this filter like so:
MST(MST>thresh)=0;
One way of determining the threshold to use is by graphing the threshold versus the number of created clusters. Generally a knee in the graph will indicate a good value for the threshold as seen here:
Once the threshold has been applied, can generate group assignments. We can see the effectiveness of the clustering by graphing the grouped time-series:
Some unwanted sources of clusters are areas which receive very little applied pressure. These regions/points can be identified by the small variance in measurement values. Therefore, another threshold can be applied to filter out these points. The full processing cycle is demonstrated by the following sequence of figures:
Code and sample data demonstrating the analysis. Coming soon!