Skip to content
adehad edited this page Jun 17, 2019 · 6 revisions

Config Settings

Documenting the variables used in the options struct, ops. Each variable here is assigned using: ops.<variableName>

(see StandardConfig_MOVEME.m).

Overview of how KiloSort uses the settings

Data is processed in batches, NT samples long, pre-processed via: filtering, median subtraction (common average referencing) and data whitening across channels (removes correlated noise - e.g. due to far away neurons). The whitened data is then scaled down, division by scaleproc.

KiloSort thresholds the scaled down pre-processed data with, spkTh, to identify initial spikes, with nt0 samples on either side of the minimum of the spike. For each threshold crossing +-loc_range(1) samples and +-loc_range(2) channels are checked to find the minimum. These spikes are clustered on the 7 dimensional PC space, wPCA, to identify potential templates. Nfilt number of templates are initially found, these templates are then run through your data in batches (think convolution). During a potential match the degree of similarity between the current match and the template is compared with a threshold, Th. The match is compared to the mean of the waves, for lower lam values the current match is allowed to be scaled more to match the template; in other words large lam values force waves to be closer to the mean of the current template's waveforms. A certain amount of noise/uncertainty is allowed, larger values of momentum(1) allow for more noise/variabilitiy in the waveforms for a given template.

After a set number of batches, 400, templates are re-evaluated. If the distance between clusters is less than mergeT these clusters and hence the templates are averaged together, if the score of the split between clusters is greater than splitT the cluster is marked for splitting. Splitting is performed after merging, and contains a hidden test for number of spikes to allow overwriting small clusters [?].

Parallel Matching Pursuit occurs during the final pass of the data. This approach looks for the best matching template and subtracts it from the waveform, the residual waveform is then compared with other templates in a similar fashion until the amount of explained varience below a threshold.

Table of Contents

  1. Nfilt
  2. nNeighPC
  3. nNeigh
  4. whitening
  5. nSkipCov
  6. whiteningRange
  7. chanMap
  8. criterionNoiseChannels
  9. Nrank
  10. nfullpasses
  11. maxFR
  12. ntbuff
  13. scaleproc
  14. NT
  15. Th
  16. lam
  17. nannealpasses
  18. momentum
  19. shuffle_clusters
  20. mergeT
  21. splitT
  22. nt0
  23. nt0min
  24. initialize
  25. spkTh
  26. loc_range
  27. long_range
  28. maskMaxChannels
  29. crit
  30. nFiltMax
  31. dd
  32. wPCA
  33. fracse
  34. epu
  35. ForceMaxRAMforDat

spkTh - Threshold for Identifying Spikes to Make Templates

If initialize='fromData', KiloSort uses this threshold to identify a set of sample waveforms, these are then projected onto the PCs and are clustered using k-means. Each cluster is used to generate a 'template', this becomes the set of initialisation templates that KiloSort subsequently uses.

Reference: #122

Th - Threshold for Comparing Spike to Template

KiloSort projects a candidate spike waveform onto each template to assess how much of the variance of that spike in the waveform can be explained by the template. This threshold allows sets how much of the variance needs to be explained to consider the waveform part of the template. In other words, the threshold is for how much variance is allowed around the template, a small value indicates a large amount of variance is allowed - allowing this template's cluster to accumulate more waveforms that vary from the template. There are 3 elements. The first 2 elements are used to create a linspace() between anneal 1 and the anneal final (nannealpasses*NBatch). e.g. 1 and 5 for 10 anneals: linspace(1,5,10). This effectively creates an increasingly harder threshold to cross for each anneal pass. The final element is used during the final template matching pass - i.e. the pass that goes through each batch sequentially and performs parallel matching.

Relevant References: #122, #146(isolated_peaks implementation notes)

lam - Penalty for Amplitudes Different to the Template

A large value of lam means that if the template needs to be scaled to match the candidate waveform, there is a large penalty associated with that. The penalty is referring to the value of similarity between the waveform and the template, hence a large penalty will cause a reduction in the similarity value. The threshold for similarity is set by Th

Nfilt - Starting Number of Clusters

Nfilt sets the initial number of clusters to find. This mean the output (before any auto-merging) will usually have this many clusters, but if shuffle_clusters = 1, you may find the output deviates from this value. number
Typically you want this variable to be 2-4 times the number of recording sites (i.e. channels, Nchan) you have. However, the lower the input impedance of your recording sites, the lower you can set this value. A low input impedance indicates that you will still receive large amplitude signals relatively further away from the recording site, hence if all your recording sites were low impedance you might find that they essentially record the same signal - KiloSort will therefore not be able to cluster signals base on a waveform signature that spans multiple channels.

nt0 - Waveform Window (Samples)

Sets the number of samples to use for templates and hence the extracted waveform. The peak of the template / extracted waveform is located at sample nt0min. Should always be an odd number. It also cannot exceed 80, #30, as there is a hardcoded maximum in the GPU code.

Excellent examples here: #177, #171

nt0min - Peak Location used for PC's -1

Informs the algorithm where in your PC's is the peak location. The -1 is required because MATLAB uses 1-indexing, e.g. waveform centre at 21: 1+20, instead of 0+21.

Key References: #177, #169

wPCA - The Principle Component Matrix

wPCA should contain the first 7 principle components from some sample data.

Wi = pca(waveFormArray);
% waveFormArray is:
% rows    x    columns
% spikes  x    samples of the spike waveform
imagesc(Wi) % visualise the output
wPCA = Wi(:,1:7) % KiloSort only uses the first 7

To compute the xth PC value for a waveform you multiply the xth column of wPCA with a spike waveform, e.g. a row of waveFormArray. For multi-channel data, the waveforms used can only be the channels with the largest amplitude.

Key References: #169, #32(multi-channel)

Clone this wiki locally