Skip to content

Simulations

Maria Kesa edited this page Jul 19, 2019 · 26 revisions

Algorithm and code for simulating the data

We simulated two random matrices $U$ and $V$ for testing algorithms, namely if $U$ and $V$ could be recovered by different matrix factorization techniques from the matrix multiple $U@V$. The weights of $U$ were non-zero with a probability of 25% and these non-zero weights were drawn from a Gaussian distribution with mean 2 and standard deviation 1. All the elements of $V$ were drawn from a Gaussian distribution of mean 0 and standard deviation 1. We define $X_\text{synthetic} = UV^\top$. We zscored the matrix prior to fitting the model. We ran ensemble pursuit on this dataset with 25 components and compared the $U_\text{approx}$ to the original $U$.

The code for simulating the data is given below:

def simulate_data(self,nr_components,nr_timepoints,nr_neurons):
        zeros_for_U=np.random.choice([0,1], nr_neurons*nr_components, p=[0.75, 0.25]).reshape((nr_neurons,nr_components))
        U=np.random.normal(loc=2,scale=1,size=(nr_neurons,nr_components))
        U=np.abs(U*zeros_for_U)
        V=np.random.normal(loc=0,scale=1,size=(nr_components,nr_timepoints))
        X=U@V
        X=self.zscore(X)
        self.U_orig=U
        self.V_orig=V
        return X

Initial simulations (50 neurons,50 timepoints,25 components)

Here we investigate the effectiveness of the Ensemble Pursuit algorithm to recover ensembles from simulated data where the ground truth is known by design.

We simulated synthetic data to see how well ensemble pursuit recovered the original sparse weights $U$. The synthetic data for this first experiment consisted of 25 sparse components $U$ with 50 weights each, which represent the neural weights and 25 dense components $V$ with 50 time points.

Here are the three top recovered $u$ vectors from $U$. The first component is a giant component that puts weights on all the neurons.

This can be explained by the large correlations between neurons in the correlation matrix.

For these simulation parameters we made plots of correlations between components in the original and recovered data and cross-correlations between these two. The original $V$'s matched to their closest approximation from the fitted $V$'s are plotted in the figure below. We found that the correlation between these matrices was XXX, suggesting that the weights were sufficiently recovered.

The variance explained by the model averaged over neurons was 0.89.

What if we reduce the number of components in EnsemblePursuit?

When we reduce the number of components used to generate the data to 5, the reconstruction is near perfect.

Other models: PCA

We simulated an identical U and V to the EnsemblePursuit initial simulations at the beginning of the page and recovered U and V via PCA. We fit a model with 25 components to a matrix of 50 neurons by 50 time points.

The top 3 U components are shown in the following figure.

The correlations between components within the original data, the fitted components and the correlations between fitted components and the original components are shown below.

The variance explained by the model averaged over neurons was 1.0.

SparsePCA

Here are the sparsePCA fits to simulated data with the same set up as the experiments at the top of the page.

The top 3 u vectors:

The correlations between the original factors, the fitted factors and between original factors and fitted factors: The original time courses of $V$ (blue) and fitted time courses (orange):

The variance explained by the model averaged over neurons was 0.87.

ICA

Here are the ICA fits to simulated data with the same set up as the experiments at the top of the page.

The top 3 u vectors:

The correlations between the original factors, the fitted factors and between original factors and fitted factors: The original time courses of $V$ (blue) and fitted time courses (orange):

The variance explained averaging over neurons was 1.0.

NMF

Here are the NMF fits to simulated data with the same set up as the experiments at the top of the page. We subtracted the minimum of the data from the data matrix to ensure that the entries are non-negative.

The top 3 u vectors:

The correlations between the original factors, the fitted factors and between original factors and fitted factors: The original time courses of $V$ (blue) and fitted time courses (orange):

The variance explained by the model averaged over neurons was 0.96.

Latent Dirichlet Allocation

Here are the LDA fits to simulated data with the same set up as the experiments at the top of the page. We subtracted the minimum of the data from the data matrix to ensure that the entries are non-negative.

The top 3 u vectors:

The correlations between the original factors, the fitted factors and between original factors and fitted factors: The original time courses of $V$ (blue) and fitted time courses (orange):