Appendix A — Fixed point study in the network of neurons
We recall the equation for the low-rank network of neurons:
\[ \dot h_i(t) = -h_i(t) + \frac 1N \sum_{\mu=1}^p \sum_{j=1}^N F_{\mu i} G_{\mu j} \phi(h_j(t)). \]
The low-rank vectors are in general obtained by sampling a distribution \(\boldsymbol{z} = (z_1, \cdots, z_p) \sim \rho\). Every neuron \(i\) samples this distribution independently, and low-rank vectors are obtained by passing the samples through functions \(f_\mu(\boldsymbol{z})\) and \(g_\mu(\boldsymbol{z})\).
In the main text, the distribution is a multivariate normal with unit variance, no covariance and zero mean, such that it factors out as \(\rho(z_1, \cdots, z_p) = \prod_{\mu=1}^p \mathcal{N}(z_\mu)\). Furthermore, the functions are component-wise \(f_\mu(z_1, \cdots, z_p) = z_\mu\) and \(g_\mu(z_1, \cdots, z_p) = \tilde \phi(z_\mu) = \frac{\phi(z_\mu) - \langle{\phi(z_\mu)}\rangle}{\mathrm{Var}[\phi(z_\mu)]}\).
\[ F_{\mu i} = f_\mu(z_{1 i}, \cdots, z_{p i}), \, G_{\mu i} = g_\mu(z_{1 i}, \cdots, z_{p i}) \]
A.1 Derivation of fixed points
To find the fixed points, one has to solve in general \(\dot h_i(t) = 0\) for all \(i = 1, \cdots, N\). Because of the non-linearity \(\phi(h_j(t))\), this is difficult, but we can guess the fixed points \(h^\star_i\) and verify them by substituting into the equation that \(\dot h_i(t) = 0\).
A.1.1 Zero fixed point
A trivial fixed point is \(h^\star_i = 0\) for all \(i\), which we prove to be a fixed point as \(N \to \infty\).
\[ \begin{aligned} \dot h_i(t) &= \frac 1N \sum_{\mu=1}^p \sum_{j=1}^N F_{\mu i} G_{\mu j} \phi(0) \\ &= \frac{\phi(0)}{N} \sum_{\mu=1}^p \sum_{j=1}^N F_{\mu i} G_{\mu j} \\ &= \frac{\phi(0)}{N} \sum_{\mu=1}^p \sum_{j \neq i}^N F_{\mu i} G_{\mu j} + \underbrace{\frac{\phi(0)}{N} \sum_{\mu=1}^p F_{\mu i} G_{\mu i}}_{\mathcal{O}(\frac 1N)} \\ &= \phi(0) \sum_{\mu=1}^p \frac 1N \sum_{j \neq i}^N F_{\mu i} G_{\mu j} + \mathcal{O}(\tfrac 1N) \\ &= \phi(0) \sum_{\mu=1}^p F_{\mu i} \frac{N-1}{N} \frac{1}{N-1} \sum_{j \neq i}^N G_{\mu j} + \mathcal{O}(\tfrac 1N) \\ &\stackrel{N \gg 1}{\approx} \phi(0) \sum_{\mu=1}^p F_{\mu i} \langle{g_\mu(\boldsymbol{z})}\rangle \end{aligned} \]
To write the last line, we make a slight abuse of notation in order to avoid notation problems when taking the \(N \to \infty\) limit of \(F_{\mu i}\).
In the case of the network studied in this work, we find that
\[ \langle{g_\mu(\boldsymbol{z})}\rangle = \langle{\tilde \phi(z_\mu)}\rangle = \langle{\frac{\phi(z_\mu) - \langle{\phi(z_\mu)}\rangle}{\mathrm{Var}[\phi(z_\mu)]}}\rangle = 0. \]
Thus, \(h_i^\star=0\) is a fixed point in the limit \(N \to \infty\).
A.1.2 Pattern fixed point
Another set of fixed points is \(h^\star_i = F_{\nu i}\) for all \(\nu = 1, \cdots, p\). We refer to these as the “patterns”, since in the context of learning the network “remembers” (converges to) these vectors. We verify the fixed point similarly as before.
\[ \begin{aligned} \dot h_i(t) &= -F_{\nu i} + \frac 1N \sum_{\mu=1}^p \sum_{j=1}^N F_{\mu i} G_{\mu j} \phi(F_{\nu j}) \\ &= -F_{\nu i} + \frac 1N \sum_{\mu=1}^p \sum_{j \neq i}^N F_{\mu i} G_{\mu j} \phi(F_{\nu j}) + \mathcal{O}(\tfrac 1N) \\ &= -F_{\nu i} + \frac 1N \sum_{j \neq i}^N F_{\nu i} G_{\nu j} \phi(F_{\nu j}) + \frac 1N \sum_{\mu \neq \nu}^p \sum_{j \neq i}^N F_{\mu i} G_{\mu j} \phi(F_{\nu j}) + \mathcal{O}(\tfrac 1N) \\ &\stackrel{N \gg 1}{\approx} -F_{\nu i} + F_{\nu i} \langle{g_\nu(\boldsymbol{z_j}) \phi(z_{\nu j})}\rangle + \sum_{\mu \neq \nu}^p F_{\mu i} \langle{g_\mu(\boldsymbol{z_j}) \phi(z_{\nu j})}\rangle \end{aligned} \]
The term \(\langle{g_\mu(\boldsymbol{z_j}) \phi(z_{\nu j})}\rangle\) vanishes in the case of a factorizable distribution and when the functions \(g_\mu\) are component-wise.
\[ \begin{aligned} \langle{g_\mu(\boldsymbol{z_j}) \phi(z_{\nu j})}\rangle &= \int_{\mathbb R^p} g_\mu(z_1, \cdots, z_p) \phi(z_\nu) \rho(\mathrm d\boldsymbol{z}) \\ &= \int_{\mathbb R^p} g_\mu(z_\mu) \phi(z_\nu) \prod_{\gamma=1}^p \rho_\gamma(\mathrm dz_\gamma) \\ &= \int_{\mathbb R} g_\mu(z_\mu) \rho_\mu(\mathrm dz_\mu) \int_{\mathbb R} \phi(z_\nu) \rho_\nu(\mathrm dz_\nu) \\ &= \langle{g_\mu(z_\mu)}\rangle \langle{\phi(z_\nu)}\rangle \end{aligned} \]
As established in the previous section, in the case of the network studied in this work, \(\langle{g_\mu(z_\mu)}\rangle = 0\).
The term \(\langle{g_\nu(\boldsymbol{z_j}) \phi(z_{\nu j})}\rangle\) is more difficult, but we can prove that in our case it equals one.
\[ \begin{aligned} \langle{g_\nu(\boldsymbol{z_j}) \phi(z_{\nu j})}\rangle &= \langle{\frac{\phi(z_{\mu j}) - \langle{\phi(z_{\mu j})}\rangle}{\mathrm{Var}[\phi(z_{\mu j})]} \phi(z_{\mu j})}\rangle \\ &= \langle{\frac{\phi(z)^2 - \langle{\phi(z)}\rangle \phi(z) }{\mathrm{Var}[\phi(z)]}}\rangle \\ &= \frac{\langle{\phi(z)^2}\rangle - \langle{\langle{\phi(z)}\rangle \phi(z)}\rangle }{\mathrm{Var}[\phi(z)]} \\ &= \frac{\langle{\phi(z)^2}\rangle - \langle{\phi(z)}\rangle^2}{\mathrm{Var}[\phi(z)]} \\ &= \frac{\mathrm{Var}[\phi(z)]}{\mathrm{Var}[\phi(z)]} \\ &= 1 \end{aligned} \]
Therefore, the set of \(h_i^\star = F_{\nu i}\) for all \(\nu = 1, \cdots, p\) are fixed points in the limit of large \(N\), under the assumptions that the distribution is factorizable, the functions \(g_\mu\) are component-wise, and \(\langle{g_\nu(z) \phi(z)}\rangle=1\), which is the case in the setup of this work.
A.2 Fixed points of the neural field
The derivation of the fixed points for the neural field equation follows the same arguments as the fixed points in the network of neurons.
As \(N \to \infty\), the fixed points \(h_i^\star\) become functions \(h^\star(z_1, \cdots, z_p)\) on the \(\mathbb R^p\) space:
- \(h_i^\star = 0 \to h^\star(z_1, \cdots, z_p) = 0\)
- \(h_i^\star = F_{\nu i} = z_{\nu i} \to h^\star(z_1, \cdots, z_p) = z_\nu\) for all \(\nu = 1, \cdots, p\)
A.3 Stability of the fixed points
We now study the stability of the fixed points. In the evolution equation, we substitute \(h_i(t) = h_i^\star + \delta_i(t)\), where \(\delta_i(t)\) is a small perturbation.
\[ \begin{aligned} \dot h_i(t) &= \dot \delta_i(t) \\ &= -(h_i^\star + \delta_i(t)) + \sum_{j=1}^N J_{ij} \phi(h_j^\star + \delta_j(t)) \\ &= -(h_i^\star + \delta_i(t)) + \sum_{j=1}^N J_{ij} (\phi(h_j^\star) + \partial \phi(h_j^\star) \delta_j(t) + \mathcal{O}(\delta_j(t)^2)) \\ &= \underbrace{-h_i^\star + \sum_{j=1}^N J_{ij} \phi(h_j^\star)}_\text{$=0$ by definition of the fixed point} - \delta_i(t) + \sum_{j=1}^N J_{ij} \partial \phi(h_j^\star) \delta_j(t) + \mathcal{O}(\lVert{\boldsymbol{\delta}(t)}\rVert^2) \\ &= - \delta_i(t) + \sum_{j=1}^N J_{ij} \partial \phi(h_j^\star) \delta_j(t) + \mathcal{O}(\lVert{\boldsymbol{\delta}(t)}\rVert^2) \\ &= \sum_{j=1}^N (\underbrace{-\mathrm{Id}_{ij} + \underbrace{J_{ij} \partial \phi(h_j^\star)}_{J_{ij}^\prime}}_{K_{ij}}) \delta_j(t) + \mathcal{O}(\lVert{\boldsymbol{\delta}(t)}\rVert^2) \end{aligned} \]
In matrix notation \(\boldsymbol{\delta}(t) = (\delta_1(t), \cdots, \delta_N(t))\), this becomes
\[ \begin{aligned} &\dot{\boldsymbol{\delta}}(t) = K \boldsymbol{\delta}(t) \\ \implies &\boldsymbol{\delta}(t) = \mathrm{exp}(Kt) \boldsymbol{\delta}(0) \end{aligned} \]
The fixed point is stable if \(\boldsymbol{\delta}(t) \to 0\) as \(t \to \infty\), in other words, if all the eigenvalues of \(K\) are negative.
Before we study the spectrum of \(K\), we write the following result: if \(\lambda\) is an eigenvalue of \(A\), then \(\lambda-1\) is an eigenvalue of \(A-\mathrm{Id}\). This follows simply from the equation for the eigenvalues \(\lambda_i\) and eigenvectors \(\boldsymbol{v_i}\) of \(A\). If \(A \boldsymbol{v_i} = \lambda_i \boldsymbol{v_i}\), then \((A-\mathrm{Id})\boldsymbol{v_i}=\lambda_i \boldsymbol{v_i} - \boldsymbol{v_i} = (\lambda_i - 1)\boldsymbol{v_i}\). Therefore, to study \(K\), it suffices to study the eigenvalues of \(J^\prime\).
Since \(J^\prime\) is low-rank, its image is spanned by the vectors \(\{\boldsymbol{F_\nu} = (F_{\nu 1}, \cdots, F_{\nu N}) | \nu=1,\cdots,p\}\), which motivates the Ansatz of eigenvectors \(\boldsymbol{v_\nu} = \boldsymbol{F_\nu}\) in the \(N \to \infty\) limit.
\[ \begin{aligned} (J^\prime \boldsymbol{F_\nu})_i &= \frac 1N \sum_{\mu=1}^p \sum_{j=1}^N F_{\mu i} \underbrace{G_{\mu j} \partial \phi(h_j^\star)}_{G^\prime_{\mu j}} F_{\nu j} \\ &= \frac 1N \sum_{\mu \neq \nu}^p \sum_{j=1}^N F_{\mu i} G^\prime_{\mu j} F_{\nu j} + \frac 1N \sum_{j=1}^N F_{\nu i} G^\prime_{\nu j} F_{\nu j} \\ &\stackrel{N \gg 1}{\approx} \sum_{\mu \neq \nu}^p F_{\mu i} \langle{g^\prime_\mu(\boldsymbol{z}) f_\nu(\boldsymbol{z})}\rangle + F_{\nu i} \langle{g^\prime_\nu(\boldsymbol{z}) f_\nu(\boldsymbol{z})}\rangle \end{aligned} \]
We study each term separately. Additionally (and similarly as before), we assume that the functions \(g_\mu\) and the functions \(f_\mu\) are component-wise, and that the distribution \(\rho\) factorises. When \(\mu \neq \nu\), we compute:
\[ \begin{aligned} \langle{g^\prime_\mu(\boldsymbol{z}) f_\nu(\boldsymbol{z})}\rangle &= \int_{\mathbb R^p} g^\prime_\mu(\boldsymbol{z}) f_\nu(\boldsymbol{z}) \rho(\mathrm d\boldsymbol{z}) \\ &= \int_{\mathbb R} \int_{\mathbb R} g^\prime_\mu(z_\mu) f_\nu(z_\nu) \rho_\mu(\mathrm dz_\mu) \rho_\nu(\mathrm dz_\nu) \\ &= \int_{\mathbb R} g^\prime_\mu(z_\mu) \rho_\mu(\mathrm dz_\mu) \int_{\mathbb R} f_\nu(z_\nu) \rho_\nu(\mathrm dz_\nu) \\ &= \langle{g^\prime_\mu(z_\mu)}\rangle \langle{f_\nu(z_\nu)}\rangle. \end{aligned} \]
In our case, \(\langle{f_\nu(z_\nu)}\rangle = \langle{z_\nu}\rangle = \int_{\mathbb R} z_\nu \mathcal{N}(\mathrm dz_\nu) = 0\) and so the term vanishes. Therefore, we are left with
\[ \begin{aligned} &(J^\prime \boldsymbol{F_\nu})_i \stackrel{N \gg 1}{\approx} F_{\nu i} \langle{g^\prime_\nu(z_\nu) f_\nu(z_\nu)}\rangle \\ \implies &J^\prime \boldsymbol{F_\nu} = \langle{g^\prime_\nu(z_\nu) f_\nu(z_\nu)}\rangle \boldsymbol{F_\nu} \\ \implies &K \boldsymbol{F_\nu} = (\underbrace{\langle{g^\prime_\nu(z_\nu) f_\nu(z_\nu)}\rangle-1}_\text{eigenvalue $\lambda_\nu$}) \boldsymbol{F_\nu} \\ \end{aligned} \]
The term \(\langle{g^\prime_\nu(z_\nu) f_\nu(z_\nu)}\rangle\) does not vanish in general because of the coupling between the components. The integral is too complicated to be analytically solved, but we provide a numerical estimation in our case (\(\rho(z_1, \cdots, z_p)=\prod_{\mu=1}^p\mathcal{N}(z_\mu)\), \(f_\mu(z_\mu)=z_\mu\), \(g_\mu(z_\mu)=\tilde\phi(z_\mu)\)) for fixed points \(h_i^\star=0\) and \(h_i^\star=F_{\nu i}\).
\[ \begin{aligned} \phi(h)&=\frac{1}{1+\mathrm{e}^{-h}} \\ &\implies \begin{cases} h_i^\star=0 &\implies \lambda_\nu = \langle{\tilde\phi(z) \partial \phi(0) z}\rangle - 1 \approx 0.19061 \pm 0.00014 \\ h_i^\star=F_{\nu i} &\implies \lambda_\nu = \langle{\tilde\phi(z) \partial \phi(z) z}\rangle - 1 \approx -0.28090 \pm 0.00006 \end{cases} \end{aligned} \]
When \(\phi\) is the logistic function, the pattern fixed points are stable, whereas the zero fixed point is unstable.
We now consider a Taylor expansion of the activation function \(\phi(h)\) up to order 3, and cite results obtained from symbolic computation and known moments of the Gaussian distribution.
\[ \begin{aligned} \phi(h)&= c_0 + c_1 h,\, c_0, c_1 \in \mathbb R^2 \\ &\implies \lambda_\nu = \langle{\frac{z}{c_{1}} c_{1} z}\rangle - 1 = 0 \text{ for any fixed point}\\ \phi(h)&= c_0 + c_1 h + c_2 h^2,\, c_0, c_1, c_2 \in \mathbb R^3 \\ &\implies \begin{cases} h_i^\star=0 &\implies \begin{aligned}[t] \lambda_\nu &= \langle{\frac{c_{1} z + c_{2} z^{2} - c_{2}}{c_{1}^{2} + 2 c_{2}^{2}} (c_1) z}\rangle - 1 \\ &= - \frac{2 c_{2}^{2}}{c_{1}^{2} + 2 c_{2}^{2}} \end{aligned} \\ h_i^\star=F_{\nu i} &\implies \begin{aligned}[t] \lambda_\nu &= \langle{\frac{c_{1} z + c_{2} z^{2} - c_{2}}{c_{1}^{2} + 2 c_{2}^{2}} (c_{1} + 2 c_{2} z) z}\rangle - 1 \\ &= \frac{2 c_{2}^{2}}{c_{1}^{2} + 2 c_{2}^{2}} \end{aligned} \end{cases} \\ \phi(h)&= c_0 + c_1 h + c_2 h^2 + c_3 h^3,\, c_0, c_1, c_2, c_3 \in \mathbb R^4 \\ &\implies \begin{cases} h_i^\star=0 &\implies \begin{aligned}[t] \lambda_\nu &= \langle{\frac{c_{1} z + c_{2} z^{2} - c_{2} + c_{3} z^{3}}{c_{1}^{2} + 6 c_{1} c_{3} + 2 c_{2}^{2} + 15 c_{3}^{2}} (c_{1}) z}\rangle - 1 \\ &= \frac{- 3 c_{1} c_{3} - 2 c_{2}^{2} - 15 c_{3}^{2}}{c_{1}^{2} + 6 c_{1} c_{3} + 2 c_{2}^{2} + 15 c_{3}^{2}} \end{aligned} \\ h_i^\star=F_{\nu i} &\implies \begin{aligned}[t] \lambda_\nu &= \langle{\frac{c_{1} z + c_{2} z^{2} - c_{2} + c_{3} z^{3}}{c_{1}^{2} + 6 c_{1} c_{3} + 2 c_{2}^{2} + 15 c_{3}^{2}} (c_{1} + 2 c_{2} z + 3 c_{3} z^{2}) z}\rangle - 1 \\ &= \frac{2 \cdot \left(3 c_{1} c_{3} + c_{2}^{2} + 15 c_{3}^{2}\right)}{c_{1}^{2} + 6 c_{1} c_{3} + 2 c_{2}^{2} + 15 c_{3}^{2}} \end{aligned} \end{cases} \end{aligned} \]
What a mess! We summarize the results from the Taylor expansion:
- When \(\phi\) is linear, we see that all the eigenvalues are zero, therefore the first order stability is inconclusive. Simulations show when \(\phi\) is linear, the convergence to the fixed points is very slow (the spectrum is negative, but the eigenvalues \(\lambda_\nu\) are weakly negative, and tend to zero when \(N \to \infty\)).
- When \(\phi\) is quadratic, the zero fixed point is stable, and the pattern fixed point is unstable, regardless of the sign of the coefficients.
- When \(\phi\) is cubic, the sign of the eigenvalues depends primarily on \(c_3\) (negative \(c_1\) breaks the assumption that \(\phi\) must be monotonically increasing). The intuition is that we require \(c_1\) to be large enough (in other words, the growth of the firing rate when \(h\) becomes positive) and \(c_3 < 0\) such that \(3 c_1 c_3 + 15 c_3^2 < 0\). The intuition behind this is that if the slope \(c_1\) is stronger, then the bump of \(\partial \phi\) becomes “more concentrated”, which helps lower the eigenvalues associated with the pattern fixed points. The magnitude of the second order term \(c_2\) only contributes to adding instability to the pattern fixed point, and stability to the zero fixed point, which leads to the conclusion it is best set to \(c_2=0\) if we want the network to be stable at the pattern fixed points.
The study of the Taylor expansion therefore instructs us that stability is improved when \(\phi\) grows rapidly around \(h=0\), and that even powers (symmetries of \(\phi\) around the y-axis) deteriorate the stability of the pattern fixed point. This helps us understand why the logistic function \(\phi(h)=\frac{1}{1+\mathrm{e}^{-h}}=\tfrac 12 + \tfrac h4 - \tfrac{h^3}{48} + \mathcal{O}(h^5)\) has good convergence properties: we have that \(c_1\) is large enough, \(c_3\) is negative, and \(c_2\) is zero.
We finish the study of eigenvalues by noting that in the \(N \to \infty\) limit, the remaining \(N-p\) eigenvalues of \(J^\prime\) are zero. The corresponding eigenvectors span the orthogonal complement of the vectors \(\boldsymbol{G^\prime_\nu} = (G^\prime_{\nu 1}, \cdots, G^\prime_{\nu N})\), formally \(\boldsymbol{v} \in \mathrm{span}(\boldsymbol{G^\prime_1}, \cdots,\boldsymbol{G^\prime_p})^\perp\).
In conclusion, the spectrum of \(K\) is composed of \(p\) eigenvalues \(\lambda_\nu = \langle{g_\nu(z_\nu) f_\nu(z_\nu) \partial \phi(h^\star)}\rangle-1\), and \(N-p\) eigenvalues \(\lambda_\perp = -1\). For the network considered in this work, the interpretation is that components of the field aligned with the patterns \(z_\nu\) converge to the fixed point, and the orthogonal components decay as \(t \to \infty\). The fixed point \(h_i^\star=0\) is therefore unstable, and the pattern fixed points \(h_i^\star = z_{\nu i}\) are stable. We finally note that since the normal distribution is centred around zero and symmetric, \(h_i^\star = -z_{\nu i}\) are also stable fixed points.