-
Notifications
You must be signed in to change notification settings - Fork 0
/
search.xml
375 lines (179 loc) · 187 KB
/
search.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
<?xml version="1.0" encoding="utf-8"?>
<search>
<entry>
<title>NMF</title>
<link href="/2022/11/07/nmf/"/>
<url>/2022/11/07/nmf/</url>
<content type="html"><![CDATA[<p>$\text{The interpretation of } W \text{ is that each column is a basis element}$.</p><p>$\text{The interpretation of } H \text{ is that each column gives the ‘coordinates of a data point’ in the basis}$</p><script type="math/tex; mode=display">V = WH\\ \text{Frobenius norm: } ||X-WH||^2_F = \sum_{i,j}(X -WH)^2_{ij} \\ W\ge0,H\ge0</script><p>Features matrix for W, and coefficients matrix for H</p><h3 id="Clustering-property"><a href="#Clustering-property" class="headerlink" title="Clustering property"></a>Clustering property</h3><p>It automatically clusters the columns of input data V. If we impose an orthogonality constraint on H matrix, i.e. $HH^T=I$, then the above minimization is mathematically equivalent to the minimization of K-means clustering. Furthermore, the computed H gives the cluster membership, i.e., if $H_{kj}>H_{ij}$ for all $i \not= k$, this suggests that the input data $v_j$ belongs to k-th cluster. </p><h3 id="Convex-non-negative-matrix-factorization"><a href="#Convex-non-negative-matrix-factorization" class="headerlink" title="Convex non-negative matrix factorization"></a>Convex non-negative matrix factorization</h3><p>In standard NMF, matrix factor $W \in R_{+}^{m \times k}$, i.e., W can be anything in that space. Convex NMF restricts the columns of W to convex combinations of the input data vectors. This greatly improves the quality of data representation of W. Furthermore, the resulting matrix factor H becomes more sparse and orthogonal.</p><h3 id="Cost-functions-and-regularizations"><a href="#Cost-functions-and-regularizations" class="headerlink" title="Cost functions and regularizations"></a>Cost functions and regularizations</h3><p>minimize the function $F(W, H)=||V - WH||^2_F$</p><h3 id="Online-NMF"><a href="#Online-NMF" class="headerlink" title="Online NMF"></a>Online NMF</h3><p>Collaborative filtering in recommendation systems, where there may be many users and many items to recommend, and it would be inefficient to recalculate everything when one user or one item is added to the system.</p><h2 id="Algorithms"><a href="#Algorithms" class="headerlink" title="Algorithms"></a>Algorithms</h2><ol><li>Initialize: W and H non negative</li><li>Update until W and H are stable</li></ol><script type="math/tex; mode=display">H_{[i,j]}^{n+1} \gets H^n_{[i,j]} \frac{((W^n)^TV)_{[i,j]}}{((W^n)^TW^nH^n)_{[i,j]}} \\W^{n+1}_{[i,j]} \gets W^n_{[i,j]} \frac{(V(H^{n+1})^T)_{[i,j]}}{(W^nH^{n+1}(H^{n+1})^T)_{[i,j]}}</script><p>The two multiplicative factors for W and H are matrices of ones when $V = WH$</p><p><strong>Alternative approach:</strong></p><ol><li>H is fixed and W found by a non-negative least squares solver, then W is fixed and H is found analogously.</li></ol><h3 id="Sequential-NMF"><a href="#Sequential-NMF" class="headerlink" title="Sequential NMF"></a>Sequential NMF</h3><p>THe contributioin from the NMF components ranked empirically when they are constructed one by one (sequentially), i.e., learn the (n+1)-th component with the first n components constructed.</p><h3 id="Exact-NMF"><a href="#Exact-NMF" class="headerlink" title="Exact NMF"></a>Exact NMF</h3><p>Exact solutions for the variants of NMF can be expected (in polynomial time) when additional constraints hold for matrix V. A polynomial time algorithm for solving nonnegative rank factorization if V contains a monomial sub matrix of rank equal to its rank was given by Compbell and Poole in 1981.</p><h2 id="Application-in-Bioinformatics"><a href="#Application-in-Bioinformatics" class="headerlink" title="Application in Bioinformatics"></a>Application in Bioinformatics</h2><p>NMF has been successfully applied in bioinformatics for clustering gene expression and DNA methylation data and finding the genes most representative of the clusters. In the analysis of cancer mutations it has been used to identify common patterns of mutations that occur in many cancers and that probably have distinct causes. NMF techniques can identify sources of variation such as cell types, disease subtypes, population stratification, tissue composition, and tumor clonality.</p><h2 id="Implement-in-cellchat-selectK"><a href="#Implement-in-cellchat-selectK" class="headerlink" title="Implement in cellchat selectK"></a>Implement in cellchat selectK</h2><h3 id="Cophenetic"><a href="#Cophenetic" class="headerlink" title="Cophenetic"></a>Cophenetic</h3><p>In the clustering of biological information such as data from microarray experiments, the <strong>cophenetic similarity</strong> or <strong>cophenetic distance</strong> of two object is a measure of how similar those two objects have to be grouped into the same cluster. The cophenetic distance between two objects is the height of the dendrogram where the two branches that include the two objects merge into a single branch. Outside the context of a dendrogram, it is the distance between the largest two clusters that contains the two objects individually when they are merged into a single cluster that contains both.</p><p>Cophenetic correlation coefficient:</p><ul><li>$x(i,j)=|X_i-X_j|$, the Euclidean distance between the ith and jth observations</li><li>$t(i,j)$, the dendrogrammatic distance between the model points Ti and Tj. This distance is the height of the node at which these two points are first joined together.</li></ul><script type="math/tex; mode=display">c= \frac{\sum_{i<j}[x(i,j)-\bar{x}][t(i,j)-\bar{t}]}{\sqrt{\sum_{i<j}[x(i,j)-\bar{x}]^2\sum_{i<j}[t(i,j)-\bar{t}]^2}}</script>]]></content>
<categories>
<category> algorithms </category>
</categories>
<tags>
<tag> algorithms </tag>
</tags>
</entry>
<entry>
<title>scBERT</title>
<link href="/2022/09/30/scbert/"/>
<url>/2022/09/30/scbert/</url>
<content type="html"><![CDATA[<h1 id="Background"><a href="#Background" class="headerlink" title="Background"></a>Background</h1><p>BERT is used in nature language processing (NLP) to find the correlation between context, and translate from one language to others. Here, the researchers in Tencent, developing a method applied BERT to predict the linkage in single-cell RNA-seq datasets to annotate the cell types of the new single-cell data. BERT model has been identified as in short of recognize the logical structure in the text. What could be done to improve the ability to implement causal inference and other advanced learning task is the major object for researchers focusing on transformer generally using.<br>Using one single, uniform deep learning framework to extract all specific features from these different omics datasets has become a more and more realistic task for deep learning researchers. And I think that even the design of scBERT is not perfect which have many risks in their neural network input embedding, however, more and more works in this fields show its promising ability to ultimately finish the task.</p><h1 id="Network-Structure"><a href="#Network-Structure" class="headerlink" title="Network Structure"></a>Network Structure</h1><p><img src="/2022/09/30/scbert/image-20221002153638892.png" alt="Self-supervised pre-training and unlabeled scRNA-seq data embedding"></p><p><img src="/2022/09/30/scbert/image-20221002153904173.png" alt="Illustrating the embeddings of scBERT"></p><h1 id="Processing"><a href="#Processing" class="headerlink" title="Processing"></a>Processing</h1><p>The input of the structure including the self-supervised pre-training and supervised finetuning. </p><h3 id="self-supervised-pre-training"><a href="#self-supervised-pre-training" class="headerlink" title="self-supervised pre-training"></a>self-supervised pre-training</h3><p>For self-supervised pre-training stage, the collected single-cell RNA-seq datasets are trained to extract features of genes and expression profile. This process could be described as follow: firstly to random mask some expression profile of genes and then to do expression embedding plus the gene embedding. The Gene embedding operation is adapted from Gene2vec. The co-expression genes are extracted to have a similar eigen value, which means to embed them. The combination of expression embedding and gene embedding then inputting in performer layers to implement self-attention to find features, which is the performer encoding operation.</p><h3 id="supervised-fine-tuning"><a href="#supervised-fine-tuning" class="headerlink" title="supervised fine-tuning"></a>supervised fine-tuning</h3><p>This process is using labeled scRNA-seq datasets to train the classifier which could annotate the types of these cells. In the detail, the data has been encoded through the same steps in pre-training stage. Embedding the expression profile and gene vector, and then processing in performer encoder which is trained in pre-training stage. The decoder will reconstruct the expression profile, and using fully connected layers to classify the cell-type. </p><h2 id="Embedding-in-practice"><a href="#Embedding-in-practice" class="headerlink" title="Embedding in practice"></a>Embedding in practice</h2><p>The gene embedding E_G1 (the gene identity from gene2vec falling into the first bin) and the expression embedding E_B2 (the gene expression falling into the second bin and being transformed to the same dimension as the E_G1) are summed and fed into scBERT to generate representations for genes.</p><p>The binned expression profile of a single-cell could be done by binning the profile of scRNA-seq data. </p><pre class="line-numbers language-python" data-language="python"><code class="language-python"><span class="token keyword">def</span> <span class="token function">binning_expression</span><span class="token punctuation">(</span>data<span class="token punctuation">,</span> bins<span class="token operator">=</span><span class="token number">200</span><span class="token punctuation">)</span><span class="token punctuation">:</span> data <span class="token operator">=</span> np<span class="token punctuation">.</span>log2<span class="token punctuation">(</span>data <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">)</span> data <span class="token operator">=</span> np<span class="token punctuation">.</span>digitize<span class="token punctuation">(</span>data<span class="token punctuation">,</span> np<span class="token punctuation">.</span>linspace<span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">15</span><span class="token punctuation">,</span> bins<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">return</span> data<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span></span></code></pre><p>Each single-cell RNA-seq dataset are then processed by performer encoder which is a transformer adapted to single-cell RNA-seq data.</p><h1 id="Performer"><a href="#Performer" class="headerlink" title="Performer"></a>Performer</h1><h3 id="Dot-Product-attention"><a href="#Dot-Product-attention" class="headerlink" title="Dot-Product attention"></a>Dot-Product attention</h3><script type="math/tex; mode=display">\begin{equation}Attention(\boldsymbol{Q},\boldsymbol{K},\boldsymbol{V}) = softmax\left(\frac{\boldsymbol{Q}\boldsymbol{K}^{\top}}{\sqrt{d_k}}\right)\boldsymbol{V}\end{equation}</script><p>The scBERT is based on the Performer architecture proposed by Google in 2020, which is an advanced progress on transformer attention mechanism. For raw dot-product attention, the complexity of computing attention is $O(L^2n)$ which is much higher than $O(L n^2)$ for convolution. So that, there have been proposed many isoformers of transformer to lower its complexity to $O(Nlog(n))$ or even $O(N)$. <strong>Performer</strong> is one of them, and it has a much stronger math proving. </p><h2 id="Linear-Attention"><a href="#Linear-Attention" class="headerlink" title="Linear Attention"></a>Linear Attention</h2><p><a href="https://spaces.ac.cn/archives/7546">Linear attention</a></p><p>Most of time, $ Q \in \R^{n \times d_k}, K \in \R^{m \times d_k}, V \in \R^{m \times d_v} $, $ n > d $ or $ n >> d $. Softmax in attention is the process that limit the speed, so that an attention model deleted softmax is called linear attention and the complexity is $ O(n) $. </p><p>Query is the raw data, Key is the type of features of the data, and Value is the value of the type of features in this curriculum. </p><p>The definition of scaled-dot product attention:</p><script type="math/tex; mode=display">Attention(\boldsymbol{Q},\boldsymbol{K},\boldsymbol{V})_i = \frac{\sum\limits_{j=1}^n e^{\boldsymbol{q}_i^{\top}\boldsymbol{k}_j}\boldsymbol{v}_j}{\sum\limits_{j=1}^n e^{\boldsymbol{q}_i^{\top}\boldsymbol{k}_j}}</script><p>More generally definition:</p><script type="math/tex; mode=display">Attention(\boldsymbol{Q},\boldsymbol{K},\boldsymbol{V})_i = \frac{\sum\limits_{j=1}^n \text{sim}(\boldsymbol{q}_i, \boldsymbol{k}_j)\boldsymbol{v}_j}{\sum\limits_{j=1}^n \text{sim}(\boldsymbol{q}_i, \boldsymbol{k}_j)}</script><p>$ sim() \gt= 0$ is a more generally function which used to compute the similarity of the query and feature. Often called Non-local neural network.</p><h3 id="Kernel-function"><a href="#Kernel-function" class="headerlink" title="Kernel function"></a>Kernel function</h3><script type="math/tex; mode=display">\text{sim}(q_i, k_j) = \phi(q_i)^{\top} \varphi(k_j)</script><p><em>Transformers are RNNs: Fast autoregressive Transformers with Linear Attention</em> uses $ \phi(x) = \varphi(x) = \text{elu}(x) + 1 $.</p><h3 id="Fast-attention-Via-positive-orthogonal-Random-Features"><a href="#Fast-attention-Via-positive-orthogonal-Random-Features" class="headerlink" title="Fast attention Via positive orthogonal Random Features"></a>Fast attention Via positive orthogonal Random Features</h3><p>Kernel function: Gaussian kernel to model Softmax kernel</p><p>Random feature map: generate Gaussian kernel </p><p>Positive random features (PRF): to ensure the value in softmax kernel is positive and unbiased approximation</p><p>Orthogonal Random feature: to lower the features in used for PRF, and also ensure the positive</p><p><img src="/2022/09/30/scbert/image-20221010090319861.png" alt="Kernel function"></p><h3 id="Random-Features"><a href="#Random-Features" class="headerlink" title="Random Features"></a>Random Features</h3><p>Firstly, let’s think about the sim(q,k) function, we could transform it like that:</p><script type="math/tex; mode=display">\begin{equation}\text{sim}(\boldsymbol{q}, \boldsymbol{k}) = \frac{\beta(\boldsymbol{q})\gamma(\boldsymbol{k})\text{sim}(\boldsymbol{q}, \boldsymbol{k})}{\beta(\boldsymbol{q})\gamma(\boldsymbol{k})}\end{equation}</script><p>and the $ \beta(\boldsymbol{x})=\gamma(\boldsymbol{x})=e^{-\lambda\Vert x\Vert^2} $ is trying to make sure the FT could generate a properly result preventing the result is meaningless. From linear attention, the task we need to solve is to find a <strong>non-negative</strong> kernel function which could simulate the distribution of Softmax function. For the upper part $ \beta(\boldsymbol{q})\gamma(\boldsymbol{k})\text{sim}(\boldsymbol{q}, \boldsymbol{k}) $. We could implement Fourier Transform: </p><script type="math/tex; mode=display">\begin{equation}\mathcal{F}(\boldsymbol{\omega}_q, \boldsymbol{\omega}_k)=\frac{1}{(2\pi)^{d/2}}\int \beta(\boldsymbol{q})\gamma(\boldsymbol{k})\text{sim}(\boldsymbol{q}, \boldsymbol{k})e^{-i\boldsymbol{\omega}_q\cdot \boldsymbol{q}-i\boldsymbol{\omega}_k\cdot \boldsymbol{k}}d\boldsymbol{q}d\boldsymbol{k}\end{equation}</script><p>And backward the equation for $sim(\boldsymbol{q}, \boldsymbol{k})$: </p><script type="math/tex; mode=display">\begin{equation}\text{sim}(\boldsymbol{q}, \boldsymbol{k})=\frac{1}{(2\pi)^{d/2}}\int \mathcal{F}(\boldsymbol{\omega}_q, \boldsymbol{\omega}_k)\frac{e^{i\boldsymbol{\omega}_q\cdot \boldsymbol{q}}}{\beta(\boldsymbol{q})} \frac{e^{i\boldsymbol{\omega}_k\cdot \boldsymbol{k}}}{\gamma(\boldsymbol{k})}d\boldsymbol{\omega}_q d\boldsymbol{\omega}_k\end{equation}</script><p>Alternatively, if you have the idea that, we just need a kernel, you could also do:</p><script type="math/tex; mode=display">\begin{equation}e^{\boldsymbol{q}\cdot \boldsymbol{k}} = e^{\Vert \boldsymbol{q}\Vert^2 / 2 + \Vert \boldsymbol{k}\Vert^2 / 2 - \Vert\boldsymbol{q}-\boldsymbol{k}\Vert^2 / 2}\end{equation}</script><p>Notice that $\exp(||\boldsymbol{q}-\boldsymbol{k}||^2/2)$ is the gaussian kernel, and for gaussian kernel, there are many methods to insure the non-negative output. Here, we could implement FT for the gaussian kernel:</p><script type="math/tex; mode=display">\begin{equation}e^{\boldsymbol{q}\cdot \boldsymbol{k}}=\frac{e^{\Vert \boldsymbol{q}\Vert^2 / 2 + \Vert \boldsymbol{k}\Vert^2 / 2}}{(2\pi)^{d/2}}\int e^{-\Vert\boldsymbol{\omega}\Vert^2 / 2 + i \boldsymbol{\omega}\cdot (\boldsymbol{q} - \boldsymbol{k})} d\boldsymbol{\omega}\end{equation}</script><p>let $ q \to iq, k \to {-ik} $:you could delete the i part in this equation, and the next step is to compute a integer solving which should be done through sampling, because this integral is not easy to compute.</p><script type="math/tex; mode=display">\begin{equation}e^{\boldsymbol{q}\cdot \boldsymbol{k}}=\frac{e^{-\Vert \boldsymbol{q}\Vert^2 / 2 - \Vert \boldsymbol{k}\Vert^2 / 2}}{(2\pi)^{d/2}}\int e^{-\Vert\boldsymbol{\omega}\Vert^2 / 2 + \boldsymbol{\omega}\cdot (\boldsymbol{q} + \boldsymbol{k})} d\boldsymbol{\omega}\end{equation}</script><p>For this part, the equation in the right means that if we sampling many times the $w$ from a $d$ dimension $(0,1)$ normal distribution and the expectation of the function $ e^{\boldsymbol{\omega}\cdot \boldsymbol{q}-\Vert \boldsymbol{q}\Vert^2 / 2} \times e^{\boldsymbol{\omega}\cdot \boldsymbol{k}-\Vert \boldsymbol{k}\Vert^2 / 2}$ could represents the result of $e^{\boldsymbol{q}\cdot\boldsymbol{k}}$. However, in normal, we could not enumerate the $w$ in unlimited times, the resulting sampling is just the approximation. But in practice, the m larger than 1000 seems have a good performance.</p><script type="math/tex; mode=display">\begin{equation}\begin{aligned} e^{\boldsymbol{q}\cdot \boldsymbol{k}}&=\mathbb{E}_{\boldsymbol{\omega}\sim \mathcal{N}(\boldsymbol{\omega};0,\boldsymbol{1}_d)}\left[e^{\boldsymbol{\omega}\cdot \boldsymbol{q}-\Vert \boldsymbol{q}\Vert^2 / 2} \times e^{\boldsymbol{\omega}\cdot \boldsymbol{k}-\Vert \boldsymbol{k}\Vert^2 / 2}\right]\\[6pt] &\approx\underbrace{\frac{1}{\sqrt{m}}\begin{pmatrix}e^{\boldsymbol{\omega}_1\cdot \boldsymbol{q}-\Vert \boldsymbol{q}\Vert^2 / 2} \\ e^{\boldsymbol{\omega}_2\cdot \boldsymbol{q}-\Vert \boldsymbol{q}\Vert^2 / 2}\\ \vdots\\ e^{\boldsymbol{\omega}_m\cdot \boldsymbol{q}-\Vert \boldsymbol{q}\Vert^2 / 2} \end{pmatrix}}_{\tilde{\boldsymbol{q}}} \cdot \underbrace{\frac{1}{\sqrt{m}}\begin{pmatrix}e^{\boldsymbol{\omega}_1\cdot \boldsymbol{k}-\Vert \boldsymbol{k}\Vert^2 / 2} \\ e^{\boldsymbol{\omega}_2\cdot \boldsymbol{k}-\Vert \boldsymbol{k}\Vert^2 / 2}\\ \vdots\\ e^{\boldsymbol{\omega}_m\cdot \boldsymbol{k}-\Vert \boldsymbol{k}\Vert^2 / 2} \end{pmatrix}}_{\tilde{\boldsymbol{k}}} \end{aligned}\label{eq:core}\end{equation}</script><h3 id="Prefix-sums"><a href="#Prefix-sums" class="headerlink" title="Prefix sums"></a>Prefix sums</h3><p>unidirectional attention: storage the computational total sum of attention rather than lower triangle matrix.</p><p><img src="/2022/09/30/scbert/image-20221010090500379.png" alt="prefix-sum mechnism"></p><h3 id="Fast-attention-code"><a href="#Fast-attention-code" class="headerlink" title="Fast attention code"></a>Fast attention code</h3><pre class="line-numbers language-python" data-language="python"><code class="language-python"><span class="token comment"># linear attention classes with softmax kernel</span><span class="token comment"># non-causal linear attention</span><span class="token keyword">def</span> <span class="token function">linear_attention</span><span class="token punctuation">(</span>q<span class="token punctuation">,</span> k<span class="token punctuation">,</span> v<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token comment"># do softmax for k firstly in the second last dimension</span> k_cumsum <span class="token operator">=</span> k<span class="token punctuation">.</span><span class="token builtin">sum</span><span class="token punctuation">(</span>dim<span class="token operator">=</span><span class="token operator">-</span><span class="token number">2</span><span class="token punctuation">)</span> <span class="token comment"># do scale for dot product of q, k</span> D_inv <span class="token operator">=</span> <span class="token number">1.</span><span class="token operator">/</span>torch<span class="token punctuation">.</span>einsum<span class="token punctuation">(</span><span class="token string">'...nd,...d->...n'</span><span class="token punctuation">,</span> q<span class="token punctuation">,</span> k_cumsum<span class="token punctuation">.</span>type_as<span class="token punctuation">(</span>q<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># do k dot product v</span> context <span class="token operator">=</span> torch<span class="token punctuation">.</span>einsum<span class="token punctuation">(</span><span class="token string">'...nd,...ne->...de'</span><span class="token punctuation">,</span> k<span class="token punctuation">,</span> v<span class="token punctuation">)</span> <span class="token comment"># do the attention</span> out <span class="token operator">=</span> torch<span class="token punctuation">.</span>einsum<span class="token punctuation">(</span><span class="token string">'...de,...nd,...n->...ne'</span><span class="token punctuation">,</span> context<span class="token punctuation">,</span> q<span class="token punctuation">,</span> D_inv<span class="token punctuation">)</span> <span class="token keyword">return</span> out<span class="token comment"># efficient causal linear attention, created by EPEL</span><span class="token keyword">def</span> <span class="token function">causal_linear_attention</span><span class="token punctuation">(</span>q<span class="token punctuation">,</span> k<span class="token punctuation">,</span> v<span class="token punctuation">,</span> eps<span class="token operator">=</span><span class="token number">1e-6</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">from</span> fast_transforms<span class="token punctuation">.</span>causla_product <span class="token keyword">import</span> CausalDotProduct autocast_enabled <span class="token operator">=</span> torch<span class="token punctuation">.</span>is_autocast_enabled<span class="token punctuation">(</span><span class="token punctuation">)</span> is_half <span class="token operator">=</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>q<span class="token punctuation">,</span> torch<span class="token punctuation">.</span>cuda<span class="token punctuation">.</span>HalfTensor<span class="token punctuation">)</span> <span class="token keyword">assert</span> <span class="token keyword">not</span> is_half <span class="token keyword">or</span> APEX_AVAILABLE<span class="token punctuation">,</span> <span class="token string">'half tensors can only be used if nvidia apex is available'</span> cuda_context <span class="token operator">=</span> null_context <span class="token keyword">if</span> <span class="token keyword">not</span> autocast_enabled <span class="token keyword">else</span> partial<span class="token punctuation">(</span>autocast<span class="token punctuation">,</span> enabled <span class="token operator">=</span> <span class="token boolean">False</span><span class="token punctuation">)</span> causal_dot_product_fn <span class="token operator">=</span> amp<span class="token punctuation">.</span>float_function<span class="token punctuation">(</span>CausalDotProduct<span class="token punctuation">.</span><span class="token builtin">apply</span><span class="token punctuation">)</span> <span class="token keyword">if</span> is_half <span class="token keyword">else</span> CausalDotProduct<span class="token punctuation">.</span><span class="token builtin">apply</span> k_cumsum <span class="token operator">=</span> k<span class="token punctuation">.</span>cumsum<span class="token punctuation">(</span>dim<span class="token operator">=</span><span class="token operator">-</span><span class="token number">2</span><span class="token punctuation">)</span> <span class="token operator">+</span> eps D_inv <span class="token operator">=</span> <span class="token number">1.</span> <span class="token operator">/</span> torch<span class="token punctuation">.</span>einsum<span class="token punctuation">(</span><span class="token string">'...nd,...nd->...n'</span><span class="token punctuation">,</span> q<span class="token punctuation">,</span> k_cumsum<span class="token punctuation">.</span>type_as<span class="token punctuation">(</span>q<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">with</span> cuda_context<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">if</span> autocast_enabled<span class="token punctuation">:</span> q<span class="token punctuation">,</span> k<span class="token punctuation">,</span> v <span class="token operator">=</span> <span class="token builtin">map</span><span class="token punctuation">(</span><span class="token keyword">lambda</span> t<span class="token punctuation">:</span> t<span class="token punctuation">.</span><span class="token builtin">float</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>q<span class="token punctuation">,</span> k<span class="token punctuation">,</span> v<span class="token punctuation">)</span><span class="token punctuation">)</span> out <span class="token operator">=</span> causual_dot_product_fn<span class="token punctuation">(</span>q<span class="token punctuation">,</span> k<span class="token punctuation">,</span> v<span class="token punctuation">)</span> out <span class="token operator">=</span> torch<span class="token punctuation">.</span>einsum<span class="token punctuation">(</span><span class="token string">'...nd,...n->...nd'</span><span class="token punctuation">,</span> out<span class="token punctuation">,</span> D_inv<span class="token punctuation">)</span> <span class="token keyword">return</span> out<span class="token keyword">class</span> <span class="token class-name">FastAttention</span><span class="token punctuation">(</span>nn<span class="token punctuation">.</span>Module<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">def</span> <span class="token function">__init__</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> dim_heads<span class="token punctuation">,</span> nb_features <span class="token operator">=</span> <span class="token boolean">None</span><span class="token punctuation">,</span> ortho_scaling <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">,</span> casual <span class="token operator">=</span> <span class="token boolean">False</span><span class="token punctuation">,</span> generalized_attention <span class="token operator">=</span> <span class="token boolean">False</span><span class="token punctuation">,</span> kernel_fn <span class="token operator">=</span> nn<span class="token punctuation">.</span>ReLU<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> no_projection <span class="token operator">=</span> <span class="token boolean">False</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token builtin">super</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span>__init__<span class="token punctuation">(</span><span class="token punctuation">)</span> nb_features <span class="token operator">=</span> default<span class="token punctuation">(</span>nb_features<span class="token punctuation">,</span> <span class="token builtin">int</span><span class="token punctuation">(</span>dim_heads <span class="token operator">*</span> math<span class="token punctuation">.</span>log<span class="token punctuation">(</span>dim_heads<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span> self<span class="token punctuation">.</span>dim_heads <span class="token operator">=</span> dim_heads self<span class="token punctuation">.</span>nb_features <span class="token operator">=</span> nb_features self<span class="token punctuation">.</span>ortho_scaling <span class="token operator">=</span> ortho_scaling <span class="token comment"># gaussian_orthogonal_random_matrix is predefined in parameters, and use partial to reuse to definition and also fill the parameters</span> self<span class="token punctuation">.</span>create_projection <span class="token operator">=</span> partial<span class="token punctuation">(</span>gaussian_orthogonal_random_matrix<span class="token punctuation">,</span> nb_rows <span class="token operator">=</span> self<span class="token punctuation">.</span>nb_features<span class="token punctuation">,</span> nb_columns <span class="token operator">=</span> dim_heads<span class="token punctuation">,</span> scaling <span class="token operator">=</span> ortho_scaling<span class="token punctuation">)</span> projection_matrix <span class="token operator">=</span> self<span class="token punctuation">.</span>create_projection<span class="token punctuation">(</span><span class="token punctuation">)</span> self<span class="token punctuation">.</span>register_buffer<span class="token punctuation">(</span><span class="token string">'projection_matrix'</span><span class="token punctuation">,</span> projection_matrix<span class="token punctuation">)</span> self<span class="token punctuation">.</span>kernel_fn <span class="token operator">=</span> kernel_fn <span class="token comment"># if this is turned on, no projection will be used</span> <span class="token comment"># queries and keys will be softmax-ed as in the original efficient attention paper</span> self<span class="token punctuation">.</span>no_projection <span class="token operator">=</span> no_projection self<span class="token punctuation">.</span>causal <span class="token operator">=</span> causal <span class="token keyword">if</span> causal<span class="token punctuation">:</span> <span class="token keyword">try</span><span class="token punctuation">:</span> <span class="token keyword">import</span> fast_transformers<span class="token punctuation">.</span>causal_product<span class="token punctuation">.</span>causal_product_cuda self<span class="token punctuation">.</span>causal_linear_fn <span class="token operator">=</span> partial<span class="token punctuation">(</span>causal_linear_attention<span class="token punctuation">)</span> <span class="token keyword">except</span> ImportError<span class="token punctuation">:</span> <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">'unable to import cuda code for auto-regressive Performer. will default to the memory inefficient non-cuda version'</span><span class="token punctuation">)</span> self<span class="token punctuation">.</span>causal_linear_fn <span class="token operator">=</span> causal_linear_attention_noncuda <span class="token decorator annotation punctuation">@torch<span class="token punctuation">.</span>no_grad</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">def</span> <span class="token function">redraw_projection_matrix</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> device<span class="token punctuation">)</span><span class="token punctuation">:</span> projections <span class="token operator">=</span> self<span class="token punctuation">.</span>create_projection<span class="token punctuation">(</span>device <span class="token operator">=</span> device<span class="token punctuation">)</span> self<span class="token punctuation">.</span>projection_matrix<span class="token punctuation">.</span>copy_<span class="token punctuation">(</span>projections<span class="token punctuation">)</span> <span class="token keyword">del</span> projections <span class="token keyword">def</span> <span class="token function">forward</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> q<span class="token punctuation">,</span> k<span class="token punctuation">,</span> v<span class="token punctuation">,</span> output_attentions <span class="token operator">=</span> <span class="token boolean">False</span><span class="token punctuation">)</span><span class="token punctuation">:</span> device <span class="token operator">=</span> q<span class="token punctuation">.</span>device <span class="token keyword">if</span> self<span class="token punctuation">.</span>no_projection<span class="token punctuation">:</span> q <span class="token operator">=</span> q<span class="token punctuation">.</span>softmax<span class="token punctuation">(</span>dim <span class="token operator">=</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">)</span> k <span class="token operator">=</span> torch<span class="token punctuation">.</span>exp<span class="token punctuation">(</span>k<span class="token punctuation">)</span> <span class="token keyword">if</span> self<span class="token punctuation">.</span>causal <span class="token keyword">else</span> k<span class="token punctuation">.</span>softmax<span class="token punctuation">(</span>dim <span class="token operator">=</span> <span class="token operator">-</span><span class="token number">2</span><span class="token punctuation">)</span> <span class="token keyword">elif</span> self<span class="token punctuation">.</span>generalized_attention<span class="token punctuation">:</span> create_kernel <span class="token operator">=</span> partial<span class="token punctuation">(</span>generalized_kernel<span class="token punctuation">,</span> kernel_fn <span class="token operator">=</span> self<span class="token punctuation">.</span>kernel_fn<span class="token punctuation">,</span> projection_matrix <span class="token operator">=</span> self<span class="token punctuation">.</span>projection_matrix<span class="token punctuation">,</span> device <span class="token operator">=</span> device<span class="token punctuation">)</span> <span class="token keyword">else</span><span class="token punctuation">:</span> create_kernel <span class="token operator">=</span> partial<span class="token punctuation">(</span>softmax_kernel<span class="token punctuation">,</span> projection_matrix <span class="token operator">=</span> self<span class="token punctuation">.</span>projection_matrix<span class="token punctuation">,</span> device <span class="token operator">=</span> device<span class="token punctuation">)</span> q <span class="token operator">=</span> create_kernel<span class="token punctuation">(</span>q<span class="token punctuation">,</span> is_query<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span> k <span class="token operator">=</span> create_kernel<span class="token punctuation">(</span>k<span class="token punctuation">,</span> is_query<span class="token operator">=</span><span class="token boolean">False</span><span class="token punctuation">)</span> attn_fn <span class="token operator">=</span> linear_attention <span class="token keyword">if</span> <span class="token keyword">not</span> self<span class="token punctuation">.</span>causal <span class="token keyword">else</span> self<span class="token punctuation">.</span>causal_linear_fn out <span class="token operator">=</span> attn_fn<span class="token punctuation">(</span>q<span class="token punctuation">,</span> k<span class="token punctuation">,</span> v<span class="token punctuation">)</span> <span class="token comment"># </span> <span class="token keyword">if</span> output_attentions<span class="token punctuation">:</span> v_diag <span class="token operator">=</span> torch<span class="token punctuation">.</span>eye<span class="token punctuation">(</span>v<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">.</span>to<span class="token punctuation">(</span>device<span class="token punctuation">)</span> v_diag <span class="token operator">=</span> v_diag<span class="token punctuation">.</span>unsqueeze<span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">.</span>unsqueeze<span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">.</span>repeat<span class="token punctuation">(</span>v<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span>v<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span> attn_weights <span class="token operator">=</span> torch<span class="token punctuation">.</span>zeros<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> q<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">,</span> q<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">.</span>to<span class="token punctuation">(</span>device<span class="token punctuation">)</span><span class="token punctuation">.</span>to<span class="token punctuation">(</span>torch<span class="token punctuation">.</span>float16<span class="token punctuation">)</span> <span class="token keyword">for</span> head_dim <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span>q<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">:</span> attn_weights <span class="token operator">+=</span> torch<span class="token punctuation">.</span><span class="token builtin">abs</span><span class="token punctuation">(</span>attn_fn<span class="token punctuation">(</span>q<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span>head_dim<span class="token punctuation">]</span><span class="token punctuation">.</span>to<span class="token punctuation">(</span>torch<span class="token punctuation">.</span>float15<span class="token punctuation">)</span><span class="token punctuation">,</span> k<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span>head_dim<span class="token punctuation">]</span><span class="token punctuation">.</span>to<span class="token punctuation">(</span>torch<span class="token punctuation">.</span>float16<span class="token punctuation">)</span><span class="token punctuation">,</span> v_diag<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span>head_dim<span class="token punctuation">]</span><span class="token punctuation">.</span>to<span class="token punctuation">(</span>troch<span class="token punctuation">.</span>float16<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span> attn_weights <span class="token operator">/=</span> q<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token keyword">return</span> out<span class="token punctuation">,</span> attn_weights <span class="token keyword">else</span><span class="token punctuation">:</span> <span class="token keyword">return</span> out<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><h1 id="Result"><a href="#Result" class="headerlink" title="Result"></a>Result</h1><p>The result of the scBERT is quite good, and most of the cell clusters could be annotated correctly and the UMAP plot shows a high consistence with the ground truth.</p><p><img src="/2022/09/30/scbert/image-20221016165325204.png" alt="Result of auto-annotation"></p><p>Thanks for blog: sciencespace.cn. Most of the mathmatical operation and ideas comes from this blog. <linear attention=""></linear></p>]]></content>
<categories>
<category> deep learning </category>
</categories>
<tags>
<tag> attention </tag>
<tag> cell type annotation </tag>
<tag> algorithm </tag>
</tags>
</entry>
<entry>
<title>Pytorch Distributed Data Parallel (DDP)</title>
<link href="/2022/09/19/ddp/"/>
<url>/2022/09/19/ddp/</url>
<content type="html"><![CDATA[<h1 id="Single-node-multi-GPU"><a href="#Single-node-multi-GPU" class="headerlink" title="Single node multi-GPU"></a>Single node multi-GPU</h1><pre class="line-numbers language-python" data-language="python"><code class="language-python"><span class="token keyword">import</span> torch<span class="token keyword">import</span> torch<span class="token punctuation">.</span>nn <span class="token keyword">as</span> nn<span class="token keyword">import</span> torch<span class="token punctuation">.</span>nn<span class="token punctuation">.</span>functional <span class="token keyword">as</span> F<span class="token keyword">import</span> torch<span class="token punctuation">.</span>multiprocessing <span class="token keyword">as</span> mp<span class="token keyword">from</span> torch<span class="token punctuation">.</span>utils<span class="token punctuation">.</span>data<span class="token punctuation">.</span>distributed <span class="token keyword">import</span> DistributedSampler<span class="token keyword">from</span> torch<span class="token punctuation">.</span>nn<span class="token punctuation">.</span>parallel <span class="token keyword">import</span> DistributedDataParallel <span class="token keyword">as</span> DDP<span class="token keyword">from</span> torch<span class="token punctuation">.</span>distributed <span class="token keyword">import</span> init_process_group<span class="token punctuation">,</span> destroy_process_group<span class="token keyword">import</span> os<span class="token keyword">def</span> <span class="token function">ddp_setup</span><span class="token punctuation">(</span>rank<span class="token punctuation">,</span> world_size<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token triple-quoted-string string">""" Args: rank: Unique identifier of each process world_size: total number of processes """</span> os<span class="token punctuation">.</span>environ<span class="token punctuation">[</span><span class="token string">"MASTER_ADDR"</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token string">"localhost"</span> os<span class="token punctuation">.</span>environ<span class="token punctuation">[</span><span class="token string">"MASTER_PORT"</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token number">9999</span> init_process_group<span class="token punctuation">(</span>backend<span class="token operator">=</span><span class="token string">"nccl"</span><span class="token punctuation">,</span> rank<span class="token operator">=</span>rank<span class="token punctuation">,</span> world_size<span class="token operator">=</span>world_size<span class="token punctuation">)</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><p>Constructing the DDP model</p><pre class="line-numbers language-python" data-language="python"><code class="language-python">self<span class="token punctuation">.</span>model <span class="token operator">=</span> DDP<span class="token punctuation">(</span>model<span class="token punctuation">,</span> device_ids<span class="token operator">=</span><span class="token punctuation">[</span>gpu_id<span class="token punctuation">]</span><span class="token punctuation">)</span><span aria-hidden="true" class="line-numbers-rows"><span></span></span></code></pre><p>Distributing input data</p><pre class="line-numbers language-python" data-language="python"><code class="language-python">train_data <span class="token operator">=</span> torch<span class="token punctuation">.</span>utils<span class="token punctuation">.</span>data<span class="token punctuation">.</span>DataLoader<span class="token punctuation">(</span>dataset <span class="token operator">=</span> train_dataset<span class="token punctuation">,</span> batch_size <span class="token operator">=</span> <span class="token number">32</span><span class="token punctuation">,</span> shuffle<span class="token operator">=</span><span class="token boolean">False</span><span class="token punctuation">,</span> sampler<span class="token operator">=</span>DistributedSampler<span class="token punctuation">(</span>train_dataset<span class="token punctuation">)</span><span class="token punctuation">,</span><span class="token punctuation">)</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><ul><li>Calling the <code>set_epoch()</code> method on the <code>DistributedSampler</code> at the beginning of each epoch is necessary to make shuffling work properly across multiple epochs. Otherwise, the same ordering will be used in each epoch.</li></ul><pre class="line-numbers language-python" data-language="python"><code class="language-python"><span class="token keyword">def</span> <span class="token function">_run_epoch</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> epoch<span class="token punctuation">)</span><span class="token punctuation">:</span> batch_size <span class="token operator">=</span> <span class="token builtin">len</span><span class="token punctuation">(</span><span class="token builtin">next</span><span class="token punctuation">(</span><span class="token builtin">iter</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>train_data<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span> self<span class="token punctuation">.</span>train_data<span class="token punctuation">.</span>sampler<span class="token punctuation">.</span>set_epoch<span class="token punctuation">(</span>epoch<span class="token punctuation">)</span> <span class="token keyword">for</span> source<span class="token punctuation">,</span> targets <span class="token keyword">in</span> self<span class="token punctuation">.</span>train_data<span class="token punctuation">:</span> self<span class="token punctuation">.</span>_run_batch<span class="token punctuation">(</span>srouce<span class="token punctuation">,</span> targets<span class="token punctuation">)</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span></span></code></pre><p>Saving model checkpoints</p><pre class="line-numbers language-python" data-language="python"><code class="language-python">ckp <span class="token operator">=</span> self<span class="token punctuation">.</span>model<span class="token punctuation">.</span>module<span class="token punctuation">.</span>state_dict<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token keyword">if</span> self<span class="token punctuation">.</span>gpu_id <span class="token operator">==</span> <span class="token number">0</span> <span class="token keyword">and</span> epoch <span class="token operator">%</span> self<span class="token punctuation">.</span>save_every <span class="token operator">==</span> <span class="token number">0</span><span class="token punctuation">:</span> self<span class="token punctuation">.</span>_save_checkpoint<span class="token punctuation">(</span>epoch<span class="token punctuation">)</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span></span></code></pre><h2 id="Torchrun"><a href="#Torchrun" class="headerlink" title="Torchrun"></a>Torchrun</h2><p>A single process failure could disrupt the whole distributed training. <code>torchrun</code> provides fault-tolerance and elastic training</p><pre class="line-numbers language-python" data-language="python"><code class="language-python"><span class="token keyword">def</span> <span class="token function">ddp_setup</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">:</span> init_process_group<span class="token punctuation">(</span>backend<span class="token operator">=</span><span class="token string">"nccl"</span><span class="token punctuation">)</span> <span class="token keyword">class</span> <span class="token class-name">Trainer</span><span class="token punctuation">:</span> <span class="token keyword">def</span> <span class="token function">__init__</span><span class="token punctuation">(</span> self<span class="token punctuation">,</span> model<span class="token punctuation">:</span> torch<span class="token punctuation">.</span>nn<span class="token punctuation">.</span>Module<span class="token punctuation">,</span> train_data<span class="token punctuation">:</span> DataLoader<span class="token punctuation">,</span> optimizer<span class="token punctuation">:</span> torch<span class="token punctuation">.</span>optim<span class="token punctuation">.</span>Optimizer<span class="token punctuation">,</span> save_every<span class="token punctuation">:</span> <span class="token builtin">int</span><span class="token punctuation">,</span> snapshot_path<span class="token punctuation">:</span> <span class="token builtin">str</span><span class="token punctuation">,</span> <span class="token punctuation">)</span> <span class="token operator">-</span><span class="token operator">></span> <span class="token boolean">None</span><span class="token punctuation">:</span> self<span class="token punctuation">.</span>local_rank <span class="token operator">=</span> <span class="token builtin">int</span><span class="token punctuation">(</span>os<span class="token punctuation">.</span>environ<span class="token punctuation">[</span><span class="token string">"LOCAL_RANK"</span><span class="token punctuation">]</span><span class="token punctuation">)</span> self<span class="token punctuation">.</span>global_rank <span class="token operator">=</span> <span class="token builtin">int</span><span class="token punctuation">(</span>os<span class="token punctuation">.</span>environ<span class="token punctuation">[</span><span class="token string">"RANK"</span><span class="token punctuation">]</span><span class="token punctuation">)</span> self<span class="token punctuation">.</span>model <span class="token operator">=</span> model<span class="token punctuation">.</span>to<span class="token punctuation">(</span>self<span class="token punctuation">.</span>gpu_id<span class="token punctuation">)</span> self<span class="token punctuation">.</span>train_data <span class="token operator">=</span> train_data self<span class="token punctuation">.</span>optimizer <span class="token operator">=</span> optimizer self<span class="token punctuation">.</span>save_every <span class="token operator">=</span> save_every self<span class="token punctuation">.</span>epochs_run <span class="token operator">=</span> <span class="token number">0</span> <span class="token keyword">if</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>exists<span class="token punctuation">(</span>snapshot_path<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"Loading snapshot"</span><span class="token punctuation">)</span> self<span class="token punctuation">.</span>_load_snapshot<span class="token punctuation">(</span>snapshot_path<span class="token punctuation">)</span> self<span class="token punctuation">.</span>model <span class="token operator">=</span> DDP<span class="token punctuation">(</span>self<span class="token punctuation">.</span>model<span class="token punctuation">,</span> device_ids<span class="token operator">=</span><span class="token punctuation">[</span>self<span class="token punctuation">.</span>gpu_id<span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token keyword">def</span> <span class="token function">_save_snapshot</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> epoch<span class="token punctuation">)</span><span class="token punctuation">:</span> snapshot <span class="token operator">=</span> <span class="token punctuation">{</span><span class="token punctuation">}</span> snapshot<span class="token punctuation">[</span><span class="token string">"MODEL_STATE"</span><span class="token punctuation">]</span> <span class="token operator">=</span> self<span class="token punctuation">.</span>model<span class="token punctuation">.</span>module<span class="token punctuation">.</span>state_dict<span class="token punctuation">(</span><span class="token punctuation">)</span> snapshot<span class="token punctuation">[</span><span class="token string">"EPOCHS_RUN"</span><span class="token punctuation">]</span> <span class="token operator">=</span> epoch torch<span class="token punctuation">.</span>save<span class="token punctuation">(</span>snapshot<span class="token punctuation">,</span> <span class="token string">"snapshot.pt"</span><span class="token punctuation">)</span> <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"Epoch </span><span class="token interpolation"><span class="token punctuation">{</span>epoch<span class="token punctuation">}</span></span><span class="token string"> | Training snapshot saved at snapshot.pt"</span></span><span class="token punctuation">)</span> <span class="token keyword">def</span> <span class="token function">_load_snapshot</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span>snapshot_path<span class="token punctuation">)</span><span class="token punctuation">:</span> snapshot <span class="token operator">=</span> torch<span class="token punctuation">.</span>load<span class="token punctuation">(</span>snapshot_path<span class="token punctuation">)</span> self<span class="token punctuation">.</span>model<span class="token punctuation">.</span>load_state_dict<span class="token punctuation">(</span>snapshot<span class="token punctuation">[</span><span class="token string">"MODEL_STATE"</span><span class="token punctuation">]</span><span class="token punctuation">)</span> self<span class="token punctuation">.</span>epochs_run <span class="token operator">=</span> snapshot<span class="token punctuation">[</span><span class="token string">"EPOCHS_RUN"</span><span class="token punctuation">]</span> <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"Resuming training from snapshot at Epoch </span><span class="token interpolation"><span class="token punctuation">{</span>self<span class="token punctuation">.</span>epochs_run<span class="token punctuation">}</span></span><span class="token string">"</span></span><span class="token punctuation">)</span> <span class="token keyword">def</span> <span class="token function">train</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> max_epochs<span class="token punctuation">:</span> <span class="token builtin">int</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">for</span> epoch <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span>max_epochs<span class="token punctuation">)</span><span class="token punctuation">:</span> self<span class="token punctuation">.</span>_run_epoch<span class="token punctuation">(</span>epoch<span class="token punctuation">)</span> <span class="token keyword">def</span> <span class="token function">_run_epoch</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> epoch<span class="token punctuation">)</span><span class="token punctuation">:</span> batch_size <span class="token operator">=</span> <span class="token builtin">len</span><span class="token punctuation">(</span><span class="token builtin">next</span><span class="token punctuation">(</span><span class="token builtin">iter</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>train_data<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"[GPU</span><span class="token interpolation"><span class="token punctuation">{</span>self<span class="token punctuation">.</span>gpu_id<span class="token punctuation">}</span></span><span class="token string">] Epoch </span><span class="token interpolation"><span class="token punctuation">{</span>epoch<span class="token punctuation">}</span></span><span class="token string"> | Batchsize: </span><span class="token interpolation"><span class="token punctuation">{</span>batch_size<span class="token punctuation">}</span></span><span class="token string"> | Steps: </span><span class="token interpolation"><span class="token punctuation">{</span><span class="token builtin">len</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>train_data<span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string">"</span></span><span class="token punctuation">)</span> <span class="token comment"># self.train_data.sampler.set_epoch(epoch)</span> <span class="token keyword">for</span> source<span class="token punctuation">,</span> targets <span class="token keyword">in</span> self<span class="token punctuation">.</span>train_data<span class="token punctuation">:</span> source <span class="token operator">=</span> source<span class="token punctuation">.</span>to<span class="token punctuation">(</span>self<span class="token punctuation">.</span>gpu_id<span class="token punctuation">)</span> targets <span class="token operator">=</span> targets<span class="token punctuation">.</span>to<span class="token punctuation">(</span>self<span class="token punctuation">.</span>gpu_id<span class="token punctuation">)</span> self<span class="token punctuation">.</span>_run_batch<span class="token punctuation">(</span>srouce<span class="token punctuation">,</span> targets<span class="token punctuation">)</span><span class="token keyword">def</span> <span class="token function">main</span><span class="token punctuation">(</span>save_every<span class="token punctuation">:</span> <span class="token builtin">int</span><span class="token punctuation">,</span> total_epochs<span class="token punctuation">:</span> <span class="token builtin">int</span><span class="token punctuation">,</span> snapshot_path<span class="token punctuation">:</span> <span class="token builtin">str</span> <span class="token operator">=</span> <span class="token string">"snapshot"</span><span class="token punctuation">)</span> <span class="token punctuation">:</span> ddp_setup<span class="token punctuation">(</span><span class="token punctuation">)</span> dataset<span class="token punctuation">,</span> model<span class="token punctuation">,</span> optimizer <span class="token operator">=</span> load_train_objs<span class="token punctuation">(</span><span class="token punctuation">)</span> train_data <span class="token operator">=</span> prepare_dataloader<span class="token punctuation">(</span>dataset<span class="token punctuation">,</span> batch_size<span class="token operator">=</span><span class="token number">32</span><span class="token punctuation">)</span> trainer <span class="token operator">=</span> Trainer<span class="token punctuation">(</span>model<span class="token punctuation">,</span> train_data<span class="token punctuation">,</span> optimizer<span class="token punctuation">,</span> save_every<span class="token punctuation">,</span> snapshot_path<span class="token punctuation">)</span> trainer<span class="token punctuation">.</span>train<span class="token punctuation">(</span>total_epochs<span class="token punctuation">)</span> destroy_process_group<span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">if</span> __name__ <span class="token operator">==</span> <span class="token string">"__main__"</span><span class="token punctuation">:</span> <span class="token keyword">import</span> sys total_epochs <span class="token operator">=</span> <span class="token builtin">int</span><span class="token punctuation">(</span>sys<span class="token punctuation">.</span>argv<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">)</span> save_every <span class="token operator">=</span> <span class="token builtin">int</span><span class="token punctuation">(</span>sys<span class="token punctuation">.</span>argv<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">)</span> main<span class="token punctuation">(</span>save_every<span class="token punctuation">,</span> total_epochs<span class="token punctuation">)</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><pre class="line-numbers language-bash" data-language="bash"><code class="language-bash">torchrun <span class="token parameter variable">--standalone</span> <span class="token parameter variable">--nproc_per_node</span><span class="token operator">=</span><span class="token number">4</span> multigpu_torchrun.py <span class="token number">50</span> <span class="token number">10</span><span aria-hidden="true" class="line-numbers-rows"><span></span></span></code></pre>]]></content>
<categories>
<category> algorithms </category>
</categories>
<tags>
<tag> spatial transcriptomics </tag>
<tag> cell segment </tag>
</tags>
</entry>
<entry>
<title>Cell Segmentation</title>
<link href="/2022/09/16/cell-segmentation/"/>
<url>/2022/09/16/cell-segmentation/</url>
<content type="html"><![CDATA[<pre class="line-numbers language-python" data-language="python"><code class="language-python"><span class="token keyword">def</span> <span class="token function">score_and_mask_pixels</span><span class="token punctuation">(</span> adata<span class="token punctuation">:</span> AnnData<span class="token punctuation">,</span> layer<span class="token punctuation">:</span> <span class="token builtin">str</span><span class="token punctuation">,</span> k<span class="token punctuation">:</span> <span class="token builtin">int</span><span class="token punctuation">,</span> method<span class="token punctuation">:</span> Literal<span class="token punctuation">[</span><span class="token string">"gauss"</span><span class="token punctuation">,</span> <span class="token string">"moran"</span><span class="token punctuation">,</span> <span class="token string">"EM"</span><span class="token punctuation">,</span> <span class="token string">"EM+gauss"</span><span class="token punctuation">,</span> <span class="token string">"EM+BP"</span><span class="token punctuation">,</span> <span class="token string">"VI+gauss"</span><span class="token punctuation">,</span> <span class="token string">"VI+BP"</span><span class="token punctuation">]</span><span class="token punctuation">,</span> moran_kwargs<span class="token punctuation">:</span> Optional<span class="token punctuation">[</span><span class="token builtin">dict</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token boolean">None</span><span class="token punctuation">,</span> em_kwargs<span class="token punctuation">:</span> Optional<span class="token punctuation">[</span><span class="token builtin">dict</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token boolean">None</span><span class="token punctuation">,</span> vi_kwargs<span class="token punctuation">:</span> Optional<span class="token punctuation">[</span><span class="token builtin">dict</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token boolean">None</span><span class="token punctuation">,</span> bp_kwargs<span class="token punctuation">:</span> Optional<span class="token punctuation">[</span><span class="token builtin">dict</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token boolean">None</span><span class="token punctuation">,</span> threshold<span class="token punctuation">:</span> Optional<span class="token punctuation">[</span><span class="token builtin">float</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token boolean">None</span><span class="token punctuation">,</span> use_knee<span class="token punctuation">:</span> Optional<span class="token punctuation">[</span><span class="token builtin">bool</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token boolean">False</span><span class="token punctuation">,</span> mk<span class="token punctuation">:</span> Optional<span class="token punctuation">[</span><span class="token builtin">int</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token boolean">None</span><span class="token punctuation">,</span> bins_layer<span class="token punctuation">:</span> Optional<span class="token punctuation">[</span>Union<span class="token punctuation">[</span>Literal<span class="token punctuation">[</span><span class="token boolean">False</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token builtin">str</span><span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token boolean">None</span><span class="token punctuation">,</span> certain_layer<span class="token punctuation">:</span> Optional<span class="token punctuation">[</span><span class="token builtin">str</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token boolean">None</span><span class="token punctuation">,</span> scores_layer<span class="token punctuation">:</span> Optional<span class="token punctuation">[</span><span class="token builtin">str</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token boolean">None</span><span class="token punctuation">,</span> mask_layer<span class="token punctuation">:</span> Optional<span class="token punctuation">[</span><span class="token builtin">str</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token boolean">None</span><span class="token punctuation">,</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><h1 id="Segmentation-approaches"><a href="#Segmentation-approaches" class="headerlink" title="Segmentation approaches"></a>Segmentation approaches</h1><p>All approaches follow the following general steps.</p><ol><li>Apply a 2D convolution (which may be Gaussian or summation) to the UMI count image. The size of the convolution is controlled with the <code>k</code> parameter.</li><li>Obtain per-pixel scores, usually in the range <code>[0, 1]</code>, indicating how likely each pixel is occupied by a cell.</li><li>Apply a threshold to these scores, which is either computed using <a href="https://en.wikipedia.org/wiki/Otsu's_method">Otsu’s method</a> or manually provided with the <code>threshold</code> parameter.</li><li>Apply <a href="https://docs.opencv.org/4.x/d9/d61/tutorial_py_morphological_ops.html">morphological</a> opening and closing with size <code>mk</code> to fill in holes and remove noise. By default, this value is set to <code>k+2</code> when using the <a href="https://spateo-release.readthedocs.io/en/latest/technicals/cell_segmentation.html#negative-binomial-mixture-model-methods-including-em-or-vi">Negative binomial mixture model</a>, otherwise to <code>k-2</code>.</li></ol><p>Each of the supported methods differ in how the per-pixel scores (step 2) are calculated.</p><ol><li>Gaussian Blur</li><li>Moran’s I</li><li>NB mixture model</li><li>Belief Propgadation</li></ol><h2 id="The-document-description-about-the-score-pixels"><a href="#The-document-description-about-the-score-pixels" class="headerlink" title="The document description about the score_pixels"></a>The document description about the <code>score_pixels</code></h2><pre class="line-numbers language-python" data-language="python"><code class="language-python"><span class="token comment"># All methods other than gauss requires EM</span> <span class="token keyword">if</span> method <span class="token operator">==</span> <span class="token string">"gauss"</span><span class="token punctuation">:</span> <span class="token comment"># For just "gauss" method, we should rescale to [0, 1] because all the</span> <span class="token comment"># other methods eventually produce an array of [0, 1] values.</span> res <span class="token operator">=</span> utils<span class="token punctuation">.</span>scale_to_01<span class="token punctuation">(</span>res<span class="token punctuation">)</span> <span class="token keyword">elif</span> method <span class="token operator">==</span> <span class="token string">"moran"</span><span class="token punctuation">:</span> res <span class="token operator">=</span> moran<span class="token punctuation">.</span>run_moran<span class="token punctuation">(</span>res<span class="token punctuation">,</span> mask<span class="token operator">=</span><span class="token boolean">None</span> <span class="token keyword">if</span> bins <span class="token keyword">is</span> <span class="token boolean">None</span> <span class="token keyword">else</span> bins <span class="token operator">></span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token operator">**</span>moran_kwargs<span class="token punctuation">)</span> <span class="token comment"># Rescale</span> res <span class="token operator">/=</span> res<span class="token punctuation">.</span><span class="token builtin">max</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">else</span><span class="token punctuation">:</span> <span class="token comment"># Obtain initial parameter estimates with Otsu thresholding.</span> <span class="token comment"># These may be overridden by providing the appropriate kwargs.</span> nb_kwargs <span class="token operator">=</span> <span class="token builtin">dict</span><span class="token punctuation">(</span>params<span class="token operator">=</span>_initial_nb_params<span class="token punctuation">(</span>res<span class="token punctuation">,</span> bins<span class="token operator">=</span>bins<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">if</span> <span class="token string">"em"</span> <span class="token keyword">in</span> method<span class="token punctuation">:</span> nb_kwargs<span class="token punctuation">.</span>update<span class="token punctuation">(</span>em_kwargs<span class="token punctuation">)</span> lm<span class="token punctuation">.</span>main_debug<span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"Running EM with kwargs </span><span class="token interpolation"><span class="token punctuation">{</span>nb_kwargs<span class="token punctuation">}</span></span><span class="token string">."</span></span><span class="token punctuation">)</span> em_results <span class="token operator">=</span> em<span class="token punctuation">.</span>run_em<span class="token punctuation">(</span>res<span class="token punctuation">,</span> bins<span class="token operator">=</span>bins<span class="token punctuation">,</span> <span class="token operator">**</span>nb_kwargs<span class="token punctuation">)</span> conditional_func <span class="token operator">=</span> partial<span class="token punctuation">(</span>em<span class="token punctuation">.</span>conditionals<span class="token punctuation">,</span> em_results<span class="token operator">=</span>em_results<span class="token punctuation">,</span> bins<span class="token operator">=</span>bins<span class="token punctuation">)</span> <span class="token keyword">else</span><span class="token punctuation">:</span> nb_kwargs<span class="token punctuation">.</span>update<span class="token punctuation">(</span>vi_kwargs<span class="token punctuation">)</span> lm<span class="token punctuation">.</span>main_debug<span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"Running VI with kwargs </span><span class="token interpolation"><span class="token punctuation">{</span>nb_kwargs<span class="token punctuation">}</span></span><span class="token string">."</span></span><span class="token punctuation">)</span> vi_results <span class="token operator">=</span> vi<span class="token punctuation">.</span>run_vi<span class="token punctuation">(</span>res<span class="token punctuation">,</span> bins<span class="token operator">=</span>bins<span class="token punctuation">,</span> <span class="token operator">**</span>nb_kwargs<span class="token punctuation">)</span> conditional_func <span class="token operator">=</span> partial<span class="token punctuation">(</span>vi<span class="token punctuation">.</span>conditionals<span class="token punctuation">,</span> vi_results<span class="token operator">=</span>vi_results<span class="token punctuation">,</span> bins<span class="token operator">=</span>bins<span class="token punctuation">)</span> <span class="token keyword">if</span> <span class="token string">"bp"</span> <span class="token keyword">in</span> method<span class="token punctuation">:</span> lm<span class="token punctuation">.</span>main_debug<span class="token punctuation">(</span><span class="token string">"Computing conditionals."</span><span class="token punctuation">)</span> background_cond<span class="token punctuation">,</span> cell_cond <span class="token operator">=</span> conditional_func<span class="token punctuation">(</span>res<span class="token punctuation">)</span> <span class="token keyword">if</span> certain_mask <span class="token keyword">is</span> <span class="token keyword">not</span> <span class="token boolean">None</span><span class="token punctuation">:</span> background_cond<span class="token punctuation">[</span>certain_mask<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token number">1e-2</span> cell_cond<span class="token punctuation">[</span>certain_mask<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token number">1</span> <span class="token operator">-</span> <span class="token punctuation">(</span><span class="token number">1e-2</span><span class="token punctuation">)</span> lm<span class="token punctuation">.</span>main_debug<span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"Running BP with kwargs </span><span class="token interpolation"><span class="token punctuation">{</span>bp_kwargs<span class="token punctuation">}</span></span><span class="token string">."</span></span><span class="token punctuation">)</span> res <span class="token operator">=</span> bp<span class="token punctuation">.</span>run_bp<span class="token punctuation">(</span>background_cond<span class="token punctuation">,</span> cell_cond<span class="token punctuation">,</span> <span class="token operator">**</span>bp_kwargs<span class="token punctuation">)</span> <span class="token keyword">else</span><span class="token punctuation">:</span> lm<span class="token punctuation">.</span>main_debug<span class="token punctuation">(</span><span class="token string">"Computing confidences."</span><span class="token punctuation">)</span> res <span class="token operator">=</span> em<span class="token punctuation">.</span>confidence<span class="token punctuation">(</span>res<span class="token punctuation">,</span> em_results<span class="token operator">=</span>em_results<span class="token punctuation">,</span> bins<span class="token operator">=</span>bins<span class="token punctuation">)</span> <span class="token keyword">if</span> certain_mask <span class="token keyword">is</span> <span class="token keyword">not</span> <span class="token boolean">None</span><span class="token punctuation">:</span> res <span class="token operator">=</span> np<span class="token punctuation">.</span>clip<span class="token punctuation">(</span>res <span class="token operator">+</span> certain_mask<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span> <span class="token keyword">if</span> <span class="token string">"gauss"</span> <span class="token keyword">in</span> method<span class="token punctuation">:</span> lm<span class="token punctuation">.</span>main_debug<span class="token punctuation">(</span><span class="token string">"Computing Gaussian blur."</span><span class="token punctuation">)</span> res <span class="token operator">=</span> utils<span class="token punctuation">.</span>conv2d<span class="token punctuation">(</span>res<span class="token punctuation">,</span> k<span class="token punctuation">,</span> mode<span class="token operator">=</span><span class="token string">"gauss"</span><span class="token punctuation">,</span> bins<span class="token operator">=</span>bins<span class="token punctuation">)</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><pre class="line-numbers language-python" data-language="python"><code class="language-python"><span class="token triple-quoted-string string">"""Score each pixel by how likely it is a cell. Values returned are in [0, 1]. Args: X: UMI counts per pixel as either a sparse or dense array. k: Kernel size for convolution. method: Method to use. Valid methods are: gauss: Gaussian blur moran: Moran's I based method EM: EM algorithm to estimate cell and background expression parameters. EM+gauss: Negative binomial EM algorithm followed by Gaussian blur. EM+BP: EM algorithm followed by belief propagation to estimate the marginal probabilities of cell and background. VI+gauss: Negative binomial VI algorithm followed by Gaussian blur. Note that VI also supports the zero-inflated negative binomial (ZINB) by providing `zero_inflated=True`. VI+BP: VI algorithm followed by belief propagation. Note that VI also supports the zero-inflated negative binomial (ZINB) by providing `zero_inflated=True`. moran_kwargs: Keyword arguments to the :func:`moran.run_moran` function. em_kwargs: Keyword arguments to the :func:`em.run_em` function. bp_kwargs: Keyword arguments to the :func:`bp.run_bp` function. certain_mask: A boolean Numpy array indicating which pixels are certain to be occupied, a-priori. For example, if nuclei staining is available, this would be the nuclei segmentation mask. bins: Pixel bins to segment separately. Only takes effect when the EM algorithm is run. Returns: [0, 1] score of each pixel being a cell."""</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><h2 id="score-and-mask-pixels-code"><a href="#score-and-mask-pixels-code" class="headerlink" title="score_and_mask_pixels() code"></a>score_and_mask_pixels() code</h2><pre class="line-numbers language-python" data-language="python"><code class="language-python">X <span class="token operator">=</span> SKM<span class="token punctuation">.</span>select_layer_data<span class="token punctuation">(</span>adata<span class="token punctuation">,</span> layer<span class="token punctuation">,</span> make_dense<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span> certain_mask <span class="token operator">=</span> <span class="token boolean">None</span> <span class="token keyword">if</span> certain_layer<span class="token punctuation">:</span> certain_mask <span class="token operator">=</span> SKM<span class="token punctuation">.</span>select_layer_data<span class="token punctuation">(</span>adata<span class="token punctuation">,</span> certain_layer<span class="token punctuation">)</span><span class="token punctuation">.</span>astype<span class="token punctuation">(</span><span class="token builtin">bool</span><span class="token punctuation">)</span> bins <span class="token operator">=</span> <span class="token boolean">None</span> <span class="token keyword">if</span> bins_layer <span class="token keyword">is</span> <span class="token keyword">not</span> <span class="token boolean">False</span><span class="token punctuation">:</span> bins_layer <span class="token operator">=</span> bins_layer <span class="token keyword">or</span> SKM<span class="token punctuation">.</span>gen_new_layer_key<span class="token punctuation">(</span>layer<span class="token punctuation">,</span> SKM<span class="token punctuation">.</span>BINS_SUFFIX<span class="token punctuation">)</span> <span class="token keyword">if</span> bins_layer <span class="token keyword">in</span> adata<span class="token punctuation">.</span>layers<span class="token punctuation">:</span> bins <span class="token operator">=</span> SKM<span class="token punctuation">.</span>select_layer_data<span class="token punctuation">(</span>adata<span class="token punctuation">,</span> bins_layer<span class="token punctuation">)</span> method <span class="token operator">=</span> method<span class="token punctuation">.</span>lower<span class="token punctuation">(</span><span class="token punctuation">)</span> lm<span class="token punctuation">.</span>main_info<span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"Scoring pixels with </span><span class="token interpolation"><span class="token punctuation">{</span>method<span class="token punctuation">}</span></span><span class="token string"> method."</span></span><span class="token punctuation">)</span> scores <span class="token operator">=</span> _score_pixels<span class="token punctuation">(</span>X<span class="token punctuation">,</span> k<span class="token punctuation">,</span> method<span class="token punctuation">,</span> moran_kwargs<span class="token punctuation">,</span> em_kwargs<span class="token punctuation">,</span> vi_kwargs<span class="token punctuation">,</span> bp_kwargs<span class="token punctuation">,</span> certain_mask<span class="token punctuation">,</span> bins<span class="token punctuation">)</span> scores_layer <span class="token operator">=</span> scores_layer <span class="token keyword">or</span> SKM<span class="token punctuation">.</span>gen_new_layer_key<span class="token punctuation">(</span>layer<span class="token punctuation">,</span> SKM<span class="token punctuation">.</span>SCORES_SUFFIX<span class="token punctuation">)</span> SKM<span class="token punctuation">.</span>set_layer_data<span class="token punctuation">(</span>adata<span class="token punctuation">,</span> scores_layer<span class="token punctuation">,</span> scores<span class="token punctuation">)</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><p>The packages is using SKM to manage its data_loading and data_saving. </p><p><code>SKM.select_layer_data(adata, layer, make_dense=True)</code> means to select the layer of anndata and using dense matrix rather than sparse matrix.</p><p><code>SKM.set_layer_data(adata, scores_layer, scores)</code> means to save the data in layer of anndata and name it scores_layer, scores is the data matrix.</p><h1 id="Practice-problems"><a href="#Practice-problems" class="headerlink" title="Practice problems"></a>Practice problems</h1><h2 id="Read-the-file"><a href="#Read-the-file" class="headerlink" title="Read the file"></a>Read the file</h2><p>There are many read file function in <code>st.io</code>, but all of the function are warp of <code>st.io.read_bgi_agg</code>. </p><pre class="line-numbers language-python" data-language="python"><code class="language-python">adata <span class="token operator">=</span> st<span class="token punctuation">.</span>io<span class="token punctuation">.</span>read_bgi_agg<span class="token punctuation">(</span><span class="token string">"D2_bgi_new.tsv"</span><span class="token punctuation">,</span> <span class="token string">"CN13_D2_HE.tiff"</span><span class="token punctuation">,</span>prealigned<span class="token operator">=</span><span class="token boolean">False</span><span class="token punctuation">)</span><span class="token comment"># prealinged=False to tell the tsv to align with image</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span></span></code></pre><h3 id="Obtain-cellxgene-adata"><a href="#Obtain-cellxgene-adata" class="headerlink" title="Obtain cellxgene adata"></a>Obtain cellxgene adata</h3><pre class="line-numbers language-python" data-language="python"><code class="language-python">cell_adata <span class="token operator">=</span> st<span class="token punctuation">.</span>io<span class="token punctuation">.</span>read_bgi<span class="token punctuation">(</span> <span class="token comment"># 'SS200000135TL_D1_all_bin1.txt.gz',</span> <span class="token string">"D2_bgi_new.tsv"</span><span class="token punctuation">,</span> segmentation_adata<span class="token operator">=</span>adata<span class="token punctuation">,</span> labels_layer<span class="token operator">=</span><span class="token string">'watershed_labels'</span><span class="token punctuation">,</span> add_props<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span><span class="token punctuation">)</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><h2 id="Drawing"><a href="#Drawing" class="headerlink" title="Drawing"></a>Drawing</h2><pre class="line-numbers language-python" data-language="python"><code class="language-python"><span class="token comment"># you could use the mathplotlib.pyplot to draw the img of interest</span><span class="token comment"># and here is the default setting for spateo</span><span class="token operator">%</span>config InlineBackend<span class="token punctuation">.</span>print_figure_kwargs <span class="token operator">=</span> <span class="token punctuation">{</span><span class="token string">'facecolor'</span> <span class="token punctuation">:</span> <span class="token string">"w"</span><span class="token punctuation">}</span><span class="token operator">%</span>config InlineBackend<span class="token punctuation">.</span>figure_format <span class="token operator">=</span> <span class="token string">'retina'</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span></span></code></pre><h3 id="Drawing-different-layers-in-a-picture"><a href="#Drawing-different-layers-in-a-picture" class="headerlink" title="Drawing different layers in a picture"></a>Drawing different layers in a picture</h3><p>You could use <code>axes</code> in pyplot to draw figure</p><p>Note, for the <code>adata</code> directly read from </p><pre class="line-numbers language-python" data-language="python"><code class="language-python">fig<span class="token punctuation">,</span> axes <span class="token operator">=</span> plt<span class="token punctuation">.</span>subplots<span class="token punctuation">(</span>figsize<span class="token operator">=</span><span class="token punctuation">(</span><span class="token number">8</span><span class="token punctuation">,</span> <span class="token number">8</span><span class="token punctuation">)</span><span class="token punctuation">,</span> tight_layout<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span><span class="token comment"># The [x:x+x_delta, y:y+y_delta] could used to observe the ROI</span>st<span class="token punctuation">.</span>pl<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span>adata<span class="token punctuation">[</span><span class="token number">400</span><span class="token punctuation">:</span><span class="token number">400</span><span class="token operator">+</span><span class="token number">520</span><span class="token punctuation">,</span><span class="token number">1100</span><span class="token punctuation">:</span><span class="token number">1100</span><span class="token operator">+</span><span class="token number">667</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token string">'stain'</span><span class="token punctuation">,</span> ax<span class="token operator">=</span>axes<span class="token punctuation">,</span> use_scale<span class="token operator">=</span><span class="token boolean">False</span><span class="token punctuation">,</span> save_show_or_return <span class="token operator">=</span> <span class="token string">'return'</span><span class="token punctuation">,</span> cmap <span class="token operator">=</span> clr<span class="token punctuation">.</span>LinearSegmentedColormap<span class="token punctuation">.</span>from_list<span class="token punctuation">(</span><span class="token string">'custom blue'</span><span class="token punctuation">,</span><span class="token punctuation">[</span><span class="token string">'#000000FF'</span><span class="token punctuation">,</span> <span class="token string">'#FFFFFFFF'</span><span class="token punctuation">]</span><span class="token punctuation">,</span> N<span class="token operator">=</span><span class="token number">256</span><span class="token punctuation">)</span><span class="token punctuation">)</span>st<span class="token punctuation">.</span>pl<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span>adata<span class="token punctuation">[</span><span class="token number">400</span><span class="token punctuation">:</span><span class="token number">400</span><span class="token operator">+</span><span class="token number">520</span><span class="token punctuation">,</span><span class="token number">1100</span><span class="token punctuation">:</span><span class="token number">1100</span><span class="token operator">+</span><span class="token number">667</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token string">'cell_labels_boundary'</span><span class="token punctuation">,</span> ax<span class="token operator">=</span>axes<span class="token punctuation">,</span> alpha<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">,</span> cmap<span class="token operator">=</span>clr<span class="token punctuation">.</span>LinearSegmentedColormap<span class="token punctuation">.</span>from_list<span class="token punctuation">(</span><span class="token string">'custom blue'</span><span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token string">'#FFFFFF00'</span><span class="token punctuation">,</span><span class="token string">'#FF0000FF'</span><span class="token punctuation">]</span><span class="token punctuation">,</span> N<span class="token operator">=</span><span class="token number">256</span><span class="token punctuation">)</span><span class="token punctuation">,</span> use_scale<span class="token operator">=</span><span class="token boolean">False</span><span class="token punctuation">,</span> save_show_or_return<span class="token operator">=</span><span class="token string">'return'</span><span class="token punctuation">)</span>plt<span class="token punctuation">.</span>show<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token comment"># fig.savefig("{path}/{name_of_file}.pdf/.png", dpi=300)</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><p>Import the figure or cell mask from other methods</p><pre class="line-numbers language-python" data-language="python"><code class="language-python">img <span class="token operator">=</span> plt<span class="token punctuation">.</span>imread<span class="token punctuation">(</span><span class="token string">"{file}"</span><span class="token punctuation">)</span><span class="token comment"># img = np.flip(img, axis=0)</span>y_int <span class="token operator">=</span> np<span class="token punctuation">.</span>array<span class="token punctuation">(</span><span class="token builtin">list</span><span class="token punctuation">(</span>adata<span class="token punctuation">.</span>var_names<span class="token punctuation">)</span><span class="token punctuation">,</span>dtype<span class="token operator">=</span>np<span class="token punctuation">.</span>int16<span class="token punctuation">)</span>x_int <span class="token operator">=</span> np<span class="token punctuation">.</span>array<span class="token punctuation">(</span><span class="token builtin">list</span><span class="token punctuation">(</span>adata<span class="token punctuation">.</span>obs_names<span class="token punctuation">)</span><span class="token punctuation">,</span>dtype<span class="token operator">=</span>np<span class="token punctuation">.</span>int16<span class="token punctuation">)</span>y_max<span class="token punctuation">,</span> y_min <span class="token operator">=</span> y_int<span class="token punctuation">.</span><span class="token builtin">max</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> y_int<span class="token punctuation">.</span><span class="token builtin">min</span><span class="token punctuation">(</span><span class="token punctuation">)</span>x_max<span class="token punctuation">,</span> x_min <span class="token operator">=</span> x_int<span class="token punctuation">.</span><span class="token builtin">max</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> x_int<span class="token punctuation">.</span><span class="token builtin">min</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token keyword">print</span><span class="token punctuation">(</span>y_max<span class="token punctuation">,</span> y_min<span class="token punctuation">,</span> x_max<span class="token punctuation">,</span> x_min<span class="token punctuation">)</span><span class="token comment"># set the weight of border</span>ax<span class="token punctuation">.</span>spines<span class="token punctuation">[</span><span class="token string">'top'</span><span class="token punctuation">]</span><span class="token punctuation">.</span>set_linewidth<span class="token punctuation">(</span><span class="token number">2.5</span><span class="token punctuation">)</span>ax<span class="token punctuation">.</span>spines<span class="token punctuation">[</span><span class="token string">'bottom'</span><span class="token punctuation">]</span><span class="token punctuation">.</span>set_linewidth<span class="token punctuation">(</span><span class="token number">2.5</span><span class="token punctuation">)</span>ax<span class="token punctuation">.</span>spines<span class="token punctuation">[</span><span class="token string">'left'</span><span class="token punctuation">]</span><span class="token punctuation">.</span>set_linewidth<span class="token punctuation">(</span><span class="token number">2.5</span><span class="token punctuation">)</span>ax<span class="token punctuation">.</span>spines<span class="token punctuation">[</span><span class="token string">'right'</span><span class="token punctuation">]</span><span class="token punctuation">.</span>set_linewidth<span class="token punctuation">(</span><span class="token number">2.5</span><span class="token punctuation">)</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><h1 id="Statistic-Test"><a href="#Statistic-Test" class="headerlink" title="Statistic Test"></a>Statistic Test</h1><pre class="line-numbers language-python" data-language="python"><code class="language-python"><span class="token comment"># ranksums</span><span class="token keyword">from</span> scipy<span class="token punctuation">.</span>stats <span class="token keyword">import</span> ranksumsranksums<span class="token punctuation">(</span>df<span class="token punctuation">[</span>df<span class="token punctuation">[</span><span class="token string">"cell_or_bin"</span><span class="token punctuation">]</span> <span class="token operator">==</span> <span class="token string">"Cell"</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token string">"density"</span><span class="token punctuation">]</span><span class="token punctuation">,</span> df<span class="token punctuation">[</span>df<span class="token punctuation">[</span><span class="token string">"cell_or_bin"</span><span class="token punctuation">]</span> <span class="token operator">==</span> <span class="token string">"Cytoplasma"</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token string">"density"</span><span class="token punctuation">]</span><span class="token punctuation">)</span>ranksums<span class="token punctuation">(</span>df<span class="token punctuation">[</span>df<span class="token punctuation">[</span><span class="token string">"cell_or_bin"</span><span class="token punctuation">]</span> <span class="token operator">==</span> <span class="token string">"Cell"</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token string">"n_counts"</span><span class="token punctuation">]</span><span class="token punctuation">,</span> df<span class="token punctuation">[</span>df<span class="token punctuation">[</span><span class="token string">"cell_or_bin"</span><span class="token punctuation">]</span> <span class="token operator">==</span> <span class="token string">"Cell"</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token string">"n_counts"</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token comment"># T-test</span><span class="token keyword">from</span> scipy<span class="token punctuation">.</span>stats <span class="token keyword">import</span> ttest_indttest_ind<span class="token punctuation">(</span>df<span class="token punctuation">[</span>df<span class="token punctuation">[</span><span class="token string">"cell_or_bin"</span><span class="token punctuation">]</span> <span class="token operator">==</span> <span class="token string">"Cell"</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token string">"n_counts"</span><span class="token punctuation">]</span><span class="token punctuation">,</span> df<span class="token punctuation">[</span>df<span class="token punctuation">[</span><span class="token string">"cell_or_bin"</span><span class="token punctuation">]</span> <span class="token operator">==</span> <span class="token string">"Cytoplasma"</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token string">"n_counts"</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre>]]></content>
<categories>
<category> spateo </category>
</categories>
<tags>
<tag> spatial tanscriptomics </tag>
</tags>
</entry>
<entry>
<title>VScode in linux</title>
<link href="/2022/09/16/vscode-in-linux/"/>
<url>/2022/09/16/vscode-in-linux/</url>
<content type="html"><![CDATA[<h1 id="Using-VScode-in-linux-though-Remote-ssh"><a href="#Using-VScode-in-linux-though-Remote-ssh" class="headerlink" title="Using VScode in linux though Remote-ssh"></a>Using VScode in linux though Remote-ssh</h1><p>Using vscode through Remote-ssh extension is a kind of popular usage of vscode. We could use desktop vscode and connect to server through the extension which also provide many available extension for coding and debugging. But there would be some misunderstanding of the vscode-server running in the server, I would like to give you a brief introduction to use it.</p><h2 id="Connect-to-server"><a href="#Connect-to-server" class="headerlink" title="Connect to server"></a>Connect to server</h2><p>Just follow the instruction in the manual, usually you could connect to the server in a few seconds. But, the first time you login through Remote-ssh, there would be automatically install <code>.vscode-server</code> directory in your <code>$HOME</code>, and it would take you a few minites. </p><p>If you have a problem when connecting, it would be the disk limitation in your <code>$HOME</code>. And that could affect your normal usage of the server, so if you have limit of disk usage in <code>$HOME</code> disk, you should firstly set a soft link in your server where you have enough disk storage. <code>ln -s /your/disk/free/path /home/yourname/.vscoder-server</code> and it would help you a lot even if you have enough space to download the <code>.vscode-server</code>, because with time flies, your <code>.vscode-server</code> would be bigger and bigger where the vscode store all of its caches and extensions also downloaded in there. </p><p>And there is also an alternative way that you could set in your vscode desktop, and find the setting <code>Remote-ssh:InstallPath</code>, fix it with your hostname and path to install, you would also solve that. </p><h2 id="One-more-thing-about-linux-HOME"><a href="#One-more-thing-about-linux-HOME" class="headerlink" title="One more thing about linux $HOME"></a>One more thing about linux <code>$HOME</code></h2><p>There could be many directory with prefix <code>.</code>, and you should try to control them. Otherwise, they will take over all of your disk of your <code>/home/</code>. For example, <code>.cache</code> directory usually save some zip files contents, it would be disappeared when rebot of the server. But you should watch it out, and clear it when neccessary.</p><h2 id="Alternative-way-to-install-vscode-extensions"><a href="#Alternative-way-to-install-vscode-extensions" class="headerlink" title="Alternative way to install vscode extensions"></a>Alternative way to install vscode extensions</h2><h2 id="Install-from-a-VSIX"><a href="#Install-from-a-VSIX" class="headerlink" title="Install from a VSIX#"></a>Install from a VSIX<a href="https://code.visualstudio.com/docs/editor/extension-marketplace#_install-from-a-vsix">#</a></h2><blockquote><p>You can manually install a VS Code extension packaged in a <code>.vsix</code> file. Using the <strong>Install from VSIX</strong> command in the Extensions view command dropdown, or the <strong>Extensions: Install from VSIX</strong> command in the <strong>Command Palette</strong>, point to the <code>.vsix</code> file.</p><p>You can also install using the VS Code <code>--install-extension</code> command-line switch providing the path to the <code>.vsix</code> file.</p><pre class="line-numbers language-none"><code class="language-none">code --install-extension myextension.vsix<span aria-hidden="true" class="line-numbers-rows"><span></span></span></code></pre><p>You may provide the <code>--install-extension</code> multiple times on the command line to install multiple extensions at once.</p><p>If you’d like to learn more about packaging and publishing extensions, see our <a href="https://code.visualstudio.com/api/working-with-extensions/publishing-extension">Publishing Extensions</a> article in the Extension API.</p></blockquote><h2 id="Before-End"><a href="#Before-End" class="headerlink" title="Before End"></a>Before End</h2><p>Reminder that, the remote-ssh extension in vscode uses your computer’s OpenSSH to connect the remote server, and you have to make sure the software is normal in your computer. You could use powershell in Windows to check the service is running normally, and if you do own it, also install it through powershell. </p><p>If you have a different shell like zsh, the connection from vscode could not be run properly. You need to delete <code>exec zsh</code> in your ~/.ashrc file. </p>]]></content>
<categories>
<category> linux </category>
</categories>
<tags>
<tag> tips </tag>
</tags>
</entry>
<entry>
<title>segment_densities</title>
<link href="/2022/09/16/segment-densities/"/>
<url>/2022/09/16/segment-densities/</url>
<content type="html"><![CDATA[<h1 id="Background"><a href="#Background" class="headerlink" title="Background"></a>Background</h1><p>To align the RNA with its nucliei is an important part in the analysis of high-resolution spatial transcriptomics. With more precise location for the RNA transcripts and the nuclei of the cells, we could segment profile of the single-cell therefore to accomplish the analysis of single-cell level spatial transcriptomics. While the first step is to align the figures (stain and RNA) appropritately, and this step has been done in spateo-release. </p><p><a href="https://spateo-release.readthedocs.io/en/latest/technicals/cell_segmentation.html#segmentation-approaches">Cell segmentation - Spateo documentation (spateo-release.readthedocs.io)</a></p><p><a href="https://spateo-release.readthedocs.io/en/latest/tutorials/notebooks/stain_segmentation.html">Stain segmentation - Spateo documentation (spateo-release.readthedocs.io)</a></p><p>After that, the next step is to segment cells based on the alignment of the two stain figures. For segmentation, the difficulties first comes to segment the relative low and high denstiy region on the slide. As the document says that the global density could not be enough precise due to the UMI does not distribute on the space hemogenousely. </p><h1 id="Function-Review"><a href="#Function-Review" class="headerlink" title="Function Review"></a>Function Review</h1><h2 id="Code"><a href="#Code" class="headerlink" title="Code"></a>Code</h2><pre class="line-numbers language-python" data-language="python"><code class="language-python"><span class="token comment"># lm for logger_manager</span><span class="token keyword">def</span> <span class="token function">segment_densities</span><span class="token punctuation">(</span> adata<span class="token punctuation">:</span> AnnData<span class="token punctuation">,</span> <span class="token comment"># input anndata</span> layer<span class="token punctuation">:</span> <span class="token builtin">str</span><span class="token punctuation">,</span> <span class="token comment"># Layers that contains UMI counts to implement this function</span> binsize<span class="token punctuation">:</span> <span class="token builtin">int</span><span class="token punctuation">,</span> <span class="token comment"># choose bin size to merge pixels</span> k<span class="token punctuation">:</span> <span class="token builtin">int</span><span class="token punctuation">,</span> <span class="token comment"># kernel size for Gaussian blur</span> dk<span class="token punctuation">:</span> <span class="token builtin">int</span><span class="token punctuation">,</span> <span class="token comment"># kernel size for final dilation </span> distance_threshold<span class="token punctuation">:</span> Optional<span class="token punctuation">[</span><span class="token builtin">float</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token boolean">None</span><span class="token punctuation">,</span> <span class="token comment"># cluster threshold</span> background<span class="token punctuation">:</span> Optional<span class="token punctuation">[</span>Union<span class="token punctuation">[</span>Tuple<span class="token punctuation">[</span><span class="token builtin">int</span><span class="token punctuation">,</span> <span class="token builtin">int</span><span class="token punctuation">]</span><span class="token punctuation">,</span> Literal<span class="token punctuation">[</span><span class="token boolean">False</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token boolean">None</span><span class="token punctuation">,</span> <span class="token comment"># in default, the outer most pixels have been identified as background, set to false to turn off background detection.</span> out_layer<span class="token punctuation">:</span> Optional<span class="token punctuation">[</span><span class="token builtin">str</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token boolean">None</span><span class="token punctuation">,</span> <span class="token comment"># the output layer name</span><span class="token punctuation">)</span><span class="token punctuation">:</span> X <span class="token operator">=</span> SKM<span class="token punctuation">.</span>select_layer_data<span class="token punctuation">(</span>adata<span class="token punctuation">,</span> layer<span class="token punctuation">,</span> make_dense<span class="token operator">=</span>binsize <span class="token operator">==</span> <span class="token number">1</span><span class="token punctuation">)</span> <span class="token keyword">if</span> binsize <span class="token operator">></span> <span class="token number">1</span><span class="token punctuation">:</span> lm<span class="token punctuation">.</span>main_debug<span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"Binning matrix with binsize=</span><span class="token interpolation"><span class="token punctuation">{</span>binsize<span class="token punctuation">}</span></span><span class="token string">."</span></span><span class="token punctuation">)</span> X <span class="token operator">=</span> bin_matrix<span class="token punctuation">(</span>X<span class="token punctuation">,</span> binsize<span class="token punctuation">)</span> <span class="token keyword">if</span> issparse<span class="token punctuation">(</span>X<span class="token punctuation">)</span><span class="token punctuation">:</span> lm<span class="token punctuation">.</span>main_debug<span class="token punctuation">(</span><span class="token string">"Converting to dense matrix."</span><span class="token punctuation">)</span> X <span class="token operator">=</span> X<span class="token punctuation">.</span>A <span class="token comment"># why need the step</span> lm<span class="token punctuation">.</span>main_info<span class="token punctuation">(</span><span class="token string">"Finding density bins."</span><span class="token punctuation">)</span> bins <span class="token operator">=</span> _segment_densities<span class="token punctuation">(</span>X<span class="token punctuation">,</span> k<span class="token punctuation">,</span> dk<span class="token punctuation">,</span> distance_threshold<span class="token punctuation">)</span><span class="token comment"># key step for density segments</span> <span class="token keyword">if</span> background <span class="token keyword">is</span> <span class="token keyword">not</span> <span class="token boolean">False</span><span class="token punctuation">:</span> lm<span class="token punctuation">.</span>main_info<span class="token punctuation">(</span><span class="token string">"Setting background pixels."</span><span class="token punctuation">)</span> <span class="token keyword">if</span> background <span class="token keyword">is</span> <span class="token keyword">not</span> <span class="token boolean">None</span><span class="token punctuation">:</span> x<span class="token punctuation">,</span> y <span class="token operator">=</span> background background_label <span class="token operator">=</span> bins<span class="token punctuation">[</span>x<span class="token punctuation">,</span> y<span class="token punctuation">]</span> <span class="token keyword">else</span><span class="token punctuation">:</span> counts <span class="token operator">=</span> Counter<span class="token punctuation">(</span>bins<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token operator">+</span> Counter<span class="token punctuation">(</span>bins<span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token operator">+</span> Counter<span class="token punctuation">(</span>bins<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token operator">+</span> Counter<span class="token punctuation">(</span>bins<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">)</span> background_label <span class="token operator">=</span> counts<span class="token punctuation">.</span>most_common<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> bins<span class="token punctuation">[</span>bins <span class="token operator">==</span> background_label<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token number">0</span> bins<span class="token punctuation">[</span>bins <span class="token operator">></span> background_label<span class="token punctuation">]</span> <span class="token operator">-=</span> <span class="token number">1</span> <span class="token keyword">if</span> binsize <span class="token operator">></span> <span class="token number">1</span><span class="token punctuation">:</span> <span class="token comment"># Expand back</span> bins <span class="token operator">=</span> cv2<span class="token punctuation">.</span>resize<span class="token punctuation">(</span>bins<span class="token punctuation">,</span> adata<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">:</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> interpolation<span class="token operator">=</span>cv2<span class="token punctuation">.</span>INTER_NEAREST<span class="token punctuation">)</span> out_layer <span class="token operator">=</span> out_layer <span class="token keyword">or</span> SKM<span class="token punctuation">.</span>gen_new_layer_key<span class="token punctuation">(</span>layer<span class="token punctuation">,</span> SKM<span class="token punctuation">.</span>BINS_SUFFIX<span class="token punctuation">)</span> SKM<span class="token punctuation">.</span>set_layer_data<span class="token punctuation">(</span>adata<span class="token punctuation">,</span> out_layer<span class="token punctuation">,</span> bins<span class="token punctuation">)</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><h2 id="Function-document"><a href="#Function-document" class="headerlink" title="Function document"></a>Function document</h2><p>The tissue is segmented into UMI density bins according to the following procedure.</p><ol><li>The UMI matrix is binned according to <code>binsize</code> (recommended >= 20). </li><li>The binned UMI matrix (from the previous step) is Gaussian blurred with kernel size <code>k</code>. Note that <code>k</code> is in terms of bins, not pixels.</li><li>The elements of the blurred, binned UMI matrix is hierarchically clustered with Ward linkage, distance threshold <code>distance_threshold</code>, and spatial constraints (immediate neighbors). This yields pixel density bins (a.k.a. labels) the same shape as the binned matrix.</li><li>Each density bin is diluted with kernel size <code>dk</code>, starting from the bin with the smallest mean UMI (a.k.a. least dense) and going to the bin with the largest mean UMI (a.k.a. most dense). This is done in an effort to mitigate RNA diffusion and “choppy” borders in subsequent steps.</li><li>If <code>background</code> is not provided, the density bin that is most common in the perimeter of the matrix is selected to be background, and thus its label is changed to take a value of 0. A pixel can be manually selected to be background by providing a <code>(x, y)</code> tuple instead. This feature can be turned off by providing <code>False</code>.</li><li>The density bin matrix is resized to be the same size as the original UMI matrix.</li></ol>]]></content>
<tags>
<tag> spatial transcriptomics </tag>
</tags>
</entry>
<entry>
<title>Using zsh as your shell</title>
<link href="/2022/09/16/using-zsh-as-your-shell/"/>
<url>/2022/09/16/using-zsh-as-your-shell/</url>
<content type="html"><![CDATA[<pre class="line-numbers language-bash" data-language="bash"><code class="language-bash"><span class="token comment">#安装zsh</span><span class="token comment">#下载zsh源代码</span><span class="token comment">#下载最新发行版zsh源代码http://www.zsh.org/pub/zsh.tar.gz,解压后进入zsh源代码目录。</span><span class="token comment">#配置zsh编译安装选项</span><span class="token comment">#这里,主要设置zsh的安装目录,让zsh安装在用户目录下,供用户访问</span>./configure <span class="token parameter variable">--prefix</span><span class="token operator">=</span><span class="token environment constant">$HOME</span>/<span class="token comment">#编译安装</span><span class="token function">make</span> <span class="token operator">&&</span> <span class="token function">make</span> <span class="token function">install</span><span class="token comment">#zsh默认会安装到$HOME/bin目录下.</span><span class="token comment">#默认shell</span><span class="token comment">#在主目录下的.bash_profile或.bashrc中添加如下代码:</span><span class="token builtin class-name">export</span> <span class="token assign-left variable"><span class="token environment constant">PATH</span></span><span class="token operator">=</span><span class="token environment constant">$PATH</span><span class="token builtin class-name">:</span><span class="token environment constant">$HOME</span>/bin <span class="token comment"># 添加PATH</span><span class="token builtin class-name">export</span> <span class="token assign-left variable"><span class="token environment constant">SHELL</span></span><span class="token operator">=</span><span class="token variable"><span class="token variable">`</span><span class="token function">which</span> <span class="token function">zsh</span><span class="token variable">`</span></span> <span class="token comment"># 设置$SHELL为zsh</span><span class="token builtin class-name">exec</span> <span class="token variable"><span class="token variable">`</span><span class="token function">which</span> <span class="token function">zsh</span><span class="token variable">`</span></span> <span class="token parameter variable">-l</span> <span class="token comment"># 设置登录为zsh</span><span class="token builtin class-name">source</span> ~/.bashrc<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><h2 id="oh-my-zsh"><a href="#oh-my-zsh" class="headerlink" title="oh-my-zsh"></a>oh-my-zsh</h2><p>install oh-my-zsh which is the most popular plugins framework which give you much more plugins to facilitate your use of shell<br></p><pre class="line-numbers language-none"><code class="language-none">sh -c "$(wget https://raw.github.com/robbyrussell/oh-my-zsh/master/tools/install.sh -O -)"<span aria-hidden="true" class="line-numbers-rows"><span></span></span></code></pre><br>And the plugins could be installed in the path <code>~/.oh-my-zsh/custom/plugins</code> from their github categories. And you could choose your favorite. Install them in the directory <code>~/.oh-my-zsh/theme</code>, and set your .zshrc <code>ZSH_THEME="theme/name"</code><p></p><h2 id="Auto-suggestions"><a href="#Auto-suggestions" class="headerlink" title="Auto-suggestions"></a>Auto-suggestions</h2><p>zsh-autosuggestions is a useful command history tool, and you could install it through command <code>git clone https://github.com/zsh-users/zsh-autosuggestions ~/.zsh/zsh-autosuggestions</code>. And add <code>plugins = (... zsh-autosuggestions)</code>, source it through <code>source ~/.zshrc</code>.</p><pre class="line-numbers language-bash" data-language="bash"><code class="language-bash"><span class="token function">git</span> clone https://github.com/zsh-users/zsh-autosuggestions <span class="token variable">${ZSH_CUSTOM<span class="token operator">:-</span>~<span class="token operator">/</span>.oh-my-zsh<span class="token operator">/</span>custom}</span>/plugins/zsh-autosuggestions<span aria-hidden="true" class="line-numbers-rows"><span></span></span></code></pre>]]></content>
<categories>
<category> linux </category>
</categories>
<tags>
<tag> tips </tag>
</tags>
</entry>
<entry>
<title>Tree based VS Deep learning</title>
<link href="/2022/09/14/tree-based-vs-deep-learning/"/>
<url>/2022/09/14/tree-based-vs-deep-learning/</url>
<content type="html"><![CDATA[<h2 id="Why-tree-based-models-beat-deep-learning-based-on-tabular-data"><a href="#Why-tree-based-models-beat-deep-learning-based-on-tabular-data" class="headerlink" title="Why tree-based models beat deep learning-based on tabular data"></a>Why tree-based models beat deep learning-based on tabular data</h2><p><a href="https://medium.com/geekculture/why-tree-based-models-beat-deep-learning-on-tabular-data-fcad692b1456">Why Tree-Based Models Beat Deep Learning on Tabular Data | by Devansh- Machine Learning Made Simple | Geek Culture | Aug, 2022 | Medium</a></p><p>The autoher think that Random Forest are very good for situations with missing data. And the paper he evaluated implement removing missing data for each columns. The author says that he doesn’t like to do some preprocess for data analysis. </p><h3 id="Why-do-tree-based-methods-beat-deep-learning"><a href="#Why-do-tree-based-methods-beat-deep-learning" class="headerlink" title="Why do tree-based methods beat deep learning?"></a>Why do tree-based methods beat deep learning?</h3><ol><li>NNs are biased to overly smoothed solutions. Neural Nets based on gradient, the decision boundary of the Neural Nets should be smooth over, but the Random Forest could have a irregular pattern for more precise decision. </li><li>Uniformative features affect more MLP-like NNs. The decision trees are designed to have information gain and entropy when decide the paths to follow.</li><li>NNs are invariant to rotation, actual data is not. NNs are maintaining their original performance, while all other learners actually lose quite a bit of performance.</li></ol>]]></content>
</entry>
<entry>
<title>DBSCAN algorithm</title>
<link href="/2022/09/14/dbscan-algorithm/"/>
<url>/2022/09/14/dbscan-algorithm/</url>
<content type="html"><![CDATA[<h1 id="Spatial-Clustering"><a href="#Spatial-Clustering" class="headerlink" title="Spatial Clustering"></a>Spatial Clustering</h1><p><a href="https://02522-cua.github.io/lecturenotes/spatial-clustering.html">Chapter 9 Spatial clustering | 02.522: Urban Data & Methods II: Computational Urban Analysis (02522-cua.github.io)</a></p><p>spatial clustering refers to those clustering methods that clustering data based on the spatial information including the density, actual location and relative path, etc. </p><h2 id="DBSCAN"><a href="#DBSCAN" class="headerlink" title="DBSCAN"></a>DBSCAN</h2><p>Denstiy-based spatial clustering of applications with Nosie (DBSCAN) is a kind of spatial clustering algorithm based on the density of data points. The following link will give you a view about how the algorithm is proceeding. I recommend you to try smile face to know its advantage and density bar to realize its drawbacks. </p><p><a href="https://www.naftaliharris.com/blog/visualizing-dbscan-clustering/">Visualizing DBSCAN Clustering (naftaliharris.com)</a></p><p>The algorithm has two important parameters: epsilon and minPoints. And If you have watched the visualization, you would probably know that the epsilon means the radius of the searching circle and minPoints representing the minimum points should include in one cluster. </p><p>The algorithms work like this: 1. To random select a point and search its neighbor within the radius and propaganda the process to select their neighbors until there is no data points within the circle. 2. Select points that have not been clustered and repeat the first step, until all of the points have been selected. </p><h2 id="Evaluation-clustering-performance"><a href="#Evaluation-clustering-performance" class="headerlink" title="Evaluation clustering performance"></a>Evaluation clustering performance</h2><p><strong>Silhouette Coefficient</strong></p><blockquote><p>The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation). The silhouette ranges from −1 to +1, where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters. If most objects have a high value, then the clustering configuration is appropriate. If many points have a low or negative value, then the clustering configuration may have too many or too few clusters. — Wikipedia</p></blockquote><p><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/71ae733cc90f36f4a6352d347dc35e4bb4b577eb" alt="mean distance of a(i) for other points in cluster"></p><p><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/ec0c9fa41baa11de15a47da36e01d8334c0a291d" alt="least mean distance of point i for each point in other cluster"></p><p><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/3d80ab22fb291b347b2d9dc3cc7cd614f6b15479" alt="Silhouette value definition"></p><p><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/ab5579a6c7150579af8a0d432b6630ba529376f0" alt="Also written as"></p><p>For above definition and Sihouette value is defined in the </p><p>As a(i) is a measure of how dissimilar i is to its own cluster, a small value means it is well matched. Furthermore, a large b(i) implies that i is badly matched to its neighbouring cluster. Thus an s(i) close to 1 means that the data is appropriately clustered. If s(i) is close to -1, then by the same logic we see that i would be more appropriate if it was clustered in its neighbouring cluster. An s(i) near zero means that the datum is on the border of two natural clusters.</p><h2 id="Sklearn-metrics"><a href="#Sklearn-metrics" class="headerlink" title="Sklearn.metrics"></a>Sklearn.metrics</h2><p>The <a href="https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics"><code>sklearn.metrics</code></a> module includes score functions, performance metrics and pairwise metrics and distance computations. And here is the document for usage.</p><p><a href="https://scikit-learn.org/stable/modules/model_evaluation.html#model-evaluation">3.3. Metrics and scoring: quantifying the quality of predictions — scikit-learn 1.1.2 documentation</a></p><p><a href="https://scikit-learn.org/stable/modules/metrics.html#metrics">6.8. Pairwise metrics, Affinities and Kernels — scikit-learn 1.1.2 documentation</a></p>]]></content>
<categories>
<category> algorithm </category>
</categories>
<tags>
<tag> spatial transciptomics </tag>
</tags>
</entry>
<entry>
<title>Using conda to help install R packages</title>
<link href="/2022/09/13/using-conda-to-help-install-r-packages/"/>
<url>/2022/09/13/using-conda-to-help-install-r-packages/</url>
<content type="html"><![CDATA[<h1 id="Conda"><a href="#Conda" class="headerlink" title="Conda"></a>Conda</h1><p>Conda is a software which could help you to govern your software programming environment. However, if you are not so familiar with conda environment, you could make it a mess for your environment. Here I would like to give some tips and advise to help you use conda.</p><h2 id="conda-usage"><a href="#conda-usage" class="headerlink" title="conda usage"></a>conda usage</h2><p><span class="github-emoji"><span>🙌</span><img src="https://github.githubassets.com/images/icons/emoji/unicode/1f64c.png?v8" aria-hidden="true" onerror="this.parent.classList.add('github-emoji-fallback')"></span></p><p>For most of the beginners, create a new environment in the first place and name would be suggested. And keep in minds that, using <code>conda activate [env-name]</code> to activate the environment.</p><pre class="line-numbers language-{bash}" data-language="{bash}"><code class="language-{bash}">conda create -n my-env python=x.xconda activate [env-name]conda install -c r r-base # install the r packages that is not easily installed through conda<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span></span></code></pre><h3 id="Using-jupyter-kernel"><a href="#Using-jupyter-kernel" class="headerlink" title="Using jupyter kernel"></a>Using jupyter kernel</h3><p>When you create a new conda environment, you could install ipykernel and IRkernel using the packages in this environments through<br></p><pre class="line-numbers language-bash" data-language="bash"><code class="language-bash">python <span class="token parameter variable">-m</span> ipykernel <span class="token function">install</span> <span class="token parameter variable">--user</span> <span class="token parameter variable">--name</span> <span class="token variable">${name<span class="token operator">/</span>of<span class="token operator">/</span>your<span class="token operator">/</span>kernel}</span><span class="token comment"># install the python kernel in conda env</span><span class="token comment"># use --display-name to set the name of the kernel</span><span class="token comment"># i.e.</span>python <span class="token parameter variable">-m</span> ipykernel <span class="token function">install</span> <span class="token parameter variable">--user</span> <span class="token parameter variable">--name</span> pytorch --display-name "pytorch<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span></span></code></pre><br>and in R<br><pre class="line-numbers language-R" data-language="R"><code class="language-R">install.packages("IRkernel")IRkernel::install_spec()<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span></span></code></pre><p></p><p>The <code>conda install</code> would install the packages in any environment</p><h2 id="conda-install-r-packages"><a href="#conda-install-r-packages" class="headerlink" title="conda install r-packages"></a>conda install r-packages</h2><p>If you find some packages could not install from CRAN through <code>install.packages()</code> in R, and the warning log says that there is some missing lib you may not install. You could easily fix that if you have <code>sudo</code> authority, however, the install of missing support could not easily if you are not root. So, I encourage to use packages manager tools like conda to achieve that.</p><p>When you are encountering some trouble in installation, you could firstly google <code>conda install $packages_name</code>. Usually, the conda channel r could have the packages in repository. And you could easily install it through <code>conda install</code>. And if there are more problems, you may have to search for configuring it through source, but some development support could also be installed through conda. So that, conda could save your life, if you are not roor user.</p><p>When you encounter other problems, try install the missing support packages through this way.</p><h2 id="conda-channels"><a href="#conda-channels" class="headerlink" title="conda channels"></a>conda channels</h2><pre class="line-numbers language-bash" data-language="bash"><code class="language-bash"><span class="token comment"># unable to execute 'x86_64-conda_cos6-linux-gnu-gcc': No such file or directory </span>conda <span class="token function">install</span> gxx_linux-64conda config <span class="token parameter variable">--add</span> channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/conda config <span class="token parameter variable">--add</span> channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/<span class="token comment"># 以上两条是Anaconda官方库的镜像</span>conda config <span class="token parameter variable">--add</span> channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/<span class="token comment"># 以上是Anaconda第三方库 Conda Forge的镜像</span>conda config <span class="token parameter variable">--set</span> show_channel_urls <span class="token function">yes</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><h1 id="Storage-control"><a href="#Storage-control" class="headerlink" title="Storage control"></a>Storage control</h1><p>After a long period using conda, you may would install too many packages in the default directory. In case you run out of the disk storage, you could move your <code>.conda</code> directory to some where you own enough dist. </p><pre class="line-numbers language-bash" data-language="bash"><code class="language-bash"><span class="token comment"># Get the size of current directory</span><span class="token function">du</span> <span class="token parameter variable">-sh</span><span class="token comment"># Check the </span>/DATA/User/<span class="token punctuation">{</span>user<span class="token punctuation">}</span>/.conda<span class="token comment"># move the directory to antoher directory</span><span class="token function">mv</span> /DATA/User/<span class="token punctuation">{</span>user<span class="token punctuation">}</span>/.conda /where/has/enough/dist<span class="token function">ln</span> <span class="token parameter variable">-s</span> /where/has/enough/dist /DATA/User/<span class="token punctuation">{</span>user<span class="token punctuation">}</span>/.conda<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><p>Then you will not have to change the setting of conda.</p><h1 id="Some-other-problems-you-can-solve-using-conda"><a href="#Some-other-problems-you-can-solve-using-conda" class="headerlink" title="Some other problems you can solve using conda"></a>Some other problems you can solve using conda</h1><h2 id="install-spateo"><a href="#install-spateo" class="headerlink" title="install spateo"></a>install spateo</h2><p><code>fbgbp</code> is a cpython library for belief propagation, and it is very hard to download due to some g++ problems. You may need to upgrade your g++ version and then add the soft link of the g++ in the lib64 directory in your machine or server.</p><pre class="line-numbers language-bash" data-language="bash"><code class="language-bash">conda <span class="token function">install</span> <span class="token parameter variable">-c</span> conda-forge gcc libgccconda <span class="token function">install</span> <span class="token parameter variable">-c</span> conda-forge gxx_linux-64<span class="token builtin class-name">cd</span> /<span class="token punctuation">..</span>./anaconda3/envs/<span class="token punctuation">..</span>./bin<span class="token function">ln</span> <span class="token parameter variable">-s</span> /<span class="token punctuation">..</span>./anaconda3/envs/<span class="token punctuation">..</span>./bin/x86_64-conda_cos6-linux-gnu-g++ g++<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre>]]></content>
<categories>
<category> conda </category>
</categories>
<tags>
<tag> tips </tag>
<tag> conda </tag>
</tags>
</entry>
<entry>
<title>Test image asset</title>
<link href="/2022/09/13/test-image-asset/"/>
<url>/2022/09/13/test-image-asset/</url>
<content type="html"><![CDATA[<h3 id="Hexo-figure-problem"><a href="#Hexo-figure-problem" class="headerlink" title="Hexo figure problem"></a>Hexo figure problem</h3><p>I am try to use typora, github.io and hexo to build my blog. However, I encounter that the image import problem. </p><p>The problem is caused by the different image loading rules for hexo and typora. After searching some resolutions in Google, I find that this could help me resolve this problem. </p><p>First is to change the <code>_config.yml</code> doc</p><pre class="line-numbers language-yaml" data-language="yaml"><code class="language-yaml"><span class="token key atrule">post_asset_folder</span><span class="token punctuation">:</span> <span class="token boolean important">true</span><span class="token key atrule">marked</span><span class="token punctuation">:</span> <span class="token key atrule">prependRoot</span><span class="token punctuation">:</span> <span class="token boolean important">true</span> <span class="token key atrule">postAsset</span><span class="token punctuation">:</span> <span class="token boolean important">true</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span></span></code></pre><p>Second step is to install a plugins for image loading</p><p><code>npm install hexo-renderer-marked --save</code></p><p>Now, the new created post in hexo could generate <code>$filename</code> fold automatically, and the image upload will be resolved automatically with <code>your\storage\post\$filename_fold\${*.jpg}</code> so that the filepath you could regularly see is <code>filepath\*.jpg</code>. However, this filepath still could not been seen in typora.</p><p>Then, set the typora. <code>preference</code> of <code>image</code> to <code>Copy image to custom folder</code>, which is <code>./${filename}</code> , <code>Apply above rules to local images</code>, <code>Use relative path if possable</code> and <code>add ./ to relative path</code></p><p><img src="/2022/09/13/test-image-asset/image-20220913172831483.png" alt="screen shot" style="zoom:50%;"></p><p>And the last step is to add code in <code>node_modules\hexo-renderer-marked\lib\renderer.js</code> . Find the image render and add code </p><pre class="line-numbers language-javascript" data-language="javascript"><code class="language-javascript"><span class="token comment">// Prepend root to image path</span><span class="token function">image</span><span class="token punctuation">(</span><span class="token parameter">href<span class="token punctuation">,</span> title<span class="token punctuation">,</span> text</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token keyword">const</span> <span class="token punctuation">{</span> hexo<span class="token punctuation">,</span> options <span class="token punctuation">}</span> <span class="token operator">=</span> <span class="token keyword">this</span><span class="token punctuation">;</span> <span class="token keyword">if</span> <span class="token punctuation">(</span>href<span class="token punctuation">.</span><span class="token function">indexOf</span><span class="token punctuation">(</span><span class="token string">'/'</span><span class="token punctuation">)</span><span class="token operator">></span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">{</span> href <span class="token operator">=</span> href<span class="token punctuation">.</span><span class="token function">split</span><span class="token punctuation">(</span><span class="token string">'/'</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token operator">...</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><p>Now the image is regularly showed!<br><span class="github-emoji"><span>😙</span><img src="https://github.githubassets.com/images/icons/emoji/unicode/1f619.png?v8" aria-hidden="true" onerror="this.parent.classList.add('github-emoji-fallback')"></span></p><h1 id="Hexo-amp-matery-math-equation-problem"><a href="#Hexo-amp-matery-math-equation-problem" class="headerlink" title="Hexo & matery math equation problem"></a>Hexo & matery math equation problem</h1><p>May be you think that if the problem about rendering figure is solved and the equation should not be a case. But I encounter about the rendering figure of math equation are not accountable with my theme and configuration. So that, I believe that I should tey to use Hexo plugins to solve it.</p><p>The first step is to install the math avaible plugin <code>npm install hexo-math --save</code>, and configure your <code>_config.yml</code>, add following code</p><pre class="line-numbers language-yaml" data-language="yaml"><code class="language-yaml"><span class="token key atrule">math</span><span class="token punctuation">:</span> <span class="token key atrule">engine</span><span class="token punctuation">:</span> <span class="token string">'mathjax'</span> <span class="token key atrule">mathjax</span><span class="token punctuation">:</span> <span class="token key atrule">src</span><span class="token punctuation">:</span> custom_mathjax_source <span class="token key atrule">config</span><span class="token punctuation">:</span> <span class="token comment"># MathJax config</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><p>And then open the file <code>your/theme/path/_config.yml</code> and set </p><pre class="line-numbers language-yaml" data-language="yaml"><code class="language-yaml"><span class="token key atrule">mathjax</span><span class="token punctuation">:</span> <span class="token key atrule">enable</span><span class="token punctuation">:</span> <span class="token boolean important">true</span> <span class="token key atrule">per_page</span><span class="token punctuation">:</span> <span class="token boolean important">false</span> <span class="token key atrule">cdn</span><span class="token punctuation">:</span> //cdn.mathjax.org/mathjax/latest/MathJax.js<span class="token punctuation">?</span>config=TeX<span class="token punctuation">-</span>AMS<span class="token punctuation">-</span>MML_HTMLorMML<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span></span></code></pre><p>Change the default rendering engine of Hexo, because the <code>hexo-renderer-marked</code> would reder <code>_</code> between $$$$ as <code><i></code>in <code>HTML</code>. So that </p><pre class="line-numbers language-bash" data-language="bash"><code class="language-bash"><span class="token function">npm</span> uninstall hexo-renderer-marked <span class="token parameter variable">--save</span><span class="token function">npm</span> <span class="token function">install</span> hexo-renderer-kramed <span class="token parameter variable">--save</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span></span></code></pre><p>After that, follow tha instruction in README.md of hexo-render-kramed, add following code in <code>_config.yml</code>. </p><pre class="line-numbers language-yaml" data-language="yaml"><code class="language-yaml"><span class="token key atrule">kramed</span><span class="token punctuation">:</span> <span class="token key atrule">gfm</span><span class="token punctuation">:</span> <span class="token boolean important">true</span> <span class="token key atrule">pedantic</span><span class="token punctuation">:</span> <span class="token boolean important">false</span> <span class="token key atrule">sanitize</span><span class="token punctuation">:</span> <span class="token boolean important">false</span> <span class="token key atrule">tables</span><span class="token punctuation">:</span> <span class="token boolean important">true</span> <span class="token key atrule">breaks</span><span class="token punctuation">:</span> <span class="token boolean important">true</span> <span class="token key atrule">smartLists</span><span class="token punctuation">:</span> <span class="token boolean important">true</span> <span class="token key atrule">smartypants</span><span class="token punctuation">:</span> <span class="token boolean important">true</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><p>Now, in your bolg, the <code>\$\$ something inside \$\$</code> would appear normally as your math equation.</p>]]></content>
<categories>
<category> Hexo </category>
</categories>
<tags>
<tag> blog </tag>
</tags>
</entry>
<entry>
<title>Louvain and leiden algorithm</title>
<link href="/2022/09/13/louvain-and-leiden-algorithm/"/>
<url>/2022/09/13/louvain-and-leiden-algorithm/</url>
<content type="html"><![CDATA[<h1 id="Clustering-algorithm"><a href="#Clustering-algorithm" class="headerlink" title="Clustering algorithm"></a>Clustering algorithm</h1><p>For the default usage of clustering algorithm in scanpy, there are 4 settings. </p><ol><li>Original Louvain</li><li>Louvain with multilevel refinement</li><li>SLM</li><li>Leiden algorithm</li></ol><h2 id="Louvain-and-leiden"><a href="#Louvain-and-leiden" class="headerlink" title="Louvain and leiden"></a>Louvain and leiden</h2><p><a href="https://www.nature.com/articles/s41598-019-41695-z">From Louvain to Leiden: guaranteeing well-connected communities - Scientific Reports</a></p><p><a href="https://timoast.github.io/blog/community-detection/">Community detection - Tim Stuart</a></p><p><a href="https://cran.r-project.org/web/packages/leiden/vignettes/run_leiden.html">Clustering with the Leiden Algorithm in R (r-project.org)</a></p><p>Community detection is often used to understand the structure of large and complex networks.</p><p><img src="/2022/09/13/louvain-and-leiden-algorithm/pasted-0.png" alt="Modularity"></p><p>Constant Potts Model (CPM) which over comes some limitations of modularity:</p><p><img src="/2022/09/13/louvain-and-leiden-algorithm/pasted-1.png" alt="CPM"></p><blockquote><p>The new algorithm integrates several earlier improvements, incorporating a combination of smart local move<a href="https://www.nature.com/articles/s41598-019-41695-z#ref-CR15">15</a>, fast local move<a href="https://www.nature.com/articles/s41598-019-41695-z#ref-CR16">16</a>,<a href="https://www.nature.com/articles/s41598-019-41695-z#ref-CR17">17</a>and random neighbour move<a href="https://www.nature.com/articles/s41598-019-41695-z#ref-CR18">18</a>.</p></blockquote><h3 id="Modularity-python-code-practice"><a href="#Modularity-python-code-practice" class="headerlink" title="Modularity python code practice"></a>Modularity python code practice</h3><p><img src="/2022/09/13/louvain-and-leiden-algorithm/pasted-2.png" alt="Equation of modularity"></p><pre class="line-numbers language-python" data-language="python"><code class="language-python"><span class="token keyword">import</span> numpy <span class="token keyword">as</span> npdata <span class="token operator">=</span> np<span class="token punctuation">.</span>matrix<span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">)</span>label <span class="token operator">=</span> np<span class="token punctuation">.</span>matrix<span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token comment"># calculate the connectivity of the matrix</span>m <span class="token operator">=</span> np<span class="token punctuation">.</span><span class="token builtin">sum</span><span class="token punctuation">(</span>np<span class="token punctuation">.</span><span class="token builtin">sum</span><span class="token punctuation">(</span>data<span class="token punctuation">,</span> axis<span class="token operator">=</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token operator">/</span><span class="token number">2</span><span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"The network has totally %d connections"</span></span> <span class="token operator">%</span> m<span class="token punctuation">)</span><span class="token comment">## The network has totally 2 connections ##</span><span class="token comment"># calculate the degree of each node</span>k <span class="token operator">=</span> np<span class="token punctuation">.</span><span class="token builtin">sum</span><span class="token punctuation">(</span>data<span class="token punctuation">,</span> axis<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"The degree of each node is </span><span class="token interpolation"><span class="token punctuation">{</span>k<span class="token punctuation">}</span></span><span class="token string">"</span></span><span class="token punctuation">)</span><span class="token comment">## The degree of each node is [[1]</span><span class="token comment">## [2]</span><span class="token comment">## [2]]</span><span class="token comment"># calculate the modularity matrix</span>b <span class="token operator">=</span> data <span class="token operator">-</span> np<span class="token punctuation">.</span>multiply<span class="token punctuation">(</span>np<span class="token punctuation">.</span>tile<span class="token punctuation">(</span>k<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span> np<span class="token punctuation">.</span>tile<span class="token punctuation">(</span>k<span class="token punctuation">.</span>T<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">3</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token operator">/</span> <span class="token punctuation">(</span><span class="token number">2</span><span class="token operator">*</span>m<span class="token punctuation">)</span><span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"The modularity matrix is </span><span class="token interpolation"><span class="token punctuation">{</span>b<span class="token punctuation">}</span></span><span class="token string">"</span></span><span class="token punctuation">)</span><span class="token comment">## The modularity matrix is [[ 0.8 -0.4 -0.4]</span><span class="token comment">## [-0.4 0.2 0.2]</span><span class="token comment">## [ 0.6 -0.8 0.2]]</span><span class="token comment"># calculate the modularity</span>q <span class="token operator">=</span> <span class="token number">1</span><span class="token operator">/</span><span class="token punctuation">(</span><span class="token number">2</span><span class="token operator">*</span>m<span class="token punctuation">)</span> <span class="token operator">*</span> np<span class="token punctuation">.</span>trace<span class="token punctuation">(</span>label<span class="token punctuation">.</span>T<span class="token operator">*</span>b<span class="token operator">*</span>label<span class="token punctuation">)</span><span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"The modularity is </span><span class="token interpolation"><span class="token punctuation">{</span>q<span class="token punctuation">}</span></span><span class="token string">"</span></span><span class="token punctuation">)</span><span class="token comment">## The modularity is 0.27999999999999997</span><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><p>The value of modularity is in [-1, 1], and if all of the nodes are allocated into 1 community the modularity is 1, while if each of the node is individually community the modularity is -1. When the value is in 0.3 ~ 0.7, the performance is good.</p><h3 id="Louvain-Algorithm-process"><a href="#Louvain-Algorithm-process" class="headerlink" title="Louvain Algorithm process"></a>Louvain Algorithm process</h3><p><img src="/2022/09/13/louvain-and-leiden-algorithm/pasted-3.png" alt="Equation of increasement of modularity"></p><p>Two stages:</p><ol><li>To allocate each node as one independent community, and then calculate the current modularity. For node i, try to delete node i form its own community rather than has the same community with node j and calculate the outcome modularity. Now, we got the increasment of the modularity comparing the two steps. For loop each node j in whole community, move the node i to the community with the highest increasment of modularity. The figure shows the equation of the increasment of modularity.</li></ol><ol><li>Aggregating the network in the first step, try to reconsturct the whole network.</li></ol><blockquote><p>In this stage we essentially collapse communities down into a single representative node, creating a new simplified graph. To do this we just sum all the edge weights between nodes of the corresponding communities to get a single weighted edge between them, and collapse each community down to a single new node. Once aggregation is complete we restart the local moving phase, and continue to iterate until everything converges down to one node. This aspect of the Louvain algorithm can be used to give information about the hierarchical relationships between communities by tracking at which stage the nodes in the communities were aggregated.</p></blockquote><h3 id="Limitations-and-Improvements-on-Louvain"><a href="#Limitations-and-Improvements-on-Louvain" class="headerlink" title="Limitations and Improvements on Louvain"></a>Limitations and Improvements on Louvain</h3><p>Modularity suffers from a difficult problem known as the resolution limit, where there is a minimal community size that not able to be resovled by optimizing modularity. The community with size smaller than the minimal size could not be identified through optimizing modularity. </p><p>Constant Potts Model (CPM) which is an alternative objective function for community detection algorithm. The object of CPM is to maximize the internal connection edges in a community, while keep the community size small, and the constant parameter balances the two characteristics. The CPM could better split into two communities when the link density between the community is lower than constant, and the constant here acts like resolution. Higher constant will result in fewer communities. </p><p>Smart Local Move (SLM) find that the original louvain has difficults to split the communities once they are merged even though the total modularity would gain more. SLM tries to add a step which is to consider each sub-network as a new community and re-apply local movement to them after running local movement. And any sub-networks found in this step could treat as a different communities in aggregation step. </p><p>Random Moving means that choosing a random neighbor node in each moving stage rather than iteratively for all node. The reason is that in most of the time, the community with neighbors could gain more modularity. And the random move also help the algorithm more explorative and it could detect better community structures. </p><p>Louvain pruning keeps track of a list of nodes that have the potential to change the community which would reduce much more time in stage I.</p><p><img src="/2022/09/13/louvain-and-leiden-algorithm/pasted-5.png" alt="Disconnected collapse"></p><h3 id="Leiden-Algorithm-process"><a href="#Leiden-Algorithm-process" class="headerlink" title="Leiden Algorithm process"></a>Leiden Algorithm process</h3><blockquote><p>The Leiden algorithm consists of three phases: (1) local moving of nodes, (2) refinement of the partition and (3) aggregation of the network based on the refined partition, using the non-refined partition to create an initial partition for the aggregate network.</p></blockquote><p>The refinement step allows badly connected communities to be split before creating the aggregate network. This is very similar to what the smart local moving algorithm does. As far as I can tell, Leiden seems to essentially be smart local moving with the additional improvements of random moving and Louvain pruning added.</p><p><img src="/2022/09/13/louvain-and-leiden-algorithm/pasted-4.png" alt="Leiden algorithm process"></p>]]></content>
<categories>
<category> algorithm </category>
</categories>
<tags>
<tag> clustering </tag>
</tags>
</entry>
<entry>
<title>Basic usage of git</title>
<link href="/2022/09/13/basic-usage-of-git/"/>
<url>/2022/09/13/basic-usage-of-git/</url>
<content type="html"><![CDATA[<h1 id="Version-Control-System"><a href="#Version-Control-System" class="headerlink" title="Version Control System"></a>Version Control System</h1><h2 id="Some-interesting-history"><a href="#Some-interesting-history" class="headerlink" title="Some interesting history"></a>Some interesting history</h2><p>Linus had been using <code>diff</code> and <code>patch</code> command in linux to achieve version control of software engineering, and the basic principle of <code>diff</code> and <code>patch</code> is simple. When a <code>a.txt</code> is upgraded to <code>b.txt</code> , using <code>diff a.txt b.txt</code> will generate a <code>diff.txt</code> which would tell you some lines that is presenting in <code>a.txt</code> and others are missed. After that, you can delete the <code>a.txt</code> and keep<code>b.txt diff.txt</code> . In the other side, you could also generate <code>a.txt</code> just from <code>b.txt</code> and <code>diff.txt</code> using <code>patch</code>. This is the basic steps for linus to do the version control of documents. However the <code>diff</code> and <code>patch</code> command could not implement in binary documents, so that the usage of a modern version control system is urgen for linux community around 2000. Linus has applied a business version control system in linux community, and the software is commited to authorized free usage for linux commnity members. But something happends and break out this relationship. In April 4th, 2004, linus has been arranging to develop the git version control system, and finished it just 2 weeks later and the performance reaching the expectation.</p><h2 id="Basic-principle-when-using-git"><a href="#Basic-principle-when-using-git" class="headerlink" title="Basic principle when using git"></a>Basic principle when using git</h2><p>One commit for one thing, even though you just finish a new function or repair a bug. </p><p>But you can still use <code>git add</code> to add the change into Stage, when finishing <code>git commit</code> the content in Stage would be commit into repository.</p><p><code>git</code> have a better <code>diff</code> command than linux <code>diff</code> and <code>patch</code>, and git support binary document. And <code>git diff --cached</code> could allow you to view the difference of the change log of files in Stage. </p><h1 id="Git的操作"><a href="#Git的操作" class="headerlink" title="Git的操作"></a>Git的操作</h1><p>Fork from ruan yifeng’s blog</p><h2 id="新建代码库"><a href="#新建代码库" class="headerlink" title="新建代码库"></a>新建代码库</h2><pre class="line-numbers language-none"><code class="language-none"># 在当前目录新建一个Git代码库$ git init# 新建一个目录,将其初始化为Git代码库$ git init [project-name]# 下载一个项目和它的整个代码历史$ git clone [url]<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span></span></code></pre><h2 id="配置"><a href="#配置" class="headerlink" title="配置"></a>配置</h2><p>Git的设置文件为<code>.gitconfig</code>,它可以在用户主目录下,也可以在项目目录下。</p><pre class="line-numbers language-none"><code class="language-none"># 显示当前的Git配置$ git config --list# 编辑Git配置文件$ git config -e [--global]# 设置提交代码时的用户信息$ git config [--global] user.name "[name]"$ git config [--global] user.email "[email address]"<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span></span></code></pre><h2 id="增加-删除文件"><a href="#增加-删除文件" class="headerlink" title="增加/删除文件"></a>增加/删除文件</h2><pre class="line-numbers language-none"><code class="language-none"># 添加指定文件到暂存区$ git add [file1] [file2] ...# 添加指定目录到暂存区,包括子目录$ git add [dir]# 添加当前目录的所有文件到暂存区$ git add .# 删除工作区文件,并且将这次删除放入暂存区$ git rm [file1] [file2] ...# 停止追踪指定文件,但该文件会保留在工作区$ git rm --cached [file]# 改名文件,并且将这个改名放入暂存区$ git mv [file-original] [file-renamed]<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><h2 id="代码提交"><a href="#代码提交" class="headerlink" title="代码提交"></a>代码提交</h2><pre class="line-numbers language-none"><code class="language-none"># 提交暂存区到仓库区$ git commit -m [message]# 提交暂存区的指定文件到仓库区$ git commit [file1] [file2] ... -m [message]# 提交工作区自上次commit之后的变化,直接到仓库区$ git commit -a# 提交时显示所有diff信息$ git commit -v# 使用一次新的commit,替代上一次提交# 如果代码没有任何新变化,则用来改写上一次commit的提交信息$ git commit --amend -m [message]# 重做上一次commit,并包括指定文件的新变化$ git commit --amend <file1> <file2> ...<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><h2 id="分支"><a href="#分支" class="headerlink" title="分支"></a>分支</h2><pre class="line-numbers language-none"><code class="language-none"># 列出所有本地分支$ git branch# 列出所有远程分支$ git branch -r# 列出所有本地分支和远程分支$ git branch -a# 新建一个分支,但依然停留在当前分支$ git branch [branch-name]# 新建一个分支,并切换到该分支$ git checkout -b [branch]# 新建一个分支,指向指定commit$ git branch [branch] [commit]# 新建一个分支,与指定的远程分支建立追踪关系$ git branch --track [branch] [remote-branch]# 切换到指定分支,并更新工作区$ git checkout [branch-name]# 建立追踪关系,在现有分支与指定的远程分支之间$ git branch --set-upstream [branch] [remote-branch]# 合并指定分支到当前分支$ git merge [branch]选择一个commit,合并进当前分支$ git cherry-pick [commit]# 删除分支$ git branch -d [branch-name]# 删除远程分支$ git push origin --delete <branch-name>$ git branch -dr <remote/branch><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><h2 id="标签"><a href="#标签" class="headerlink" title="标签"></a>标签</h2><pre class="line-numbers language-none"><code class="language-none"># 列出所有tag$ git tag# 新建一个tag在当前commit$ git tag [tag]# 新建一个tag在指定commit$ git tag [tag] [commit]# 查看tag信息$ git show [tag]# 提交指定tag$ git push [remote] [tag]# 提交所有tag$ git push [remote] --tags# 新建一个分支,指向某个tag$ git checkout -b [branch] [tag]<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><h2 id="查看信息"><a href="#查看信息" class="headerlink" title="查看信息"></a>查看信息</h2><pre class="line-numbers language-none"><code class="language-none"># 显示有变更的文件$ git status# 显示当前分支的版本历史$ git log# 显示commit历史,以及每次commit发生变更的文件$ git log --stat# 显示某个文件的版本历史,包括文件改名$ git log --follow [file]$ git whatchanged [file]# 显示指定文件相关的每一次diff$ git log -p [file]# 显示指定文件是什么人在什么时间修改过$ git blame [file]# 显示暂存区和工作区的差异$ git diff# 显示暂存区和上一个commit的差异$ git diff --cached [<file>]# 显示工作区与当前分支最新commit之间的差异$ git diff HEAD# 显示两次提交之间的差异$ git diff [first-branch]...[second-branch]# 显示某次提交的元数据和内容变化$ git show [commit]# 显示某次提交发生变化的文件$ git show --name-only [commit]# 显示某次提交时,某个文件的内容$ git show [commit]:[filename]# 显示当前分支的最近几次提交$ git reflog<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><h2 id="远程同步"><a href="#远程同步" class="headerlink" title="远程同步"></a>远程同步</h2><pre class="line-numbers language-none"><code class="language-none"># 下载远程仓库的所有变动$ git fetch [remote]# 显示所有远程仓库$ git remote -v# 显示某个远程仓库的信息$ git remote show [remote]# 增加一个新的远程仓库,并命名$ git remote add [shortname] [url]# 取回远程仓库的变化,并与本地分支合并$ git pull [remote] [branch]# 上传本地指定分支到远程仓库$ git push [remote] [branch]# 强行推送当前分支到远程仓库,即使有冲突$ git push [remote] --force# 推送所有分支到远程仓库$ git push [remote] --all<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><h2 id="撤销"><a href="#撤销" class="headerlink" title="撤销"></a>撤销</h2><pre class="line-numbers language-none"><code class="language-none"># 恢复暂存区的指定文件到工作区$ git checkout [file]# 恢复某个commit的指定文件到工作区$ git checkout [commit] [file]# 恢复上一个commit的所有文件到工作区$ git checkout .# 重置暂存区的指定文件,与上一次commit保持一致,但工作区不变$ git reset [file]# 重置暂存区与工作区,与上一次commit保持一致$ git reset --hard# 重置当前分支的指针为指定commit,同时重置暂存区,但工作区不变$ git reset [commit]# 重置当前分支的HEAD为指定commit,同时重置暂存区和工作区,与指定commit一致$ git reset --hard [commit]# 重置当前HEAD为指定commit,但保持暂存区和工作区不变$ git reset --keep [commit]# 新建一个commit,用来撤销指定commit# 后者的所有变化都将被前者抵消,并且应用到当前分支$ git revert [commit]<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></code></pre><h2 id="其他"><a href="#其他" class="headerlink" title="其他"></a>其他</h2><pre class="line-numbers language-none"><code class="language-none"># 生成一个可供发布的压缩包# git archive<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span></span></code></pre>]]></content>
<categories>
<category> Git </category>
</categories>
<tags>
<tag> tips </tag>
</tags>
</entry>
<entry>
<title>tmux command</title>
<link href="/2022/09/12/tmux-command/"/>
<url>/2022/09/12/tmux-command/</url>
<content type="html"><![CDATA[<h2 id="tmux"><a href="#tmux" class="headerlink" title="tmux"></a>tmux</h2><p><span class="github-emoji"><span>😆</span><img src="https://github.githubassets.com/images/icons/emoji/unicode/1f606.png?v8" aria-hidden="true" onerror="this.parent.classList.add('github-emoji-fallback')"></span></p><p>ctrl+b ? 显示快捷键帮助<br>ctrl+b 空格键 采用下一个内置布局,这个很有意思,在多屏时,用这个就会将多有屏幕竖着展示<br>ctrl+b ! 把当前窗口变为新窗口<br>ctrl+b “ 模向分隔窗口<br>ctrl+b % 纵向分隔窗口<br>ctrl+b q 显示分隔窗口的编号<br>ctrl+b o 跳到下一个分隔窗口。多屏之间的切换<br>ctrl+b 上下键 上一个及下一个分隔窗口<br>ctrl+b C-方向键 调整分隔窗口大小<br>ctrl+b & 确认后退出当前tmux<br>ctrl+b [ 复制模式,即将当前屏幕移到上一个的位置上,其他所有窗口都向前移动一个。<br>ctrl+b c 创建新窗口<br>ctrl+b n 选择下一个窗口<br>ctrl+b l 最后使用的窗口<br>ctrl+b p 选择前一个窗口<br>ctrl+b w 以菜单方式显示及选择窗口<br>ctrl+b s 以菜单方式显示和选择会话。这个常用到,可以选择进入哪个tmux<br>ctrl+b t 显示时钟。然后按enter键后就会恢复到shell终端状态<br>ctrl+b d 脱离当前会话;这样可以暂时返回Shell界面,输入tmux attach能够重新进入之前的会话<br>ctrl+b pageUp/pageDown; 移动窗口,查看内容</p><h2 id="数值计算"><a href="#数值计算" class="headerlink" title="数值计算"></a>数值计算</h2><pre class="line-numbers language-{bash}" data-language="{bash}"><code class="language-{bash}">a = 1b = 2${a+b}<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span></span></code></pre>]]></content>
</entry>
<entry>
<title>Hexo admin Usage</title>
<link href="/2022/09/11/hexo-admin-usage/"/>
<url>/2022/09/11/hexo-admin-usage/</url>
<content type="html"><![CDATA[<h1 id="Install-hexo-admin"><a href="#Install-hexo-admin" class="headerlink" title="Install hexo-admin"></a>Install hexo-admin</h1><pre class="line-numbers language-{shell}" data-language="{shell}"><code class="language-{shell}"># cd to/your/hexo/path/npm server -dopen http://localhost:4000/admin/<span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span></span></code></pre><p>Now enter the page link in browser<br>And click to <code>Settings</code> —> <code>Setup authentification here.</code><br>Setting your log in name, passwd, and secret. (Note that do not set too simple)<br>Then following the instrction to paste the admin-pharses in the <code>_config.yaml</code><br>You could login ine hexo admin and publish your posts</p>]]></content>
<categories>
<category> Hexo </category>
</categories>
<tags>
<tag> tips </tag>
<tag> blog </tag>
</tags>
</entry>
<entry>
<title>Hello World</title>
<link href="/2022/09/11/hello-world/"/>
<url>/2022/09/11/hello-world/</url>
<content type="html"><![CDATA[<p>Welcome to <a href="https://hexo.io/">Hexo</a>! This is your very first post. Check <a href="https://hexo.io/docs/">documentation</a> for more info. If you get any problems when using Hexo, you can find the answer in <a href="https://hexo.io/docs/troubleshooting.html">troubleshooting</a> or you can ask me on <a href="https://github.com/hexojs/hexo/issues">GitHub</a>.</p><h2 id="Quick-Start"><a href="#Quick-Start" class="headerlink" title="Quick Start"></a>Quick Start</h2><h3 id="Create-a-new-post"><a href="#Create-a-new-post" class="headerlink" title="Create a new post"></a>Create a new post</h3><pre class="line-numbers language-bash" data-language="bash"><code class="language-bash">$ hexo new <span class="token string">"My New Post"</span><span aria-hidden="true" class="line-numbers-rows"><span></span></span></code></pre><p>More info: <a href="https://hexo.io/docs/writing.html">Writing</a></p><h3 id="Run-server"><a href="#Run-server" class="headerlink" title="Run server"></a>Run server</h3><pre class="line-numbers language-bash" data-language="bash"><code class="language-bash">$ hexo server<span aria-hidden="true" class="line-numbers-rows"><span></span></span></code></pre><p>More info: <a href="https://hexo.io/docs/server.html">Server</a></p><h3 id="Generate-static-files"><a href="#Generate-static-files" class="headerlink" title="Generate static files"></a>Generate static files</h3><pre class="line-numbers language-bash" data-language="bash"><code class="language-bash">$ hexo generate<span aria-hidden="true" class="line-numbers-rows"><span></span></span></code></pre><p>More info: <a href="https://hexo.io/docs/generating.html">Generating</a></p><h3 id="Deploy-to-remote-sites"><a href="#Deploy-to-remote-sites" class="headerlink" title="Deploy to remote sites"></a>Deploy to remote sites</h3><pre class="line-numbers language-bash" data-language="bash"><code class="language-bash">$ hexo deploy<span aria-hidden="true" class="line-numbers-rows"><span></span></span></code></pre><p>More info: <a href="https://hexo.io/docs/one-command-deployment.html">Deployment</a></p><h1 id="More-and-more"><a href="#More-and-more" class="headerlink" title="More and more"></a>More and more</h1><p>What should you do more about hexo? I tried some fantastic hexo themes. </p>]]></content>
</entry>
</search>