-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathmatrix-data.qmd
More file actions
126 lines (86 loc) · 3.93 KB
/
matrix-data.qmd
File metadata and controls
126 lines (86 loc) · 3.93 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
title: "Matrix-based data"
format:
html:
code-fold: false
jupyter: julia-1.10
execute:
cache: true
keep-ipynb: true
---
## Introduction
Internally, all [BioFindr][1] functions work with matrices or array-based data, and the [DataFrame](https://dataframes.juliadata.org/stable/) based `findr` methods used in the [coexpression analysis](coexpression.qmd), [association analysis](association.qmd), and [causal inference](causal-inference.qmd) tutorials are wrapper functions provided for convenience. If you prefer matrix-based data over DataFrames, you can directly use matrix-based `findr` methods without having to create DataFrames first.
## Set up the environment
```{julia}
using DrWatson
quickactivate(@__DIR__)
using DataFrames
using Arrow
using BioFindr
```
## Load data
Let's pretend our GEUVADIS data is in a matrix-based format:
```{julia}
Xt = Matrix(DataFrame(Arrow.Table(datadir("exp_pro","findr-data-geuvadis", "dt.arrow"))));
Xm = Matrix(DataFrame(Arrow.Table(datadir("exp_pro","findr-data-geuvadis", "dm.arrow"))));
Gm = Matrix(DataFrame(Arrow.Table(datadir("exp_pro","findr-data-geuvadis", "dgm.arrow"))));
```
We also need the microRNA eQTL mapping (see the [causal inference tutorial](causal-inference.qmd)), in this case in the form of an array where each row corresponds to a cis-eQTL/eGene pair represented by of a column index of `Gm` (i.e. a SNP) and a column index of `Xm` (i.e. a microRNA). [Recall](causal-inference.qmd) that due to the preprocessing of the [findr-geuvadis][2] data. the column indices are identical, but this will not be the case in general:
```{julia}
mirpairs = zeros(Int32,size(Gm,2),2);
for k=1:size(mirpairs,1)
mirpairs[k,:] = [k k]
end
```
Note that data must be stored in matrices where **columns correspond to variables** (genes, SNPs, etc.) and **rows correspond to observations** (samples).
## Run BioFindr
Below, we only show the relevant `findr` commands. Check the corresponding tutorials and [BioFindr documentation][6] for more details.
### Coexpression analysis
#### All-vs-all
Coexpression analysis on a single matrix returns a square matrix with dimensions equal to the number of variables (columns) in the input matrix:
```{julia}
P = findr(Xm)
```
In the output, columns correspond to A-genes (causal factors) and rows to B-genes (targets), that is:
$$
P_{i,j} = P(X_j \to X_i)
$$
Note that the diagonal is arbitrarily set to one, BioFindr cannot make any inferences about the presence or absence of self-regulation!
#### Bipartite
Analyse coexpression *from* a subset of variables *to* the whole set:
```{julia}
P = findr(Xm; cols=[1,3,7,50])
```
Analyse coexpression *from* the variables in `Xm` *to* the variables in `Xt`:
```{julia}
P = findr(Xt,Xm)
```
### Association analysis
Testing associations between eQTL genotypes in `Gm`and microRNA expression levels in `Xm`:
```{julia}
P = findr(Xm,Gm)
```
In the output, columns correspond to eQTLs and rows to genes, that is,
$$
P_{i,j} = P(E_j \to X_i)
$$
### Causal inference
#### Subset-to-all
When you run causal inference with `findr` using matrix-based inputs, the default is to return posterior probabilities for [each test](https://tmichoel.github.io/BioFindr.jl/dev/realLLR/) separately:
```{julia}
P = findr(Xm,Gm,mirpairs);
```
Note the dimensions of `P`:
```{julia}
size(P)
```
The third dimension indexes the A-genes (causes), the second dimension the tests (test 2-5, see link above), and the first the B-genes (targets). If you are interested only in a specific combination, use the optional `combination` argument as explained in the [causal inference tutorial](causal-inference.qmd):
```{julia}
P = findr(Xm,Gm,mirpairs; combination="IV");
```
[1]: https://github.com/tmichoel/BioFindr.jl
[2]: https://github.com/lingfeiwang/findr-data-geuvadis
[3]: https://doi.org/10.1038/nature12531
[4]: https://dataframes.juliadata.org/stable/
[5]: https://doi.org/10.1371/journal.pcbi.1005703
[6]: https://tmichoel.github.io/BioFindr.jl/dev/