Title: | Multiscale Fisher's Independence Test for Multivariate Dependence |
---|---|
Description: | Test for independence of two random vectors, learn and report the dependency structure. For more information, see Gorsky, Shai and Li Ma, Multiscale Fisher's Independence Test for Multivariate Dependence, Biometrika, accepted, January 2022. |
Authors: | S. Gorsky, L. Ma |
Maintainer: | S. Gorsky <[email protected]> |
License: | CC0 |
Version: | 1.1.1 |
Built: | 2024-11-08 04:43:54 UTC |
Source: | https://github.com/cran/MultiFit |
Perform multiscale test of independence for multivariate vectors. See vignettes for further examples.
MultiFIT(xy, x = NULL, y = NULL, p_star = NULL, R_max = NULL, R_star = 1, rank.transform = TRUE, ranking.approximation = FALSE, M = 10, apply.stopping.rule = FALSE, alpha = 0.05, test.method = "Fisher", correct = TRUE, min.tbl.tot = 25L, min.row.tot = 10L, min.col.tot = 10L, p.adjust.methods = c("H", "Hcorrected"), compute.all.holm = TRUE, return.all.pvs = TRUE, verbose = FALSE)
MultiFIT(xy, x = NULL, y = NULL, p_star = NULL, R_max = NULL, R_star = 1, rank.transform = TRUE, ranking.approximation = FALSE, M = 10, apply.stopping.rule = FALSE, alpha = 0.05, test.method = "Fisher", correct = TRUE, min.tbl.tot = 25L, min.row.tot = 10L, min.col.tot = 10L, p.adjust.methods = c("H", "Hcorrected"), compute.all.holm = TRUE, return.all.pvs = TRUE, verbose = FALSE)
xy |
A list, whose first element corresponds to the matrix x as below, and
its second element corresponds to the matrix y as below. If |
x |
A matrix, number of columns = dimension of random vector, number of rows = number of observations. |
y |
A matrix, number of columns = dimension of random vector, number of rows = number of observations. |
p_star |
Numeric, cuboids associated with tests whose |
R_max |
A positive integer (or Inf), the maximal number of
resolutions to scan (algorithm will stop at a lower resolution if
all tables in it do not meet the criteria specified at |
R_star |
A positive integer, if set to an integer
between 0 and |
rank.transform |
Logical, if |
ranking.approximation |
Logical, if |
M |
A positive integer (or Inf), the number of top ranking tests to continue to split at each resolution. FWER control not guaranteed for this method. |
apply.stopping.rule |
Logical. If TRUE, an adjusted |
alpha |
Numeric. Threshold below which resolution-specific |
test.method |
String, choose "Fisher" for Fisher's exact test (slowest), "chi.sq" for Chi-squared test, "LR" for likelihood-ratio test and "norm.approx" for approximating the hypergeometric distribution with a normal distribution (fastest). |
correct |
Logical, if |
min.tbl.tot |
Non-negative integer, the minimal number of observations
per table below which a |
min.row.tot |
Non-negative integer, the minimal number of observations for row totals in the 2x2 contingency tables below which a contingency table will not be tested. |
min.col.tot |
Non-negative integer, the minimal number of observations for column totals in the 2x2 contingency tables below which a contingency table will not be tested. |
p.adjust.methods |
String, choose between "H" for Holm, "Hcorrected" for Holm with
the correction as specified in |
compute.all.holm |
Logical, if |
return.all.pvs |
Logical, if TRUE, a data frame with all |
verbose |
Logical. |
p.values.holistic
, a named numerical vector containing the holistic p
-values of
for the global null hypothesis (i.e. x independent of y).
p.values.resolution.specific
, a named numerical vector containing the
reslution specific p
-values of for the global null hypothesis (i.e. x independent of y).
res.by.res.pvs
, a dta frame that contains the raw and Bonferroni adjusted
resolution specific p
-values.
all.pvs
, a data frame that contains all p
-values and adjusted
p
-values that are computed. Returned if return.all.pvs
is TRUE
.
all
, a nested list. Each entry is named and contains data about a resolution
that was tested. Each resolution is a list in itself, with cuboids
, a summary of
all tested cuboids in a resolution, tables
, a summary of all 2x2
contingency tables in a resolution, pv
, a numerical vector containing the
p
-values from the tests of independence on 2x2 contingency table in tables
that meet the criteria defined by min.tbl.tot
, min.row.tot
and min.col.tot
.
The length of pv
is equal to the number of rows of tables
. pv.correct
,
similar to the above pv
, corrected p
-values are computed and returned when
correct
is TRUE
. rank.tests
, logical vector that indicates
whether or not a test was ranked among the top M
tests in a resolution. The
length of rank.tests
is equal to the number of rows of tables
. parent.cuboids
,
an integer vector, indicating which cuboids in a resolution are associated with
the ranked tests, and will be further halved in the next higher resolution.
parent.tests
, a logical vector of the same length as the
number of rows of tables
, indicating whether or not a test was chosen as a parent
test (same tests may have multiple children).
set.seed(1) n = 300 Dx = Dy = 2 x = matrix(0, nrow = n, ncol = Dx) y = matrix(0, nrow = n, ncol = Dy) x[,1] = rnorm(n) x[,2] = runif(n) y[,1] = rnorm(n) y[,2] = sin(5 * pi * x[ , 2]) + 1 / 5 * rnorm(n) fit = MultiFIT(x = x, y = y, verbose = TRUE) w = MultiSummary(x = x, y = y, fit = fit, alpha = 0.0001)
set.seed(1) n = 300 Dx = Dy = 2 x = matrix(0, nrow = n, ncol = Dx) y = matrix(0, nrow = n, ncol = Dy) x[,1] = rnorm(n) x[,2] = runif(n) y[,1] = rnorm(n) y[,2] = sin(5 * pi * x[ , 2]) + 1 / 5 * rnorm(n) fit = MultiFIT(x = x, y = y, verbose = TRUE) w = MultiSummary(x = x, y = y, fit = fit, alpha = 0.0001)
Provide a post-hoc summary of significant tests. See vignettes for further examples.
MultiSummary(xy, x = NULL, y = NULL, fit, alpha = 0.05, only.rk = NULL, use.pval = NULL, plot.tests = TRUE, pch = NULL, rd = 2, plot.margin = FALSE)
MultiSummary(xy, x = NULL, y = NULL, fit, alpha = 0.05, only.rk = NULL, use.pval = NULL, plot.tests = TRUE, pch = NULL, rd = 2, plot.margin = FALSE)
xy |
A list, whose first element corresponds to the matrix x as below, and
its second element corresponds to the matrix y as below.
if |
x |
A matrix, number of columns = dimension of random vector, number of rows = number of observations. |
y |
A matrix, number of columns = dimension of random vector, number of rows = number of observations. |
fit |
An object generated by |
alpha |
Numeric, only tests with adjusted |
only.rk |
Positive integer vector. Show only tests that are ranked according to
|
use.pval |
String, choose between |
plot.tests |
Logical, plot the marginal scatter plots that are associated with the presented significant tests. |
pch |
Point style for plots. If left as |
rd |
Numeric, number of figures to round to when presenting ranges of variables. |
plot.margin |
Logical, plot the marginal scatter plot of the margins that are associated with each significant test, without highlighting which points are conditioned on and are in the discretized 2x2 contingency table. |
List whose elements are significant.tests
, a data frame that summarizes
the main features of the tests and their overall ranking by p
-value and
original.scale.cuboids
, a list whose number of elements is equal to the number of
significant tests (the same number of rows of the data frame significant.tests
). Each
element corresponds to a test and is a list whose elements are the marginal ranges of
the associated cuboid.
set.seed(1) n = 300 Dx = Dy = 2 x = matrix(0, nrow = n, ncol = Dx) y = matrix(0, nrow = n, ncol = Dy) x[,1] = rnorm(n) x[,2] = runif(n) y[,1] = rnorm(n) y[,2] = sin(5 * pi * x[ , 2]) + 1 / 5 * rnorm(n) fit = MultiFIT(x = x, y = y, verbose = TRUE) w = MultiSummary(x = x, y = y, fit = fit, alpha = 0.0001)
set.seed(1) n = 300 Dx = Dy = 2 x = matrix(0, nrow = n, ncol = Dx) y = matrix(0, nrow = n, ncol = Dy) x[,1] = rnorm(n) x[,2] = runif(n) y[,1] = rnorm(n) y[,2] = sin(5 * pi * x[ , 2]) + 1 / 5 * rnorm(n) fit = MultiFIT(x = x, y = y, verbose = TRUE) w = MultiSummary(x = x, y = y, fit = fit, alpha = 0.0001)
Plot a post-hoc tree of all tests or all significant tests on 2x2 discretized contingency tables. See vignettes for examples.
MultiTree(xy, x = NULL, y = NULL, fit, show.all = FALSE, max.node.size = 5, min.node.size = 2.5, use.pval = NULL, images.path = NULL, node.name = "node", filename = NULL, filetype = "pdf")
MultiTree(xy, x = NULL, y = NULL, fit, show.all = FALSE, max.node.size = 5, min.node.size = 2.5, use.pval = NULL, images.path = NULL, node.name = "node", filename = NULL, filetype = "pdf")
xy |
A list (optional), whose first element corresponds to the matrix x as below, and
its second element corresponds to the matrix y as below.
if |
x |
A matrix (optional), number of columns = dimension of random vector,
number of rows = number of observations. If |
y |
A matrix (optional), number of columns = dimension of random vector,
number of rows = number of observations. If |
fit |
An object generated by |
show.all |
Logical. If |
max.node.size |
Numeric. Maximal node size. All nodes are scaled between |
min.node.size |
Numeric. Minimal node size. All nodes are scaled between |
use.pval |
String, choose between |
images.path |
String, path to save |
node.name |
String, prefix for file names for nodes |
filename |
String, file name for tree output. If left |
filetype |
String, default is |
The main output of multiTree is a pdf
file with the directed acyclic graph
showing tests as nodes.
In addition, the function returns a list. Its elements are:
qgraph.object
, the graphical object generated by the qgraph
function. See
the qgraph
package documentation for further details.
qgraph.call
, the call for the tree generating function. Arguments for
the call: adj
, the adjacency matrix, nodes.size
, a numeric vector with the
scaled sizes of the nodes, images
, the file names of the nodes images (may be
NULL
), filename
as passed to multiTree
and passed over to qgraph
,
and filetype
as passed to multiTree
and passed over to qgraph
.
Other elements of the returned list are pvs.attributes
, the attributes summarizing the
data and the tests performed as stored in fit
, and n.nodes
, the number of nodes.