This function fits a high-dimensional model using hexagonal bins and provides options to customize the modeling process, including the choice of bin centroids or bin means, removal of low-density hexagons, and averaging of high-dimensional data.
Usage
fit_highd_model(
training_data,
emb_df,
bin1 = 4,
r2,
q = 0.1,
is_bin_centroid = TRUE,
is_rm_lwd_hex = FALSE,
benchmark_to_rm_lwd_hex = NULL,
col_start_highd = "x"
)
Arguments
- training_data
A tibble that contains the training high-dimensional data.
- emb_df
A tibble that contains embedding with a unique identifier.
- bin1
Number of bins along the x axis.
- r2
The ratio of the ranges of the original embedding components.
- q
The buffer amount as proportion of data range.
- is_bin_centroid
Logical, indicating whether to use bin centroids (default is TRUE).
- is_rm_lwd_hex
Logical, indicating whether to remove low-density hexagons (default is FALSE).
- benchmark_to_rm_lwd_hex
The benchmark value to remove low-density hexagons.
- col_start_highd
The text prefix for columns in the high-dimensional data.
Value
A list containing the data frame with high-dimensional coordinates
for 2D bin centroids (df_bin
) and the data frame containing
information about hexagonal bin centroids (df_bin_centroids
) in 2D.
Examples
r2 <- diff(range(s_curve_noise_umap$UMAP2))/diff(range(s_curve_noise_umap$UMAP1))
fit_highd_model(training_data = s_curve_noise_training,
emb_df = s_curve_noise_umap_scaled, bin1 = 4, r2 = r2,
col_start_highd = "x")
#> $df_bin
#> # A tibble: 11 × 8
#> hb_id x1 x2 x3 x4 x5 x6 x7
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 5 -0.458 1.13 -1.78 0.00444 -0.0000532 -0.0265 -0.00143
#> 2 6 0.491 1.51 -1.86 0.0114 -0.0106 -0.0291 -0.000367
#> 3 9 -0.340 0.0682 -1.90 -0.000169 0.00304 -0.0130 -0.00288
#> 4 10 0.798 0.537 -1.44 0.00424 -0.00131 0.0395 0.00177
#> 5 14 0.734 1.25 -0.358 -0.00214 0.00479 0.0230 -0.0000563
#> 6 15 0.0705 1.65 -0.00255 0.00553 -0.00839 -0.0248 0.00926
#> 7 19 -0.498 1.03 0.269 0.00358 0.00210 0.00889 0.000509
#> 8 23 -0.747 0.805 1.58 0.00287 -0.00408 0.0159 0.000514
#> 9 27 0.0106 1.54 1.90 0.00762 0.00272 -0.0161 0.00209
#> 10 28 0.552 0.465 1.69 -0.00901 0.00561 -0.0155 -0.00369
#> 11 31 0.901 1.55 1.38 0.000970 0.00252 0.00784 -0.00202
#>
#> $df_bin_centroids
#> # A tibble: 11 × 6
#> hexID c_x c_y bin_counts std_counts drop_empty
#> <int> <dbl> <dbl> <int> <dbl> <lgl>
#> 1 5 0.0915 0.130 14 1 FALSE
#> 2 6 0.474 0.130 3 0.214 FALSE
#> 3 9 -0.1 0.461 3 0.214 FALSE
#> 4 10 0.283 0.461 6 0.429 FALSE
#> 5 14 0.474 0.793 9 0.643 FALSE
#> 6 15 0.857 0.793 2 0.143 FALSE
#> 7 19 0.666 1.12 11 0.786 FALSE
#> 8 23 0.857 1.46 4 0.286 FALSE
#> 9 27 0.666 1.79 9 0.643 FALSE
#> 10 28 1.05 1.79 9 0.643 FALSE
#> 11 31 0.857 2.12 5 0.357 FALSE
#>