Skip to contents

This function fits a high-dimensional model using hexagonal bins and provides options to customize the modeling process, including the choice of bin centroids or bin means, removal of low-density hexagons, and averaging of high-dimensional data.

Usage

fit_highd_model(
  training_data,
  emb_df,
  bin1 = 4,
  r2,
  q = 0.1,
  is_bin_centroid = TRUE,
  is_rm_lwd_hex = FALSE,
  benchmark_to_rm_lwd_hex = NULL,
  col_start_highd = "x"
)

Arguments

training_data

A tibble that contains the training high-dimensional data.

emb_df

A tibble that contains embedding with a unique identifier.

bin1

Number of bins along the x axis.

r2

The ratio of the ranges of the original embedding components.

q

The buffer amount as proportion of data range.

is_bin_centroid

Logical, indicating whether to use bin centroids (default is TRUE).

is_rm_lwd_hex

Logical, indicating whether to remove low-density hexagons (default is FALSE).

benchmark_to_rm_lwd_hex

The benchmark value to remove low-density hexagons.

col_start_highd

The text prefix for columns in the high-dimensional data.

Value

A list containing the data frame with high-dimensional coordinates for 2D bin centroids (df_bin) and the data frame containing information about hexagonal bin centroids (df_bin_centroids) in 2D.

Examples

r2 <- diff(range(s_curve_noise_umap$UMAP2))/diff(range(s_curve_noise_umap$UMAP1))
fit_highd_model(training_data = s_curve_noise_training,
emb_df = s_curve_noise_umap_scaled, bin1 = 4, r2 = r2,
col_start_highd = "x")
#> $df_bin
#> # A tibble: 11 × 8
#>    hb_id      x1     x2       x3        x4         x5       x6         x7
#>    <int>   <dbl>  <dbl>    <dbl>     <dbl>      <dbl>    <dbl>      <dbl>
#>  1     5 -0.458  1.13   -1.78     0.00444  -0.0000532 -0.0265  -0.00143  
#>  2     6  0.491  1.51   -1.86     0.0114   -0.0106    -0.0291  -0.000367 
#>  3     9 -0.340  0.0682 -1.90    -0.000169  0.00304   -0.0130  -0.00288  
#>  4    10  0.798  0.537  -1.44     0.00424  -0.00131    0.0395   0.00177  
#>  5    14  0.734  1.25   -0.358   -0.00214   0.00479    0.0230  -0.0000563
#>  6    15  0.0705 1.65   -0.00255  0.00553  -0.00839   -0.0248   0.00926  
#>  7    19 -0.498  1.03    0.269    0.00358   0.00210    0.00889  0.000509 
#>  8    23 -0.747  0.805   1.58     0.00287  -0.00408    0.0159   0.000514 
#>  9    27  0.0106 1.54    1.90     0.00762   0.00272   -0.0161   0.00209  
#> 10    28  0.552  0.465   1.69    -0.00901   0.00561   -0.0155  -0.00369  
#> 11    31  0.901  1.55    1.38     0.000970  0.00252    0.00784 -0.00202  
#> 
#> $df_bin_centroids
#> # A tibble: 11 × 5
#>    hexID     c_x   c_y std_counts drop_empty
#>    <int>   <dbl> <dbl>      <dbl> <lgl>     
#>  1     5  0.0915 0.130      1     FALSE     
#>  2     6  0.474  0.130      0.214 FALSE     
#>  3     9 -0.1    0.461      0.214 FALSE     
#>  4    10  0.283  0.461      0.429 FALSE     
#>  5    14  0.474  0.793      0.643 FALSE     
#>  6    15  0.857  0.793      0.143 FALSE     
#>  7    19  0.666  1.12       0.786 FALSE     
#>  8    23  0.857  1.46       0.286 FALSE     
#>  9    27  0.666  1.79       0.643 FALSE     
#> 10    28  1.05   1.79       0.643 FALSE     
#> 11    31  0.857  2.12       0.357 FALSE     
#>