Test whether there is a significant difference in meaning between two sets of texts (i.e., between their word embeddings).

textSimilarityTest(
  x,
  y,
  Npermutations = 10000,
  method = "paired",
  alternative = c("two_sided", "less", "greater"),
  output.permutations = TRUE,
  N_cluster_nodes = 1,
  seed = 1001
)

Arguments

x

Set of word embeddings from textEmbed.

y

Set of word embeddings from textEmbed.

Npermutations

Number of permutations (default 1000).

method

Compute a "paired" or an "unpaired" test.

alternative

Use a two or one-sided test (select one of: "two_sided", "less", "greater").

output.permutations

If TRUE, returns permuted values in output.

N_cluster_nodes

Number of cluster nodes to use (more makes computation faster; see parallel package).

seed

Set different seed.

Value

A list with a p-value, cosine_estimate and permuted values if output.permutations=TRUE.

Examples

x <- wordembeddings4$harmonywords y <- wordembeddings4$satisfactionwords textSimilarityTest(x, y, method = "paired", Npermutations = 100, N_cluster_nodes = 1, alternative = "two_sided" )
#> $random.estimates.4.null #> [1] 0.4983119 0.5576852 0.5302024 0.5523948 0.5192837 0.5069734 0.5426046 #> [8] 0.5364956 0.5186255 0.5659262 0.5444500 0.5176275 0.5928729 0.5448589 #> [15] 0.5665787 0.6010373 0.5209912 0.5354355 0.5387865 0.5636173 0.5109059 #> [22] 0.5056216 0.5477581 0.5350841 0.5805851 0.5226741 0.5113583 0.5649385 #> [29] 0.5452156 0.5408462 0.5535737 0.5185920 0.5333117 0.5490415 0.5398604 #> [36] 0.5469505 0.5531049 0.5666718 0.5231832 0.5411863 0.5483832 0.5433615 #> [43] 0.5440919 0.5554494 0.5259646 0.5156110 0.6094675 0.5571161 0.5698052 #> [50] 0.5995209 0.5773272 0.5335416 0.5134631 0.5458662 0.5353203 0.5241089 #> [57] 0.4621846 0.5288317 0.5209880 0.5505339 0.5361929 0.5515761 0.5260796 #> [64] 0.5307438 0.5245073 0.5504687 0.5401110 0.5514474 0.5652146 0.5123560 #> [71] 0.5927479 0.5063603 0.5396703 0.5426171 0.5244020 0.5392873 0.5670804 #> [78] 0.5725232 0.4902566 0.5310121 0.5305491 0.5199153 0.5569171 0.5913420 #> [85] 0.5827711 0.5493309 0.5374053 0.5789101 0.5505858 0.5584467 0.4953798 #> [92] 0.4909416 0.5565927 0.5279804 0.5654466 0.5408371 0.4599621 0.5716093 #> [99] 0.5198857 0.5441615 #> #> $embedding_x #> [1] "x : Information about the embeddings. textEmbedLayersOutput: model: bert-base-uncased layers: 11 12 . textEmbedLayerAggregation: layers = 11 12 aggregate_layers = concatenate aggregate_tokens = mean tokens_select = tokens_deselect = " #> #> $embedding_y #> [1] "y : Information about the embeddings. textEmbedLayersOutput: model: bert-base-uncased layers: 11 12 . textEmbedLayerAggregation: layers = 11 12 aggregate_layers = concatenate aggregate_tokens = mean tokens_select = tokens_deselect = " #> #> $test_description #> [1] "permutations = 100 method = paired alternative = two_sided" #> #> $time_date #> [1] "Duration to run the test: 0.277342 secs; Date created: 2021-05-14 09:11:52" #> #> $cosine_estimate #> [1] 0.6069308 #> #> $p.value #> [1] 0.02 #>