
Semantic similarity score between single words' and an aggregated word embeddings
Source:R/4_1_textPlotCentrality.R
textCentrality.RdtextCentrality() computes semantic similarity score between single words' word embeddings and the aggregated word embedding of all words.
Usage
textCentrality(
words,
word_embeddings,
word_types_embeddings = word_types_embeddings_df,
method = "cosine",
aggregation = "mean",
min_freq_words_test = 0
)Arguments
- words
(character) Word or text variable to be plotted.
- word_embeddings
Word embeddings from textEmbed for the words to be plotted (i.e., the aggregated word embeddings for the "words" variable).
- word_types_embeddings
Word embeddings from textEmbed for individual words (i.e., the decontextualized word embeddings).
- method
(character) Character string describing type of measure to be computed. Default is "cosine" (see also "spearmen", "pearson" as well as measures from textDistance() (which here is computed as 1 - textDistance) including "euclidean", "maximum", "manhattan", "canberra", "binary" and "minkowski").
- aggregation
(character) Method to aggregate the word embeddings (default = "mean"; see also "min", "max" or "[CLS]").
- min_freq_words_test
(numeric) Option to select words that have at least occurred a specified number of times (default = 0); when creating the semantic similarity scores.
Value
A dataframe with variables (e.g., including semantic similarity, frequencies) for the individual words that are used as input for the plotting in the textCentralityPlot function.
See also
See textCentralityPlot and textProjection.
Examples
# Computes the semantic similarity between the individual word embeddings (Iwe)
# in the "harmonywords" column of the pre-installed dataset: Language_based_assessment_data_8,
# and the aggregated word embedding (Awe).
# The Awe can be interpreted the latent meaning of the text.
if (FALSE) { # \dontrun{
df_for_plotting <- textCentrality(
words = Language_based_assessment_data_8["harmonywords"],
word_embeddings = word_embeddings_4$texts$harmonywords,
word_types_embeddings = word_embeddings_4$word_types
)
# df_for_plotting contain variables (e.g., semantic similarity, frequencies) for
# the individual words that are used for plotting by the textCentralityPlot function.
} # }
