R/2_4_textPredict.R
textPredict.Rd
Trained models created by e.g., textTrain() or stored on e.g., github can be used to predict new scores or classes from embeddings or text using textPredict.
textPredict(
model_info = NULL,
word_embeddings = NULL,
texts = NULL,
x_append = NULL,
type = NULL,
dim_names = TRUE,
save_model = TRUE,
threshold = NULL,
show_texts = FALSE,
device = "cpu",
participant_id = NULL,
save_embeddings = TRUE,
save_dir = "wd",
save_name = "textPredict",
story_id = NULL,
dataset_to_merge_predictions = NULL,
previous_sentence = FALSE,
...
)
(character or r-object) model_info has three options. 1: R model object (e.g, saved output from textTrain). 2:link to github-model (e.g, "https://github.com/CarlViggo/pretrained_swls_model/raw/main/trained_github_model_logistic.RDS"). 3: Path to a model stored locally (e.g, "path/to/your/model").
(tibble) Embeddings from e.g., textEmbed(). If you're using a pretrained model, then texts and embeddings cannot be submitted simultaneously (default = NULL).
(character) Text to predict. If this argument is specified, then arguments "word_embeddings" and "premade embeddings" cannot be defined (default = NULL).
(tibble) Variables to be appended after the word embeddings (x).
(character) Defines what output to give after logistic regression prediction. Either probabilities, classifications or both are returned (default = "class". For probabilities use "prob". For both use "class_prob").
(boolean) Account for specific dimension names from textEmbed() (rather than generic names including Dim1, Dim2 etc.). If FALSE the models need to have been trained on word embeddings created with dim_names FALSE, so that embeddings were only called Dim1, Dim2 etc.
(boolean) The model will by default be saved in your work-directory (default = TRUE). If the model already exists in your work-directory, it will automatically be loaded from there.
(numeric) Determine threshold if you are using a logistic model (default = 0.5).
(boolean) Show texts together with predictions (default = FALSE).
Name of device to use: 'cpu', 'gpu', 'gpu:k' or 'mps'/'mps:k' for MacOS, where k is a specific device number such as 'mps:1'.
(list) Vector of participant-ids. Specify this for getting person level scores (i.e., summed sentence probabilities to the person level corrected for word count). (default = NULL)
(boolean) If set to TRUE, embeddings will be saved with a unique identifier, and will be automatically opened next time textPredict is run with the same text. (default = TRUE)
(character) Directory to save embeddings. (default = "wd" (i.e, work-directory))
(character) Name of the saved embeddings (will be combined with a unique identifier). (default = ""). Obs: If no save_name is provided, and model_info is a character, then save_name will be set to model_info.
(vector) Vector of story-ids. Specify this to get story level scores (i.e., summed sentence probabilities corrected for word count). When there is both story_id and participant_id indicated, the function returns a list including both story level and person level prediction corrected for word count. (default = NULL)
(R-object, tibble) Insert your data here to integrate predictions to your dataset, (default = NULL).
If set to TRUE, word-embeddings will be averaged over the current and previous sentence per story-id. For this, both participant-id and story-id must be specified.
Setting from stats::predict can be called.
Predictions from word-embedding or text input.
See textTrain
, textTrainLists
and
textTrainRandomForest
.
if (FALSE) {
# Text data from Language_based_assessment_data_8
text_to_predict <- "I am not in harmony in my life as much as I would like to be."
# Example 1: (predict using pre-made embeddings and an R model-object)
prediction1 <- textPredict(
trained_model,
word_embeddings_4$texts$satisfactiontexts
)
# Example 2: (predict using a pretrained github model)
prediction3 <- textPredict(
texts = text_to_predict,
model_info = "https://github.com/CarlViggo/pretrained-models/raw/main/trained_hils_model.RDS"
)
# Example 3: (predict using a pretrained logistic github model and return
# probabilities and classifications)
prediction4 <- textPredict(
texts = text_to_predict,
model_info = "https://github.com/CarlViggo/pretrained-models/raw/main/
trained_github_model_logistic.RDS",
type = "class_prob",
threshold = 0.7
)
##### Automatic implicit motive coding section ######
# Create example dataset
implicit_motive_data <- dplyr::mutate(.data = Language_based_assessment_data_8,
participant_id = dplyr::row_number())
# Code implicit motives.
implicit_motives <- textPredict(
texts = implicit_motive_data$satisfactiontexts,
model_info = "power",
participant_id = implicit_motive_data$participant_id,
dataset_to_merge_predictions = implicit_motive_data
)
# Examine results
implicit_motives$sentence_predictions
implicit_motives$person_predictions
}
if (FALSE) {
# Examine the correlation between the predicted values and
# the Satisfaction with life scale score (pre-included in text).
psych::corr.test(
predictions1$word_embeddings__ypred,
Language_based_assessment_data_8$swlstotal
)
}