Predict label and probability of a text using a pretrained classifier language model. (experimental)

textClassify(
  x,
  model = "distilbert-base-uncased-finetuned-sst-2-english",
  device = "cpu",
  tokenizer_parallelism = FALSE,
  logging_level = "error",
  return_incorrect_results = FALSE,
  return_all_scores = FALSE,
  function_to_apply = "none",
  set_seed = 202208
)

Arguments

x

(string) A character variable or a tibble/dataframe with at least one character variable.

model

(string) Specification of a pre-trained classifier language model. For full list of options see pretrained classifier models at HuggingFace. For example use "cardiffnlp/twitter-roberta-base-sentiment", "distilbert-base-uncased-finetuned-sst-2-english".

device

(string) Device to use: 'cpu', 'gpu', or 'gpu:k' where k is a specific device number.

tokenizer_parallelism

(boolean) If TRUE this will turn on tokenizer parallelism.

logging_level

(string) Set the logging level. Options (ordered from less logging to more logging): critical, error, warning, info, debug

return_incorrect_results

(boolean) Stop returning some incorrectly formatted/structured results. This setting does CANOT evaluate the actual results (whether or not they make sense, exist, etc.). All it does is to ensure the returned results are formatted correctly (e.g., does the question-answering dictionary contain the key "answer", is sentiments from textClassify containing the labels "positive" and "negative").

return_all_scores

(boolean) Whether to return all prediction scores or just the one of the predicted class.

function_to_apply

(string) The function to apply to the model outputs to retrieve the scores.

set_seed

(Integer) Set seed. There are four different values: "default": if the model has a single label, will apply the sigmoid function on the output. If the model has several labels, the softmax function will be applied on the output. "sigmoid": Applies the sigmoid function on the output. "softmax": Applies the softmax function on the output. "none": Does not apply any function on the output.

Value

A tibble with predicted labels and scores for each text variable. The comment of the object show the model-name and computation time.

Examples

# \donttest{
# classifications <- textClassify(x = Language_based_assessment_data_8[1:2, 1:2])
# classifications
# comment(classifications)
# }