## Installation

textrpp_install() textrpp_install_virtualenv()

Install text required python packages in conda or virtualenv environment

textrpp_uninstall()

Uninstall textrpp conda environment

textrpp_initialize()

Initialize text required python packages

## Transform text to word embeddings

textEmbed()

Extract layers and aggregate them to word embeddings, for all character variables in a given dataframe.

textDimName()

Change the names of the dimensions in the word embeddings.

textEmbedRawLayers()

Extract layers of hidden states (word embeddings) for all character variables in a given dataframe.

textEmbedLayerAggregation()

Select and aggregate layers of hidden states to form a word embeddings.

textEmbedStatic()

Applies word embeddings from a given decontextualized static space (such as from Latent Semantic Analyses) to all character variables

textClassify()

Predict label and probability of a text using a pretrained classifier language model. (experimental)

textGeneration()

Predicts the words that will follow a specified text prompt. (experimental)

textNER()

Named Entity Recognition. (experimental)

textSum()

Summarize texts. (experimental)

textQA()

textTranslate()

Translation. (experimental)

textZeroShot()

Zero Shot Classification (Experimental)

## Train word embeddings

textTrain()

Train word embeddings to a numeric (ridge regression) or categorical (random forest) variable.

textTrainLists()

Individually trains word embeddings from several text variables to several numeric or categorical variables. It is possible to have word embeddings from one text variable and several numeric/categprical variables; or vice verse, word embeddings from several text variables to one numeric/categorical variable. It is not possible to mix numeric and categorical variables.

textTrainRegression()

Train word embeddings to a numeric variable.

textTrainRandomForest()

Train word embeddings to a categorical variable using random forrest.

## Predict from word embeddings

textPredict()

Predict scores or classification from, e.g., textTrain.

textPredictTest()

Significance testing correlations If only y1 is provided a t-test is computed, between the absolute error from yhat1-y1 and yhat2-y1.

textPredictAll()

Predict from several models, selecting the correct input

## Semantic similarities and distances

textSimilarity()

Compute the semantic similarity between two text variables.

textDistance()

Compute the semantic distance between two text variables.

textSimilarityMatrix()

Compute semantic similarity scores between all combinations in a word embedding

textDistanceMatrix()

Compute semantic distance scores between all combinations in a word embedding

textSimilarityNorm()

Compute the semantic similarity between a text variable and a word norm (i.e., a text represented by one word embedding that represent a construct).

textDistanceNorm()

Compute the semantic distance between a text variable and a word norm (i.e., a text represented by one word embedding that represent a construct/concept).

## Plot words in the word embedding space

textProjection()

Compute Supervised Dimension Projection and related variables for plotting words.

textPlot()

Plot words from textProjection() or textWordPrediction().

textProjectionPlot()

Plot words according to Supervised Dimension Projection.

textWordPrediction()

Compute predictions based on single words for plotting words. The word embeddings of single words are trained to predict the mean value associated with that word. P-values does NOT work yet.

textCentrality()

Compute semantic similarity score between single words' word embeddings and the aggregated word embedding of all words.

textCentralityPlot()

Plot words according to semantic similarity to the aggregated word embedding.

textPCA()

Compute 2 PCA dimensions of the word embeddings for individual words.

textPCAPlot()

Plot words according to 2-D plot from 2 PCA components.

textModels()

textModelLayers()

Get the number of layers in a given model.

textModelsRemove()

Delete a specified model and model associated files.

## Miscellaneous

textDescriptives()

Compute descriptive statistics of character variables.

textTokenize()

Tokenize according to different huggingface transformers

## Example Data

Language_based_assessment_data_8

Text and numeric data for 10 participants.

word_embeddings_4

Word embeddings for 4 text variables for 40 participants

raw_embeddings_1

Word embeddings from textEmbedRawLayers function

Language_based_assessment_data_3_100

Example text and numeric data.

DP_projections_HILS_SWLS_100

Data for plotting a Dot Product Projection Plot.

centrality_data_harmony

Example data for plotting a Semantic Centrality Plot.

PC_projections_satisfactionwords_40

Example data for plotting a Principle Component Projection Plot.