Package index • text

Installation

textrpp_install() textrpp_install_virtualenv(): Install text required python packages in conda or virtualenv environment

textrpp_uninstall(): Uninstall textrpp conda environment

textrpp_initialize(): Initialize text required python packages

Transform text to word embeddings

textEmbed(): textEmbed() extracts layers and aggregate them to word embeddings, for all character variables in a given dataframe.

textDimName(): Change dimension names

textEmbedRawLayers(): Extract layers of hidden states

textEmbedLayerAggregation(): Aggregate layers

textEmbedReduce(): Pre-trained dimension reduction (experimental)

textEmbedStatic(): Apply static word embeddings

Fine-tuning models

textFineTuneTask(): Task Adapted Pre-Training (EXPERIMENTAL - under development)

textFineTuneDomain(): Domain Adapted Pre-Training (EXPERIMENTAL - under development)

Text language analysis tasks

textGeneration(): Text generation

textNER(): Named Entity Recognition. (experimental)

textSum(): Summarize texts. (experimental)

textQA(): Question Answering. (experimental)

textTranslate(): Translation. (experimental)

textZeroShot(): Zero Shot Classification (Experimental)

The text-train functions

textTrain(): Trains word embeddings

textTrainLists(): Train lists of word embeddings

textTrainRegression(): Train word embeddings to a numeric variable.

textTrainRandomForest(): Trains word embeddings usig random forest

textTrainN(): Cross-validated accuracies across sample-sizes

textTrainNPlot(): Plot cross-validated accuracies across sample sizes

textTrainExamples() textPredictExamples(): Show language examples (Experimental)

The text-predict functions

textPredict() textAssess() textClassify(): textPredict, textAssess and textClassify

textTrainExamples() textPredictExamples(): Show language examples (Experimental)

textPredictTest(): Significance testing correlations If only y1 is provided a t-test is computed, between the absolute error from yhat1-y1 and yhat2-y1.

textPredictAll(): Predict from several models, selecting the correct input

textLBAM(): The LBAM library

Semantic similarities and distances functions

textSimilarity(): Semantic Similarity

textDistance(): Semantic distance

textSimilarityMatrix(): Semantic similarity across multiple word embeddings

textDistanceMatrix(): Semantic distance across multiple word embeddings

textSimilarityNorm(): Semantic similarity between a text variable and a word norm

textDistanceNorm(): Semantic distance between a text variable and a word norm

Visualise words in the word embedding space

textProjection(): Supervised Dimension Projection

textPlot(): Plot words

textProjectionPlot(): Plot Supervised Dimension Projection

textCentrality(): Semantic similarity score between single words' and an aggregated word embeddings

textCentralityPlot(): Plots words from textCentrality()

textPCA(): textPCA()

textPCAPlot(): textPCAPlot

BERTopics

textTopics(): BERTopics

textTopicsTest(): Wrapper for topicsTest function from the topics package

textTopicsWordcloud(): Plot word clouds

textTopicsReduce(): textTopicsReduce (EXPERIMENTAL)

textTopicsTree(): textTopicsTest (EXPERIMENTAL) to get the hierarchical topic tree

View or delete downloaded HuggingFace models

textModels(): Check downloaded, available models.

textModelLayers(): Number of layers

textModelsRemove(): Delete a specified model

Miscellaneous

textDescriptives(): Compute descriptive statistics of character variables.

textClean(): Cleans text from standard personal information

textTokenize(): Tokenize text-variables

textTokenizeAndCount(): Tokenize and count

textDomainCompare(): Compare two language domains

textFindNonASCII(): Detect non-ASCII characters

textCleanNonASCII(): Clean non-ASCII characters

Example Data

Language_based_assessment_data_8: Text and numeric data for 10 participants.

word_embeddings_4: Word embeddings for 4 text variables for 40 participants

raw_embeddings_1: Word embeddings from textEmbedRawLayers function

Language_based_assessment_data_3_100: Example text and numeric data.

DP_projections_HILS_SWLS_100: Data for plotting a Dot Product Projection Plot.

centrality_data_harmony: Example data for plotting a Semantic Centrality Plot.

PC_projections_satisfactionwords_40: Example data for plotting a Principle Component Projection Plot.