recognizer
Class PersonalityRecognizer

java.lang.Object
  extended by recognizer.PersonalityRecognizer

public class PersonalityRecognizer
extends java.lang.Object

 
  
  The program computes features described in (Mairesse et al., 2007) 
  given a text, and it runs Weka models on the features to produce 
  personality scores for all Big Five dimensions.
  
  The MRC Psycholinguistic database and the LIWC tool need to be installed, 
  and the file PersonalityRecognizer.conf in the main directory needs to 
  be modified accordingly. The PersonalityRecognizer script should be used 
  for launching the program.
 
  Usage: PersonalityRecognizer [-d] [-m model_number] [-o] [-c] [-t model_type] [-a arff_output_file] -i file|directory
                -c,--counts      Also outputs feature counts, -d must be disabled
                -d,--directory   Corpus analysis mode. Input must be a directory with 
                                 multiple text files, features are standardized over 
                                 the corpus and the recognizer outputs a personality 
                                 estimate for each text file.
                -i,--input       Input file or directory (required)
                -m,--model       Model to use for computing scores (default 4). Options:
                                                1 = Linear Regression
                                                2 = M5' Model Tree
                                                3 = M5' Regression Tree
                                                4 = Support Vector Machine with Linear Kernel (SMOreg)
                -o,--outputmod   Also outputs models
                -t,--type        Selects the type of model to use (default 1). The appropriate
                                 model depends on the language sample (written or 
                                 spoken), and whether observed personality (as perceived 
                                 by external judges) or self-assessed personality (the 
                                 writer/speaker's perception) needs to be estimated from the 
                                 text. Options:
                                                1 = Observed personality from spoken language
                                                2 = Self-assessed personality from written language
                -a,--arff       In corpus analysis mode, outputs the features of each text into 
                                a Weka .arff dataset file, together with the predicted scores.
                                New models can be trained by adding features and replacing the scores
                                with human estimates. Each line corresponds to a text in the corpus 
                                indicated by the filename feature.
   
  
  See the included readme file and the website 
  http://www.mairesse.co.uk/personality/recognizer.html
  for more information. 
  
  Questions can be emailed to the author (webpage: http://www.mairesse.co.uk).
  
  Reference paper:
  
  Francois Mairesse, Marilyn Walker, Matthias Mehl and Roger Moore. 
  Using Linguistic Cues for the Automatic Recognition of Personality in 
  Conversation and Text. Journal of Artificial Intelligence Research (JAIR), 
  30, pages 457-500, 2007. 
  
  Available on the web in PDF format at 
  http://www.mairesse.co.uk/papers/personality-jair07.pdf
  
 

Version:
1.03
Author:
Francois Mairesse, http://www.mairesse.co.uk

Field Summary
static java.io.File DEFAULT_CONFIG_FILE
          Configuration file (default is PersonalityRecognizer.conf in root application directory).
static java.lang.String[] DIMENSIONS
          Personality dimensions names.
static java.lang.String FS
          File separator.
static java.lang.String LS
          Line separator.
 
Constructor Summary
PersonalityRecognizer()
          Initializes parameters based on the default configuration file (PersonalityRecognizer.properties).
PersonalityRecognizer(java.io.File propFile)
          Initializes parameters based on configuration file, and loads the LIWC dictionary and the MRC database in memory.
 
Method Summary
 java.util.Map<java.io.File,java.lang.Double[]> computeScoresOverCorpus(java.io.File dir, weka.classifiers.Classifier[] models, java.io.File outputArffFile)
          Runs the models of each personality trait for each file in the directory.
 java.util.Map<java.lang.String,java.lang.Double> getFeatureCounts(java.lang.String text, boolean relativeOnly)
          Computes the features from the input text (70 LIWC features and 14 from the MRC database).
 int getModelIndex()
          Gets the current default model index.
 int getModelIndex(java.lang.String modelDir)
          Gets the model index in the MODEL_NAMES array from a string representation.
 weka.classifiers.Classifier[] loadWekaModels(boolean selfModel, boolean stdModels)
          Loads saved Weka models in memory for all personality dimensions, using the default model type.
 weka.classifiers.Classifier[] loadWekaModels(int modelIndex, boolean selfModel, boolean stdModels)
          Loads saved Weka models in memory for all personality dimensions.
static void main(java.lang.String[] args)
          Main method that initializes the parameters from the configuration file, counts the features from the input text(s), run the specified Weka models for this feature set for each Big Five personality traits, and returns the personality score estimates to the standard output.
 void printOutput(weka.classifiers.Classifier[] models, double[] scores, int modelIndex, boolean printModels, boolean self, java.io.PrintStream out)
          Prints personality scores to standard output, and model details if required.
 void printOutput(weka.classifiers.Classifier[] models, java.util.Map<java.io.File,java.lang.Double[]> scores, int modelIndex, boolean printModels, boolean self, java.io.PrintStream out)
          Prints personality scores of multiple files to standard output, and model details if required.
 double[] runWekaModels(weka.classifiers.Classifier[] models, java.util.Map<java.lang.String,java.lang.Double> counts)
          Runs each Weka model on a new instance created from the input feature counts, and outputs the resulting personality score.
 void setModel(int modelIndex)
          Sets the default Weka model to load when calling loadWekaModels().
 void setModel(java.lang.String modelDir)
          Sets the default Weka model to load when calling loadWekaModels().
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DIMENSIONS

public static final java.lang.String[] DIMENSIONS
Personality dimensions names.


LS

public static final java.lang.String LS
Line separator.


FS

public static final java.lang.String FS
File separator.


DEFAULT_CONFIG_FILE

public static final java.io.File DEFAULT_CONFIG_FILE
Configuration file (default is PersonalityRecognizer.conf in root application directory).

Constructor Detail

PersonalityRecognizer

public PersonalityRecognizer(java.io.File propFile)
Initializes parameters based on configuration file, and loads the LIWC dictionary and the MRC database in memory.

Parameters:
propFile - configuration file in ASCII format ( VARIABLE = "VALUE" on each line).

PersonalityRecognizer

public PersonalityRecognizer()
Initializes parameters based on the default configuration file (PersonalityRecognizer.properties).

Method Detail

main

public static void main(java.lang.String[] args)
Main method that initializes the parameters from the configuration file, counts the features from the input text(s), run the specified Weka models for this feature set for each Big Five personality traits, and returns the personality score estimates to the standard output.

Parameters:
args - set of options and input file(s).

setModel

public void setModel(int modelIndex)
Sets the default Weka model to load when calling loadWekaModels().

Parameters:
modelIndex - the index of the element in the MODEL_DIRS array corresponding to the directory of the model to load.

setModel

public void setModel(java.lang.String modelDir)
Sets the default Weka model to load when calling loadWekaModels().

Parameters:
modelDir - the model subdirectory in the MODEL_DIRS array corresponding to the model to load.

getModelIndex

public int getModelIndex()
Gets the current default model index.

Returns:
index of the default model in the MODEL_NAMES array

getModelIndex

public int getModelIndex(java.lang.String modelDir)
Gets the model index in the MODEL_NAMES array from a string representation.

Parameters:
modelDir - the model subdirectory in the MODEL_DIRS array corresponding to the model to load.
Returns:
the index of the model in the MODEL_NAMES array.

loadWekaModels

public weka.classifiers.Classifier[] loadWekaModels(boolean selfModel,
                                                    boolean stdModels)
Loads saved Weka models in memory for all personality dimensions, using the default model type.

Parameters:
selfModel - if set to true, loads the self-report models.
stdModels - if set to true, loads the standardized models.
Returns:
an array of Weka models (Classifier objects) loaded from each model filename in specified in the DIM_MODEL_FILES array.

loadWekaModels

public weka.classifiers.Classifier[] loadWekaModels(int modelIndex,
                                                    boolean selfModel,
                                                    boolean stdModels)
Loads saved Weka models in memory for all personality dimensions.

Parameters:
modelIndex - the index of the element in the MODEL_DIRS array corresponding to the directory of the model to load.
selfModel - if set to true, loads the self-report models.
stdModels - if set to true, loads the standardized models.
Returns:
an array of Weka models (Classifier objects) loaded from each model filename in specified in the DIM_MODEL_FILES array.

runWekaModels

public double[] runWekaModels(weka.classifiers.Classifier[] models,
                              java.util.Map<java.lang.String,java.lang.Double> counts)
Runs each Weka model on a new instance created from the input feature counts, and outputs the resulting personality score.

Parameters:
models - array of Weka models (Classifier objects).
counts - mapping of feature counts (Double objects), it must probide a value for all attribute strings of the input models.
Returns:
an array containing a personality score for each model.

getFeatureCounts

public java.util.Map<java.lang.String,java.lang.Double> getFeatureCounts(java.lang.String text,
                                                                         boolean relativeOnly)
                                                                  throws java.lang.Exception
Computes the features from the input text (70 LIWC features and 14 from the MRC database).

Parameters:
text - input text.
relativeOnly - do not return absolute count features (WC), must be set to false if standardized features are used (corpus analysis mode).
Returns:
mapping associating feature names strings with feature counts (Double objects).
Throws:
java.lang.Exception

printOutput

public void printOutput(weka.classifiers.Classifier[] models,
                        double[] scores,
                        int modelIndex,
                        boolean printModels,
                        boolean self,
                        java.io.PrintStream out)
Prints personality scores to standard output, and model details if required.

Parameters:
models - array of Weka models.
scores - array of personality scores to print.
modelIndex - index of the model used in the MODEL_NAMES array.
printModels - if true, prints out a textual representation of the models.
out - output stream.

printOutput

public void printOutput(weka.classifiers.Classifier[] models,
                        java.util.Map<java.io.File,java.lang.Double[]> scores,
                        int modelIndex,
                        boolean printModels,
                        boolean self,
                        java.io.PrintStream out)
Prints personality scores of multiple files to standard output, and model details if required.

Parameters:
models - array of Weka models.
scores - map associating each file to an array of personality scores to print.
modelIndex - index of the model used in the MODEL_NAMES array.
printModels - if true, prints out a textual representation of the models.
out - output stream.

computeScoresOverCorpus

public java.util.Map<java.io.File,java.lang.Double[]> computeScoresOverCorpus(java.io.File dir,
                                                                              weka.classifiers.Classifier[] models,
                                                                              java.io.File outputArffFile)
Runs the models of each personality trait for each file in the directory. Feature values are standardized.

Parameters:
dir - input directory containing multiple text files.
models - models of each Big Five personality trait.
outputArffFile - Weka arff file to print the feature values and scores to (null=none).
Returns:
map associating each file with an array of personality scores for each trait.