Personality Recognizer

Personality Recognizer v1.03

Note: unfortunately I currently do not have the time to support or maintain this software.

The Personality Recognizer is a Java command-line application that reads a set of text files and computes estimates of personality scores along the Big Five dimensions (Norman, 1963):

Extraversion
Emotional stability
Agreeableness
Conscientiousness
Openness to experience

The program is based on models analyzed in (Mairesse et al., 2007) that were shown to predict personality scores significantly better than a constant baseline. The program uses a command line interface, and outputs scores on a scale from 1 to 7, e.g. where 7 is strongly extravert. An online web demo is available.

Requirements

First check that the required components are correctly installed:

Java Runtime Environment (JRE) 1.5 or later
Weka 3.4.3
MRC Psycholinguistic Database
Linguistic Inquiry and Word Count 2001 dictionary file (email author for details)

History

Major updates in version 1.03 (24/06/2008):

Added the -a option to write the existing feature values to a Weka arff file. This facilitates training models on new data, as well as the addition of new features. See details.

Major updates in version 1.02 (06/06/2007):

Added a corpus analysis mode (option -d) in which the recognizer estimates the personality of individual text files in a collection of texts by standardizing the feature values over the corpus and running standardized models that can be applied across domains.
Removed 18 LIWC features that didn't generalize well across domain, e.g. School, Job, etc.
Added models of self-assessed personality from written language, while previous models only estimated observed personality from spoken language (option -t).
Added SVM models as a default (support vector machine with linear kernel, i.e. SMOreg)

Download source and binary files

The Personality Recognizer is a Java application that can run on any platform. Instructions for running it are in the installation section.

By downloading the Personality Recognizer, you agree to use this program for non-commercial purpose only, i.e. solely for education or research. Please contact François Mairesse if you want to use it commercially. If you find it useful for your research, please cite (Mairesse et al., 2007) appropriately.

Download the v1.03 Java sources, binaries, and documentation: recognizer-1.0.3.tar.gz (24/06/2008)
Warning: if you're using Winzip, you should save the file on the hard drive before opening it.

Installation and usage instructions

Unpack the archive by keeping the directory structure (e.g. using tar xvzf recognizer-1.0.2.tar.gz or Winzip).
Edit the PersonalityRecognizer.properties file in the root directory appropriately, by specifying the paths to
- the PersonalityRecognizer installation directory
- the mrc2.dct file from the MRC Psycholinguistic Database
- the LIWC.CAT dictionary file from the Linguistic Inquiry and Word Count 2001 tool
You can specificy either Unix or Windows paths, the file PersonalityRecognizer.properties.win is an example configuration file for Windows.
If running under Unix:
- Make the PersonalityRecognizer and compile files executable
- Modify the environment variables in the PersonalityRecognizer shell script in the root directory, including the paths to the Java installation directory and to the Weka jar file.
- The program can then be launched using the PersonalityRecognizer command.
If running under Windows:
- Modify the environment variables in the PersonalityRecognizer.bat batch file in the root directory, including the paths to the Java installation directory and to the Weka jar file.
- The program can then be launched using the PersonalityRecognizer.bat command.

The program takes the following options:

Usage: PersonalityRecognizer [-d] [-m model_number] [-o] [-c] [-t model_type] [-a output_arff_file] -i file|directory
 		-c,--counts		Also outputs feature counts, -d must be disabled

 		-d,--directory		Corpus analysis mode. Input must be a directory with 
                         		multiple text files, features are standardized over 
                         		the corpus and the recognizer outputs a personality 
                         		estimate for each text file.

  		-i,--input       	Input file or directory (required)

 		-m,--model       	Model to use for computing scores (default 4). Options:
 	              				1 = Linear Regression
   	            				2 = M5' Model Tree
              					3 = M5' Regression Tree
              					4 = Support Vector Machine with Linear Kernel (SMOreg)

  		-o,--outputmod   	Also outputs models

  		-t,--type		Selects the type of model to use (default 1). The appropriate
                        		model depends on the language sample (written or 
  					spoken), and whether observed personality (as perceived 
  					by external judges) or self-assessed personality (the 
  					writer/speaker's perception) needs to be estimated from the 
  					text. Options:
  						1 = Observed personality from spoken language
                                		2 = Self-assessed personality from written language
                -a,--arff               In corpus analysis mode, outputs the features of each text into 
                                        a Weka .arff dataset file, together with the predicted scores.
                                        New models can be trained by adding features and replacing the scores
                                        by human estimates. Each line corresponds to a text in the corpus 
                                        indicated by the filename feature.

Given a text file or a directory, the program will output personality scores for the Big Five dimensions at the standard output. Feature counts and a textual representation of the models can be shown using the -c and -o options, respectively. In corpus analysis mode (option -d), the recognizer estimates the scores of each text in the specified directory, and uses standardized models to improve accuracy in the target application domain. By default, the recognizer estimates observed personality using models trained on spoken language data, but the option '-t 2' switches to models of self-assessed personality from written language, i.e. estimating the writer's own perception of him/herself.

For example, the following Unix command computes personality scores (self-report) for each text in the examples directory, using standardized SVM models trained on written language:

PersonalityRecognizer -i examples -d -t 2 -m 4

The output of this command can be found in the file output.txt.

How to add new features to the models using my own corpus?
While the models need to be retrained to include new features, this can be done using the -a and -d options to write the LIWC and MRC feature values of your corpus to a Weka arff file, (as well as the personality score estimates). New features can be included by adding new attributes in the arff file, and making sure that the feature values are standardized over the full corpus. The model can then be trained on your corpus by replacing the personality scores (the last five attributes) by human judgements of each text sample. The Weka Explorer or Experimenter can then be used for comparing various models.
Creation of the example.arff dataset file based on the example corpus:
PersonalityRecognizer -i examples -d -a examples.arff -t 1 -m 1

Directory structure

The program files are organized as follows:

.:                               Application root directory
./PersonalityRecognizer		 Unix program launcher script (to be modified)
./PersonalityRecognizer.bat	 Windows program launcher batch file (to be modified)
./PersonalityRecognizer.properties	 	Configuration file (to be modified)
./PersonalityRecognizer.properties.win 		Example configuration file for Windows
./compile			 Unix shell script for recompiling sources (to be modified)
./compile.bat			 Windows batch file for recompiling sources (to be modified)
./readme.html			 This readme file
./output.txt			 Sample output file (see above)

./bin:
./bin/recognizer:                Java binary files
./src/recognizer/PersonalityRecognizer.class
./src/recognizer/LIWCDictionary.class
./src/recognizer/Utils.class

./doc:				 Javadoc documentation

./examples			 Example corpus (see above)

./lib:				
./lib/attributes-info.arff	 Template Weka ARFF file with all attributes
./lib/commons-cli-1.0.jar	 Apache Jakarta Commons command line interface library
./lib/jmrc.jar			 jMRC - MRC Psycholinguistic Database Java Interface library

./lib/models:			 Model files
./lib/models/obs:		 	 Models of observed personality trained on spoken language
./lib/models/obs/LinearRegression:	 Linear Regression model files (one for each 5 personality traits)
./lib/models/obs/M5P:		 M5' Model Tree model files, e.g.:
./lib/models/obs/M5P/extra.model	 Extraversion model
./lib/models/obs/M5P/ems.model	 Emotional stability model
./lib/models/obs/M5P/agree.model	 Agreeableness model
./lib/models/obs/M5P/consc.model	 Conscientiousness model
./lib/models/obs/M5P/open.model	 Openness to experience model
./lib/models/obs/M5P-R:		 M5' Regression Tree model files
./lib/models/obs/SVM:		 SVM model files
./lib/models/self:		 Models of self-assessed personality trained on written language

./src:				
./src/recognizer:		 		Java source files
./src/recognizer/PersonalityRecognizer.java	main file
./src/recognizer/LIWCDictionary.java 		LIWC dictionary interface
./src/recognizer/Utils.java			static methods library

The program architecture allows the user to easily include external Weka models by adding new directories under lib/models, and modify the model names array in the source code. A model file is required for each personality trait, with attributes matching those of the lib/attributes-info.arff file.

Please contact François Mairesse if you have any question, and feel free to modify the source code as long as you give appropriate credit to the author and cite the paper at the bottom of the page. You can use the compile script to recompile the source files. The online Javadoc documentation contains detailed information about the structure of the program.

Reference

François Mairesse, Marilyn Walker, Matthias Mehl and Roger Moore. Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text [PDF] [PS] [BibTeX]. Journal of Artificial Intelligence Research (JAIR), 30, pages 457-500, 2007.

[ Back to homepage ]

Francois Mairesse, 2007 -