Personality Recognizer v1.03
[ Requirements | Download | Installation and usage | Directory structure | Javadoc API |
Online demo | Contact author ]
Note: unfortunately I currently do not have the time to support or maintain this software.
The Personality Recognizer is a Java command-line application that reads a set of text files and computes estimates of personality scores along the Big Five dimensions (Norman, 1963):
- Extraversion
- Emotional stability
- Agreeableness
- Conscientiousness
- Openness to experience
The program is based on models analyzed in (Mairesse et al., 2007) that were shown to predict personality scores significantly better than a constant baseline. The program uses a command line interface, and outputs scores on a scale from 1 to 7, e.g. where 7 is strongly extravert. An online web demo is available.
Requirements
First check that the required components are correctly installed:
History
Major updates in version 1.03 (24/06/2008):
- Added the -a option to write the existing feature values to a Weka
arff
file. This facilitates training models on new data, as well as the addition
of new features. See details.
Major updates in version 1.02 (06/06/2007):
- Added a corpus analysis mode (option -d) in which the recognizer estimates the personality of
individual text files in a collection of texts by standardizing the feature values over the corpus
and running standardized models that can be applied across domains.
- Removed 18 LIWC features that didn't generalize well across domain, e.g. School, Job, etc.
- Added models of self-assessed personality from written language, while previous models only estimated
observed personality from spoken language (option -t).
- Added SVM models as a default (support vector machine with linear kernel, i.e. SMOreg)
Download source and binary files
The Personality Recognizer is a Java application that can run on any platform. Instructions for
running it are in the installation section.
By downloading the Personality Recognizer, you agree to use this program for non-commercial purpose
only, i.e. solely for education or research. Please contact François Mairesse if you want to use it commercially. If
you find it useful for your research, please cite (Mairesse et al., 2007) appropriately.
- Download the v1.03 Java sources, binaries, and documentation:
recognizer-1.0.3.tar.gz (24/06/2008)
Warning: if you're using Winzip, you should save the file on the hard drive before opening it.
Installation and usage instructions
- Unpack the archive by keeping the directory structure (e.g. using
tar xvzf
recognizer-1.0.2.tar.gz or Winzip).
- Edit the
PersonalityRecognizer.properties file in the root directory appropriately, by specifying the paths to
- the
PersonalityRecognizer installation directory
- the
mrc2.dct file from the MRC Psycholinguistic Database
- the
LIWC.CAT dictionary file from the Linguistic Inquiry and Word Count 2001
tool
You can specificy either Unix or Windows paths, the file PersonalityRecognizer.properties.win is an
example configuration file for Windows.
- If running under Unix:
- Make the
PersonalityRecognizer and compile files executable
- Modify the environment variables in the
PersonalityRecognizer shell script in the root directory, including the paths to the Java installation directory and to the Weka jar file.
- The program can then be launched using the
PersonalityRecognizer command.
- If running under Windows:
- Modify the environment variables in the
PersonalityRecognizer.bat batch file in the root
directory, including the paths to the Java installation directory and to the Weka jar file.
- The program can then be launched using the
PersonalityRecognizer.bat command.
- The program takes the following options:
Usage: PersonalityRecognizer [-d] [-m model_number] [-o] [-c] [-t model_type] [-a output_arff_file] -i file|directory
-c,--counts Also outputs feature counts, -d must be disabled
-d,--directory Corpus analysis mode. Input must be a directory with
multiple text files, features are standardized over
the corpus and the recognizer outputs a personality
estimate for each text file.
-i,--input Input file or directory (required)
-m,--model Model to use for computing scores (default 4). Options:
1 = Linear Regression
2 = M5' Model Tree
3 = M5' Regression Tree
4 = Support Vector Machine with Linear Kernel (SMOreg)
-o,--outputmod Also outputs models
-t,--type Selects the type of model to use (default 1). The appropriate
model depends on the language sample (written or
spoken), and whether observed personality (as perceived
by external judges) or self-assessed personality (the
writer/speaker's perception) needs to be estimated from the
text. Options:
1 = Observed personality from spoken language
2 = Self-assessed personality from written language
-a,--arff In corpus analysis mode, outputs the features of each text into
a Weka .arff dataset file, together with the predicted scores.
New models can be trained by adding features and replacing the scores
by human estimates. Each line corresponds to a text in the corpus
indicated by the filename feature.
Given a text file or a directory, the program will output personality scores for the Big Five
dimensions at the standard output. Feature counts and a textual representation of the models can be
shown using the -c and -o options, respectively. In corpus analysis mode (option -d), the recognizer
estimates the scores of each text in the specified directory, and uses standardized models
to improve accuracy in the target application domain. By default, the recognizer estimates observed
personality using models trained on spoken language data, but the option '-t 2' switches to models
of self-assessed personality from written language, i.e. estimating the writer's own perception of
him/herself.
For example, the following Unix command computes personality scores (self-report) for each text in
the examples directory, using standardized SVM models trained on written language:
PersonalityRecognizer -i examples -d -t 2 -m 4
The output of this command can be found in the file output.txt .
- How to add new features to the models using my own corpus?
While the models need to be retrained to include new features,
this can be done using the -a and -d options to write the LIWC and MRC feature values of your corpus to a Weka arff file,
(as well as the personality score estimates). New features can be included
by adding new attributes in the arff file, and making sure that the feature values are standardized over the full corpus.
The model can then be trained on your corpus
by replacing the personality scores (the last five attributes) by human judgements of each text sample.
The Weka Explorer or Experimenter can then be used for comparing various models.
Creation of the example.arff dataset file based on the example corpus:
PersonalityRecognizer -i examples -d -a examples.arff -t 1 -m 1
Directory structure
The program files are organized as follows:
.: Application root directory
./PersonalityRecognizer Unix program launcher script (to be modified)
./PersonalityRecognizer.bat Windows program launcher batch file (to be modified)
./PersonalityRecognizer.properties Configuration file (to be modified)
./PersonalityRecognizer.properties.win Example configuration file for Windows
./compile Unix shell script for recompiling sources (to be modified)
./compile.bat Windows batch file for recompiling sources (to be modified)
./readme.html This readme file
./output.txt Sample output file (see above)
./bin:
./bin/recognizer: Java binary files
./src/recognizer/PersonalityRecognizer.class
./src/recognizer/LIWCDictionary.class
./src/recognizer/Utils.class
./doc: Javadoc documentation
./examples Example corpus (see above)
./lib:
./lib/attributes-info.arff Template Weka ARFF file with all attributes
./lib/commons-cli-1.0.jar Apache Jakarta Commons command line interface library
./lib/jmrc.jar jMRC - MRC Psycholinguistic Database Java Interface library
./lib/models: Model files
./lib/models/obs: Models of observed personality trained on spoken language
./lib/models/obs/LinearRegression: Linear Regression model files (one for each 5 personality traits)
./lib/models/obs/M5P: M5' Model Tree model files, e.g.:
./lib/models/obs/M5P/extra.model Extraversion model
./lib/models/obs/M5P/ems.model Emotional stability model
./lib/models/obs/M5P/agree.model Agreeableness model
./lib/models/obs/M5P/consc.model Conscientiousness model
./lib/models/obs/M5P/open.model Openness to experience model
./lib/models/obs/M5P-R: M5' Regression Tree model files
./lib/models/obs/SVM: SVM model files
./lib/models/self: Models of self-assessed personality trained on written language
./src:
./src/recognizer: Java source files
./src/recognizer/PersonalityRecognizer.java main file
./src/recognizer/LIWCDictionary.java LIWC dictionary interface
./src/recognizer/Utils.java static methods library
The program architecture allows the user to easily include external Weka models by adding new directories under lib/models , and modify the model names array in the source code. A model file is required for each personality trait, with attributes matching those of the lib/attributes-info.arff file.
Please contact François Mairesse if you have any question,
and feel free to modify the source code as long as you give appropriate credit to the author and
cite the paper at the bottom of the page. You
can use the compile script to recompile the source files. The online Javadoc documentation contains detailed information about the structure of the program.
Reference
[ Back to homepage ]
Francois
Mairesse, 2007 -
|