recognizer
Class LIWCDictionary

java.lang.Object
  extended by recognizer.LIWCDictionary

public class LIWCDictionary
extends java.lang.Object

Interface to the LIWC dictionary, implementing patterns for each LIWC category based on the LIWC.CAT file.

Version:
1.01
Author:
Francois Mairesse, http://www.mairesse.co.uk

Constructor Summary
LIWCDictionary(java.io.File catFile)
          Loads dictionary from LIWC dictionary tab-delimited text file (with variable names as first row).
 
Method Summary
 java.util.Map<java.lang.String,java.lang.Double> getCounts(java.lang.String text, boolean absoluteCounts)
          Returns a map associating each LIWC categories to the number of their occurences in the input text.
static java.lang.String[] splitSentences(java.lang.String text)
          Splits a text into sentences separated by a dot, exclamation point or question mark.
static java.lang.String[] tokenize(java.lang.String text)
          Splits a text into words separated by non-word characters.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LIWCDictionary

public LIWCDictionary(java.io.File catFile)
Loads dictionary from LIWC dictionary tab-delimited text file (with variable names as first row). Each word category is converted into a regular expression that is a disjunction of all its members.

Parameters:
catFile - dictionary file, it should be pointing to the LIWC.CAT file of the Linguistic Inquiry and Word Count software (Pennebaker & Francis, 2001).
Method Detail

getCounts

public java.util.Map<java.lang.String,java.lang.Double> getCounts(java.lang.String text,
                                                                  boolean absoluteCounts)
Returns a map associating each LIWC categories to the number of their occurences in the input text. The counts are computed matching patterns loaded. It doesn't produce punctuation counts.

Parameters:
text - input text.
absoluteCounts - includes counts that aren't relative to the total word count (e.g. actual word count).
Returns:
hashtable associating each LIWC category with the percentage of words in the text belonging to it.

tokenize

public static java.lang.String[] tokenize(java.lang.String text)
Splits a text into words separated by non-word characters.

Parameters:
text - text to tokenize.
Returns:
an array of words.

splitSentences

public static java.lang.String[] splitSentences(java.lang.String text)
Splits a text into sentences separated by a dot, exclamation point or question mark.

Parameters:
text - text to tokenize.
Returns:
an array of sentences.