2.0 srqpix GENDER DIFFERENCES IN DISCIPLINARY · PDF filer / srqpix 2.0 GENDER DIFFERENCES IN...
Transcript of 2.0 srqpix GENDER DIFFERENCES IN DISCIPLINARY · PDF filer / srqpix 2.0 GENDER DIFFERENCES IN...
Imag
e: ©
flic
kr/s
rqpi
x C
C B
Y 2.
0
GENDER DIFFERENCES IN DISCIPLINARY WRITING
Brian N. Larson WRAB III, 20 February 2014
Université Paris-Ouest Nanterre La Défense
www.Rhetoricked.com @Rhetoricked
Housekeeping
• www.Rhetoricked.com (these slides + some additional)
• Communicate with me: – @Rhetoricked – [email protected]
• Research supported by: – Graduate Research Partnership Program fellowship(U of M
CLA) – James I. Brown fellowship
www.Rhetoricked.com @Rhetoricked
Synonyms (for now)
• I’ll use these words as synonyms for this talk (for reasons explained in another talk) – {sex, gender, Fr. sexe} – {woman, female, feminine} – {man, male, masculine}
www.Rhetoricked.com @Rhetoricked
Do men and women communicate differently?
• Much work inspired by Robin Lakoff (1975)
• Scholarly and popular works by Deborah Tannen (e.g. 1990) and others
• Much of this research in oral/face-to-face communication
www.Rhetoricked.com @Rhetoricked
Writing: Process and product
• In writing studies, we can (roughly) divide process and product – Do men and women produce writing using
different processes? – Is the writing they produce distinguishable
based on author gender?
www.Rhetoricked.com @Rhetoricked
Previous studies: Process research
• Focus on interpersonal communications in mixed-gender contexts – Lay, 1989; Rehling, 1996; Raign & Sims,
1993; Ton & Klecun, 2004; Wolfe & Alexander, 2005; Brown & Burnett, 2006; Wolfe & Powell, 2006, 2009.
www.Rhetoricked.com @Rhetoricked
Previous studies: Product research
• In technical and professional communication – Sterkel, 1988 (20 stylistic chars) – Smeltzer & Werbel, 1986 (16 stylistic and
evaluative measures) – Tebeaux, 1990 (quality of responses) – Allen, 1994 (markers of authoritativeness)
• Manual methods, small samples
www.Rhetoricked.com @Rhetoricked
Enter computational methods
• Natural language processing (NLP) • Allows processing of large quantities of
text data • Study that attracted my attention
– Argamon et al., 2003 – Koppel, Argamon & Shimoni, 2002 – “02/03 Argamon Study”
www.Rhetoricked.com @Rhetoricked
02/03 Argamon study
• Used 500 published texts from BNC • Mean 34,000 words (‘tokens’) per text • Categorized texts by author gender
accurately – 82.6% of time on non-fiction texts – 79.5% of time on fiction texts
www.Rhetoricked.com @Rhetoricked
Gender in computer-mediated communication (CMC)
• CMC popular for NLP studies – Data are readily available – Data are voluminous
• Examples – Herring & Paolillo, 2006 (blog posts) – Yan & Yan, 2006 (blog posts) – Argamon et al., 2007 (blog posts) – Rao et al., 2010 (Twitter) – Burger et al., 2011 (Twitter)
www.Rhetoricked.com @Rhetoricked
Rationale: Why is the question important?
• Lend support to one or more theories of gender – ‘Two cultures’ (Maltz & Borker, 1982) – ‘Standpoint’ (Barker & Zifcak, 1999) – ‘Performative’ (Butler 1993, 1999, 2004) – Others
• Concern that “women’s writing” may be less persuasive (Amstrong & McAdams 2009)
• Sorting out methodological problems, particularly use of gender as a variable
www.Rhetoricked.com @Rhetoricked
Study design goals
• Overarching: Show utility of NLP/corpus methods in disciplinary communication research. Important considering, e.g., Pakhomov et al. 2008.
• Examine a corpus of texts – All of the same genre – Where we can be confident of single authorship – Where author gender is self-identified
• Analyze them using the same variables (“features”) as the 02/03 Argamon study
• LATER: analyze them using other features
www.Rhetoricked.com @Rhetoricked
Data collection
• Major writing project at end of first year of law school* – Students address hypothetical problem (writing in
same ‘genre’ broadly defined) – Students not allowed to collaborate – Plagiarism difficult (but still possible)
• Students self-identified gender** • 193 texts (mean word tokens = 3764) *Law school comes after 4-year baccalaureate in U.S. **This study IRB-approved (UMN Study #1202E10685)
www.Rhetoricked.com @Rhetoricked
Text genre: Memorandum regarding motion to dismiss
• Written to hypothetical court • Supporting or opposing a motion before
the court • High-level organization is formulaic
www.Rhetoricked.com @Rhetoricked
Memorandum Sections
• Caption** • Introduction/summary* • Facts • Legal standard of review* • Argument • Conclusion • Signature block**
* Not always present. **I did not analyze (content is highly formulaic)
www.Rhetoricked.com @Rhetoricked
Manual Annotation Using GATE
• General Architecture for Text Engineering (Cunningham et al. 2012; 2013)
• Annotation is nondestructive bracketing that allows exclusion of material from analysis
• Annotated and excluded from study – Long quotations – Legal citations – Headings
• Annotated to permit segmentation of samples: – Large sections of text
• About two hours of work for each text in sample
www.Rhetoricked.com @Rhetoricked
Coding and inter-rater reliability
• Two coders did this work • Coding guide developed with other legal
texts (not the study texts) • Performed test of inter-rater reliability on
10 (5%) papers • F-scores satisfactory (for those interested)
– Strict = .83 (target >.80) – Lenient = .98 (target > .95) – Average = .91 (target > .91)
www.Rhetoricked.com @Rhetoricked
Pre-processing
• Exported from GATE in XML • Used Python and NLTK (Bird et al.
2009) – Stripped sections I am not analyzing – Created a text corpus
www.Rhetoricked.com @Rhetoricked
Feature (“variable”) selection
• For now, those of 02/03 Argamon study • Relative frequencies of
– 405 “function words” (I used 429) – 76 BNC parts of speech (I used 45 from the
Penn Treebank tagset) – 500 most common part-of-speech trigrams – 100 most common POS bigrams – I can explain variations from Argamon if you
have questions
www.Rhetoricked.com @Rhetoricked
‘Part-of-speech’ tags? ‘Bigrams & trigrams’?
• First, ‘tokenize’ each sentence (automated): – ‘My aunt’s pen is on the table.’ (purple shading represents ‘function’ words) – ‘La plume de ma tante est sur la table.’
www.Rhetoricked.com @Rhetoricked
POS tags
• Then tag the parts of speech (automated)
• I can now calculate relative frequency of function words and POS tags (automated)
www.Rhetoricked.com @Rhetoricked
POS bigrams and trigrams
• A bigram or trigram is a 2- or 3-token ‘window’ on the sentence. – Automated calculation
www.Rhetoricked.com @Rhetoricked
Each student’s text is represented as a ‘vector’
• A series of numerical values expressing each feature (variable), i.e., the relative frequency of: – Function words / total tokens – POS tags / total tokens – Bigrams / total bigrams* – Trigrams / total trigrams* – Automated calculation
*Multiplied by a factor.
www.Rhetoricked.com @Rhetoricked
Example 1
• Tokens of the function word-type “all” in paper 1007 account for less than 7/100 of 1% of all tokens in that paper.
www.Rhetoricked.com @Rhetoricked
Example 2
• Bigrams made up of a plural common noun (NNS) followed by a coordinating conjunction (CC) accounted for 1/10 of 1% of bigrams in paper 1009.
www.Rhetoricked.com @Rhetoricked
Example 3
• t • Trigrams consisting of a determiner (DT), a past-participle (VBN), and a common noun (NN) accounted for nearly 3.7% of the trigrams in paper 1014
www.Rhetoricked.com @Rhetoricked
Machine learning algorithm (MLA)
• All based on WEKA implementation (Hall et al. 2009)
• Algorithm that trains on part of the data, learning which features are most useful for categorizing texts
• Then it’s tested on another part of the data, to see how accurate it is
• Repeat x times, using different “slices” of the data (x-fold cross validation)
www.Rhetoricked.com @Rhetoricked
Evaluation baselines
• Baseline 1—Default category: If all texts are assigned Gender 1, observed agreement would be 104/193 or 53.9%
• Baseline 2—My “gut” target of 70% (given Argamon study correctly categorized 82.6% (non-fiction))
www.Rhetoricked.com @Rhetoricked
Preliminary results
• These MLAs unable to classify texts better than default-category baseline – Bayesian Logistic Regression (53.9%) – Naïve Bayes (49.2%/51.3%) – Voted Perceptron (53.9%)
www.Rhetoricked.com @Rhetoricked
Preliminary results
• Two performed better than default category baseline, but only one statistically significantly – Logistic regression: 60.1%, χ2=7.66 (df=1),
p<0.01 – Simple Logistic: 57%, χ2=3.54 (df=1), p>0.05
• But compare to target of 70% (considering Argamon at 82.6% (non-fiction))
www.Rhetoricked.com @Rhetoricked
(Preliminary) Conclusion
• My preliminary conclusion: With these texts, on these features, these MLAs cannot be said to classify texts successfully based on author gender.
• Limitations/questions – Conclusion not generalizable (even to all law students. – Would other features (lexical, syntactic, discourse
level) distinguish texts by author gender? – Would humans attempting to classify these texts be
able to? Based on what characteristics? – Would differences be evidence before/after law
school?
www.Rhetoricked.com @Rhetoricked
Possible implications
• If there is gender-correlated language difference coming into law school, the conventions of legal writing disguise it
• Students adapt their language to the genres in which they communicate
• This supports standpoint and performative theories of gender, but not some psychological theories
• Use of gender as a variable in many of these studies is undertheorized
www.Rhetoricked.com @Rhetoricked
Implications for NLP in disciplinary writing research
• Thinking of Pakhomov et al. 2008 • PRO: Tools are open-source • PRO: Techniques are easily learned • CON: Manual annotation is time-
consuming (but may not always be necessary)
• CON: Methods not sufficiently theorized (in many cases)
www.Rhetoricked.com @Rhetoricked
THANK YOU!
• www.Rhetoricked.com (these slides + some additional)
• Communicate with me: – @Rhetoricked – [email protected]
• Research supported by: – Graduate Research Partnership Program fellowship(U of M
CLA) – James I. Brown fellowship
www.Rhetoricked.com @Rhetoricked
Works cited Allen, J. (1994). Women and authority in business/technical communication scholarship: An analysis of writing... Technical Communication Quarterly, 3(3), 271. Argamon, S., Koppel, M., Fine, J., & Shimoni, A. R. (2003). Gender, genre, and writing style in formal written texts. Text, 23(3), 321–346. Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2007). Mining the Blogosphere: Age, gender and the varieties of self-expression. First Monday, 12(9). Retrieved from http://firstmonday.org/issues/issue12_9/argamon/index.html Armstrong, C. L., & McAdams, M. J. (2009). Blogs of information: How gender cues and individual motivations influence perceptions of credibility. Journal of Computer-Mediated Communication, 14(3), 435–456. Barker, R. T., & Zifcak, L. (1999). Communication and gender in workplace 2000: creating a contextually-based integrated paradigm. Journal of Technical Writing & Communication, 29(4), 335. Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python (1st ed.). O’Reilly Media. Brown, S. M., & Burnett, R. E. (2006). Women hardly talk. Really! Communication practices of women in undergraduate engineering classes (pp. T3F1–T3F9). Presented at the 9th International Conference on Engineering Education, San Juan, Puerto Rico: International Network for Engineering Education & Research. Retrieved from http://ineer.org/Events/ICEE2006/papers/3219.pdf Burger, J., Henderson, J., Kim, G., & Zarrella, G. (2011). Discriminating gender on Twitter. Bedford, MA: MITRE Corporation. Retrieved from http://www.mitre.org/work/tech_papers/2011/11_0170/
Butler, J. (1993). Bodies that matter: on the discursive limits of“ sex.” New York: Routledge. Butler, J. (1999). Gender trouble. New York: Routledge. Butler, J. (2004). Undoing gender. New York: Routledge. Cunningham, H., Maynard, Diana, Bontcheva, K., Tablan, V., Aswani, N., Roberts, I., … Peters, W. (2012, December 28). Developing Language Processing Components with GATE Version 7 (a User Guide). GATE: General Architecture for Text Engineering. Retrieved January 1, 2013, from http://gate.ac.uk/sale/tao/split.html Cunningham, H., Tablan, V., Roberts, A., & Bontcheva, K. (2013). Getting More Out of Biomedical Documents with GATE’s Full Lifecycle Open Source Text Analytics. PLoS Computational Biology, 9(2), e1002854. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1), 10–18. Herring, S. C., & Paolillo, J. C. (2006). Gender and genre variation in weblogs. Journal of Sociolinguistics, 10(4), 439–459. Koppel, M., Argamon, S., & Shimoni, A. R. (2002). Automatically categorizing written texts by author gender. Literary and Linguistic Computing, 17(4), 401 –412. Lakoff, R. T. (1975/2004). Language and Woman’s Place: Text and Commentaries. (M. Bucholtz, Ed.) (Revised and expanded ed.). New York: Oxford University Press. Lay, M. M. (1989). Interpersonal conflict in collaborative writing: What we can learn from gender studies. Journal of Business and Technical Communication, 3(2), 5–28.
www.Rhetoricked.com @Rhetoricked
Works cited Maltz, D. N., & Borker, R. (1982). A cultural approach to male-female miscommunication. In J. J. Gumperz (Ed.), Language and social identity (pp. 196–216). Cambridge U.K.: Cambridge University Press. Pakhomov, S. V., Hanson, P. L., Bjornsen, S. S., & Smith, S. A. (2008). Automatic classification of foot examination findings using clinical notes and machine learning. Journal of the American Medical Informatics Association, 15, 198–202. Raign, K. R., & Sims, B. R. (1993). Gender, persuasion techniques, and collaboration. Technical Communication Quarterly, 2(1), 89–104. Rao, D., Yarowsky, D., Shreevats, A., & Gupta, M. (2010). Classifying latent user attributes in Twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated contents (pp. 37–44). Toronto, ON, Canada: ACM. Rehling, L. (1996). Writing together: Gender’s effect on collaboration. Journal of Technical Writing and Communication, 26(2), 163–176. Smeltzer, L. R., & Werbel, J. D. (1986). Gender differences in managerial communication: Fact or folk-linguistics? Journal of Business Communication, 23(2), 41–50. Sterkel, K. S. (1988). The relationship between gender and writing style in business communications. Journal of Business Communication, 25(4), 17–38. Tannen, D. (2001). You Just Don’t Understand: Women and Men in Conversation. William Morrow Paperbacks. Tebeaux, E. (1990). Toward an understanding of gender differences in written business communications: A suggested perspective for future research. Journal of Business and Technical Communication, 4(1), 25–43.
Tong, A., & Klecun, E. (2004). Toward accommodating gender differences in multimedia communication. Professional Communication, IEEE Transactions on, 47(2), 118–129. Wolfe, J., & Alexander, K. P. (2005). The computer expert in mixed-gendered collaborative writing groups. Journal of Business and Technical Communication, 19(2), 135–170. Wolfe, J., & Powell, B. (2006). Gender and expressions of dissatisfaction: A study of complaining in mixed-gendered student work groups. Women & Language, 29(2), 13–20. Wolfe, J., & Powell, E. (2009). Biases in interpersonal communication: How engineering students perceive gender typical speech acts in teamwork. Journal of Engineering Education, 98(1), 5–16. Yan, X., & Yan, L. (2006). Gender classification of weblog authors. In AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs (pp. 228–230).
www.Rhetoricked.com @Rhetoricked
Inter-rater reliability: Strict and lenient
• Coders have to select span and code it • What about leading/trailing spaces or
punctuation?
• “Strict” means spans are identical; “lenient” means they have same code but don’t overlap 100%
www.Rhetoricked.com @Rhetoricked
Inter-rater reliability
• My target F-scores – Strict > .80 – Lenient > .95 – Average > .90
• Actual F-scores – Strict = .83 (target >.80) – Lenient = .98 (target > .95) – Average = .91 (target > .91)
• Manual review showed most lenient matches would have been strict matches but for a terminal missed space or punctuation (not affecting this analysis)