A method traditionally applied to product review and marketing, namely, sentiment analysis or opinion mining, has recently been adopted to conduct computational analysis of literary texts (Jockers). In principle, this methodology consist of assigning a positive or negative valence derived from a “bag of words” to sentences or words in order to study the progress of sentiments throughout the text. This represents the passage of time and, in novels, the narrative plot.
As with most digital analysis methodologies and experiments run in recent years, these sentiment analysis dictionaries, workflows, and corpora to test results have been developed and conducted in English. In a few occasions, the research even includes works translated into English (Underwood 2019). In most cases, the use of these tools in other languages requires adaptation (Fradejas Rueda).
In this talk, I will show the results of a three-dimention mid-distance reading of literary texts in Spanish using the Syuzhet Package in R. First, I present the analysis of the original text with the available version of the NRC sentiment dictionary. Later, I will run the original, English dictionary in the same work in its published translated version as well as on a (non-reviewed) machine translated version. As a point of contrast, I will run the same test with a text in English with its human and machine translations into Spanish. Preliminary results conducted on La gaviota (1849) by Böhl de Faber, Pepita Jiménez (1874) by J. Valera, The Swam of Villamorta (1885) by E. Pardo Bazán, Frankenstein (1832) by M. Shelley and David Copperfield (1850) by Dickens shows that results on a micro-level change but do not affect the overall or macro-level narrative plot result. Marianela (1878) by B. Pérez Galdós, The Froth (1890) by A. Palacio Valdés, One Hundred Years of Solitud (1967) by G. García Márquez or The Handmaid’s Tale (1985) by M. Atwood, however, show distinct results on a micro and distant level in both two languages, bringing up questions such as: Is it sufficient to generate raw translations of datasets in English in order to conduct the same tests in Spanish or should we generate our own datasets and methods? What effect has norms on punctuation have on this type of text analysis? How do informal expressions that call for clearly different vocabulary to express the same emotion affect the results of this method? As a consequence, one can ask, how good is the idea of using translations when testing methods in English?
The ultimate goal of this presentation is, thus, twofold. On the one hand, I show the possibilities of sentiment analysis for literary works in Spanish. Most importantly, however, I show the need to break the tools before trusting them: I investigate the implications of relying on translation for text analysis, by studying the difference in results in using a translated version of the sentiment dictionary to original works, as well as using the original dictionary to works translated from other languages.