I was told by a co-term student in computational science that no program exists for discerning tone in an article. Also lacking is any summarization method. If an effective program is invented for determining tone and summarizing a text, what are some possible benefits to the humanities? I would think they would be useful in dividing texts by partisanship and providing scholars a quick way to determine whether an unexamined text is useful to their research. Also, is there coordination between the computer scientists who are inventing these forms of text analysis and humanities scholars? Or are only a select few who possess expertise in both fields furthering this research?
“tone” is hard to define – there have been quite a few efforts to determine “perspective”, which is a little less subjective. Lin et. al’s paper “Whose Side Are You On” is a good read in that regard. I think the challenge of natural language processing makes it always a good candidate for computer science research, but the implications for humanities scholars is evident.
I think it’s important to consider that especially for literary studies, metadata is given its significance by our understanding of how such words tend to be used in books that we actually have read. An overabundance of prepositions means something about style, but to understand what that style is doing we do have to actually read a book that’s written in it. Metadata could be really useful in identifying stylistic similarities across unexpected books, revealing unexpected connections–but it’s something that will be difficult to entirely unhinge from traditional reading.
This is a broadening of Elspeth’s comment, but one concern–or theme–that I have been wondering about since the start of the workshop has been the philosophical and practical relationship between the human operator(s) and the digitization tools. In our discussion on ORC and text analysis, this issue has especially loomed: it seems that ultimately much of the most fundamental work, e.g. setting the parameters for what counts as a particular “style” of writing in assaying a large body of fiction, still has to be done manually. I’d love to read more or hear more about the viewpoints no doubt already abundantly put forth by scholars (both in humanities fields and in computing) on the man-machine interface, the limitations and potential therein, and what the future might bring in changes to that relationship.
What are some of the questions involved with the revelation that many books are most similar to their contemporaries? When the example of Moby Dick was presented he said it opened a “pandora’s box of questions,” and I did not quite follow.
I think Professor Nunberg’s piece on Google Books and metadata raised a number of important points of concern that merit more attention. What are the effects on scholarly work based in a monopolized digital library? How can we ensure accountability under such a system? Even if Google is able to resolve many of the problems with its collection’s metadata, does such a monopoly inherently threaten those pursuing objective scholarship? It seems to me that while a digital library provides incredible opportunities for studying a much larger corpus of work than ever before possible, it simultaneously presents greater risks, both in terms of its management and the methods used in studying such a vast amount of data.
Matthew Jockers’ talk on this subject was totally fascinating. The big paradox I’m still struggling to get my head around is the fact that “distant reading,” in a way, involves us reading more closely than ever — because it is founded, ultimately, on searches for specific words. In this way digital humanities seem to encourage a turn back to the philological, and away from more contextual, formalist, and narratological approaches. More than anything, I’m reminded of classical philology, where the entire history of a word across the corpus of, say, Latin poetry, was assumed to underlie a single use of it by Virgil. I don’t have any strong feelings about this, but I am curious to what extent scholars might feel their research agendas and questions being constructed to “fit” the tools available. That is, does the availability of digital humanities tools make them adjust their questions so that they have more to do with finding very specific language than they otherwise would?
I’m curious about the actual technological infrastructure that allows the kind of textual analysis shown in the workshop to go on (I suppose my question relates to the series as a whole, but last week’s session got me to thinking about it). I am intrigued by the analysis that Dr. Jockers displayed, but to what extent are scholars in control of the technology they use to do their research? It seemed that Dr. Jockers was very proficient in writing code, and therefore had a lot of intellectual and creative control over what the computer could do for him. But given that these tools are meant for humanists–many of whom do not have rigorous training in coding–how much of the software creation process is and will remain subject to the control of a handful of scholars and emerging corporations that build programs for such analysis? If such a consolidation of intellectual power occurs, won’t there be a significant constriction of the kinds of studies that scholars can conduct?
One of the “big data” tools talked about in lecture that instinctively sort of raised my hackles was topic modeling. While I can see how useful it is in broadly determining topics that a text or number of texts is interested in, I also really don’t like the idea of documents as “bags of words,” where word order and syntax are ignored. Isn’t how words are put together in a text one of its most important features? The fact that a topic cloud ignores this aspect isn’t such a big deal when it’s being used for an individual text, because you’ll probably go back and look at the topics in context. But, when topic modeling with a large “bag” of texts, I feel like important subtleties in usage and concept linkage could be missed.
The textual analysis session is a large part of the reason I signed up for the Digital Humanities course to begin with, because I felt I had heard a lot about it, but usually only from people who were opposed to the method for various reasons. I think I’m interested in how these methods that Matt talked about during his talk may be combined with more traditional methods of literary analysis. I think that in the midst of this proliferation of methods, it’s important that we not forget who we are and what we’re actually here to do. In that vein, one of the more compelling arguments that I’ve heard *against* such analysis is that many very good humanists would make very bad scientists, and we must be wary of trying to pass ourselves off as a field as more “scientific” than we actually are, a trend which has, in the past, not led anywhere good. As humanists, we are interested in what it means to be human, and that is where we stand to make contributions. I think it’s important to remember that when doing these sorts of projects, and to ask ourselves how they contribute to such an understanding of ourselves.
I do think distant reading is a valid and powerful tool, not only in detecting structural information from huge data, but also a way of opening new line of inquiries. It is a useful technique that we, as humanists, have yet to master. It is very different from what we are trained of doing, like critical analyzing and contextualizing some particular texts. The problem is how can we systematically improve our disciplinary training to incorporate such training in distant reading? Or is it necessary to have such training? Moreover, should this be a skill that all humanists command? Or should it be the job of just a handful of curious people?
I am very interested in Stacy’s comments above. What I think is just the opposite: for a long time the world of humanity has not been too open to scientific “colonization”, it is rather too snobbish about science in wrongheadedly adhering to what “humanities” are thought to be. As far as my field is concerned, I have heard about people talking about the new method, but hardly any of them are really employing it. My point is that we are too often too humanitarian to be open-minded to the changes of current world. This leads to the crisis of the general field of literary studies, while we are only complaining it is the government that always wants to dismiss us by cutting budget.
I do find the idea of textual analysis very hard to be equally applied to non-alphabetical languages and literature. I am doing Chinese literature, so I have been thinking about the possibility of doing the same thing to, say, late imperial novels in China. Would love to know more about it if anyone knows similar projects going on.
To respond to Nile’s comment, though the co-term student may be right in that there exists no program to discern tone in an article, there are many other factors that can help discern a writing’s tone. An Elspeth mentions, we can look at how words are used, such as her given example of an overabundance of prepositions. Or even by analyzing all the lexical categories and what types of words are used within. Saying a “bad day” vs a “somber day” while both negative connotations, each imply a different tone. I agree with Erik that this “distant reading” is in fact very, very, close reading. It makes me think of the SAT where you need to “analyze” each adjective or verb described to decide the tone of the passage, and why the author chose that word.
Taking up on Stacy’s question of the ‚humanistic’ dimension of the digital humanities I was wondering if the projects of digital text analysis discussed in this section do not have an inherent tendency to develop into sociological scrutinizes rather than generating ‘genuine’ literary studies.
Most of the projects we looked at in the workshop like authorship attribution, classification of genre, writing of eastern / western Irish-Americans, Mapping the Republic of Letters seem to produce results that give information about the sociological aspects of literature over time or in space. That is great, but sometimes I think that these aspects might be considered to be a bit further away from the actual text and from the essential questions.
I might be wrong, but it seems to me that they hardly can help answering the question why a certain book or a certain style became formative or why certain aspects of literature are still interesting and exciting for us today although these books and thoughts might be hundreds of years old. Can these digital tools help us in trying to explain what ‘beauty’ is or might be? Or why these books should be read??
I was told by a co-term student in computational science that no program exists for discerning tone in an article. Also lacking is any summarization method. If an effective program is invented for determining tone and summarizing a text, what are some possible benefits to the humanities? I would think they would be useful in dividing texts by partisanship and providing scholars a quick way to determine whether an unexamined text is useful to their research. Also, is there coordination between the computer scientists who are inventing these forms of text analysis and humanities scholars? Or are only a select few who possess expertise in both fields furthering this research?
“tone” is hard to define – there have been quite a few efforts to determine “perspective”, which is a little less subjective. Lin et. al’s paper “Whose Side Are You On” is a good read in that regard. I think the challenge of natural language processing makes it always a good candidate for computer science research, but the implications for humanities scholars is evident.
I think it’s important to consider that especially for literary studies, metadata is given its significance by our understanding of how such words tend to be used in books that we actually have read. An overabundance of prepositions means something about style, but to understand what that style is doing we do have to actually read a book that’s written in it. Metadata could be really useful in identifying stylistic similarities across unexpected books, revealing unexpected connections–but it’s something that will be difficult to entirely unhinge from traditional reading.
This is a broadening of Elspeth’s comment, but one concern–or theme–that I have been wondering about since the start of the workshop has been the philosophical and practical relationship between the human operator(s) and the digitization tools. In our discussion on ORC and text analysis, this issue has especially loomed: it seems that ultimately much of the most fundamental work, e.g. setting the parameters for what counts as a particular “style” of writing in assaying a large body of fiction, still has to be done manually. I’d love to read more or hear more about the viewpoints no doubt already abundantly put forth by scholars (both in humanities fields and in computing) on the man-machine interface, the limitations and potential therein, and what the future might bring in changes to that relationship.
What are some of the questions involved with the revelation that many books are most similar to their contemporaries? When the example of Moby Dick was presented he said it opened a “pandora’s box of questions,” and I did not quite follow.
I think Professor Nunberg’s piece on Google Books and metadata raised a number of important points of concern that merit more attention. What are the effects on scholarly work based in a monopolized digital library? How can we ensure accountability under such a system? Even if Google is able to resolve many of the problems with its collection’s metadata, does such a monopoly inherently threaten those pursuing objective scholarship? It seems to me that while a digital library provides incredible opportunities for studying a much larger corpus of work than ever before possible, it simultaneously presents greater risks, both in terms of its management and the methods used in studying such a vast amount of data.
Matthew Jockers’ talk on this subject was totally fascinating. The big paradox I’m still struggling to get my head around is the fact that “distant reading,” in a way, involves us reading more closely than ever — because it is founded, ultimately, on searches for specific words. In this way digital humanities seem to encourage a turn back to the philological, and away from more contextual, formalist, and narratological approaches. More than anything, I’m reminded of classical philology, where the entire history of a word across the corpus of, say, Latin poetry, was assumed to underlie a single use of it by Virgil. I don’t have any strong feelings about this, but I am curious to what extent scholars might feel their research agendas and questions being constructed to “fit” the tools available. That is, does the availability of digital humanities tools make them adjust their questions so that they have more to do with finding very specific language than they otherwise would?
I’m curious about the actual technological infrastructure that allows the kind of textual analysis shown in the workshop to go on (I suppose my question relates to the series as a whole, but last week’s session got me to thinking about it). I am intrigued by the analysis that Dr. Jockers displayed, but to what extent are scholars in control of the technology they use to do their research? It seemed that Dr. Jockers was very proficient in writing code, and therefore had a lot of intellectual and creative control over what the computer could do for him. But given that these tools are meant for humanists–many of whom do not have rigorous training in coding–how much of the software creation process is and will remain subject to the control of a handful of scholars and emerging corporations that build programs for such analysis? If such a consolidation of intellectual power occurs, won’t there be a significant constriction of the kinds of studies that scholars can conduct?
One of the “big data” tools talked about in lecture that instinctively sort of raised my hackles was topic modeling. While I can see how useful it is in broadly determining topics that a text or number of texts is interested in, I also really don’t like the idea of documents as “bags of words,” where word order and syntax are ignored. Isn’t how words are put together in a text one of its most important features? The fact that a topic cloud ignores this aspect isn’t such a big deal when it’s being used for an individual text, because you’ll probably go back and look at the topics in context. But, when topic modeling with a large “bag” of texts, I feel like important subtleties in usage and concept linkage could be missed.
The textual analysis session is a large part of the reason I signed up for the Digital Humanities course to begin with, because I felt I had heard a lot about it, but usually only from people who were opposed to the method for various reasons. I think I’m interested in how these methods that Matt talked about during his talk may be combined with more traditional methods of literary analysis. I think that in the midst of this proliferation of methods, it’s important that we not forget who we are and what we’re actually here to do. In that vein, one of the more compelling arguments that I’ve heard *against* such analysis is that many very good humanists would make very bad scientists, and we must be wary of trying to pass ourselves off as a field as more “scientific” than we actually are, a trend which has, in the past, not led anywhere good. As humanists, we are interested in what it means to be human, and that is where we stand to make contributions. I think it’s important to remember that when doing these sorts of projects, and to ask ourselves how they contribute to such an understanding of ourselves.
I do think distant reading is a valid and powerful tool, not only in detecting structural information from huge data, but also a way of opening new line of inquiries. It is a useful technique that we, as humanists, have yet to master. It is very different from what we are trained of doing, like critical analyzing and contextualizing some particular texts. The problem is how can we systematically improve our disciplinary training to incorporate such training in distant reading? Or is it necessary to have such training? Moreover, should this be a skill that all humanists command? Or should it be the job of just a handful of curious people?
I am very interested in Stacy’s comments above. What I think is just the opposite: for a long time the world of humanity has not been too open to scientific “colonization”, it is rather too snobbish about science in wrongheadedly adhering to what “humanities” are thought to be. As far as my field is concerned, I have heard about people talking about the new method, but hardly any of them are really employing it. My point is that we are too often too humanitarian to be open-minded to the changes of current world. This leads to the crisis of the general field of literary studies, while we are only complaining it is the government that always wants to dismiss us by cutting budget.
I do find the idea of textual analysis very hard to be equally applied to non-alphabetical languages and literature. I am doing Chinese literature, so I have been thinking about the possibility of doing the same thing to, say, late imperial novels in China. Would love to know more about it if anyone knows similar projects going on.
To respond to Nile’s comment, though the co-term student may be right in that there exists no program to discern tone in an article, there are many other factors that can help discern a writing’s tone. An Elspeth mentions, we can look at how words are used, such as her given example of an overabundance of prepositions. Or even by analyzing all the lexical categories and what types of words are used within. Saying a “bad day” vs a “somber day” while both negative connotations, each imply a different tone. I agree with Erik that this “distant reading” is in fact very, very, close reading. It makes me think of the SAT where you need to “analyze” each adjective or verb described to decide the tone of the passage, and why the author chose that word.
Taking up on Stacy’s question of the ‚humanistic’ dimension of the digital humanities I was wondering if the projects of digital text analysis discussed in this section do not have an inherent tendency to develop into sociological scrutinizes rather than generating ‘genuine’ literary studies.
Most of the projects we looked at in the workshop like authorship attribution, classification of genre, writing of eastern / western Irish-Americans, Mapping the Republic of Letters seem to produce results that give information about the sociological aspects of literature over time or in space. That is great, but sometimes I think that these aspects might be considered to be a bit further away from the actual text and from the essential questions.
I might be wrong, but it seems to me that they hardly can help answering the question why a certain book or a certain style became formative or why certain aspects of literature are still interesting and exciting for us today although these books and thoughts might be hundreds of years old. Can these digital tools help us in trying to explain what ‘beauty’ is or might be? Or why these books should be read??