Add your comments and discuss the material in this chapter here!
One aspect of digitization that I notice was not covered here was copyright and intellectual property. This isn’t so much an issue when it comes to things like Jane Eyre, and Project Gutenberg is comprised mostly (if not entirely) of works that have already passed into the public domain, but I know that this issue has been hotly contested with regard to the Google Books Project. It seems to me that digitization is about easing access to information, making it easier and cheaper to find; this is a valuable project, but it does raise issues about ownership. Is digitization changing our conceptions of intellectual property, and if so, how?
Thanks for the great question, Stacy. You’re right on at least two counts: we didn’t really discuss copyright in any substantial way, and yes, of course copyright is a very big deal for digitization — and not always in the most obvious ways!
I tend to get long-winded on this really fascinating and important topic, so let me first answer briefly that, at least in many circles (for example, the Creative Commons community), digitization is indeed changing conceptions of intellectual property — and the digital library world is embracing that whole-heartedly as best it can.
On this same positive side, I’d like to reiterate my belief in the boldly correct original defense of Google Books’ digitization program: that digitizing a book in order to create a rich, full-text index of it is simply a logical next step after creating a rich metadata record for it — and nobody would ever claim that sort of “copying” (of author, title, subjects, tables of contents, abstracts, etc.) was in violation of anything! I do believe that this is a challenge to the old way of looking at intellectual property — but I believe it’s an important and right-thinking challenge.
But a huge number of vested interests disagree, as we’ve seen in the more recent, sad history of the Google Books litigation (first the lawsuit itself, then its attempts at settlement, then the latest rejection of settlement — each step, in my opinion, worse than the previous). Ironically, digitization has caused some old conceptions of copyright and intellectual to become even more entrenched than they were before.
For example, although you bring up “things like Jane Eyre” as something that doesn’t have copyright issues, but in fact — shockingly! — when digitization comes into play, “things like this” are indeed affected, either because our digital collections rely so much on modern editorial interventions (textual scholarship, editing, and even the markup itself that may have potential claim to copyright protection); or because the digitization of a public-domain work has raised new awareness of that work and brought opportunistic copyright claimants out of the woodwork. Here are two brief examples from my library experience:
1. We’ve hosted a full-text collection of Shakespeare (Oxford ed.) at the Stanford Libraries since about the beginning of time (i.e., for about 20 years — note the pre-historic search interface): https://dlib.stanford.edu:6521/text/shake.html
This work (like the vast majority of SULAIR digital text collections) was licensed from Oxford Univ. Press long before most publishers (or even university libraries) were thinking much about this sort of thing — but about 5 years ago, when word got out that we had this great digital Shakespeare available, Oxford (having misplaced this oddball license agreement) demanded that we take it down immediately! We managed to sort things out, showed them a copy of the license agreement, promised to pay another licensing fee and to restrict access to Stanford users, etc. — but they were very clear that this Oxford edition was *their* intellectual property. And generally — although I’m a strong advocate of fair use and the public domain — I agree with them: they put a lot of thought and work into this digital edition, into correcting it, enhancing it with excellent metadata, and so on (and we saw today how important all that is).
2. About 10 years ago, as part of an early library digitization project, I put up a copy of this lovely 1909 book by Antonio Scarfoglio, “Round the World in a Motor-Car,” about the Great Race (of 1965 slapstick cinema fame):
In the United States, a 1909 publication is clearly, obviously, unambiguously within public-domain territory (which is generally quite unambiguous until 1923). But lo and behold, a few years ago, a very aggressive Italian chap started threatening all sorts of legal action against us for our digitization efforts, claiming that he had purchased exclusive(!) rights to this original work in Italian(!) from the author’s heirs(!), and now hoped to make some money from it; that our 1909 translation was unauthorized(!), and therefore this book should be considered unpublished(!), and thus not subject to public domain status(!), etc., etc. Unbelievable! After lots of (very expensive) legal consultation, Stanford concluded that we are well within our rights to republish this digital work in the U.S., and probably most everywhere else, but — shockingly! — if this very aggressive publisher took us to court (which he certainly threatened to do, repeatedly), there was a slight chance he’d win, especially if that court were in Italy. So for the moment, to be on the safe side, this book is available to the Stanford community alone. I’m proud that our Libraries took this case on, rather than simply capitulating to the absurd demands of a greedy complainant and removing the book altogether — but I’m sad that copyright law, especially internationally, is unclear enough to make it seem too risky for even the very slightly cautious to do the right thing. I don’t think any of this would have happened without digitization.
* Note that, even as I write, some pesky metadata error — surely not Stanford’s; probably Google’s! — is causing the wrong book cover to appear on this catalog record. Click the “online” link to see the real cover, and indeed the entire book.
Thanks for the reply – I only just saw it now. I’m glad that the digital library world is embracing new ways of thinking about copyright. The way we thought about copyright in the 20th century is not the way we thought about in the 18th or the 19th centuries (and not the way that much of the rest of the world thinks about it either!), so there’s no reason we have to think in that particular way about it moving forward into the 21st. I hope that as we all become more comfortable with sharing data and research through the sorts of applications we talked about today, we’ll also become more comfortable with new conceptualizations of copyright.
Digital libraries and archives have mainly been discussed from the standpoint of what happens to the text. I am curious, though, about how and whether the act of access is also changing. When we look at a text digitally, we leave a fingerprint or a trail of breadcrumbs in a way that we don’t when we’re browsing the shelves in a library — just as we can see how frequently certain words appear, a search engine can see how frequently items are being searched for, what groups of words you particularly have searched for, and what paths are most often followed from text to text. On the one hand, the potential boon to scholarly productivity is great — imagine not having to compile a bibliography by hand, because it happens automatically as you take research notes.
On the other hand, I wonder what the implications are for readerly (or scholarly) privacy. At least in America, traditional libraries (most of them) consider patron privacy and access records sacrosanct. Services like Google Books and ECCO, presumably, do not. Are they collecting footprint information about the sets of books individuals look at, the searches they do and the paths they follow from text to text to improve their services, and, if so, is patron privacy, or integrity of one person’s “original” digital research, a concern they have rules in place to address? I don’t see this so much as a present concern, as something that might become a concern in the future as use of digital libraries becomes more widespread and it becomes more and more routine for scholarship to be done this way.
Both today’s presentation and the Cohen piece, I believe, approached the digitization of Humanities and Google’s role in that pursuit, terrificly. While noting the limitations of digitization–to the computers inability to recognize spelling variations I would add its inability to decipher tone or summarize an article–it’s important to appreciate all the good Google and digital archivists at universities around the globe are doing to facilitate the research of humanities scholars for generations. In my limited research experience, digitalized works and the accompanying search engines have proved a blessing and I cannot imagine historians decades from now not looking back at the analog system with shock at the inefficiencies and busy work. This cause should be celebrated and appreciated, not ridiculed by purist historians and perfectionist scholars focused narrowly on the, admittedly, prevalent problems still yet to be solved.
It’s clear that digitization is a process improving only gradually, and I really appreciated how the presentation on the 8th highlighted some of the definite drawbacks or weaknesses of digitization. I was, however, a little surprised at the lack of attention given to the competition for resources, space, and attention between digitized materials and physical ones. Here at Stanford, a campaign was staged by a sub-committee of the Academic Senate to resist the administration’s suggestions of tearing down Meyer Library and relocating the contents of the East Asian Library to Livermore. This not only raises the as-yet unresolved issue of poor OCR efficacy with non-Roman alphabet texts, but also the perhaps more immediately pragmatic concern of how we should search for information relevant to our research. As I’m about to embark on dissertation fieldwork, I would be very interested to hear any advice on how to minimize the “sublime/fublime” problem when performing searches in digitized archives…that is, if there is any systematic, effective antidote at all?
I really dug deeply Dr. Worthey’s digitization defense, and ditto the comment by Nile above. It’s easy to imagine a time in the near future when digital archives, whether in the form of Google books, university projects or even a Digital Public LIbrary of America (written about in this week’s NYRB) are accepted as resources de facto as the encyclopedias of yore. And like NIle, I’m keen to see more librarians like Glen, professors and students (like ourselves?) devoted to finding better ways to approach digitization, improving instruction in how to use digital archives and making clearer, calmer arguments on behalf of such endeavors. Dig. archives need not replace hard copies, but augment their ease of access. Holding book in hand, all the tactile meta-data imparted from it (in my field—music—e.g. drypoint clef lines, stave perforations, smudge v. notehead verification, etc.) will remain indispensable, I think, the digital archives will make everything else that much faster, cheaper, smarter.
Beyond the obvious use of metadata and having documents electronically available, programs such as Google’s Ngram provide another possibility – tracking linguistic history. Through programs such as Ngram, different fields of interests, such as linguistic and the use of common words throughout the ages becomes very feasible. This may aid researchers in understanding why the word “woman” has become more popular over the ages while the use of the word “lady” has become less popular. In a few years, technology may be developed to even look at speech or music and the use of various words and their trends over the years.
I think digitilization also allows new ways of stumbling across texts that changes how we read them and what we get out of them. When selecting physical books for an essay on, say, music and modernism, I look at the library’s catalogue and find a few that look likely based on their descriptions as entire volumes. I then have to actually read the books to find what I’m looking for. The index helps shorten this process, but it’s still likely that I will read parts of the book that are not directly related to my project (but sometimes enhance it anyway, or make me rethink certain aspects of it). With Google books, I can search a term and find exactly where it appears, and read only that section. Though in some cases it would be possible for me to read more, scrolling through text on my laptop is not as pleasurable an experience as flipping through a book; and in some cases, limited access makes it impossible to thoroughly understand the context of a statement in a Google book.
Going off some of the remarks of other commenters, I was surprised that there wasn’t (or maybe I just don’t remember it) more of a discussion of the possible intellectual and, dare I say, ethical consequences of digital libraries. I think that most of us involved in the workshop will agree that digitization should be a supplement, rather than a replacement, to physical books, manuscripts, etc. But I’m not convinced that everyone outside of the humanist portion of the academy shares such sentiments. I think there’s a real danger in the prospect that the administrations of universities and perhaps even the public at large could come to see books as anachronisms not too long from now. Given the faultiness of the current OCR technology, and the slow rate of its improvement, I think that we as practitioners of the humanities could face a major obstacle.
It seems likely that financial resources will increasingly be poured into digitization, rather than improving existing archives or archival techniques. While digitization is no doubt a tremendous tool, it seems of at least equal importance that archival materials be preserved. I do not know to what degree these aspects would have to compete for funding, but it seems likely to me that many university administrations would be more inclined to see their resources going towards digitization. From my experience working in an archive in the state university system, I was struck by the extent to which this was already happening. I, and everyone I was working with, were working of digitization of the collection, while many of the materials were stored in sub-optimal conditions which I feared would shorten their shelf life. Most troubling was the fact that no one else at the archive seemed particularly troubled by this.
I really appreciated Cohen’s thoughts about Google and information access in the piece that was linked to in the extra reading section. Unlike a significant number of for-profit information/scholarly database companies (ProQuest et al), Google provides access to a lot of its materials for free. However, Google Books has kept a close hold on OCR text and the kind of source information needed for academic analysis. I definitely share Cohen’s anxiety about a private company being in charge of the digital versions of such a large amount of books. How will both scholars and non-scholars get access to this material in the future? How will Google profit? To use material, will you have to be part of the academic elite, or will you simply have to pay? Who will be left out of the digital text revolution?
Generally, I am a huge supporter of all digitization efforts, especially the adventure that google embarks. Google Books is the single most ambitious and useful thing that has been done for not only humanities scholars, but also any learner. I am really puzzled that such a great effort would lead to such controversies. Copy rights should not be in the way of progress of human knowledge. If copy rights and intellectual property became a hindrance to progress, what is the point to have it?
I am interested in the way digitization has changed our general reading method as well as research approach. To fully utilize the function of search and locating in digitized materials, we are more inclined to formulate a selection of what we want to find and see, and less used to follow what is given us along reading process. I am two thumbs up for digitization project but think we should remain aware of that impact.
As for copyright issue: digitization in scanned format normally does not have a problem when quoted. Copyright is something that is typically sacred de juro but terrible de facto. If digitization changes our conception about copyright, just let it happen and I would be more than happy to see it happens.
Like Kyle asked, above – “Who will be left out of the digital text revolution?” – I wonder when or if digitization will become feasible for individuals on a small scale, not just for vast libraries, their parent institutions, and big companies. What about, for example, weekly newspapers? Their historical value likely has small audiences – locals and researchers – but the depth of their information about a particular place, changing over time, could be incredible. At two print newspapers to which I have ties, we are puzzling over how to digitize, store and allow access to our archives – plus how to pay for it all. I wonder how quickly we’ll get there, if ever.
Since I am working on TV-interviews and performances I wanted to ask if there are digital collections for these kinds of genres as well and if there are software tools out there to do a combined analysis of textual and ‘performative’ / visual elements on this kind of data. I am particularly interested in the relation of text and its oral / corporal expression and I think that modern technology opens up great opportunities for this kind of research.