Digital Humanities as a Universal Language

One of the nice things about the meta-discipline of the digital humanities is that it’s also an international movement. As Carol pointed out in her post on the global digital divide, this is not an easy feat to accomplish. The asymmetrical shape of economic development over the past several centuries has influenced the technological backbone that connects different parts of the world. As a result, digital humanities work tends to mirror the starkly divided, core-periphery dynamic of contemporary globalization. Marginalized regions remain bit players, while the wealthiest countries retain their gravitational pull as the center of life for the academic elite. At the same time, though, I have noticed a sharp increase in the global consciousness and outreach among the digerati. Recent and forthcoming conferences in Australia, England, Canada, Germany, and Switzerland offer proof that the digital humanities need not be conducted along narrow nationalist lines. And the historic HASTAC conference held this year in Lima, Peru, proves that this work does not have to be exclusively Anglo or Eurocentric, either.

YouTube Preview Image

The translatability of the digital humanities, its broad and easy appeal across conventional boundaries of ethnicity, class, culture, and nation, is one of its most amazing features. At its most basic level, it functions as a kind of universal language, like HTML or mathematics or heavy metal music. Little kids can do it. Your grandmother can do it. Able-bodied people can do it. Disabled people can do it. Privileged people can do it. Oppressed and marginalized people can do it. Even birds and bees do it. Although there is still a long way to go, I think the digital humanities hold the potential to accomplish that magnificent thing to which the traditional humanities have always aspired, but rarely achieved – a truly comprehensive and inclusive representation of humanity.

As one small contribution to this project, I will offer a completely shameless plug for a one-day conference at Paris Diderot University (Paris 7) in October. Focused on recent digital history projects, the event will bring together practitioners and researchers from the United States and France (and maybe elsewhere) to initiate a dialog. I will present on some of my experiences connecting research and teaching, with a special focus on my experimental digital history course and RunawayCT.org. Constance Schulz, Professor Emerita at the University of South Carolina, will present on her NEH-funded scholarly editing project about two remarkable early American women. Additional details, including a map and schedule, are available here. I think it will be a wonderful opportunity to grow digital history work internationally, and if you happen to be in the area, I hope you will attend.

Posted in Debates in DH, Digital Scholarship | Tagged , , , , | Leave a comment

What is Digital History?

Chalkboard - What is Digital HistoryOne thing professional scholars everywhere love to do is to categorize, define, and explain, to erect borders and boundaries and partitions. There is a good reason the word “discipline” is at the heart of the academic industrial complex. It is discipline in both the good, self-control, zen sense and the bad, Michel Foucault sense of the word. There has been much debate over the past several years about whether Digital Humanities, and its subset Digital History, constitutes its own discipline, or whether it is fundamentally trans-disciplinary at its core. And what academic discipline worthy of the name is not essentially trans-disciplinary, anyway? Try as we might to impose categories on living reality, to sort into neat boxes of genus, species, and phylum, reality is not that static. It is constantly evolving, always in motion, always transitioning from one thing to the next.

There are a great many extremely interesting documents and manifestos floating around the web attempting to draw boundaries around the digital humanities, to tie it down, to reign it in and discipline it (in the Foucauldian sense). Jason Heppler’s approach to this problem, which presents a different definition each time the page is refreshed, continually remixing them into infinite combinations, is one of the best I have seen. Digital History, like the Digital Humanities, is a broad camp, capable of accommodating everything from the whimsical Serendip-o-matic to the brutal historiographical battles erupting on the back end of prominent Wikipedia pages. The Promise of Digital History, a Journal of American History roundtable discussion from way back in 2008, is a fair introduction to this particular genre of the digital. A breakdown of the document using Voyant reveals, among other things, a strong emphasis on open access. Together, these two words appear a total 97 times. Ironically, and perhaps appropriately, the exchange itself is a daunting and hopelessly difficult-to-digest wall of text.

At the heart of this definitional battle is a fundamental status anxiety. Is Digital History just regular old history plus expensive computers? Is it, as Adam Kirsch argues about digital literary studies, just “fancy reiterations of conventional wisdom?” Or does it represent something new and qualitatively different? When I posed this question to my students this year, it produced some fascinating results.

Being one of those definition-obsessed academics, I always ask my students to unpack what may seem like everyday or familiar terms. What is freedom? What is slavery? What is civil war? What is Africa? What is America? So when tasked with teaching Digital History to a group of undergraduates, I naturally asked them to define what exactly that means. Actually, I first asked them to define History proper, and then we tried to figure out what makes it so different when done digitally. Of course, we were not alone in this endeavor. It is deeply interesting to observe different classes in different parts of the country generate different responses to similar prompts. Our answers, some of which you can see if you click on the chalkboard above, ranged from Cervantes and Foucault to the practical and the public. I suspect that if I had asked my students at the end of the class, after they submitted their final project, they would have added that Digital History is also really hard work. It requires discipline.

Posted in Debates in DH, Random Thoughts | Tagged , , , , | Comments Off

Rapid Development for the History Web

This year I was privileged to design and teach an experimental (and somewhat improvisational) course spanning multiple disciplines. It is one of a small number of Digital History courses offered at the undergraduate level in the United States and, to the best of my knowledge, the only course of its kind to require students to conceive, design, and execute an original historical website in a matter of weeks. Beginning with a short overview of the history of computing, the major part of the course deals with current debates and problems confronting historians in the Digital Age. Students read theoretical literature on topics such as the gender divide, big data, and the democratization of knowledge, as well as digital history projects spanning the range of human experience, from ancient Greece to modern Harlem. Guest speakers discussed the complexities of database design and the legal terrain of fair use, open access, and privacy. The complete syllabus is available here.

Unusually for a humanities class, the students engaged in a series of labs to build and test digital literacy skills. This culminated in a final project asking them to select, organize, and interpret a body of original source material. I solicited ideas and general areas of interest for the project and posted a list to the class blog that grew over the course of the semester. Students expressed interest in newspaper databases, amateur history and genealogy, text mining and topic modeling, local community initiatives, and communications, culture, and new media. I thought it was important to find a project that would speak to every student’s interest while not playing favorites with the subject matter. We considered a plan to scan and present an archive of old student and university publications. I thought it was a good idea. On the other hand, it would have involved at lot of time-consuming rote digitization, access to restricted library collections, and sharing of limited scanning facilities.

Ultimately, the students decided to build an interactive database of runaway advertisements printed in colonial and early national Connecticut. This seemed to satisfy every major area of interest on our list and, when I polled the class, there was broad consensus that it would be an interesting experiment. The project grew out of an earlier assignment, which asked students to review websites pertaining to the history of slavery and abolition. It also allowed me to draw on my academic background researching and teaching about runaways. We settled on Connecticut because it is a relatively small state with a small population, as well as home to the nation’s oldest continuously published newspaper. At the same time, it was an important colonial outpost and deeply involved in the slave trade and other forms of unfree labor on a variety of fronts.

RunawayCT_projectDrawing on the site reviews submitted earlier in the term, we brainstormed some ideas for what features would and would not work on our site. The students were huge fans of Historypin, universally acclaimed for both content and interface. So we quickly agreed that the site should have a strong geospatial component. We also agreed that the site should have a focus on accessibility for use in classrooms and by researchers as well as the general public. Reading about History Harvest, OutHistory.org, and other crowdsourced community heritage projects instilled a desire to reach out to and collaborate with local educators. Settling on a feasible research methodology was an ongoing process. Although initially focused on runaway slaves, I gently encouraged a broader context. Thus the final site presents ads for runaway children, servants, slaves, soldiers, wives, and prisoners and ties these previously disparate stories into a larger framework. Finally, a student who had some experience with web design helped us to map a work plan for the project based on the Web Style Guide by Patrick Lynch and Sarah Horton.

Since there were students from at least half a dozen different majors, with vastly different interests and skill sets, we needed a way to level the playing field, and specialized work groups seemed like a good way to do this. We sketched out the groups together in class and came up with four: Content, CMS, Outreach, and Accessibility. The Content Team researched the historiography on the topic and wrote most of the prose content, including the transcriptions of the advertisements. They used Readex’s America’s Historical Newspapers database to mine for content and collated the resulting data using shared Google Docs. The CMS Team, composed mostly of computer science majors, focused on building the framework and visual feel for the site. Theoretically they could have chosen any content management system, although I pushed for Omeka and Neatline as probably the best platforms for what we needed to do. The Outreach Team created a twitter feed and a video documentary and solicited input about the site from a wide range of scholars and other professionals. The Accessibility Officer did extensive research and testing to make sure the site was fully compliant with open web standards and licenses.

The group structure had benefits and drawbacks. I tried to keep the system as flexible as possible. I insisted that major decisions be made by consensus and that group members post periodic updates to the class blog so that we could track our progress. Some students really liked it and floated around between different groups, helping out as necessary. I also received criticism on my evaluations from students who felt boxed in and complained that there was too much chaos and not enough communication between the groups. So I will probably rethink this approach in the future. One evaluator suggested that I ditch the collaborative project altogether and ask each student to create their own separate site, but that seems even more chaotic. In my experience, there are always students who want less group work and students who want more, and it is an ongoing struggle to find the right balance for a given class.

The assignment to design and publish an original historical site in a short amount of time, with no budget, almost no outside support, and only a general sense of what needs to be done is essentially a smaller, limited form of crowdsourcing. More accurately, it is a form of rapid development, in which the transition between design and production is extremely fast and highly mutable. Rapid development has been a mainstay of the technology industry for a while now. In my class, I cited the example of One Week | One Tool, in which a small group of really smart people get together and produce an original digital humanities tool. If they could do that over the course of a single week, I asked, what could an entire class of really smart people accomplish in a month?

The result, RunawayCT.org, is not anything fancy, but it is an interesting proof of concept. Because of the hit-or-miss nature of OCR on very old, poorly microfilmed newspapers, we could not get a scientific sample of advertisements. Figuring out how to properly select, categorize, format, and transcribe the data was no mean feat – although these are exactly the kinds of problems that scholarly history projects must confront on a daily basis. The Outreach Team communicated with the Readex Corporation throughout the project, and their representatives were impressively responsive and supportive of our use of their newspaper database. When the students asked Readex for access to their internal API so that we could automate our collection of advertisements, they politely declined. Eventually, I realized that there were literally thousands of ads, only a fraction of which are easily identified with search terms. So our selection of ads was impressionistic, with some emphasis on chronological breadth and on ads that were especially compelling to us.

upside downDespite the students’ high level of interest in, even fascination with, the content of the ads, transcribing them could be tedious work. I attempted to apply OCR to the ad images using ABBYY FineReader and even digitized some newspaper microfilm reels to create high resolution copies, but the combination of eighteenth-century script and ancient, blurry microfilm rendered OCR essentially useless. Ads printed upside down, faded ink, and text disappearing into the gutters between pages were only a few of the problems with automatic recognition. At some point toward the end, I realized that my Mac has a pretty badass speech-to-text utility built into the OS. So I turned it on, selected the UK English vocabulary for the colonial period ads, and plugged in an old Rock Band mic (which doubles as an external USB microphone). Reading these ads, which are almost universally offensive, aloud in my room was a surreal experience. It was like reading out portions of Mein Kampf or Crania Americana, and it added a new materiality and gravity to the text. I briefly considered adding an audio component to the site, but after thinking about it for a while, in the cold light of day, I decided that it would be too creepy. One of my students pointed out that a popular educational site on runaway slaves is accompanied by the sounds of dogs barking and panicked splashing through rivers. And issues like these prompted discussion about what forms of public presentation would be appropriate for our project.

I purposely absented myself from the site design because I wanted the students to direct the project and gain the experience for themselves. On the other hand, if I had inserted myself more aggressively, things might have moved along at a faster pace. Ideas such as building a comprehensive data set, or sophisticated topic modelling, or inviting the public to participate in transcribing and commenting upon the documents, had to be tabled for want of time. Although we collected some historical maps of Connecticut and used them to a limited extent, we did not have the opportunity to georeference and import them into Neatline. This was one of my highest hopes for the project, and I may still attempt to do it at some point in the future. I did return to the site recently to add a rudimentary timeline to our exhibit. Geocoding took only minutes using an API and some high school geometry, so I assumed the timeline would be just as quick. Boy, was I wrong. To accomplish what I needed, I had to learn some MySQL tricks and hack the underlying database. I also had to make significant alterations to our site theme to get everything to display correctly.

One of the biggest challenges we faced as a class was securing a viable workspace for the project. Technology Services wanted us to use their institutional Omeka site, with little or no ability to customize anything, and balked at the notion of giving students shell access to their own server space. Instead, they directed us to Amazon Web Services, which was a fine compromise, but caused delays getting our system in place and will create preservation issues in the future. As it is now, the site will expire in less than a year, and when I asked, there was little interest in continuing to pay for the domain. I was told saving the site would be contingent on whether or not it is used in other classes and whether it “receives decent traffic.” (Believe it or not, that’s a direct quote.) One wonders how much traffic most student projects receive and what relationship that should bear to their institutional support.

Although not a finely polished gem, RunawayCT.org demonstrates something of the potential of rapid development for digital history projects. As of right now, the site includes almost 600 unique ads covering over half a century of local history. At the very least, it has established a framework for future experimentation with runaway ads and other related content. Several of the students told me they were thrilled to submit a final project that would endure and be useful to the broader world, rather than a hastily-written term paper that will sit in a filing cabinet, read only by a censorious professor. Given all that we accomplished in such a short time span, I can only guess what could be done with a higher level of support, such as that provided by the NEH or similar institutions. My imagination is running away with the possibilities.

Cross-posted at HASTAC

Posted in Digital Scholarship, Research and Teaching Tools | Tagged , , , , , , , , , | 3 Comments

History Leaks

I am involved in a new project called History Leaks. The purpose of the site is to publish historically significant public domain documents and commentaries that are not available elsewhere on the open web. The basic idea is that historians and others often digitize vast amounts of information that remains locked away in their personal files. Sharing just a small portion of this information helps to increase access and draw attention to otherwise unknown or underappreciated material. It also supports the critically important work of archives and repositories at a time when these institutions face arbitrary cutbacks and other challenges to their democratic mission.

I hope that you will take a moment to explore the site and that you will check back often as it takes shape, grows, and develops. Spread the word to friends and colleagues. Contributions are warmly welcomed and encouraged. Any feedback, suggestions, or advice would also be of value. A more detailed statement of purpose is available here.

Posted in Archives, Research and Teaching Tools, Site Reviews | Tagged , , | Comments Off

Combine JPEGs and PDFs with Automator

leninchristmasLike most digital historians, my personal computer is packed to the gills with thousands upon thousands of documents in myriad formats and containers: JPEG, PDF, PNG, GIF, TIFF, DOC, DOCX, TXT, RTF, EPUB, MOBI, AVI, MP3, MP4, XLSX, CSV, HTML, XML, PHP, DMG, TAR, BIN, ZIP, OGG. Well, you get the idea. The folder for my dissertation alone contains almost 100,000 discrete files. As I mentioned last year, managing and preserving all of this data can be somewhat unwieldy. One solution to this dilemma is to do our work collaboratively on the open web. My esteemed colleague and fellow digital historian Caleb McDaniel is running a neat experiment in which he and his student assistants publish all of their research notes, primary documents, drafts, presentations, and other material online in a wiki.

Although I think there is a great deal of potential in projects like these, most of us remain hopelessly mired in virtual reams of data files spread across multiple directories and devices. A common issue is a folder with 200 JPEGs from some archival box or a folder with 1,000 PDFs from a microfilm scanner. One of my regular scholarly chores is to experiment with different ways to sort, tag, manipulate, and combines these files. This time around, I would like to focus on a potential solution for the latter task. So if, like most people, you have been itching for a way to compile your entire communist Christmas card collection into a single handy document, today is your lucky day. Now you can finally finish that article on why no one ever invited Stalin over to their house during the holidays.

Combining small numbers of image files or PDFs into larger, multipage PDFs is a relatively simply point-and-click operation using Preview (for Macs) or Adobe Acrobat. But larger, more complex operations can become annoying and repetitive pretty quickly. Since I began my IT career on Linux and since my Mac runs on a similar Unix core, I tend to fall back on shell scripting for exceptionally complicated operations. The venerable, if somewhat bloated, PDFtk suite is a popular choice for the programming historian, but there are plenty of other options as well. I’ve found the pdfsplit and pdfcat tools included in the latter package to be especially valuable. At the same time, I’ve been trying to use the Mac OS X Automator more often, and I’ve found that it offers what is arguably an easier, more user friendly interface, especially for folks who may be a bit more hesitant about shell scripting.

What follows is an Automator workflow that takes an input folder of JPEGs (or PDFs) and outputs a single combined PDF with the same name as the containing folder. It can be saved as a service, so you can simply right-click any folder and run the operation within the Mac Finder. I’ve used this workflow to combine thousands of research documents into searchable digests.

Step 1: Open Automator, create a new workflow and select the “Service” template. At the top right, set it to receive selected folders in the Finder.

Step 2: Insert the “Set Value of Variable” action from the library of actions on the left. Call the variable “Input.” Below this, add a “Run Applescript” action and paste in the following commands:

on run {input}
tell application "Finder"
set FilePath to (container of (first item of input)) as alias
end tell
return FilePath
end run

Add another “Set Value of Variable” action below this and call it “Path.” This will establish the absolute path to the containing folder of your target folder for use later in the script. If this is all getting too confusing, just hang it there. It will probably make more sense by the end.

combinesmallStep 3: Add a “Get Value of Variable” action and set it to “Input.” Click on “Options” on the bottom of the action and select “Ignore this action’s input.” This part is crucial, as you are starting a new stage of the process.

Step 4: Add the “Run Shell Script” action. Set the shell to Bash and pass input “as arguments.” Then paste the following code:

echo ${1##*/}

I admit that I am cheating a little bit here. This Bash command will retrieve the title of the target folder so that your output file is named properly. There is probably an easier way to do this using Applescript, but to be honest I’m just not that well versed in Applescript. Add another “Set Value of Variable” action below the shell script and call it “FolderName” or whatever else you want to call the variable – it really doesn’t matter.

Step 5: Add another “Get Value of Variable” action and set it to “Input.” Click on “Options” on the bottom of the action and select “Ignore this action’s input.” Once again, this step is crucial, as you are starting a new stage of the process.

Step 6: Add the action to “Get Folder Contents,” followed by the action to “Sort Finder Items.” Set the latter to sort by name in ascending order. This will assure that the pages of your output PDF are in the correct order, the same order in which they appeared in the source folder.

Step 7: Add the “New PDF from Images” action. This is where the actual parsing of the JPEGs will take place. Save the output to the “Path” variable. If you don’t see this option on the list, go to the top menu and click on View –> Variables. You should now see a list of variables at the bottom of the screen. At this point, you can simply drag and drop the “Path” variable into the output box. Set the output file name to something arbitrary like “combined.” If you want to combine individual PDF files instead of images, skip this step and scroll down to the end of this list for alternative instructions.

Step 8: Add the “Rename Finder Items” action and select “Replace Text.” Set it to find “combined” in the basename and replace it with the “FolderName” variable. Once again, you can drag and drop the appropriate variable from the list at the bottom of the screen. Save the workflow as something obvious like “Combine Images into PDF,” and you’re all set. When you right-click on a folder of JPEGs (or other images) in the Finder, you should be able to select your service. Try it out on some test folders with a small number of images to make sure all is working properly. The workflow should deposit your properly-named output PDF in the same directory as the source folder.

To combine PDFs rather than image files, follow steps 1-6 above. After retrieving and sorting the folder contents, add the “Combine PDF Pages” action and set it to combine documents by appending pages. Next add an action to “Rename Finder Items” and select “Name Single Item” from the pull-down menu. Set it to name the “Basename only” and drag and drop the “FolderName” variable into the text box. Lastly, add the “Move Finder Items” action and set the location to the “Path” variable. Save the service with a name like “Combine PDFs” and you’re done.

This procedure can be modified relatively easily to parse individually-selected files rather than entire folders. A folder action worked best for me, though, so that’s what I did. Needless to say, the containing folder has to be labeled appropriately for this to work. I find that I’m much better at properly naming my research folders than I am at naming all of the individual files within them. So, again, this process worked best for me. A lot can go wrong with this workflow. Automator can be fickle, and scripting protocols are always being updated and revised, so I disavow any liability for your personal filesystem. I also welcome any comments or suggestions to improve or modify this process.

Posted in Research and Teaching Tools, Study Methods | Tagged , , , , , , , | Comments Off