Visualizing Networks in Contemporary Poetry
I've thought for a while now that some fascinating information might be gleaned from visualizing networks in contemporary poetry, so I've begun compiling a database of poets and their connections. I've started by focusing on my own library of contemporary poetry, which is extensive, to say the least. (Do I have a problem? Do I buy too many books? Well, if you've got to have an addiction...)
As anyone who works with data can tell you, the labor lies in compiling and cleaning the data; doing stuff with it is really the payoff. I'm working on this on my own, in my dark little office, while listening to old Coltrane recordings. Here's what I've done so far:
I went through my library and photographed the covers and acknowledgement pages of every book that is fewer than 10 years old. I did the same with a few friends' libraries and with the library here at Hood, which has a small, but surprisingly well-curated selection of contemporary poetry (thank Hood's professor of poetry, Elizabeth Knapp, and its poetry loving research librarian, Aimee Gee, for that). Then I transcribed the info I captured in the photos - poet, book, and people mentioned in the acknowledgements - into a spreadsheet. After a ridiculous number of hours, I managed to compile 1417 unique names, most of whom are poets, in a nodes sheet, along with an edges sheet that shows their relationships. (This is clearly a biased dataset. The next step in this process involves building an online tool that allows for collecting data from poets and small presses. For now, however, this dataset is large enough to work with.)
After that was done, I realized I might be able to generate richer results if I also had the location of the poets. For the next few weeks, I googled every individual and read their bios in an effort to find out where they are based. Eventually, I managed to find the city in which the individuals live or work in around, I'd say, 80% of the time.
So now, I have a couple of spreadsheets that look like this:
I decided, for now, to make the edges table a set of directed relationships because it seemed that it's relevant that one poet mentioned the others. I wanted to see how frequently a poet was being mentioned by other poets - I was interested to see, for instance, if there are poets who are particularly significant within a given community, or if there are poets who bridge communities.
Some initial visualizations:
Right now, I'm working with Gephi and with Palladio. Below are three images of the data represented as network graphs.
In the above instance, applied the Force Atlas layout, which tries to keep the edges (connections) relatively equal in length and tries to keep them from overlapping too much. All forced-based algorithms, generally, pull linked nodes together and push unlinked ones apart. Then I applied the Yifan Hu layout, which helps to cluster nodes and treats clusters like individual nodes. I sized the nodes according to in-degree weight, and I applied a modularity calculation which is represented by color. I'm (perhaps quite obviously) not a data scientist or a statistician, so I don't fully understand the mathematics at play here, but I do think there is value in experimentation, exploration, and play. The close-up of the graph above represented in image 3, for instance, reveals a clustering of African American poets who, despite geographical differences, seem to have formed a significant community of support. This section of the graph makes me consider the influence of Cave Canem on the shape of American Poetry in the last several decades - although there are most certainly a countless other reasons communities like this are formed.
I also did some geo-spatial layouts of this data, in Gephi and in Palladio.
I feel, at least at this moment, I can understand more about the makeup of contemporary poetry communities from these maps. The first map (4), made in Palladio, indicates concentrations of poets - the larger the circle, the greater the number of poets in that location. Not surprisingly, poets tend to live on the coasts, in major metropolitan areas. New York and its environs has the greatest number of poets. San Francisco and Chicago, and LA are big poetry centers. Denver is huge here. I've always suspected that the Denver/Boulder area is one of the most significant literary areas in the country. I'm also aware, however, of the fact that this is a representation of my library, and until recently, I lived in Denver. The data is most certainly skewed - I buy and read my friends' books, after all. I need to more fully realize the data set before I can make any legitimate claims about the importance of Colorado to American poetry.
The second map (5) was made in Gephi using the GeoLayout plugin, and because it shows relationships, I think it has a lot to say. I'm fascinated by the concentration of edges between Colorado and New York/the North East. I would have thought, based upon the history of American poetry in the 20th century, that the edge concentration between San Francisco and New York would be much more pronounced.
These are a few cursory observations based on these visualizations - there is nothing definitive here at all. But these initial visualizations are promising. I intend to keep working on this project, and I will continue to report here. If you have any thoughts or suggestions, I'd love to hear them.