battling technology: data scraping and kumu

I’ve always liked to think of myself as a reasonably technologically capable person. I know the very basics of html, and have managed to poke at some pre-written page codes when I didn’t quite like them and come out with a result that pleased me. Data scraping, unfortunately, did not give that same experience. When […]

I’ve always liked to think of myself as a reasonably technologically capable person. I know the very basics of html, and have managed to poke at some pre-written page codes when I didn’t quite like them and come out with a result that pleased me.

Data scraping, unfortunately, did not give that same experience.

When I sat down to do this project, after plugging multiple years around the 1830s into the Dissenting Books search and coming back with 900-1000 results, I decided that surely going back a couple decades would yield a far smaller data set. I was correct- the year 1810 only had three pages of results. So, with a halfway giddy (only three pages!) and halfway guilty (only three pages…) conscience, I began data scraping.

The scraping process itself was not difficult. I was able to successfully scrape the data from the table, and put it into Google Sheets without a problem. In fact, I enjoyed adding all the small codes to add different columns from the table into the scraped data. I understood (mostly) what was going on, and faced no troubles on that front.

From that point on, it was downhill.

My thoughts, the further I got into this process of cleaning and copying the data, went something like this: Google Sheets is now my worst enemy. If I ever have to use Google Sheets for anything other than making a cross-stitching pattern ever again I’m going to fling my computer across the room. It was a mess of coding not working- even now, four days after I was defeated by the dragon, I’m not sure if it was human error on my part, if some of the data went wrong, or if Sheets just decided it didn’t like codes that day. Thankfully, my data set was small enough that when some things started to go wrong, I was able to simply go in and fix them myself.

Things that went wrong:
– Find and replace didn’t get rid of all the :, /, and assorted punctuation at the end of book titles, no matter what variations of the code I tried.
– Finding the weight refused to actually count the books, putting a “1” in for every single book, even when I could see five different occurrences on my screen.
– Copying and pasting values was a nightmare I could not seem to escape, and I never managed to fix it.

By the time I made it to the Connections sheet and got stuck on trying to wrangle VLOOKUP, four hours had passed, curses had poured from my mouth, and tears had been shed. It was at this point in time that I raised my figurative white flag and emailed Dr. Pauley what I’m sure was a rather disconcerting email, including a sentence that said something along the lines of “I’d rather chop off a hand than continue to do this.” (In retrospect, that line was very dramatic. I will, however, continue to stand by it.)

After Dr. Pauley swooped in and saved my Google Sheet from being abandoned completely (thank you so much, BP), I continued to avoid the data for the weekend, because I am a coward and did not want anything to do with it. However, today I finally logged back on and was pleased to have everything go smoothly with Kumu.

If you look at the map, you’ll notice something I found incredibly ironic- not a single person at Manchester Academy in 1810 checked out the same book as any other person. There are literally no connections- everyone is their own tiny map. Thanks to the weight, you can tell that some people checked out the same book multiple times. However, that it as interesting as it gets. After all of my plight to find their connections and study them, the people of 1810 said, “Checking out the same books? Nah.”

C’est la vie, I suppose. The fact that it turned out not to give me the kind of data I wanted to see will probably add to my desire to never, ever, ever in my life use data scraping and cleaning as a tool again.

Assignment #5 — A Book’s Owners

You know how when renovating a room, you have to scrape all of the wallpaper off the walls and it is easy and super tedious, but you know it’s going to be worth it in the end? That’s kind of how this assignment went for me. This week we were asked to once again use […]

You know how when renovating a room, you have to scrape all of the wallpaper off the walls and it is easy and super tedious, but you know it’s going to be worth it in the end? That’s kind of how this assignment went for me. This week we were asked to once again use technology I had previously never dealt with before, and considering last week’s results, I was cautious to say the least. Surpisingly enough, I did not have as many troubles this time around. While data scraping is definitely not how I would like to spend an afternoon, much like peeling wallpaper off it was more tedious than anything.
Once I finally got all of the data, painstakingly putting it through Google Sheet after Google Sheet, I was ready to take on Kumu. Once again, I was surpised how user-friendly it was. (Maybe Zotero and I are just not meant to be.) Although there are some discrepancies with my map, such as the random gray circles that did not want to get color coded, overall it turned out pretty well.

I used the Bristol Baptist Academy’s records of 1860:

Social Networking With Kumu

This assignment brought a whole new meaning to the process of “trial and error.” While I struggled immensely with the intricate process of data scraping, the final product gave me a special appreciation for Kumu. This data mapping site brings a special awareness to the relationships between different elements. In this case, we see the […]

This assignment brought a whole new meaning to the process of “trial and error.”

While I struggled immensely with the intricate process of data scraping, the final product gave me a special appreciation for Kumu. This data mapping site brings a special awareness to the relationships between different elements. In this case, we see the relationship between students and the books they borrowed within 1845 at Manchester Academy. I gathered this data from Dissenting Academies Online. This data started out messy and complex so it needed to be scraped and cleaned. Although the last assignment went extremely well for me, this exercise was a different story. In my opinion this map was worth the struggle. Being able to see all of the information that taunted me in such a cool visual is strangely fulfilling. Below is the end result of my attempt at data scrapping.

Social Networking Map

In our 5th assignment we were asked to explore the world of social networking maps. Above is my recreation of Dr. Pauley’s selected data of books and borrower’s of Manchester… Read More

In our 5th assignment we were asked to explore the world of social networking maps. Above is my recreation of Dr. Pauley’s selected data of books and borrower’s of Manchester Academy from January to December of 1845 for better or for worse. Enjoy!

The Ties That Bind (Part II)

Apparently Safari and Kumu are having domestic problems, and I suspect it is because Chrome and Kumu are in the middle of an affair. All of that is a round about way of saying that my loading problem with Kumu magically fixed itself as soon as I used Chrome. Technical difficulties aside, I really enjoyed […]

Apparently Safari and Kumu are having domestic problems, and I suspect it is because Chrome and Kumu are in the middle of an affair. All of that is a round about way of saying that my loading problem with Kumu magically fixed itself as soon as I used Chrome. Technical difficulties aside, I really enjoyed using Kumu. I think that like Timeline JS this is a fun way of showing visually the data we are working with. It also builds upon the connection work we did with the spreadsheets, except that the Kumu map is colorful. In all seriousness however, I think that being able to interact with large amounts of data in this manner is extremely helpful to building an understanding concerning the data. The ability to work with it in some physical manner rather than trying to project it all in one’s head is lovely and helps foster better analysis. Below is my recreation of Professor Pauley’s Manchester data. I really want to find some way to incorporate this into our local project.

Assignment 5: Social Network Mapping with Kumu

“Let’s work with web scraping and social networking,” they said. “It doesn’t matter if you’re technologically inept,” they said. But I had to go and be a special snowflake by picking my own set of data—basically making a perfectly reasonable project take ten hours more than it should have. This week, we worked on unearthing […]

“Let’s work with web scraping and social networking,” they said. “It doesn’t matter if you’re technologically inept,” they said. But I had to go and be a special snowflake by picking my own set of data—basically making a perfectly reasonable project take ten hours more than it should have.

This week, we worked on unearthing library records for dissenting academies via the Dissenting Academies Online Project. I chose to work within the calendar year 1830, focusing my search on Homerton Academy. With tools such as Kumu, Scraper, and xPath Finder (the latter two being Chrome extensions), I scraped data from the DAO database and used it to map connections between books and borrowers during that year, as seen below.

Regardless of the time I spent working on this project (due to the fact that I had to constantly seek help in fixing my own errors, because I’m a Creative Writing major and computers are hard) the end product was surprisingly cool. I didn’t even have to ask for help on Part III, which kind of made me question reality for a few minutes but it’s all cool now.

So there you have it – a comprehensive, slightly jiggly map of connections between the borrowers and the books of Homerton Academy in 1830.

Coding for Connections: Dissenting Academies Online and Kumu Course Exercise

Just when I thought we were out of the worst of the complex digital thicket I discussed last week, this week’s assignment resolved to convince me otherwise. I’ll be the first to admit that this was an arduous task; I found myself almost affectionately missing last week’s mapping activity (almost). However, as was the case […]

Just when I thought we were out of the worst of the complex digital thicket I discussed last week, this week’s assignment resolved to convince me otherwise. I’ll be the first to admit that this was an arduous task; I found myself almost affectionately missing last week’s mapping activity (almost). However, as was the case last week, there is a real sense of accomplishment when I (finally) present the end product of my hard work!

This week’s assignment was all about using the innovative data-display platform Kumu. Kumu is a great way to present hefty or complex data and show the relationships that are within it. For my map, I chose to explore the loan records of Manchester Academy in its 1845 academic year. I got this data from Dissenting Academies Online, which is a vast storage of digital information that relates to the Dissenting Academies of the United Kingdom. There was a lot of data to maneuver in this exercise, but my ultimate presentation below shows just how connected everything is!

 

Dissenting Academy Map

We were victims of our own success. Because we were able to generate the printing/publishing map of a text using the British Short Title Catalogue with only moderate pain and… Read More

We were victims of our own success. Because we were able to generate the printing/publishing map of a text using the British Short Title Catalogue with only moderate pain and a few tears, we were all tasked with generating a map via scraping information from the Dissenting Academies Online website.

Upon beginning this task, I took a deep breath and whispered to myself, “This won’t hurt that much”.  Boy, was I wrong.  When deciding on query to ask of the Dissenting Academies Online website, I settled for the library records spanning from 1843-1844.  There were over 1,000 results from this search.  To keep myself from going mad, I used the first 17 sheets of results.  I know this is only a minor dent in ALL of the data.  However, from what I used one can still see a few interesting trends, such as the same person repeatedly checking out the same book.  As mentioned in class, this possibly happened because many of the academies would not allow their books to physically leave the library.

The art of scraping a website, while not my strong suite, was not terrible.  I was pleasantly surprised.  However, while it was all outlined for us in elaborate walkthroughs, manipulating the data in Google Sheets proved to be an expletive inducing experience. There was a “son of a b*@%#” here, and a “f!&*” there; nothing too out of the normal.

Once I “tamed” the data (or thought I had); it was time to wrestle with Kumu.  In one corner, there was me: an already battle-worn SLOB warrior (who was still having flashbacks about last weeks “bloodshed”).  Then in the other corner was Kumu:  a new, fresh-faced opponent, whose methods and tactics in the ring were still unknown to me.  Unfortunately, to be very anticlimactic, Kumu was fairly simple to use.  The only frustrating part was figuring out what was up with three sections of my Google Sheets data.  While I never exactly figured out what was wrong with the data, I was still able to generate a map.  Is the map beautiful?  No.  Does the map make much sense?  Not really.  However, what I did get out of this is potentially a new skill that I not only can apply for later in this class with our final project, but, it also has the potential to be used in other areas of my academic life.

Homerton Borrowing Map

After playing around with some of the information from Dissenting Academies, I decided that I would work with the borrowing records from Homerton in 1830. Scraping and cleaning up the… Read More

After playing around with some of the information from Dissenting Academies, I decided that I would work with the borrowing records from Homerton in 1830. Scraping and cleaning up the data took longer than I anticipated it would, but following the tutorials proved to be relatively easy and I managed to complete that part of the assignment with minimal levels of frustration. The frustration actually only kicked in when I tried to embed my Kumu project into this post. I’m still not sure what was wrong or how I eventually fixed it, but the map is here and it’s working and I’m not going to think too much about how that happened.

Even after Kumu decided to start a fight with me, I really like the idea of being able to visualize connections among books and people. I think that tools like Kumu (when it decides to cooperate) make it much easier to see how specific books actually create these links across people and space. I could see myself using this tool again at some point, maybe without so much data scraping, but I think that Kumu and I need some time apart before we can work together amicably again.

Borrowing Books during the Early 19th Century

Kumu is hands down my favorite tool we’ve used. This mapping process constructs a mesmerizing visualization of different topics – in this instance, books and who borrowed them in a… Read More

Kumu is hands down my favorite tool we’ve used. This mapping process constructs a mesmerizing visualization of different topics – in this instance, books and who borrowed them in a given time period. I chose a slightly different time from the tutorial to see if I could accomplish this assignment on my own. I chose August 1, 1825, to July 31, 1826, in order to cover a school year. I stuck with Manchester Academy because the Dissenting Academies Online website gave me a lot of listings to work with.

Though this process of searching the Dissenting Academies Online and scraping the data from that website into Google Sheets tested my patience, I was overwhelmed with joy when I completed the mapping process. By doing this, we are able to look back at a time period and easily view who borrowed a book, at what date, and how many times, if applicable. It’s basically a fun way of searching for a certain person and quickly being able to learn their book loan history. Whether they borrowed books for school education or personal curiosity, we can learn a lot about their habits.