The Aggregate Weekly Newsletter🔬June 25, 2019

Hello! I am Lars E. Schonander, a writer for MediaFile and a blogger on international affairs, tech, and general wonkery. Happy Tuesday! Here is my weekly newsletter with a weekly analysis with interesting data, along with links related to things I found particularly interesting that week. Any Questions? Send me a message or just respond to this email!


The Weekly Data:

I recently discovered that The Pudding under their backlog section was looking for stories related to the analysis of headlines. Having read a NICAR 2019 tutorial on how to use Quandata and been playing around with Tidytext, taking a small stab at this type of project seemed like a fun way to do more types of text analysis.

To grab the data, it is necessary to either web scrape a website, or use a native API. In both cases, both BuzzFeed and The Guardian offered API’s of varying quality. As a note, the library purr makes it very easy to call on a range of let’s say 1:100, and then combine all the called dataframes into a single data frames containing many articles and their headlines. After this it’s convenient to dump the data into a SQLite file so one doesn’t need to call the API’s again.

For example, this is what it looks like to grab quite a few BuzzFeed stories:

get_buzzfeed_stories <- function(paging){
  url <- paste("https://www.buzzfeed.com/api/v2/feeds/news?p=", paging, sep = '')
  data <- jsonlite::fromJSON(url)
  stories <- data$buzzes
  stories %>%
    select(
      title,
      category,
      published_date,
      bylines
    ) -> stories
  stories
}

1:100 %>%
  map(., get_buzzfeed_stories) %>%
  compact() %>%
  reduce(bind_rows) %>%
  unnest(bylines) %>%
  select(title, category, published_date, display_name)

To get to the actual charts, before looking at the Flesch–Kincaid readability scores which I added into the data set, there is one amusing pattern to be noted in one of the datasets.

When looking at The Guardian, one can clearly see the publishing schedule play out, along with the massive spike in the start of June, which I assume was caused by Theresa May resigning from the position of Prime Minister on the 7th of June.

To start, looking at the more recent Guardian articles, I decided to run an ANOVA test (Analysis of Variance) to check the score of articles related to Trump versus articled not related to Trump, out of curiosity (Only in the US News section, however). Amusingly enough, the test provided that the difference is statistically significant, and that the Flesch–Kincaid readability scores have different means for the regular articles versus the Trump related articles. Why this is, I do not know, but it is modestly amusing.

When looking at BuzzFeed, I was curious how much the top five categories varied from each other. The answer is, not by much! The tech and science sections have lower scores, but that’s not a bad thing all things considered, mainly because the headline writers are trying to make a complicated concept SEO readable and understandable by a non-technical/scientific audience.

The next steps with this project is a more interactive form of analysis. For BuzzFeed I queried their API back to 2014,and now have 48095 BuzzFeed article titles along with other information in a SQLite file. The same will need to be done for The Guardian, along with other outlets such as the New York Times or Bloomberg.


Now, some links…


Geoff Manaugh (BLDGBLOG): Nakatomi Space

While watching Die Hard the other night—easily one of the best architectural films of the past 25 years—I kept thinking about an essay called “Lethal Theory” by Eyal Weizman—itself one of the best and most consequential architectural texts of the past decade (download the complete PDF).

In it, Weizman—an Israeli architect and prominent critic of that nation’s territorial policy—documents many of the emerging spatial techniques used by the Israeli Defense Forces in their high-tech, legally dubious 2002 invasion of Nablus. During that battle, Weizman writes, “soldiers moved within the city across hundred-meter-long ‘overground-tunnels’ carved through a dense and contiguous urban fabric.” Their movements were thus almost entirely camouflaged, with troop movements hidden from above by virtue of always remaining inside buildings. “Although several thousand soldiers and several hundred Palestinian guerrilla fighters were maneuvering simultaneously in the city,” Weizman adds, “they were so ‘saturated’ within its fabric that very few would have been visible from an aerial perspective at any given moment.”

Eyal Weizman: Lethal Theory

The maneuver conducted by units of the Israeli Defense Forces (IDF) in Nablus in April 2002 was described by its commander, Brigadier General Aviv Kokhavi, as inverse geometry, the reorganization of the urban syntax by means of a series of microtactical actions. During the battle, soldiers moved within the city across hundred-meter-long “overground-tunnels” carved through a dense and contiguous urban fabric. Although several thousand soldiers and several hundred Palestinian guerrilla fighters were maneuvering simultaneously in the city, they were so “saturated” within its fabric that very few would have been visible from an aerial perspective at any given moment.

Gwern: On Seeing Through and Unseeing

To draw some parallels here, I think unexpected Turing-complete systems and weird machines have something in common with heist movies or cons or stage magic: they all share a specific paradigm we might call the security mindset or hacker mindset.

What they/OP/security/speedrunning/hacking/social-engineering all have in common is that they show that the much-ballyhooed ‘hacker mindset’ is, fundamentally, a sort of reductionism run amok, ‘seeing through’ abstractions to a manipulable reality.7

Michael Tomasky (New York Review of Books): The Rules of the Game

The fight for the Democratic presidential nomination will begin to assume a more concrete shape on the nights of June 26 and 27, when twenty candidates take the stage in Miami for the first round of debates. They qualified to participate by meeting one of two criteria before June 12: securing a minimum number of donors (at least 65,000) or registering more than one percent support in three major polls. The qualifying criteria are the same for the second round of debates in Detroit on July 30 and 31. For the third round, to be held on September 12 and 13 (in a location not yet chosen), the criteria are essentially doubled, which could winnow the field considerably. The Democratic National Committee has sanctioned twelve debates, to be held until next April.

Tom Stevenson (London Review of Books): How to Run a Caliphate

The horrors of IS rule are well known: the killings of Shia; the choice offered to the Christians of Mosul (conversion, ruinous taxation or expulsion); the slaughter of polytheists; the revival of slavery; the massacre of Yazidis on Mount Sinjar. Less well known are the thousands of mundane regulations instituted by the caliphal bureaucracy. The claim to be a state, not just another band of zealous militiamen, was central to what IS stood for. In support of its statehood it operated marriage offices, a telecommunications agency, a department of minerals and a central birth registry. Its department of alms and social solidarity redistributed wealth to the poor. Its department of health brought in sanitation regulations that stipulated more frequent bin collections than in New York. 


What I’m Reading

Starting to rework my way through The Algorithm Design Manual. When I start applying for jobs I’ll probably apply to some more technical roles so doing a refresher on some Computer Science knowledge is always helpful.


What I’m Working On

I wrote up some notes on the Pulitzer Beyond Religion conference which I attended at the start of June at the National Press Club.

I been also for my internship doing quite a bit of JavaScript work. Svelte.js has proven to be a useful library for creating interactive apps, especially those involving d3. I like Svelte quite a bit because like the R library shiny, it is also based on reactive programming, which seems particularly well suited for dashboard applications.


Thanks!

Thanks for taking the time to read this, I will be back next Monday. In the meantime, you can follow me on Twitter or reach out via email.