Hello! I am Lars E. Schonander, a writer for MediaFile and a blogger on international affairs, tech, and general wonkery. Happy Monday! Here is my weekly newsletter with a weekly analysis with interesting data, along with links related to things I found particularly interesting that week. Any Questions? Send me a message or just respond to this email!
The Weekly Data:
FiveThirtyEight recently did an article on college fight song lyrics, where FiveThirtyEight analyzed several college fight songs based on some features within the songs.
As FiveThirtyEight does, they publish a CSV file of their data on GitHub. Interestingly enough, while they use information from Spotify, not all of the Spotify data associated with songs were in the dataset. To get the data I wanted to analyze, I used the Spotify API in R by using map
on a list of Spotify IDs, and then grabbed the musical data Spotify has with each song.
Also, the CSV file lacked the actual lyrics to the song. To do this, I used Rvest
to scrape the article website. Because the school names were the same, it was easy to join the lyrics and the rest of the data by the school.
To begin, I used the TidyText package to do a sentiment analysis of the lyrics to see what were the most positive and most negative songs in the data.
Related to sentiment, I also wanted to see if there was a relation between sentiment and Beats Per Minute (BPM). I discovered running a regression that FiveThirtyEight kept the tempo variable from Spotify, but renamed it BPM and just rounded it up. I discovered that there is no linear relationship, but as seen in the graphic below, and by some unsupervised learning (K-means), there are two noticeable clusters.
The level of sentiment remains consistent, but there are two categories of songs. Those with low BPM around 60-90, and those with a high BPM past 130.
If there were more songs included in the dataset, this would be a good way to practice a classification problem, by using a Random Forest or Naive Bayes, but the dataset, at around 65 rows, is not that large quite honestly.
To cap it off, I been looking at Julia Silge’s course on machine learning. One thing she notes is the power of exploratory data analysis to find interesting relations in your dataset even before you do any machine learning.
AsThe
The relation overtime is constant, meaning that later fight songs don’t have a different amount of words or syllables in them and that there are few deviations when it comes to how many words or syllables a fight song has
As a side note, this dataset would be a fun one to practice my D3.js skills, so perhaps in a few weeks, a more interactive version of this dataset will be made available.
Now, some links…
David Auerbach (Music & Literature): László Krasznahorkai's Baron Wenckheim's Homecoming
László Krasznahorkai’s massive, intimidating, and disturbing novel arrives with a specific statement of intent from its author:
With this novel I can prove that I really wrote just one book in my life. This is the book—Satantango, Melancholy, War and War, and Baron. This is my one book.
Krasznahorkai has written other novels besides these four. They do not, at first glance, proclaim themselves siblings. Yet having now read Baron Wenckheim’s Homecoming twice, once before re-reading the other three and once after, I can say that it does stand atop its predecessors as a unifying lintel of Krasznahorkai’s work—and that knowledge of its predecessors greatly enriches the finale.
Anna Weiner (New Yorker): Four Years in Startups
Depending on whom you ask, 2012 represented the apex, the inflection point, or the beginning of the end for Silicon Valley’s startup scene—what cynics called a bubble, optimists called the future, and my future co-workers, high on the fumes of world-historical potential, breathlessly called the ecosystem. Everything was going digital. Everything was up in the cloud. A technology conglomerate that first made its reputation as a Web-page search engine, but quickly became the world’s…
David Wallace-Wells (New York): Vaclav Smil: We Must Leave Growth Behind
Vaclav Smil cuts an unusual figure in the climate world — an iconoclastic Czech-Canadian scientist, he is often called the person who understands energy transitions better than anyone else in the world. (Bill Gates is a particular fan.) But his view of energy transitions is, famously, dour — that it will take, at least, many more decades to produce a transition to renewable energy than most analysts and advocates predict and that a total transition may prove tremendously difficult.
In his new book, Growth — a dense, 500-page treatise that covers everything from “microorganisms to megacities,” whose afterword we’re excerpting here — Smil makes perhaps an even-more-off-putting proposition: that in order to “ensure the habitability of the biosphere,” we must at the very least move away from prioritizing growth and perhaps abandon it entirely.
What I’m Reading
Nothing new, still reading the László Krasznahorkai book on China which I managed to find a few weeks ago.
What I’m Working On
Went to hackNY a few days ago, my team worked on a machine learning project to highlight refugee outcomes after they enter the United States. I worked on the backend, initially using R for exploratory data analysis, but using Scikit-Learn and Flask to deploy the algorithm I liked best out of the few I played around within R.
It won’t be up for a bit, but I am working on an article on my experience at the hackathon, along with with with a meditation on the poor state of technical documentation for many PaaS (Platform as a Service) as SAAS (Software as a Service) services.
Thanks!
Thanks for taking the time to read this, I will be back next Monday. In the meantime, you can follow me on Twitter or reach out via email.