R – import from Excel with French characters and export to csv

It seems like dealing with character encoding is a fairly common problem in R (see this -pretty old- article for example). For some reason I only experienced it today. When trying to import data (read.xlsx()) from Excel into R Studio with French text in it, I realized that:

  1. R Studio cannot display French characters properly when one tries to view the objects’ content,
  2. And R doesn’t export the objects with the proper encoding when using write.csv().

I use R mostly to work on raw data before I visualize them with D3.js. So point 1 doesn’t bother me too much. Point 2 on the other hand is a real problem.

After trying (to no avail) to add the fileEncoding option both to read.xlsx() and to write.csv(), and to set R’s local to French explicitely using Sys.setlocale(),  I finaly discovered a fix : use only ASCII characters in the first row of your Excel table (I’m not 100 % sure it’s ASCII only, but at least avoid characters like é, è, à, ç, etc.).

For instance, this table in Excel

Catégorie Salaire Espèce
A 100 Colibri
B 200 Sirène

will be output as this csv table  :
"CatÃ.gorie","Salaire","EspÃ.ce"
"A",100,"Colibri"
"B",200,"Sirène"

While this Excel table

Categorie Salaire Espece
A 100 Colibri
B 200 Sirène

will give you this csv table :
"Categorie","Salaire","Espece"
"A",100,"Colibri"
"B",200,"Sirène"

I have no clue whether this is a bug or a (rather weird) feature.

Real-life data-visualization by Cambio

I ran into this real-life visualization this week by Cambio, a car sharing service quite popular in Belgium.

Real-life data-visualization by Cambio

The message on the banner reads something like “1 Cambio means 12 cars less in the city”. Granted, it’s heavier to load than even a million nodes in a d3.js force layout. And their design isn’t very responsive either (i’m thinking narrow streets in a Greek village, for instance). But still, it’s pretty effective !

Real-life data-visualization by Cambio

Chronophotography revisited

I just stumbled upon these photos by (Catalan ?) photographer Xavi Bou, thanks to an article in The Guardian. He revisited what his predecessors Etienne-Jules Marey and Eadweard Muybridge first did in the late 19th century when they invented chronophotography, but his goal was to “move away from [their] scientific approach”, by portraying the scene in a non-invasive way. I think the result is beautiful.

Xavi Bou, Ornithographies

Visualizing Bob Dylan data !

Despite my age (I was born about 22 years after Dylan published his first LP), I like Bob Dylan‘s music a lot. While browsing BobDylan.com to find lyrics I noticed that there was some data about the songs : how many times they’d been played live, and when was the first and the last time.

Data + Dylan = exciting project !

So I thought I’d try to visualize it and see if I could make something interesting.

The story and the visualizations are here. Do contact me if you find any errors in the data. I collected most of it from BobDylan.com, but I also used dates from Wikipedia.

Visualizing Bob Dylan's data

On the origin of slopegraphs

Slopegraphs are very good ways of viewing both changes in value between two measures over time, and changes in the rank of items for each measure. They can also be used to compare two different dimensions, such as the number of cases and the number of deaths for different diseases (3rd chart), for instance.

Alberto Cairo introduced me to this type of chart in The Functional Art (p 129), and below is the classic example from Edward Tufte’s The Visual Display of Quantitative Information.

A slopechart from Edward Tufte's The Visual Display of Quantitative Information (Graphic Press), on page 158 of the 2013 edition
A slopechart from Edward Tufte’s The Visual Display of Quantitative Information (Graphic Press), on page 158 of the 2013 edition.

I’ve read in many places that Tufte was the inventor of the slopegraph, and I always believed it, although Tufte himself doesn’t claim that in his book.

That was until I visited the Museum of Press and Graphic Communication in Lyon, France, and ran into this :

A slopegraph published in l'Oeuvre on 3 February 1944.
A slopegraph published in l’Oeuvre on 3 February 1944.
Header of the journal where the date is visible.
Header of the journal where the date is visible.
Full page - l Oeuvre - 1944
Full page.

It seems like somebody made a slopegraph during WWII to show how the exports from different French industries had evolved between 1913 and 1935. The subtitle of the chart most likely aims at convincing the readers that the French economy would be better off after the war. The journal L’Oeuvre was apparently collaborationist during the war. The “terroristes” in Haute-Savoie below the header were certainly members of the French Resistance who had been recently massacred by the Nazis.

So that would make the slopegraph 39 years older than the one published by Tufte in 1983. I bet there’s even older examples out there…

A (not so good) example of a map

Visualization is a very powerful tool to share knowledge with people, when used properly. Last week there was some noise in the Belgian and French news and about a map of the Muslim population living in Belgium. The map was apparently first published in SudPresse, with a printed map to which I did not have access. A different, interactive map was also made by De Standaard in Flanders (the Dutch-speaking part of Belgium). There were many reactions to the publication of these maps, and I heard about it via this article on lemonde.fr.

First of all, there seem to be many flaws in the research that led to the publication of this news, which has apparently been criticized by other sociologists in Belgium. This is the most important problem here : if the data is of dubious origin or of poor quality, your visualization will necessarily be wrong and misleading.

The interactive map that was made based this research also has a few important problems in my opinion :

  • The color scale uses a linear color gradient (from grey to dark green) to display non-linear intervals : from 0 to 1 %, 1 to 2 %, 2 to 5 %, 5 to 10 %, 10 to 20 %. This results in a map where the majority of the country is colored, even though the Muslim population seems to concentrate in a few areas (Brussels, Antwerp, Ghent, for instance).
  • But for the colored areas, the map is empty. There are no cities displayed on the map. The result is once again an impression that Muslims are everywhere in Belgium, with some blur areas where their proportion is higher. I’m assuming that there must be quite a bit more to the data than this !
  • The source of the data isn’t very clear. The Pew Research Center and the German Federal Office for Migration and Refugees are mentioned, but it is not clear what data was actually used. Was it related to this report for instance ? I have no clue, and I am unable to check the features in the data myself.
Screen capture of the map on De Standaard's website.
Screen capture of the map on De Standaard’s website.

The resulting map in my opinion is unclear enough to be used by anybody, including for xenophobic purposes, even though that doesn’t seem to have been the goal of the author. I believe that any author of visualizations bears a responsibility towards her readers and towards the people who produced the data. This is especially true about such sensitive topics as Islam and immigration in Europe nowadays.

If you want to read more about the dangers of visualizing data without properly understanding or checking it, I recommend “Unknown unknowns”, a short article by Alberto Cairo about the dangers of “data ignorance”.

In case you didn’t know, constructive criticism of visualizations is a very popular sport in our domain. Take a look at Junk Charts or at this post by Alberto Cairo if you need to be convinced !

Rock art : first steps of graphic communication

Using graphics to communicate is not exactly a recent idea of our species. It actually dates back (at least) to the paintings our ancestors made in caves, as mentioned by Nigel Holmes in his Infographia poster. That means roughly between 20 and 40 000 years ago, for caves like Chauvet, Cussac or Lascaux in France.

While most of us have seen the paintings of animals and humans, few of us know that a great many (non-figurative) signs were also drawn in these caves. This is the topic of Genevieve von Petzinger‘s reasearch. In a TED talk she explained last year that after studying these signs in several caves across Europe, she came to the conclusion that many of them were found in different places and different times. While she thinks that there are too few signs to make them a language, she does think that they had a meaning, and that these signs and their accompanying paintings are the first know forms of graphic communication. That was way before d3.js, but though they lacked animation, they certainly stood the test of time.

There’s an interactive chart explaining where the signs were found here.

Geometric signs - Genevieve von Petzinger

Her book “The First Signs: Unlocking the Mysteries of the World’s Oldest Symbols” is also about to be published I think.

Oh, and while we’re into caves, Nature just published an article about a new breakthrough in this field : it seems like Neanderthal was already making sculptures or building things in caves as early as 176 500 years ago…

Interchange Choreography by Nicolas Rougeux

One thing I enjoy about data-visualization is that it can range from pure science to art. And my favorite pieces are generally those who are well aware of both ends of the spectrum : that they should be accurate and tell a truthful story, while remaining visually appealing.

There has been quite some noise recently about a recent work by Nicolas Rougeux entitled “Interchange Choreography“. Granted, it leaning more toward the art end of the data-visualization spectrum, but I like the result a lot.

He’s done lots of other interesting things too, and seems to regularly come up with clever ideas. I particularly like his “Weather portraits : US cities“, and “Colors of World Flags“.

Nicolas Rougeux, “Interchange Choreography”