This tutorial documents my experiments with R and is mainly intended to provide inspiration for others exploration. The tasks accomplished in the described process are chiefly useful as an exercise leading a novice R user to more complex and more useful analytical endeavors.
We are going to try to download data about website traffic from Google Analytics (sessions grouped by countries) and use R with a couple of great packages to plot this data onto a world map.
Overall, we will go through the following steps:
- Connecting to Google Analytics
- Executing a query and downloading Data
- Adding country codes
- Plotting on a map
Connecting to Google Analytics
Firstly, we need to connect to our account in Google Analytics to be able to query and download data stored in our properties there.
If not installed yet, we use:
Firstly, we will need to create an authentication object using the Auth function provided by RGoogleAnalytics. The Auth function asks for two variables Client ID and Client Secret. In order to get those, we will have to head over to the Google Developer Console and Create a new project with any name desired.
Once created, we need to enable the Analytics API. We do that by going to APIs under APS & auth, selecting Analytics API (Advertising APIs) group and Enabling it.
Next, head over to the Credentials screen found also under API & auth. We click “Add Credentials” and select “oAuth 2.0 client ID” as the type. Now we are prompted to configure the consent screen. The only thing we really need to put in is the name. Select other as application type and voilá.
the application and obtain the two variables we can create the Authentication object with.
We use the following code updated with your specific values:
token <- Auth(“xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com”, client.secret)
When you execute the Auth function, you will be redirected to a website where you will need to approve the consent form you designed one of the previous steps. Since this is a required step every time you want to connect to your Google Analytics Account, I suggest saving the token as a file using:
Later, you can conveniently resurrect token object into the environment using the load() function.
If you are interested, here is a look at a potentially better solution to saving objects in R. The loaded token can be verified using the function
In order to query the statistics, we will need to know one more parameter and that is the ID of the Profile (Profile ID/View ID). This can be found either manually by going to the Admin Panel in your browser or using another function:
Please note the code or store it in your working environment. It will be used in the next step.
Once all these steps have been accomplished, we managed to connect to Google Analytics and can move on to actually Querying the website traffic data.
Executing a query and downloading Data
Once we are connected, can download the data. Google Analytics uses its own data structure which is familiar to advanced users of the service. If you are working only with the web interface of Google Analytics, this may seem daunting at first but really it is the same variables just written in a fancier way. To get a feel for the parameters to play with, head over to the Query Explorer. A good resource is also a list of common queries provided by Google.
1. Building a list of parameters to pass into the query
Firstly, we use the Init() function to build a list of parameters to be used in the following steps. In this tutorial, I will be working with several months worth of data clustering sessions by country:
query.list <- Init(start.date = "2014-01-01", end.date = "2015-08-01", dimensions = "ga:country", metrics = "ga:sessions", sort = "ga:sessions", max.results = 1000, table.id = "ga:12345678")
2. Building the query
Once we have this list prepared, we feed the list as an argument to a function which will build the actual query for us.
ga.query <- QueryBuilder(query.list)
3. Downloading results obtained by executing the query
In the last step, everything comes together. We cam use the token element to authenticate into our Google Analytics account and execute the prepared query. The function GetReportData returns a data frame we can save in R:
ga.df <- GetReportData(ga.query,token)
Adding country codes
We have obtained the data frame with sessions and countries, which is great but we will need to add one more column with the names of the countries provided in a format the mapping package (rworldmap) will understand. Essentially, we need to perform a substitution of the character string with country name with a code in the ISO3 format. Fortunately, this will be an easy step thanks to an R package called countrycode :
# Extracting names of the countries from the report: long_names <- ga.df$country #Finding codes corresponding with the the long names and saving vector #of those values: new_codes <- countrycode(long_names, "country.name", "iso3c") #Writing the codes into the original data frame: ga.df$iso3 <- new_codes
Plotting on a map
Once we have a data frame with column with country codes and a column with variables to display, we are ready to proceed to the actual maping. Package we will use to accomplish this is called rworldmap.
1. Joining data to a map
The next step is to create an object which joins the data frame to the map object. The joining uses the newly added column in the data frame.
mapjoin<- joinCountryData2Map(ga.df , joinCode = "ISO3" , nameJoinColumn = "iso3" , nameCountryColumn = "Country" , suggestForFailedCodes = FALSE , mapResolution="coarse" , projection=NA #DEPRECATED , verbose = FALSE )
2. Displaying the map
When we have prepared this element joining the data frame with the map, we can go ahead and finally see the map. The function enabling this is mapCountryData, we feed it the object created in the previous step; specify the column with data to be ploted and control appearance of the rendered figure using the remaining variables. This is my implementation:
mapCountryData(malmap, nameColumnToPlot="sessions", mapTitle = "Sessions per country", catMethod = "fixedWidth", missingCountryCol = gray(.8), colourPalette = "heat")
Which, in my case, produces the following figure:
While it is true that the same but better figure can be accessed through Google Analytics web interface, in the case of this tutorial, the journey is more important than the destination. We went through the basics of connecting to the Google Analytics through its API and we can go on to embrace bigger analytical challenges. I hope you enjoyed it. I welcome corrections and am accepting letters of gratitude at my email Happy analyzing.