How to get daily Google Trends data for any period with R

Recently, I needed some seven years of Google Trends daily data. It turned out that by default it’s not possible to get it neither through the web interface nor via API. So I wrote a tiny script that pulls daily Google Trends data for any period using gtrendsR package

What’s the problem with Google Trends?

Google Trends returns data in daily granularity only if the timeframe is less than 9 months. If the timeframe is between 9 months and 5 years, you’ll get weekly data, and if it’s longer than 5 years – you’ll get monthly data.

A trivial solution like querying the data month by month and then tieing it together won’t work in this case, because Google Trends assess interest in relative values within the given time period. It means that for a given keyword and month, Google Trend will estimate interest identically – with a local minimum of 0 and a local maximum of 100 – event in one month it had twice as many searches than in the other.

Querying Google Trend daily data properly

To get proper daily estimates, I do the following:

  1. Query daily estimates for each month in the specified timeframe;
  2. Queries monthly data for the whole timeframe;
  3. Multiply daily estimates for each month from step 1 by its weight from step 2.

Here is the R code:

library(gtrendsR)
library(tidyverse)
library(lubridate)

get_daily_gtrend <- function(keyword = 'Taylor Swift', geo = 'UA', from = '2013-01-01', to = '2019-08-15') {
  if (ymd(to) >= floor_date(Sys.Date(), 'month')) {
    to <- floor_date(ymd(to), 'month') - days(1)
    
    if (to < from) {
      stop("Specifying \'to\' date in the current month is not allowed")
    }
  }

  mult_m <- gtrends(keyword = keyword, geo = geo, time = paste(from, to))$interest_over_time %>%
    group_by(month = floor_date(date, 'month')) %>%
    summarise(hits = sum(hits)) %>%
    mutate(ym = format(month, '%Y-%m'),
           mult = hits / max(hits)) %>%
    select(month, ym, mult) %>%
    as_tibble()
  
  pm <- tibble(s = seq(ymd(from), ymd(to), by = 'month'), 
               e = seq(ymd(from), ymd(to), by = 'month') + months(1) - days(1))
  
  raw_trends_m <- tibble()
  
  for (i in seq(1, nrow(pm), 1)) {
    curr <- gtrends(keyword, geo = geo, time = paste(pm$s[i], pm$e[i]))
    print(paste('for', pm$s[i], pm$e[i], 'retrieved', count(curr$interest_over_time), 'days of data'))
    raw_trends_m<- rbind(raw_trends_m,
                         curr$interest_over_time)
  }
  
  trend_m <- raw_trends_m %>%
    select(date, hits) %>%
    mutate(ym = format(date, '%Y-%m')) %>%
    as_tibble()
  
  trend_res <- trend_m %>%
    left_join(mult_m, by = 'ym') %>%
    mutate(est_hits = hits * mult) %>%
    select(date, est_hits) %>%
    as_tibble() %>%
    mutate(date = as.Date(date))
  
  return(trend_res)
}

get_daily_gtrend(keyword = 'Taylor Swift', geo = 'UA', from = '2013-01-01', to = '2019-08-15')

get_daily_gtrend function should return a tibble with daily trend. Now you can plot it nicely or use in some analysis

 196   2019   R