Recursive functions in R

Besides product management and growth, sometimes I also write short technical posts (particularly, about R), where I share solutions to non-trivial tasks I’ve encountered or just useful pieces of code.

In this post:

  1. How to write a recursive function in R, and
  2. How to apply this function group-wise to a data frame.

The context

Our orders table is designed in a way that a single order may be split into different entities with numerical suffixes and character prefixes. It’s done so because different products have different manufacturing and shipping time (1) and customers may add or change items shortly after placing an order (2).

It turned out though, that not all salespeople have been using it as expected: when a customer was coming back months later, instead of creating a new order, they just added an incremental suffix to the existing order.

From the analytical standpoint, it made no sense: a single order may be spread over months (or even years). So before doing any analysis, I first had to group orders by the order id and then, within those groups, gather orders that happened within 7 days together

While the first part of the task was trivial using regex, the second one required iterating through groups of orders to properly match them by order date. That’s where a recursive function becomes handy.

Writing a recursive function in R

Let’s say we have a group of orders with a similar order id as an input. Now we have to gather orders, that happened during 7 days from each other. Here is the algorithm:

  1. Find an order with the minimal order date;
  2. Find all orders with order date that falls within 7 days period since the minimal order date;
  3. Mark them as a single order;
  4. Repeat steps 1 – 3 with the rest of orders within the given group.

And here is the R function that does it using recursion:

# x is a DF containing orders with the similar order id
group_orders <- function(x) {
  curr_min_date <- min(x$date)
  curr_order_num <- filter(x, date == curr_min_date)$order_number[1]
  # DFs with orders before and after the current min_order_date + 7 days
  before_min_date <- mutate(filter(x, date <= curr_min_date + days(7)), g_order_number = curr_order_num)
  after_min_date <- filter(x, date > curr_min_date + days(7))
  # recursive call
  if(count(after_min_date) == 0)
    return(rbind(before_min_date, group_orders(after_min_date)))

The function above takes a data frame of orders with the similar order id, groupes them by order date, and return the initial data frame + a g_order_number column which represents order number after grouping. Now we need to apply it to the initial data frame group-wise.

Applying a function by groups

Now the only thing left is to group orders by order id (I extract it to order_number_gen column using regexp) and apply the function above to each group

orders_grouped <- orders %>%
  filter(op_sum > 10) %>%
  mutate(order_number_gen = str_extract(order_number, '(?<=\\-)\\d+')) %>%
  group_by(order_number_gen) %>%
  group_map(~ group_orders(.x)) %>%
  ungroup() %>%
  group_by(customer_id, g_order_number) %>%
  summarise(order_date = min(date),
            order_amount = sum(op_sum))

I also filtered out orders that have amount less than 10 (those orders represent free gifts to our customers).

To apply a function group-wise, I used group_map, that was recently added to the dplyr library and makes the process above pretty straightforward.

 241   2019   R