Skip to main content

Accessing Your Magento Store Data in R via REST API

Motivation

R

R is probably the most popular language for data science. It is a great tool widely used to manipulate, analyze and visualize data. Similar to Magento, it is open source and has a great community. In this article, I will write how quickly get data from Magento store to R for further analysis.

Magento REST API

There are multiple ways to pull Magento 2 store data from Magento BI to direct database connection and use of SQL. So, why use REST API? The benefits of that approach are direct connection; hence, fewer moving parts, complexity, and costs and more logical and easier to use syntax than mind-blowing SQL queries to notoriously complex Magento DB. It may not be an ideal route if you need to digest huge volumes of data, though, because API has its limits on how much data it can send in a reasonable time.

 

Preparation

Assume that you already have your R running with your favorite IDE like R studio on your machine. From the Magento side, you just need to create an admin user, grant this user access to the data you want to analyze (typically orders and products), and you are ready to go.

Write down the next details for use in R:

Download an R file with the simple wrapper functions I have prepared and uploaded to Github.

 

Getting Data

OK, it’s time to get your hands dirty and start writing some code. Open your R IDE and paste in the next code.

Authorization

First, we need to get access to the Magento 2 store.

# loading required libraries

library(jsonlite)

library(httr)

library(tidyverse)

# Magento admin user credentials in format below

urlbase <- "http://domain.com/"

username <- "username"

password <- "password"

# loading my function to work with Magento from R

source('Rmagento2API-functions.R')

# get auth token and store in a variable, we'll need it in each request later

auth <- getm2authtoken (urlbase, username, password)

Now we are ready to request data.

Universal Search

Universal search utilizes REST API search functionality. A function getm2objects that I wrote takes the list of search parameters, sends it to Magento via API and gets search results in response.

A very simple search of invoices after a certain date looks like this:

endpoint <- "rest/V1/invoices"

# query, list object, invoices after Jan 1st 2020

myquery <- list("searchCriteria[filter_groups][0][filters][0][field]"="created_at",

                "searchCriteria[filter_groups][0][filters][0][value]"="2020-01-01 00:00:00",

                "searchCriteria[filter_groups][0][filters][0][condition_type]"="gt")

invoices <- getm2objects(urlbase = urlbase, endpoint = endpoint, query = myquery, auth=auth)

The result is an invoices list; we may print the first object (invoice) using the next command:

print (invoices$items[1])

The result will include all order products, prices, taxes, the store where the order was placed, etc.

You may combine several search conditions and construct more complex queries like “all invoices over $200” where the order includes certain SKU for the period.

We can get not only invoices, but also products.

For example, the code below returns all products that have “bag” and “tote” in names

# Logical OR search. With % wildcard 

endpoint <- "rest/V1/products"

# using 0 in first and 1 in second search criteria group

myquery <- list("searchCriteria[filter_groups][0][filters][0][field]"="name",

                "searchCriteria[filter_groups][0][filters][0][value]"="%Bag%",

                "searchCriteria[filter_groups][0][filters][0][condition_type]"="like",

                "searchCriteria[filter_groups][0][filters][1][field]"="name",

                "searchCriteria[filter_groups][0][filters][1][value]"="%Tote%",

                "searchCriteria[filter_groups][0][filters][1][condition_type]"="like")

products <- getm2objects(urlbase = urlbase, endpoint = endpoint, query = myquery, auth=auth)

The screenshot below shows the resulting list that contains product data.

Filtered Search

As you can see, the data we get is quite extensive; it can be dozens of KB per record. It is not a big problem for a few products, but with thousands of results, you may exhaust both your bandwidth and local RAM. This is where filtered search comes handy. It can limit the set of data you get in response and only include the attributes you need.

The function getm2productdata allows you to get filtered data about the products. In the example below, we only get a specific product SKU, type, price and name.

# NB - no spaces in fields!

product <- getm2productdata (url=urlbase, sku="24-MB01", fields="sku,price,name,type_id", auth=auth)

 In response, we get a much simpler list than in the previous example.

Complex Case: Not Ordering Customers

Let’s look into a more complex case: extract names and emails of all customers who placed no orders in a given time period. That data may be needed to target them with a specific promotion.

start_date <- "2020-05-01 00:00:00"

end_date <- "2020-05-31 23:59:59"

# step 1 - get all orders in the required period of time

endpoint <- "rest/V1/orders"

myquery <- list("searchCriteria[filter_groups][0][filters][0][field]"="created_at",

                "searchCriteria[filter_groups][0][filters][0][value]"=start_date,

                "searchCriteria[filter_groups][0][filters][0][condition_type]"="gt",

                "searchCriteria[filter_groups][1][filters][0][field]"="created_at",

                "searchCriteria[filter_groups][1][filters][0][value]"=end_date,

                "searchCriteria[filter_groups][1][filters][0][condition_type]"="lt")

orders <- getm2objects(urlbase = urlbase, endpoint = endpoint, query = myquery, auth=auth)

# step 2 

# get emails of the customers who made that orders

# note - I've tried to use internal customer IDs instead, but for some reasons it didn't work in search call used later

# initialize empty factor and add there ids of customers who placed orders

customers_ordered <- character()

for (i in 1:length(orders[["items"]])){

        customers_ordered[i] <- orders[["items"]][[i]][["customer_email"]]

}

# inspect values

customers_ordered

# get rid of duplicates and combine in one string

customers_ordered <- unique(customers_ordered)

# step 3. Get customers who are not in this list

# form to query

customers_ordered_req <- paste(customers_ordered, collapse = ',')

# get all who not in the list of customers ordered

# using customer search endpoint

endpoint <- "rest/V1/customers/search/"

myquery <- list("searchCriteria[filter_groups][0][filters][0][field]"="email",

                "searchCriteria[filter_groups][0][filters][0][value]"=customers_ordered_req,

                "searchCriteria[filter_groups][0][filters][0][condition_type]"="nin")

cust_no <- getm2objects(urlbase = urlbase, endpoint = endpoint, query = myquery, auth=auth)

# Step 4. Save results for further use

# extract emails, first and last names to dataframe and cave to csv

emails_list <- data.frame()[1:length(cust_no[["items"]]),]

for (i in 1:length(cust_no[["items"]])) {

        emails_list$email[i] <- cust_no[["items"]][[i]][["email"]]

        emails_list$fname[i] <-  cust_no[["items"]][[i]][["firstname"]]

        emails_list$lname[i] <- cust_no[["items"]][[i]][["lastname"]]

}

# write as CSV file to use in your email software

row.names(emails_list) <-NULL

write.csv(emails_list, file="email-list.csv", row.names = FALSE)

As a result, we have a CSV file with customer names and emails, which we may use in our marketing campaign.

Wrap Up and Additional Resources

As you have seen, it is possible and not too complex to get your Magento 2 store data to R via API. I have just scratched the surface of what is possible. The conditions used may be more complex and the data from Magento 2 can be combined with other data sources like Google Analytics or Twitter API for a more elaborate analysis. You may find more examples of use in my personal blog. The full code used above and wrapper with functions are available from Github repository.