Accessing Your Magento Store Data in R via REST API
Motivation
R
R is probably the most popular language for data science. It is a great tool widely used to manipulate, analyze and visualize data. Similar to Magento, it is open source and has a great community. In this article, I will write how quickly get data from Magento store to R for further analysis.
Magento REST API
There are multiple ways to pull Magento 2 store data from Magento BI to direct database connection and use of SQL. So, why use REST API? The benefits of that approach are direct connection; hence, fewer moving parts, complexity, and costs and more logical and easier to use syntax than mind-blowing SQL queries to notoriously complex Magento DB. It may not be an ideal route if you need to digest huge volumes of data, though, because API has its limits on how much data it can send in a reasonable time.
Preparation
Assume that you already have your R running with your favorite IDE like R studio on your machine. From the Magento side, you just need to create an admin user, grant this user access to the data you want to analyze (typically orders and products), and you are ready to go.
Write down the next details for use in R:
- Your fully qualified store URL, e.g. https://www.domain.com/ (trailing slash is important)
- Username
- Password
Download an R file with the simple wrapper functions I have prepared and uploaded to Github.
Getting Data
OK, it’s time to get your hands dirty and start writing some code. Open your R IDE and paste in the next code.
Authorization
First, we need to get access to the Magento 2 store.
# loading required libraries
library(jsonlite)
library(httr)
library(tidyverse)
# Magento admin user credentials in format below
urlbase <- "http://domain.com/"
username <- "username"
password <- "password"
# loading my function to work with Magento from R
source('Rmagento2API-functions.R')
# get auth token and store in a variable, we'll need it in each request later
auth <- getm2authtoken (urlbase, username, password)
Now we are ready to request data.
Universal Search
Universal search utilizes REST API search functionality. A function getm2objects that I wrote takes the list of search parameters, sends it to Magento via API and gets search results in response.
A very simple search of invoices after a certain date looks like this:
endpoint <- "rest/V1/invoices"
# query, list object, invoices after Jan 1st 2020
myquery <- list("searchCriteria[filter_groups][0][filters][0][field]"="created_at",
"searchCriteria[filter_groups][0][filters][0][value]"="2020-01-01 00:00:00",
"searchCriteria[filter_groups][0][filters][0][condition_type]"="gt")
invoices <- getm2objects(urlbase = urlbase, endpoint = endpoint, query = myquery, auth=auth)
The result is an invoices list; we may print the first object (invoice) using the next command:
print (invoices$items[1])
The result will include all order products, prices, taxes, the store where the order was placed, etc.
You may combine several search conditions and construct more complex queries like “all invoices over $200” where the order includes certain SKU for the period.
We can get not only invoices, but also products.
For example, the code below returns all products that have “bag” and “tote” in names
# Logical OR search. With % wildcard
endpoint <- "rest/V1/products"
# using 0 in first and 1 in second search criteria group
myquery <- list("searchCriteria[filter_groups][0][filters][0][field]"="name",
"searchCriteria[filter_groups][0][filters][0][value]"="%Bag%",
"searchCriteria[filter_groups][0][filters][0][condition_type]"="like",
"searchCriteria[filter_groups][0][filters][1][field]"="name",
"searchCriteria[filter_groups][0][filters][1][value]"="%Tote%",
"searchCriteria[filter_groups][0][filters][1][condition_type]"="like")
products <- getm2objects(urlbase = urlbase, endpoint = endpoint, query = myquery, auth=auth)
The screenshot below shows the resulting list that contains product data.
Filtered Search
As you can see, the data we get is quite extensive; it can be dozens of KB per record. It is not a big problem for a few products, but with thousands of results, you may exhaust both your bandwidth and local RAM. This is where filtered search comes handy. It can limit the set of data you get in response and only include the attributes you need.
The function getm2productdata allows you to get filtered data about the products. In the example below, we only get a specific product SKU, type, price and name.
# NB - no spaces in fields!
product <- getm2productdata (url=urlbase, sku="24-MB01", fields="sku,price,name,type_id", auth=auth)
In response, we get a much simpler list than in the previous example.
Complex Case: Not Ordering Customers
Let’s look into a more complex case: extract names and emails of all customers who placed no orders in a given time period. That data may be needed to target them with a specific promotion.
start_date <- "2020-05-01 00:00:00"
end_date <- "2020-05-31 23:59:59"
# step 1 - get all orders in the required period of time
endpoint <- "rest/V1/orders"
myquery <- list("searchCriteria[filter_groups][0][filters][0][field]"="created_at",
"searchCriteria[filter_groups][0][filters][0][value]"=start_date,
"searchCriteria[filter_groups][0][filters][0][condition_type]"="gt",
"searchCriteria[filter_groups][1][filters][0][field]"="created_at",
"searchCriteria[filter_groups][1][filters][0][value]"=end_date,
"searchCriteria[filter_groups][1][filters][0][condition_type]"="lt")
orders <- getm2objects(urlbase = urlbase, endpoint = endpoint, query = myquery, auth=auth)
# step 2
# get emails of the customers who made that orders
# note - I've tried to use internal customer IDs instead, but for some reasons it didn't work in search call used later
# initialize empty factor and add there ids of customers who placed orders
customers_ordered <- character()
for (i in 1:length(orders[["items"]])){
customers_ordered[i] <- orders[["items"]][[i]][["customer_email"]]
}
# inspect values
customers_ordered
# get rid of duplicates and combine in one string
customers_ordered <- unique(customers_ordered)
# step 3. Get customers who are not in this list
# form to query
customers_ordered_req <- paste(customers_ordered, collapse = ',')
# get all who not in the list of customers ordered
# using customer search endpoint
endpoint <- "rest/V1/customers/search/"
myquery <- list("searchCriteria[filter_groups][0][filters][0][field]"="email",
"searchCriteria[filter_groups][0][filters][0][value]"=customers_ordered_req,
"searchCriteria[filter_groups][0][filters][0][condition_type]"="nin")
cust_no <- getm2objects(urlbase = urlbase, endpoint = endpoint, query = myquery, auth=auth)
# Step 4. Save results for further use
# extract emails, first and last names to dataframe and cave to csv
emails_list <- data.frame()[1:length(cust_no[["items"]]),]
for (i in 1:length(cust_no[["items"]])) {
emails_list$email[i] <- cust_no[["items"]][[i]][["email"]]
emails_list$fname[i] <- cust_no[["items"]][[i]][["firstname"]]
emails_list$lname[i] <- cust_no[["items"]][[i]][["lastname"]]
}
# write as CSV file to use in your email software
row.names(emails_list) <-NULL
write.csv(emails_list, file="email-list.csv", row.names = FALSE)
As a result, we have a CSV file with customer names and emails, which we may use in our marketing campaign.
Wrap Up and Additional Resources
As you have seen, it is possible and not too complex to get your Magento 2 store data to R via API. I have just scratched the surface of what is possible. The conditions used may be more complex and the data from Magento 2 can be combined with other data sources like Google Analytics or Twitter API for a more elaborate analysis. You may find more examples of use in my personal blog. The full code used above and wrapper with functions are available from Github repository.