Diving into the e-com business through data.

The Olist company

M. Hummelt
4 min readNov 29, 2020

The e-commerce business is succesfully growing over the past years and 2020s Corona Crisis is boosting it even further. Logistics in e-commerce mainly concerns fulfillment. Online markets and retailers have to find the best possible way to fill orders and deliver products.

But what can we learn about the e-com business without having deeper knowledge? We will take a look an extensive dataset that the brazilian e-com business named Olist has provided us with via kaggle.

Data gives you the power to get insights into a variety of topics, finding patterns and correlations you would not have identified at first glance. In this post we are going to answer the following 3 questions:

  • What does the order profile of Olist look like?
  • Does Olist deliver on time (or even earlier than the expected delivery date)?
  • What correlation can we find between item price and payment method?
Olist Dataset

The Data Analysis of the .csv datasets is done in a Jupyter Notebook and all steps can be reviewed on Github.

This blog post describes the first steps of diving into the extensive Olist dataset. Understanding the datasets in general, the variables and giving an introduction in the Olist e-com business. From there, we can dive deeper into more complex analyses.

Questions #1: What does the order profile of Olist look like?

We can see that the biggest part by far are Single Item Orders (SIO) which is quite typicall for the e-com business. Multi Item Orders are rare but exist up to 21 items per order at max. The avarage item is about 74,99 Breazilian Real with the most expensive item of 6735 Real.

Question #2: Does Olist deliver on time (or even earlier than the expected delivery date)?

days erlier than the expected delivery time

Analyzing the data fields of purchase_date, estimated delivery date and the delivery date shows that Olist delivers the majority of the orders before their estimated delivery date. In general the mean delivery duration is xx days.

delivery duration

Question #3: What correlations can we find between item price and payment method?

Spoiler at the beginning: Not too many or statistically significant ones. The final table contains a lot of data per order id.

A correlation matrix is a fast and easy table showing correlation coefficients between variables. A number close to 1 indicates high correlation while a value close to -1 a very low.

Some correlations in the heatmap are intuitiv and logic: dimensions are linked to the amount of items (“order_item_id”), therefore price and the freight value. The order price obviously correlates very much with the overall payment price (including shipping, fees etc.). Payment installment seems to be somehow related to the payment method via voucher.

Conclusion:

  1. Olist has a typical order profile for an e-com business with >90% SIOs.
  2. The mean delivery time is 12,49 days after the purchase.
  3. We can see some correlations which are quite intuitive to understand and others less. The payment via credit card makes >73% of the orders and >50% of the orders are payed at once, without any payment installments.
payment type

Outlook:

  • The dataset of Olist is quite extensive and leaves us with a lot of topics we haven’t investigated yet as this was kind of a first introduction to Olist and their e-com business. Further analysis and the results will follow…

--

--