Data analysis is not the first thing that comes to mind when you think of ChatGPT, a canonical example of LLM (Large Language Model) but it’s quite decent for that purpose.
During the past year, I turned to ChatGPT several times for help with analyzing data. In this post, I’ll give three examples of using ChatGPT as a “Data Analysis Assistant”.
Let’s get on with it.
1. Retail Format Price Comparison
As I wrote in Zepto Becomes The First Quick Commerce Unicorn, I use quick commerce platforms quite regularly. I’ve often found that prices on Swiggy Instamart, my go-to quick commerce app, are lower than their MRPs (Maximum Retail Price).
For the uninitiated, MRP is the the maximum price at which a product can be sold by retailers to consumers. According to Indian law, MRP must be marked on all packaged goods in FMCG, CPG and other industries. Contrary to how the term sounds at first blush, MRP is not price control. It’s set by the manufacturer and does not involve the government. The law only states that there must be an MRP, not how high or low it should be. Click here for more details.
Someone pointed out to me on X fka Twitter that quick commerce apps have higher MRP than local khirana stores (Indian version of mom-and-pop stores).
While it sounds a bit counterintuitive, it’s a fact that manufacturers can have different MRPs for the same product across different retail formats, so I didn’t dismiss their take outright. Instead, I decided to do a quick-and-dirty comparison of MRP and net prices of a few common items across quick commerce (QC) and khirana store (KS) formats. For good measure, I also added ecommerce (EC) into the mix.
For each SKU, I tabulated the MRP and Net Price in each retail format, as shown in the following exhibit.
I then used a formula to compute the maximum MRP and Net Price. The formula for both variables is identical (except for different cell addresses). Accordingly, I will elaborate only the Net Price formula in the rest of this post.
The basic formula is simple: =MAX(F11:H11).
This outputs the max net price. By using the MATCH and INDEX functions, I got the retail format with the highest Net Price.
So far so good.
What happens if there’s a tie for Net Price between two or all three retail formats?
By default, in the event of a tie, the Excel MAX function outputs the first of the cells that has the maximum value. I also learned that a Python program would also do the same thing in the event of a tie in an array of numbers.
This would mean that KS would get nailed as the most expensive retail format just because it was entered on the spreadsheet before QC and EC. I thought that was dumb.
I wanted a way to break the tie in a way that makes business sense e.g. If there’s a tie for max net price involving KS, report KS as the costliest retail format.
In my three decades of using Excel, I’ve never come across such a requirement. I wasn’t sure how to write a “tie-breaker formula”. I turned to ChatGPT.
It gave me the following formula:
=IF(AND(C11=D11,D11=E11),”SAME”, IF(AND(C11=D11,E11>C11),$E$9, IF(AND(C11=D11,E11<C11),$C$9, IF(AND(C11=E11,E11>D11),$E$9, IF(AND(C11=E11,E11<D11),$D$9, IF(AND(D11=E11,C11>E11),$C$9, IF(AND(D11=E11,C11<E11),$E$9,IF(C11<>D11<>E11,INDEX($C$9:$E$9,MATCH(MAX(C11:E11),C11:E11,0))))))))))
As you can see, it’s quite a mouthful – er, keyboardful. I don’t recall ever writing such a long formula in Excel.
I entered this formula for the first SKU and repeated it for the remaining four SKUs. This is what I got:
As you can see:
- MRP is the same for 3 out of 5 SKUs. QC has higher MRP for one SKU and EC, for another SKU.
- KS is the most expensive retail format for 4 out of 5 SKUs and EC, for one.
- QC is not the most expensive retail format for any of the 5 SKUs.
This confirmed my long-standing belief that Quick Commerce has the lowest prices of all retail formats.
2. Stock Portfolio Prediction Tracker
Every Monday, the ET WEALTH supplement of the Economic Times carries a prediction of stocks that will go up in the next 12 months. A recent prediction is given in the exhibit on the RHS.
I’ve been rating the performance of these weekly predictions and posting my findings on X fka Twitter and LinkedIn for several years. You can see a sample post below:
Prediction Check: BIOCON Stock Price @ 9 Oct 2024.
Analyst Prediction: ?370 (UP from ?265 one year ago)
Actual Price: ?348.— Ketharaman Swaminathan (@s_ketharaman) November 11, 2024
A couple of people who follow me on Twitter and / or LinkedIn reached out to me and asked me my overall take on the reliability of ET Wealth’s predictions. I tried to mine my prediction check posts in a bid to assess their reliability statistically. However, I found out quickly that I did not have a statistically significant sample size.
I reached out to an investment banker buddy. He told me that inadequate sample size is a perennial problem with these types of prediction checks and suggested a different methodology: Create a portfolio of the stocks recommended by ET Wealth and measure its performance.
I took his advice and started buying one tracking share of the top recommendation of ET Wealth every week since then. I then created an Excel model with the purchase price and the forecasted price after one year. To calculate the returns as on a particular date, I needed to enter the CMP of each stock in the portfolio as on that date. Over time, as the portfolio grew, it became cumbersome to look up the CMP for dozens of stocks.
Around 30 years ago, in my old company, I’d set up an ODBC link between an Excel speadsheet and the company’s mainframe system for fetching the Duty Paid Landed Cost (DPLC) of various items. (Click here and here for more details).
I was wondering if Excel had a similar feature to link a cell to a webpage. I learned that there is. It’s called “web scraping” and it uses HTTP access as the mechanism to go to a website and fetch data from a specific webpage on the website.
I’d forgotten the steps I’d used to set up the link and turned to ChatGPT. I got excellent instructions using which I was able to set up this link from my spreadsheet to the Google Finance page of that scrip. Unfortunately, it returned spurious characters along with the ticker price, and I had to manually clean it up every time. I then switched the link to Yahoo! Finance. That worked better although the cell in which the ticker price is displayed changes from one week to another and I need to change the formula manually. This is a well-known problem with scraping and has nothing to do with ChatGPT.
I’m managing with this for now but, ideally, I’d like to get an API access to End of Day stock prices, so that I can link my spreadsheet to the ticker and hopefully eliminate manual intervention. If any of you knows a suitable web service, please share in the comments below. Thanks in advance.
3. Pizza Coke Sales Correlation
For a customer engagement, I needed to evaluate the correlation, if any, between pizza and coke sales. The sales figures were tabulated in two columns in Excel. I graphed the two variables. On visual inspection, there appeared to be a strong correlation between pizza and coke sales. I wanted to quantify the relationship between the two variables and turned to ChatGPT for help:
In Excel, I have one line representing pizza sales over a week and I have another line representing coke sales over a week. Visually, they seem to be correlated. How do I find the equation between the two curves (or underlying functions)?
Within seconds, I got the formulas for SLOPE and INTERCEPT. I plugged them into my spreadsheet and, voila, I arrived at the following equation:
Coke Sales = 0.507*Pizza Sales – 2.440.
Mission accomplished!
On a side note, I think the realtime nature of ChatGPT’s answers is highly underrated. I’m the only one who seems to be talking about it.
In the above examples, I did the data analysis and used the help of ChatGPT in that pursuit.
Since the launch of 4o model in May 2024, ChatGPT has become powerful enough to do data analysis by itself. I’ll cover my experience with that in a follow-on post. Watch this space!