Data analysis is not the first thing that comes to mind when you think of a language model – large or small – but ChatGPT happens to be quite good at it.
During the past year, I turned to ChatGPT several times for help with analyzing data. In this post, I’ll give three examples of using ChatGPT as a “Data Analysis Assistant”.
Let’s get on with it.
1. Retail Format Price Comparison
As I wrote in Zepto Becomes The First Quick Commerce Unicorn, I use quick commerce platforms quite regularly. I’ve often found that prices on Swiggy Instamart, my go-to quick commerce app, are lower than their MRPs (Maximum Retail Price).
For the uninitiated, MRP is the the maximum price at which a product can be sold by retailers to consumers. According to Indian law, MRP must be marked on all packaged goods in FMCG, CPG and other industries. Contrary to how the term sounds at first blush, MRP is not price control. It’s set by the manufacturer, not by the government. The law only states that there must be an MRP, not what the figure should be. Click here for more details.
Someone pointed out to me on X fka Twitter that quick commerce apps have higher MRP than local khirana stores (Indian version of mom-and-pop stores).
While it sounds a bit counterintuitive, I’ve heard that manufacturers can print different MRPs for the same product across different retail formats, so I didn’t dismiss the tweeple’s take outright. I decided to do a quick-and-dirty comparison of MRP and net prices of a few common items across quick commerce (QC) and khirana store (KS) formats. For good measure, I also added ecommerce (EC) into the mix.
For each SKU, I tabulated the MRP and Net Price in the three retail formats, as shown in the following exhibit.
I then used a formula to compute the maximum MRP and Net Price. The formula for both variables is identical (except for having different cell addresses). Accordingly, I will elaborate only the Net Price formula in the rest of this post.
The basic formula is simple: =MAX(F11:H11).
This outputs the max net price. By using the MATCH and INDEX functions, I got the retail format with the highest Net Price: INDEX($F$9:$H$9,MATCH(MAX(F11:H11),F11:H11,0))
So far so good.
But what happens if two or more retail formats have the same Net Price? A tie is more likely in MRP but, as we’ll see shortly, it’s not uncommon in Net Price.
I found out that the Excel MAX function handles tie-breakers by printing the first of the cells that has the maximum value. (Same as Python).
This would mean that KS would get nailed as the most expensive retail format just because it was entered on the spreadsheet before QC and EC. I thought that was arbitrary.
I wanted a tie-breaker logic that made business sense e.g. If there’s a tie for max net price involving KS, report KS as the costliest retail format.
In my three decades of using Excel, I’ve never come across such a condition. I didn’t know how to write a “tie-breaker formula”. I turned to ChatGPT.
It gave me the following formula:
=IF(AND(C11=D11,D11=E11),”SAME”, IF(AND(C11=D11,E11>C11),$E$9, IF(AND(C11=D11,E11<C11),$C$9, IF(AND(C11=E11,E11>D11),$E$9, IF(AND(C11=E11,E11<D11),$D$9, IF(AND(D11=E11,C11>E11),$C$9, IF(AND(D11=E11,C11<E11),$E$9,IF(C11<>D11<>E11,INDEX($C$9:$E$9,MATCH(MAX(C11:E11),C11:E11,0))))))))))
As you can see, it’s quite a mouthful – er, keyboardful. I don’t recall ever writing such a long formula in Excel.
I entered this formula for the first SKU and repeated it for the remaining four SKUs. This is what I got:
As you can see:
- MRP is the same for 3 out of 5 SKUs. QC has higher MRP for one SKU and EC, for another SKU.
- KS is the most expensive retail format for 4 out of 5 SKUs and EC, for one.
- QC is not the most expensive retail format for any of the 5 SKUs.
This confirmed my long-standing belief that Quick Commerce has the lowest prices of all retail formats.
2. Stock Portfolio Prediction Tracker
Every Monday, the ET WEALTH supplement of the Economic Times carries a prediction of stocks that will go up in the next 12 months. A recent prediction is given in the exhibit on the RHS.
I’ve been rating the performance of these weekly predictions and posting my findings on X fka Twitter and LinkedIn for several years. You can see a sample post below:
Prediction Check: BIOCON Stock Price @ 9 Oct 2024.
Analyst Prediction: ?370 (UP from ?265 one year ago)
Actual Price: ?348.— Ketharaman Swaminathan (@s_ketharaman) November 11, 2024
A couple of people who follow me on Twitter and / or LinkedIn reached out to me and inquired about my take on the overall reliability of ET Wealth’s predictions. I tried to data mine my prediction check posts in a bid to find that out statistically. However, I found out quickly that I did not have a statistically significant sample size.
I reached out to an investment banker buddy. He told me that inadequate sample size is a perennial problem with these types of prediction trackers and suggested a different methodology: Create a portfolio of the stocks recommended by ET Wealth and track its performance.
I took his advice and started buying one share of ET Wealth’s leading recommendation every week since then. I then created an Excel model with the purchase price and the forecasted price after one year. To calculate the returns as on a particular date, I needed to enter the CMP of each stock in the portfolio as on that date. Over time, as the portfolio grew, it became cumbersome to look up so many CMPs and enter them manually in the spreadsheet.
I recalled that, in one old company in the mid-90s, I’d set up an ODBC link between a cell in Excel and the company’s mainframe system to fetch the Duty Paid Landed Cost (DPLC) of various components in a PC. (Click here and here for more details).
I was wondering if Excel had a similar feature to link a cell to a webpage. I learned that there is. It’s called “web scraping” and it uses HTTP access as the mechanism to go to a website and fetch data from a specific webpage on the website.
I’d forgotten the steps I’d used to set up the link 30 years ago and turned to ChatGPT for instructions. I got an excellent step-by-step guide using which I was able to set up this link from my spreadsheet to the Google Finance page of each scrip in the portfolio. It worked!
Unfortunately, it returned spurious characters along with the ticker price, and I had to manually clean it up every time. I then pointed the link to Yahoo! Finance. That worked better but the location of the ticker price kept changing from one week to another and I needed to change the formula everytime I wished to the compute the value of the portfolio. This is a well-known problem with scraping technology in general and has nothing to do with ChatGPT.
I’m managing with this for now but, ideally, I’d like to get an API access to stock prices, so that I can link my spreadsheet to the ticker and eliminate manual intervention. If any of you knows a suitable web service, please share in the comments below. Thanks in advance.
3. Pizza Coke Sales Correlation
For a customer engagement, I needed to evaluate the correlation, if any, between pizza and coke sales. The sales figures were tabulated in two columns in Excel. I graphed the two variables. On visual inspection, there appeared to be a strong correlation between pizza and coke sales. I wanted to quantify the relationship between the two variables and turned to ChatGPT for help:
In Excel, I have one line representing pizza sales over a week and I have another line representing coke sales over a week. Visually, they seem to be correlated. How do I find the equation between the two curves (or underlying functions)?
Within seconds, I got the formulas for SLOPE and INTERCEPT. I plugged them into my spreadsheet and, voila, I arrived at the following equation:
Coke Sales = 0.507*Pizza Sales – 2.440.
Mission accomplished!
On a side note, I think the realtime nature of ChatGPT’s answers is highly underrated. I’m the only one who seems to be talking about it.
In the above examples, ChatGPT helped me to conduct the data analyses. So I was the pilot and ChatGPT acted as the copilot – Data Analysis Assistant – of sorts.
Since the launch of 4o model in May 2024, ChatGPT can do data analysis by itself. I’ll cover my experience with ChatGPT as pilot and me as passenger in a follow-on post. Watch this space!