{"id":8088,"date":"2026-03-11T10:30:17","date_gmt":"2026-03-11T05:00:17","guid":{"rendered":"https:\/\/sketharaman.com\/blog\/?p=8088"},"modified":"2026-03-05T23:49:48","modified_gmt":"2026-03-05T18:19:48","slug":"actionable-insights-from-mybar-chatgpt-data-analysis","status":"publish","type":"post","link":"https:\/\/sketharaman.com\/blog\/2026\/03\/11\/actionable-insights-from-mybar-chatgpt-data-analysis\/","title":{"rendered":"Actionable Insights From myBAR ChatGPT Data Analysis"},"content":{"rendered":"<p>In <a href=\"https:\/\/gtm360.com\/blog\/?p=13961&amp;preview=true\" target=\"_blank\" rel=\"noopener\"><strong>myBAR Data Analysis By ChatGPT<\/strong><\/a>, we saw various insights from ChatGPT based on crunching the data in <strong>myBAR dataset<\/strong> (For those who haven&#8217;t read that post, myBAR dataset contains ratings and reviews given by me to 600-odd books I&#8217;ve read in the last 15-odd years).<\/p>\n<p>This analysis happened on data in a spreadsheet. Some vendors call CSV \/ XLS <em>unstructured data<\/em> whereas others call it <em>semi-structured data<\/em>. Either way, this is a canonical example of the use of ChatGPT to conduct data analysis on <em>non-structured<\/em> data. (Click <a href=\"https:\/\/gtm360.com\/blog\/2025\/07\/22\/about-upi-ban-by-small-merchants-part-1\/#:~:text=I%20decided%20to%20consult%20ChatGPT.%20I%20uploaded%20the%20following%20sketch%20and%20asked%20ChatGPT\" target=\"_blank\" rel=\"noopener\"><strong>here<\/strong><\/a> to see an example where ChatGPT did data analysis on handwritten data, which everybody would agree is unstructured data.)<\/p>\n<p>I was surprised that ChatGPT missed an obvious insight into my reading behavior, namely, that I&#8217;ve an overwhelming preference for fiction over non-fiction books. But, apart from that solitary error of omission, its insights were very comprehensive, spanning two categories:<\/p>\n<ol>\n<li><em><strong>Basic insights<\/strong><\/em>: Based on data in myBAR dataset.<\/li>\n<li><strong><em>Advanced insights<\/em><\/strong>: Based on additional data fetched by ChatGPT that was not in myBAR dataset.<\/li>\n<\/ol>\n<p>In <a href=\"https:\/\/gtm360.com\/blog\/2024\/04\/24\/ten-revolutionary-things-about-ai-part-1\/#:~:text=5.-,Makes%20GIGO%20Obsolete,-OpenAI%20\/%20GPT%20is\" target=\"_blank\" rel=\"noopener\">ChatGPT Makes GIGO Obsolete<\/a>, I&#8217;d posited that ChatGPT sources missing \/ erroneous data from third party data brokers. Advanced insights confirms that.<\/p>\n<p>Some of these insights are FYI whereas others are actionable.<\/p>\n<p>In this blog post, I&#8217;ll cover the actionable insights.<\/p>\n<p><a href=\"https:\/\/gtm360.com\/blog\/wp-content\/uploads\/2026\/01\/mybar-actionable-insight-fi.jpg\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-14130 size-full\" src=\"https:\/\/gtm360.com\/blog\/wp-content\/uploads\/2026\/01\/mybar-actionable-insight-fi.jpg\" alt=\"\" width=\"630\" height=\"292\" \/><\/a><\/p>\n<p>Let&#8217;s get on with it.<\/p>\n<p><strong>1. Housekeeping<\/strong><\/p>\n<p>While verifying the <em>overall summary<\/em> (insight #3), I discovered inconsistencies in the rating descriptors across different rows e.g. Some 3* ratings were labeled <em>&#8220;Good: Definitely worth reading&#8221;<\/em> whereas others said <em>&#8220;Good: Worth reading&#8221;<\/em>. Ergo the following:<\/p>\n<blockquote><p><em>Actionable Insight<\/em>: Standardize the description of ratings.<\/p><\/blockquote>\n<p>I did this manually by replacing the wrong descriptions in the source Google Sheet with the right ones. In hindsight, I realize that I could&#8217;ve written a prompt to have ChatGPT do the heavy lifting and saved myself a half hour or so of manual work.<\/p>\n<p><strong>2. Data Cleansing<\/strong><\/p>\n<p><a href=\"https:\/\/gtm360.com\/blog\/wp-content\/uploads\/2025\/12\/mybar-chatgpt-genre-duplicates.jpg\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright wp-image-14062\" src=\"https:\/\/gtm360.com\/blog\/wp-content\/uploads\/2025\/12\/mybar-chatgpt-genre-duplicates.jpg\" alt=\"\" width=\"250\" height=\"88\" \/><\/a>In the <em>overall summary<\/em> (insight #3), ChatGPT reported a mismatch between the counts of <em>total books<\/em> (583) and <em>unique books<\/em> (571).<\/p>\n<p>This suggested that there were 12 duplicate entries (583-571). Ergo the following:<\/p>\n<blockquote><p><em>Actionable Insight<\/em>: Remove duplicates.<\/p><\/blockquote>\n<p>To help me in this pursuit, I asked ChatGPT to list the duplicates and point out the rows in which they&#8217;d occurred. I reviewed its response and found that:<\/p>\n<ol>\n<li>All of its duplicate calls were right.<\/li>\n<li>Some row calls were also right e.g. <em>&#8220;463 &amp; 464, FROZEN PAST, Richard Hale&#8221;<\/em>.<\/li>\n<li>Some row calls were offset by a few rows e.g. ChatGPT called <em>&#8220;EXECUTIVE POWER, Vince Flynn&#8221;<\/em> on Row 33 and 38 whereas the duplication was actually in Row 35 and 40.<\/li>\n<\/ol>\n<p>I de-duped these books in the original Google Sheets manually.<\/p>\n<p>I then realized that some of the duplicates were not really duplicates e.g:<\/p>\n<ul>\n<li><a href=\"https:\/\/gtm360.com\/blog\/wp-content\/uploads\/2025\/12\/GOODREADS-DIFFERENT-BOOKS-SAME-NAME-THE-INNOCENT.jpg\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright wp-image-14067\" src=\"https:\/\/gtm360.com\/blog\/wp-content\/uploads\/2025\/12\/GOODREADS-DIFFERENT-BOOKS-SAME-NAME-THE-INNOCENT.jpg\" alt=\"\" width=\"136\" height=\"300\" \/><\/a>The same book&#8217;s rating and review were duplicated on two different rows.<\/li>\n<li>I happened to repeat a few books after a gap of few years and reviewed them again, forgetting that I&#8217;d already reviewed them before e.g. <em>PERFIDIA, James Ellroy<\/em>; <em>SHEBA, Jack Higgins<\/em>.<\/li>\n<li>The same book was double-counted because its title was referenced in the review of another book e.g. <em>THE ASSOCIATE<\/em> was mentioned in the review of <em>THE EXCHANGE<\/em>, another book by John Grisham. I&#8217;m guessing this can be fixed easily by adding a remark in the prompt telling ChatGPT to treat only the entries in the &#8220;<em>Name of Book<\/em>&#8221; column as book titles.<\/li>\n<li>Two different books have the same name! <em>&#8220;Row 286, THE INNOCENT by Harlan Coben; Row 469, THE INNOCENT by Sean Black&#8221;<\/em>. I&#8217;ve noticed many duplicate books in <em>Goodreads<\/em> and <em>BookBub<\/em>. According to Goodreads search results, there are not two or three but <em>7701<\/em> books called THE INNOCENT! (Does this indicate that book titles are not copyrightable?)<\/li>\n<\/ul>\n<p>I left these entries intact.<\/p>\n<p><strong>3. Self Improvement<\/strong><\/p>\n<p>As a follow-on to <em>overall summary<\/em> (insight #3), I asked ChatGPT to plot the distribution of my ratings. It gave me the following chart.<\/p>\n<p><a href=\"https:\/\/gtm360.com\/blog\/wp-content\/uploads\/2025\/12\/mybar-rating-distribution-with-label.jpg\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-14084 size-full\" src=\"https:\/\/gtm360.com\/blog\/wp-content\/uploads\/2025\/12\/mybar-rating-distribution-with-label.jpg\" alt=\"\" width=\"500\" height=\"398\" \/><\/a><\/p>\n<p>At first blush, this looked like a normal distribution aka bell curve. But ChatGPT disagreed.<br \/>\n<!--BLOCKQUOTE, NO LHS LINE, GRAY--><\/p>\n<blockquote class=\"skr-bq-noline\">\n<div style=\"background-color: #fafafa; padding: 15px 30px;\">\n<p data-start=\"351\" data-end=\"400\">No &#8211; a normal (bell-curve) distribution would be:<\/p>\n<ul data-start=\"401\" data-end=\"485\">\n<li data-start=\"401\" data-end=\"443\">\n<p data-start=\"403\" data-end=\"443\">roughly symmetric around the mean, and<\/p>\n<\/li>\n<li data-start=\"444\" data-end=\"485\">\n<p data-start=\"446\" data-end=\"485\">have similar frequencies on both sides.<\/p>\n<\/li>\n<\/ul>\n<p data-start=\"487\" data-end=\"508\">Your distribution is:<\/p>\n<ul data-start=\"509\" data-end=\"589\">\n<li data-start=\"509\" data-end=\"541\">\n<p data-start=\"511\" data-end=\"541\"><strong data-start=\"511\" data-end=\"541\">strongly peaked at 3 stars<\/strong><\/p>\n<\/li>\n<li data-start=\"542\" data-end=\"589\">\n<p data-start=\"544\" data-end=\"589\"><strong data-start=\"544\" data-end=\"554\">skewed<\/strong>, with sharp drops toward both ends<\/p>\n<\/li>\n<\/ul>\n<p data-start=\"591\" data-end=\"615\">So it is <strong data-start=\"600\" data-end=\"614\">not normal<\/strong>.<\/p>\n<\/div>\n<\/blockquote>\n<p>I then asked ChatGPT if it my ratings follow a power law curve.<\/p>\n<p>It first plotted the following log-log chart:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-14085\" src=\"https:\/\/gtm360.com\/blog\/wp-content\/uploads\/2025\/12\/mybar-rating-distribution-log-scale.jpg\" alt=\"\" width=\"500\" height=\"403\" \/><\/p>\n<p>And then said no.<br \/>\n<!--BLOCKQUOTE, NO LHS LINE, GRAY--><\/p>\n<blockquote class=\"skr-bq-noline\">\n<div style=\"background-color: #fafafa; padding: 15px 30px;\">\n<p>Also no. A power-law distribution would:<\/p>\n<ul>\n<li>show a straight-ish line on the log\u2013log plot<\/li>\n<\/ul>\n<p>But your log\u2013log chart bends sharply \u2014 the middle category (3?) dominates instead of a smooth decay. That means it does not follow a power-law either.<\/p>\n<\/div>\n<\/blockquote>\n<p>It felt &#8220;this looks like a <strong>truncated \/ central-bias distribution<\/strong> whereby readers mostly rate books as \u201cokay\u201d or \u201caverage&#8221; and give far fewer &#8220;very bad&#8221; or &#8220;very good&#8221; extreme ratings&#8221;.\u00a0Apparently, this pattern is common in subjective rating systems where people avoid extremes.<\/p>\n<p>But it was news to me. I don&#8217;t recall coming across such a distribution in the <em>Quant Marketing<\/em> course during MBA or subsequently in my working career. Ergo the following<\/p>\n<blockquote><p><em>Actionable Insight<\/em>: Learn about truncated \/ central-bias distribution.<\/p><\/blockquote>\n<p>ChatGPT then offered to\u00a0<strong>run a goodness-of-fit test<\/strong> (e.g., normality test), or compare this with <strong>Amazon \/ Goodreads typical rating curves<\/strong> to interpret behavior further. Neither Amazon nor Goodreads existed when I got my MBA degree, so I&#8217;m sure I hadn&#8217;t studied these tests and curves. Anybody who plans on acting on this insight would need to get a good handle on them, which brings us to the next<\/p>\n<blockquote><p><em>Actionable Insight<\/em>: Learn goodness-of-fit test and comparison with Amazon \/ Goodreads rating curves.<\/p><\/blockquote>\n<p>Coincidentally, as I was writing this post, I was tipped off to a 10-point rating scale used by a leading auto manufacturer in which 1 to 8 is poor, 9 is average and 10 is excellent.<\/p>\n<blockquote class=\"twitter-tweet\">\n<p dir=\"ltr\" lang=\"en\">Which &#8220;genius&#8221; in Indian car industry invented the absurd scale for post-service survey: What is your rating on a scale of 1 to 10, where 1 to 8 is poor, 9 is average, and 10 is excellent?!?<\/p>\n<p>Would love views of mktg experts such as <a href=\"https:\/\/twitter.com\/GTM360?ref_src=twsrc%5Etfw\">@GTM360<\/a> and <a href=\"https:\/\/twitter.com\/beastoftraal?ref_src=twsrc%5Etfw\">@beastoftraal<\/a><\/p>\n<p>\u2014 R_Ganesh (@r_ganesh) <a href=\"https:\/\/twitter.com\/r_ganesh\/status\/2004508035675508805?ref_src=twsrc%5Etfw\">December 26, 2025<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p>At first blush, this scale does seem absurd. However, I now wonder if it&#8217;s the result of some modern market research technique meant to eliminate the aforementioned truncated \/ central-bias distribution problem.<\/p>\n<p>This also reminds me that I felt quite weird when I first heard that the highest <a href=\"https:\/\/gtm360.com\/blog\/2016\/10\/28\/why-are-reviews-so-powerful-in-social-media-customer-service\/#:~:text=I%20belong%20to%20the%20second%20category%20and%20I%E2%80%99ve%20been%20giving%203%2D4*%20rating%20to%20most%20of%20my%20Uber%20rides.%20I%20recently%20heard%20on%20the%20grapevine%20that%20Uber%20drivers%20want%20raters%20to%20belong%20to%20the%20first%20category!\" target=\"_blank\" rel=\"noopener\">5* rating on Uber<\/a> did not signify superlative experience but merely that &#8220;nothing went wrong&#8221; during the ride.<\/p>\n<p><strong>4. Entertainment<\/strong><\/p>\n<p>I got a few surprises in the list of <em>top authors<\/em> (insight #4): Phillip Margolin, Val McDermid, Jed Rubenfeld, Tom Harper.<\/p>\n<p>As I said before, my Top 3 authors are Joseph Heller, John Irving, and James A. Michener. If you ask me who my Next 7 favorite authors are, none of these four names would figure on that list.<\/p>\n<p>At first blush, I thought ChatGPT was hallucinating. But, then I realized that I&#8217;ve indeed read a book or two by these four authors and given them 4* ratings.<!----------START----------><!--SIDEBAR RIGHT, GRAY--><\/p>\n<blockquote class=\"skr-bq-noline-right\">\n<div style=\"background-color: #fafafa; padding: 15px 30px;\">When Joseph Heller was asked why he hadn&#8217;t written another book like Catch-22, he quipped, &#8220;Neither has anybody else&#8221;.<\/div>\n<\/blockquote>\n<p>Nonetheless I wasn&#8217;t able to get my head around these four names and I pushed back by arguing that ChatGPT shouldn&#8217;t be selecting top authors based on the ratings of merely two books each. Besides, I also felt that readers pick favorite authors on the basis of their <em>best book<\/em> written by the said author, not <em>average book<\/em>.<\/p>\n<p>While my skepticism might still be justified, it also struck me that I&#8217;m perhaps behaving like the typical human who overrides data with their biases, which defeats the basic purpose of data analysis. Since I&#8217;ve taken the trouble to analyze the data, I thought I must follow through with the results. Ergo the following:<\/p>\n<blockquote><p><em>Actionable Insight:<\/em> Check out a few more books by Phillip Margolin, Val McDermid, and Jed Rubenfeld, Tom Harper.<\/p><\/blockquote>\n<p><strong>5. Marketing<\/strong><\/p>\n<p>Now that ChatGPT has announced that it would enter advertising, it can translate its insights to the following:<\/p>\n<blockquote><p><em>Actionable Insight<\/em>: Send extremely compelling targeted offers for books.<\/p><\/blockquote>\n<p>I already receive targeted offers from <em>BookBub<\/em> based on favorite authors and genres. But this site lacks end-to-end data. BookBub knows that I clicked a book on its email newsletter to visit the online store &#8211; e.g. Amazon and Nook &#8211; where it&#8217;s available at a discount. However, it doesn&#8217;t know whether I actually purchased the book there or not, let alone whether I read them.<\/p>\n<p>Even Amazon only knows <em>books bought<\/em>, not <em>books read<\/em>.<\/p>\n<p>On the other hand, ChatGPT has a treasure trove of data about my reading preferences. As a result, it can make targeted offers by favorite authors, genres, topics, locales, and other attributes that the current crop of book-selling platforms cannot.<\/p>\n<hr style=\"width: 70%;\" \/>\n<p>There are at least two ways of evaluating the accuracy of ChatGPT&#8217;s data analysis:<\/p>\n<ol>\n<li><em>Academic<\/em>: Crunch the same data in parallel using purpose-built analytics tools like <em>Excel Pivot Table<\/em>, <em>SAS<\/em>, <em>Databricks<\/em>, etc., and compare their results with those provided by ChatGPT, and go from there. This reminds me of selling data entry machines in my early days of tech. To validate the data entered into them, customers would have two people enter the same data on two different DEMs and spot errors by comparing the checksum of the two floppies! This hack worked on the principle that, while two people are equally likely to make mistakes while entering data, they&#8217;re highly unlikely to make the <em>same mistake<\/em>. This is a <em>first principles approach<\/em>.<\/li>\n<li><em>Business<\/em>: Run a new campaign using ChatGPT&#8217;s insights and compare its performance with that of the existing campaign lacking those insights. KPIs for comparison include <em>Conversion Rate<\/em> and <em>ROAS<\/em> (<em>Return on Advertising Spend<\/em>). If the metrics of the ChatGPT-driven campaign trend higher, certify that ChatGPT&#8217;s data analysis is accurate enough. In parallel, work with the AI vendor to continually improve the accuracy of the AI model. This is akin to the <em>portfolio approach<\/em> I described in <strong><a href=\"https:\/\/gtm360.com\/blog\/2024\/04\/03\/evolving-success-metrics-for-ai\/\" target=\"_blank\" rel=\"bookmark noopener\">Evolving\u00a0<em class=\"algolia-search-highlight\">Success<\/em>\u00a0<em class=\"algolia-search-highlight\">Metrics<\/em>\u00a0For AI<\/a><\/strong>.<\/li>\n<\/ol>\n<blockquote class=\"twitter-tweet\">\n<p dir=\"ltr\" lang=\"en\">&#8220;To find out whether AI improves efficiency, we could run the same process twice, once the legacy way without AI, and the other with AI, and compare the results&#8221; ~ <a href=\"https:\/\/t.co\/hapEyrOME8\">https:\/\/t.co\/hapEyrOME8<\/a>.<br \/>\nI can&#8217;t recall the last enterprise technology that lent itself to A\/B testing like this.<\/p>\n<p>\u2014 SKR (@s_ketharaman) <a href=\"https:\/\/twitter.com\/s_ketharaman\/status\/2010353481757159441?ref_src=twsrc%5Etfw\">January 11, 2026<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<hr style=\"width: 70%;\" \/>\n<p>Between this and the previous post, we did a deep dive on the use of ChatGPT to analyze unstructured data, get basic and advanced insights, and suss out actionable insights. We followed the &#8220;man in the loop&#8221; approach where a human &#8211; me! &#8211; followed through with the actions triggered by ChatGPT&#8217;s insights. In parallel, we noted where ChatGPT could be used to automate the execution of some of its actionable insights. While I didn&#8217;t use Agentic AI on this instance, based on my high level exposure to the latest automation technology, I&#8217;m reasonably sure that AI Agents will be able to take automation to the next level.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In myBAR Data Analysis By ChatGPT, we saw various insights from ChatGPT based on crunching the data in myBAR dataset (For those who haven&#8217;t read that post, myBAR dataset contains&hellip;<\/p>\n","protected":false},"author":1,"featured_media":8088,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,1],"tags":[],"class_list":["post-8088","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-b2-product-v-services","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/sketharaman.com\/blog\/wp-json\/wp\/v2\/posts\/8088","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sketharaman.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sketharaman.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sketharaman.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sketharaman.com\/blog\/wp-json\/wp\/v2\/comments?post=8088"}],"version-history":[{"count":1,"href":"https:\/\/sketharaman.com\/blog\/wp-json\/wp\/v2\/posts\/8088\/revisions"}],"predecessor-version":[{"id":8089,"href":"https:\/\/sketharaman.com\/blog\/wp-json\/wp\/v2\/posts\/8088\/revisions\/8089"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sketharaman.com\/blog\/wp-json\/wp\/v2\/posts\/8088"}],"wp:attachment":[{"href":"https:\/\/sketharaman.com\/blog\/wp-json\/wp\/v2\/media?parent=8088"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sketharaman.com\/blog\/wp-json\/wp\/v2\/categories?post=8088"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sketharaman.com\/blog\/wp-json\/wp\/v2\/tags?post=8088"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}