News

Tables and structured data

Matt Cowan

Sep 24, 2023 • 3 min read

Photo by Mika Baumeister / Unsplash

Let's say I gave you a spreadsheet with the first ten rows of Spotify's Top 100 songs (for some day in the past).

I could ask you a bunch of questions about this that you could answer quickly and accurately just by looking it over.

Which artist has the most songs?
What is the longest song?
What is the longest song with a tempo above 100?

If you are like me, that last one was doable, but definitely required a bit of mental effort / working memory usage.

If I gave you the full list of 100 songs, or perhaps even more, you are unlikely to continue to answer questions using the same strategy (looking over the table).

Instead, what I would do is to start making use of additional tools. Since this data is a spreadsheet, my first instinct would be to use the spreadsheet software, perhaps Excel or Google Sheets to help me answer the questions.

Conveniently, with Microsoft's deep push into AI, I can drop my data in there, select it, and ask it my question directly.

Without this feature, after pasting my table, I need to make sure that the columns are the right data types, and then filter and sort on the columns. Things get even more harrowing if there are mixed data types in the columns.

How does GPT-4 do with this question? Let's try:

Source data: Spotify Top 100

Song name,Artists,Danceability,Energy,Loudness,Tempo,Duration (ms)
God's Plan,Drake,0.754,0.449,-9.211,77.169,198973
SAD!,XXXTENTACION,0.74,0.613,-4.88,75.023,166606
rockstar (feat. 21 Savage),Post Malone,0.587,0.535,-6.09,159.847,218147
Psycho (feat. Ty Dolla $ign),Post Malone,0.739,0.559,-8.011,140.124,221440
[TRUNCATED]

---

Using only the provided source data, please answer the following question:

What is the longest song with a tempo above 100?

This is not great for a couple reasons.

GPT4 gets it wrong, but perhaps with enough "prompt-engineering" we could get past that
This will get expensive real fast, not to mention, depending on your data size, you may run into context-window limitations if your wallet wasn't already scared off

Here at ALU we are currently focused on bringing powerful AI based search and question answering to Confluence. We had struggled reliably answering questions where the answer required understanding a table. We were essentially having the AI operate the same as a human who only had a printed copy of the table.

With the latest update to Connie AI as a part of Atlassian's CodeGeist, we are excited to bring Excel AI or better level question answering over tables in your Confluence spaces.

For example here is the same question we asked Excel and GPT4 earlier. Note, unlike those other two, Connie is considering all of the pages and data in my Confluence space to answer this, it doesn't have the luxury of being pointed directly to the table. More on this to come soon 😉

That question was easy, what about something a bit more challenging.

Connie even beats Excel on this one:

With the latest update to Connie AI, your teams can stop exfiltrating your data from Confluence to analyze it and answer questions. Install Connie AI to make more of your investment in Confluence.