Spreadsheets are graphs too!
Felienne Hermans (@felienne)
Spreadsheets are graphs too!
Felienne Hermans (@felienne)
In this slidedeck I explain how I
used Neo4J to store information on
spreadsheets
Ehm...spreadsheets?
They are so tably?
Are you sure they are fit for a graph
database?
Spreadsheets are mislabeled
Spreadsheets are mislabeled
People often think of spreadsheets
as data, but...
Spreadsheets are code
Spreadsheets are code
I have made it my life’s work to
spread the happy word
“Spreadsheets are code!”
Spreadsheets are code
I have made it my life’s work to
spread the happy word
“Spreadsheets are code!”
If you don’t immediately believe
me, I have three reasons*
* If you do believe me, skip the next 10 slides ;)
1) Used for similar problems
This tool (for stock price
computation) could have been
built in any language. C,
JavaScript, COBOL, or Excel.
The problems Excel is used for are
often (not always) similar to
problems solved in different
languages.
2) Formulas are Turing complete
2) Formulas are Turing complete
I go to great lengths to make my
point. To such great lengths that I
built a Turing machine in Excel,
using formulas only.
Here you see it in action. Every row
is an consecutive step of the lint.
This makes it, in addition to a proof
that formulas are Turing complete,
Also a nice visualization of a Turing
machine.
3) They suffer from the same problems
3) They suffer from the same problems
3) They suffer from the same problems
3) They suffer from the same problems
In summary: both the activities,
complexity and problems are the same
So if spreadsheets are code, can we
apply software engineering methods?
Spreadsheets are graphs too: Using Neo4J as backend to store spreadsheet information
In my dissertation, I defined smells
for spreadsheet formulas
Turns out, Fowler’s code smells are easily
transferable to spreadsheets
Pop quiz: what smell is this?
It is the ‘feature envy’ smell
See how easily this applies to
spreadsheets
To analyze smells, we save spreadsheet
info to a database
This is the data model that I am
storing to the database.
The basics are pretty simple.
This is the data model that I am
storing to the database.
The basics are pretty simple.
But cells can refer to each other,
either directly (i.e. =A7+A9)
=A7+A9
=A7+A9=SUM(A1:A5)
This is the data model that I am
storing to the database.
The basics are pretty simple.
But cells can refer to each other,
either directly [=A7+A9] or through
a range [=SUM(A1:A5)]
This is the data model that I am
storing to the database.
The basics are pretty simple.
But cells can refer to each other,
either directly [=A7+A9] or through
a range [=SUM(A1:A5)]
In the case of a range, the range
itself will points to the cells it
contains.
=SUM(A1:A5) A1..A5
You know the saying that if all you
have is a hammer, everything is a
nail to you.
This is what happened to me. I did
not think about what type of
database to use.
SQL
You know the saying that if all you
have is a hammer, everything is a
nail to you.
This is what happened to me. I did
not think about what type of
database to use.
I just started banging with the
good ol’ SQL hammer I had been
using for ever.
Number of worksheets in a spreadsheet
Which started out just fine!
Number of cells in a spreadsheet
Still pretty okay
Number of connected cells for a cell
But, in order to calculate the
‘feature envy’ smell, we need the
total number of connected cells.
So both direct and through a
range.
Number of connected cells for a cell
But, in order to calculate the
‘feature envy’ smell, we need the
total number of connected cells.
So both direct and through a
range.
Let’s start with direct.
Number of connected cells for a cell
Number of connected cells for a cell
But, in order to calculate the
‘feature envy’ smell, we need the
total number of connected cells.
So both direct and through a
range.
Let’s start with direct.
Now look at the range part.
Number of connected cells for a cell
Number of connected cells for a cell
Number of connected cells for a cell
Things start to get iffy when we
combine these two query parts.
Number of connected cells for a cell
Number of connected cells for a cell
Things start to get iffy when we
combine these two query parts.
Not only is the query quite big, also
this happens.
Number of connected cells for a cell
If your tools reach their limits, this
has to tell you something.
So I started thinking.
Maybe this
is not a
nail…
Spreadsheets are graphs too: Using Neo4J as backend to store spreadsheet information
Maybe I
need a
different tool
Maybe I
need a
different tool
It was at this time that I attended a
talk about Neo4J.
And the strange thing is, I had
seen a few talks about Neo before.
But this time it ‘clicked’, because I
was suffering from the problem
that Neo could solve.
So I ended up with this
model. Still spreadsheets,
worksheets, cells and links.
So I ended up with this
model. Still spreadsheets,
worksheets, cells and links.
But the ‘prec’ relation can
now refer to either cells or
ranges.
Turning this
Turning this into this.
Turning this into this.
I wouldn’t say this is the power of
Neo at work. It is the power of the
right tool for the job.
There are scenarios, for sure,
where the situation is the other
way around.
But for my goal, Neo was a great
fit.
Also, to be honest with you, I did
not immediately write such super
succint Cypher queries. My first
attempt was something like this:
Also, to be honest with you, I did
not immediately write such super
succint Cypher queries.
My first attempt was something like
this
Also, to be honest with you, I did
not immediately write such super
succint Cypher queries.
My first attempt was something like
this
This is basically a one on one
translation from SQL to Neo. Still the
two different ways of connecting. It
took me a while to understand the
power of traversal queries.
Here’s another example:
Number of cells in a spreadsheet
Number of cells in a spreadsheet
First Cypher attempt
Still very SQLy
Number of cells in a spreadsheet
Second (okay probably more like
fifth) attempt. No more where,
directly matching a graph pattern.
The power of Cypher :)
That’s all folks.
Spreadsheets are code
That’s all folks.
Spreadsheets are code
Don’t justhit things with the one
hammer you know
That’s all folks.
Spreadsheets are code
Don’t justhit things with the one
hammer you know
Neo is cool for graph like structures
That’s all folks.
Spreadsheets are code
Don’t justhit things with the one
hammer you know
Neo is cool for graph like structures
It makes queries easier
That’s all folks.
Spreadsheets are code
Don’t justhit things with the one
hammer you know
Neo is cool for graph like structures
It makes queries easier
But it takes some getting used to
for SQL minded brains
Spreadsheets are graphs too!
Felienne Hermans (@felienne)
That’s all folks.
Spreadsheets are code
Don’t justhit things with the one
hammer you know
Neo is cool for graph like structures
It makes queries easier
But it takes some getting used to for
SQL minded brains
Liked this talk? Visit my site for more

More Related Content

PPTX
Spreadsheets are code
PPTX
Spreadsheets for developers
PPTX
Improving Spreadsheet Test Practices
PDF
An overview of my PhD research
PPTX
Presenting: structure story and support
PDF
Functional Programming in Excel
PPTX
Putting the science in computer science
PDF
Detecting and Visualizing Inter-worksheet Smells in Spreadsheets
Spreadsheets are code
Spreadsheets for developers
Improving Spreadsheet Test Practices
An overview of my PhD research
Presenting: structure story and support
Functional Programming in Excel
Putting the science in computer science
Detecting and Visualizing Inter-worksheet Smells in Spreadsheets

Viewers also liked (20)

PDF
GraphConnect 2014 SF: Applying the GraphAware Framework
PPT
Google Docs
PPTX
Target audience profile
PPT
Data visualisation - Big data
PDF
Deploying Massive Scale Graphs for Realtime Insights
PDF
ViBRANT WP5 presentation by Vizzuality
PDF
Tips and Tricks for Graph Data Modeling
PDF
New trends in data analysis and visualization on the web
PDF
A FEW THINGS THAT WOULD HAVE BEEN HELPFUL TO KNOW BEFOREHAND
PDF
Presentatie data visualisatie, interactive storytelling
PDF
Lasst die Daten sprechen
PPTX
Panama Papers and Beyond: Unveiling Secrecy with Graphs
PPTX
Small, simple and smelly: What we can learn from examining end-user artifacts?
PDF
Natural Language Processing with Graphs
PDF
Open Data: Open Your Mind
PDF
Meaningful Data - Best Internet Conference 2015 (Lithuania)
PPTX
Knowledge Architecture: Graphing Your Knowledge
PPTX
The power of symmetry
PDF
Neo4j the Anti Crime Database
PDF
Fraud Detection with Neo4j
GraphConnect 2014 SF: Applying the GraphAware Framework
Google Docs
Target audience profile
Data visualisation - Big data
Deploying Massive Scale Graphs for Realtime Insights
ViBRANT WP5 presentation by Vizzuality
Tips and Tricks for Graph Data Modeling
New trends in data analysis and visualization on the web
A FEW THINGS THAT WOULD HAVE BEEN HELPFUL TO KNOW BEFOREHAND
Presentatie data visualisatie, interactive storytelling
Lasst die Daten sprechen
Panama Papers and Beyond: Unveiling Secrecy with Graphs
Small, simple and smelly: What we can learn from examining end-user artifacts?
Natural Language Processing with Graphs
Open Data: Open Your Mind
Meaningful Data - Best Internet Conference 2015 (Lithuania)
Knowledge Architecture: Graphing Your Knowledge
The power of symmetry
Neo4j the Anti Crime Database
Fraud Detection with Neo4j
Ad

Similar to Spreadsheets are graphs too: Using Neo4J as backend to store spreadsheet information (20)

ODP
Summer School DSL 2013 - SpreadSheet Engineering
PDF
managing big data
PPTX
Spreadsheet Engineering
PDF
Spreadsheets are models too - Richard Paige at Sems 2014
PPT
Spreadsheets: Functional Programming for the Masses
PDF
Neo4j Training Modeling
PDF
Neo4j Data Science Presentation
PDF
How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018
PDF
Neo4j Presentation
PDF
Thinking about graphs
PDF
Building Applications with a Graph Database
PPTX
Neo4j Training Introduction
PDF
5.17 - IntroductionToNeo4j-allSlides_1_2022_DanMc.pdf
ODP
Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14
PPTX
Graph Databases for SQL Server Professionals
PDF
3. Relationships Matter: Using Connected Data for Better Machine Learning
PPTX
Starting With Microsoft Excel
PPTX
Automatically Inferring ClassSheet Models from Spreadsheets
PDF
Findability Day 2014 Neo4j how graph data boost your insights
PPT
Hands on Training – Graph Database with Neo4j
Summer School DSL 2013 - SpreadSheet Engineering
managing big data
Spreadsheet Engineering
Spreadsheets are models too - Richard Paige at Sems 2014
Spreadsheets: Functional Programming for the Masses
Neo4j Training Modeling
Neo4j Data Science Presentation
How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018
Neo4j Presentation
Thinking about graphs
Building Applications with a Graph Database
Neo4j Training Introduction
5.17 - IntroductionToNeo4j-allSlides_1_2022_DanMc.pdf
Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14
Graph Databases for SQL Server Professionals
3. Relationships Matter: Using Connected Data for Better Machine Learning
Starting With Microsoft Excel
Automatically Inferring ClassSheet Models from Spreadsheets
Findability Day 2014 Neo4j how graph data boost your insights
Hands on Training – Graph Database with Neo4j
Ad

More from Felienne Hermans (9)

PDF
Using F# and genetic programming to play computer bridge
PDF
How does code sound?
PDF
Programming is logical reasoning?
PPTX
Do Code Smell Hamper Novice Programmers?
PPTX
Programming by Calculation
PPTX
A board game night with geeks: attacking Quarto ties with SAT solvers
PPTX
Social media for the busy scientist
PDF
Spreadsheet Testing
PPTX
TEDxDelft
Using F# and genetic programming to play computer bridge
How does code sound?
Programming is logical reasoning?
Do Code Smell Hamper Novice Programmers?
Programming by Calculation
A board game night with geeks: attacking Quarto ties with SAT solvers
Social media for the busy scientist
Spreadsheet Testing
TEDxDelft

Recently uploaded (20)

PDF
CloudStack 4.21: First Look Webinar slides
PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPTX
2018-HIPAA-Renewal-Training for executives
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
Comparative analysis of machine learning models for fake news detection in so...
PPT
Geologic Time for studying geology for geologist
PPTX
Build Your First AI Agent with UiPath.pptx
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
DOCX
search engine optimization ppt fir known well about this
PDF
STKI Israel Market Study 2025 version august
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PPTX
TEXTILE technology diploma scope and career opportunities
PDF
A review of recent deep learning applications in wood surface defect identifi...
PPTX
Modernising the Digital Integration Hub
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
sustainability-14-14877-v2.pddhzftheheeeee
CloudStack 4.21: First Look Webinar slides
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
Flame analysis and combustion estimation using large language and vision assi...
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
2018-HIPAA-Renewal-Training for executives
Improvisation in detection of pomegranate leaf disease using transfer learni...
Comparative analysis of machine learning models for fake news detection in so...
Geologic Time for studying geology for geologist
Build Your First AI Agent with UiPath.pptx
The influence of sentiment analysis in enhancing early warning system model f...
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
search engine optimization ppt fir known well about this
STKI Israel Market Study 2025 version august
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
TEXTILE technology diploma scope and career opportunities
A review of recent deep learning applications in wood surface defect identifi...
Modernising the Digital Integration Hub
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Final SEM Unit 1 for mit wpu at pune .pptx
sustainability-14-14877-v2.pddhzftheheeeee

Spreadsheets are graphs too: Using Neo4J as backend to store spreadsheet information

  • 1. Spreadsheets are graphs too! Felienne Hermans (@felienne)
  • 2. Spreadsheets are graphs too! Felienne Hermans (@felienne) In this slidedeck I explain how I used Neo4J to store information on spreadsheets
  • 3. Ehm...spreadsheets? They are so tably? Are you sure they are fit for a graph database?
  • 5. Spreadsheets are mislabeled People often think of spreadsheets as data, but...
  • 7. Spreadsheets are code I have made it my life’s work to spread the happy word “Spreadsheets are code!”
  • 8. Spreadsheets are code I have made it my life’s work to spread the happy word “Spreadsheets are code!” If you don’t immediately believe me, I have three reasons* * If you do believe me, skip the next 10 slides ;)
  • 9. 1) Used for similar problems
  • 10. This tool (for stock price computation) could have been built in any language. C, JavaScript, COBOL, or Excel. The problems Excel is used for are often (not always) similar to problems solved in different languages.
  • 11. 2) Formulas are Turing complete
  • 12. 2) Formulas are Turing complete I go to great lengths to make my point. To such great lengths that I built a Turing machine in Excel, using formulas only.
  • 13. Here you see it in action. Every row is an consecutive step of the lint. This makes it, in addition to a proof that formulas are Turing complete, Also a nice visualization of a Turing machine.
  • 14. 3) They suffer from the same problems
  • 15. 3) They suffer from the same problems
  • 16. 3) They suffer from the same problems
  • 17. 3) They suffer from the same problems
  • 18. In summary: both the activities, complexity and problems are the same
  • 19. So if spreadsheets are code, can we apply software engineering methods?
  • 21. In my dissertation, I defined smells for spreadsheet formulas
  • 22. Turns out, Fowler’s code smells are easily transferable to spreadsheets
  • 23. Pop quiz: what smell is this?
  • 24. It is the ‘feature envy’ smell
  • 25. See how easily this applies to spreadsheets
  • 26. To analyze smells, we save spreadsheet info to a database
  • 27. This is the data model that I am storing to the database. The basics are pretty simple.
  • 28. This is the data model that I am storing to the database. The basics are pretty simple. But cells can refer to each other, either directly (i.e. =A7+A9) =A7+A9
  • 29. =A7+A9=SUM(A1:A5) This is the data model that I am storing to the database. The basics are pretty simple. But cells can refer to each other, either directly [=A7+A9] or through a range [=SUM(A1:A5)]
  • 30. This is the data model that I am storing to the database. The basics are pretty simple. But cells can refer to each other, either directly [=A7+A9] or through a range [=SUM(A1:A5)] In the case of a range, the range itself will points to the cells it contains. =SUM(A1:A5) A1..A5
  • 31. You know the saying that if all you have is a hammer, everything is a nail to you. This is what happened to me. I did not think about what type of database to use.
  • 32. SQL You know the saying that if all you have is a hammer, everything is a nail to you. This is what happened to me. I did not think about what type of database to use. I just started banging with the good ol’ SQL hammer I had been using for ever.
  • 33. Number of worksheets in a spreadsheet Which started out just fine!
  • 34. Number of cells in a spreadsheet Still pretty okay
  • 35. Number of connected cells for a cell But, in order to calculate the ‘feature envy’ smell, we need the total number of connected cells. So both direct and through a range.
  • 36. Number of connected cells for a cell But, in order to calculate the ‘feature envy’ smell, we need the total number of connected cells. So both direct and through a range. Let’s start with direct.
  • 37. Number of connected cells for a cell
  • 38. Number of connected cells for a cell But, in order to calculate the ‘feature envy’ smell, we need the total number of connected cells. So both direct and through a range. Let’s start with direct. Now look at the range part.
  • 39. Number of connected cells for a cell
  • 40. Number of connected cells for a cell
  • 41. Number of connected cells for a cell Things start to get iffy when we combine these two query parts.
  • 42. Number of connected cells for a cell
  • 43. Number of connected cells for a cell Things start to get iffy when we combine these two query parts. Not only is the query quite big, also this happens.
  • 44. Number of connected cells for a cell
  • 45. If your tools reach their limits, this has to tell you something. So I started thinking.
  • 46. Maybe this is not a nail…
  • 49. Maybe I need a different tool It was at this time that I attended a talk about Neo4J. And the strange thing is, I had seen a few talks about Neo before. But this time it ‘clicked’, because I was suffering from the problem that Neo could solve.
  • 50. So I ended up with this model. Still spreadsheets, worksheets, cells and links.
  • 51. So I ended up with this model. Still spreadsheets, worksheets, cells and links. But the ‘prec’ relation can now refer to either cells or ranges.
  • 54. Turning this into this. I wouldn’t say this is the power of Neo at work. It is the power of the right tool for the job. There are scenarios, for sure, where the situation is the other way around. But for my goal, Neo was a great fit.
  • 55. Also, to be honest with you, I did not immediately write such super succint Cypher queries. My first attempt was something like this:
  • 56. Also, to be honest with you, I did not immediately write such super succint Cypher queries. My first attempt was something like this
  • 57. Also, to be honest with you, I did not immediately write such super succint Cypher queries. My first attempt was something like this This is basically a one on one translation from SQL to Neo. Still the two different ways of connecting. It took me a while to understand the power of traversal queries. Here’s another example:
  • 58. Number of cells in a spreadsheet
  • 59. Number of cells in a spreadsheet First Cypher attempt Still very SQLy
  • 60. Number of cells in a spreadsheet Second (okay probably more like fifth) attempt. No more where, directly matching a graph pattern. The power of Cypher :)
  • 62. That’s all folks. Spreadsheets are code Don’t justhit things with the one hammer you know
  • 63. That’s all folks. Spreadsheets are code Don’t justhit things with the one hammer you know Neo is cool for graph like structures
  • 64. That’s all folks. Spreadsheets are code Don’t justhit things with the one hammer you know Neo is cool for graph like structures It makes queries easier
  • 65. That’s all folks. Spreadsheets are code Don’t justhit things with the one hammer you know Neo is cool for graph like structures It makes queries easier But it takes some getting used to for SQL minded brains
  • 66. Spreadsheets are graphs too! Felienne Hermans (@felienne) That’s all folks. Spreadsheets are code Don’t justhit things with the one hammer you know Neo is cool for graph like structures It makes queries easier But it takes some getting used to for SQL minded brains Liked this talk? Visit my site for more