SICP-Graph

This project began from a question "can machines understand the text within SICP?".

A graph visualization of the interconnectedness between the sections in SICP.

Method

Similarity is computed using a hybrid approach combining semantic and lexical methods:

Semantic similarity: OpenAI's text-embedding-3-small embeddings with cosine similarity
Lexical similarity: BM25 (Okapi BM25) for keyword-based matching

The final similarity score combines both: 0.7 * semantic + 0.3 * lexical. This captures both conceptual relationships (via embeddings) and shared terminology (via BM25).

A k-NN threshold is applied to keep only the top-k most similar neighbors per chapter, resulting in a sparse similarity graph visualized with D3.js force-directed layout.

Directory structure

docs contains all the data and the static pages rendered on gh-pages
texts contains the cleaned html and markdown version of each texts
connectome.py computes semantic similarity using OpenAI embeddings
connectome_tfidf.py legacy TF-IDF approach (preserved for comparison)

The graph visualization has since been extended to other texts:

Structure and Interpretation of Computer Programs
Structure and Interpretation of Classical Mechanics
- Lagrangian mechanics is more related to rigid bodies than Hamiltonian mechanics.
The Principles of Quantum Mechanics: chapters, sections
The Society of Mind. Sourced From http://www.aurellem.org/minsky/
- Each sections are highly correlated with each other

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
docs		docs
texts		texts
tools		tools
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
connectome.py		connectome.py
connectome_tfidf.py		connectome_tfidf.py
graph_rank.py		graph_rank.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SICP-Graph

Method

Directory structure

About

Uh oh!

Releases

Packages

Uh oh!

Languages

rht/sicp-graph

Folders and files

Latest commit

History

Repository files navigation

SICP-Graph

Method

Directory structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages