← Back to Cookbook
property graph neo4j
Details
File: third_party/LlamaIndex/propertygraphs/property_graph_neo4j.ipynb
Type: Jupyter Notebook
Use Cases: Property graph
Integrations: Neo4j
Content
Notebook content (JSON format):
{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# PropertyGraph using Neo4j\n", "\n", "In this notebook we will demonstrate building PropertyGraph using Neo4j\n", "\n", "Neo4j is a production-grade graph database that excels in storing property graphs, performing vector searches, filtering, and more.\n", "\n", "The simplest way to begin is by using a cloud-hosted instance through Neo4j Aura. However, for the purposes of this notebook, we will focus on how to run the database locally using Docker." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install llama-index-core \n", "%pip install llama-index-graph-stores-neo4j\n", "%pip install llama-index-llms-mistralai\n", "%pip install llama-index-embeddings-mistralai" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Docker Setup\n", "\n", "You need to login and set password for the first time.\n", "\n", "1. username: neo4j\n", "\n", "2. password: neo4j" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!docker run \\\n", " -p 7474:7474 -p 7687:7687 \\\n", " -v $PWD/data:/data -v $PWD/plugins:/plugins \\\n", " --name neo4j-apoc \\\n", " -e NEO4J_apoc_export_file_enabled=true \\\n", " -e NEO4J_apoc_import_file_enabled=true \\\n", " -e NEO4J_apoc_import_file_use__neo4j__config=true \\\n", " -e NEO4JLABS_PLUGINS=\\[\\\"apoc\\\"\\] \\\n", " neo4j:latest" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import nest_asyncio\n", "\n", "nest_asyncio.apply()\n", "\n", "from IPython.display import Markdown, display" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import os\n", "os.environ['MISTRAL_API_KEY'] = 'YOUR MISTRAL API KEY'" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from llama_index.embeddings.mistralai import MistralAIEmbedding\n", "from llama_index.llms.mistralai import MistralAI\n", "\n", "llm = MistralAI(model='mistral-large-latest')\n", "embed_model = MistralAIEmbedding()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!mkdir -p 'data/paul_graham/'\n", "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Data" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from llama_index.core import SimpleDirectoryReader\n", "\n", "documents = SimpleDirectoryReader(\"./data/paul_graham/\").load_data()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Index Construction" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The procedure has a deprecated field. ('config' used by 'apoc.meta.graphSample' is deprecated.)} {position: line: 1, column: 1, offset: 0} for query: \"CALL apoc.meta.graphSample() YIELD nodes, relationships RETURN nodes, [rel in relationships | {name:apoc.any.property(rel, 'type'), count: apoc.any.property(rel, 'count')}] AS relationships\"\n" ] } ], "source": [ "from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore\n", "\n", "# Note: used to be `Neo4jPGStore`\n", "graph_store = Neo4jPropertyGraphStore(\n", " username=\"neo4j\",\n", " password=\"llamaindex\",\n", " url=\"bolt://localhost:7687\",\n", ")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/ravithejad/Desktop/llamaindex/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n", "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 27.19it/s]\n", "Extracting paths from text: 100%|██████████| 22/22 [00:42<00:00, 1.92s/it]\n", "Generating embeddings: 100%|██████████| 3/3 [00:01<00:00, 2.60it/s]\n", "Generating embeddings: 100%|██████████| 40/40 [00:13<00:00, 2.86it/s]\n", "Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The procedure has a deprecated field. ('config' used by 'apoc.meta.graphSample' is deprecated.)} {position: line: 1, column: 1, offset: 0} for query: \"CALL apoc.meta.graphSample() YIELD nodes, relationships RETURN nodes, [rel in relationships | {name:apoc.any.property(rel, 'type'), count: apoc.any.property(rel, 'count')}] AS relationships\"\n" ] } ], "source": [ "from llama_index.core import PropertyGraphIndex\n", "from llama_index.embeddings.openai import OpenAIEmbedding\n", "from llama_index.llms.openai import OpenAI\n", "from llama_index.core.indices.property_graph import SimpleLLMPathExtractor\n", "\n", "index = PropertyGraphIndex.from_documents(\n", " documents,\n", " embed_model=embed_model,\n", " kg_extractors=[\n", " SimpleLLMPathExtractor(\n", " llm=llm\n", " )\n", " ],\n", " property_graph_store=graph_store,\n", " show_progress=True,\n", ")" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "from llama_index.core import Settings\n", "Settings.llm = llm\n", "Settings.embed_model = embed_model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Retrievers" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "from llama_index.core.indices.property_graph import (\n", " LLMSynonymRetriever,\n", " VectorContextRetriever,\n", ")\n", "\n", "\n", "llm_synonym = LLMSynonymRetriever(\n", " index.property_graph_store,\n", " llm=llm,\n", " include_text=False,\n", ")\n", "vector_context = VectorContextRetriever(\n", " index.property_graph_store,\n", " embed_model=embed_model,\n", " include_text=False,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Querying" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Retrieving" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Yahoo -> Bought -> Viaweb in 1998\n", "Hacker news -> Source of stress for -> Author\n", "Author -> Wrote -> Yc's internal software in arc\n", "Author -> Advised by -> Robert morris to not make yc the last cool thing\n", "Author -> Decided to hand yc over to -> Sam altman\n", "Author -> Worked on -> Writing essays and yc\n", "Viaweb -> Software -> Works via the web\n", "Robert morris -> Showed -> World wide web\n" ] } ], "source": [ "retriever = index.as_retriever(\n", " sub_retrievers=[\n", " llm_synonym,\n", " vector_context,\n", " ],\n", ")\n", "\n", "nodes = retriever.retrieve(\"What did author do at Viaweb?\")\n", "\n", "for node in nodes:\n", " print(node.text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### QueryEngine" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "The author, along with Robert Morris, started Viaweb, a company that aimed to build online stores. The author's role involved writing software to generate websites for galleries initially, and later, developing a new site generator for online stores using Lisp. The author also had the innovative idea of running the software on the server and letting users control it by clicking on links, eliminating the need for any client software or command line interaction on the server. This led to the creation of a web app, which at the time was a novel concept." ], "text/plain": [ "<IPython.core.display.Markdown object>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query_engine = index.as_query_engine(include_text=True)\n", "\n", "response = query_engine.query(\"What did author do at Viaweb?\")\n", "\n", "display(Markdown(f\"{response.response}\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "llamaindex", "language": "python", "name": "llamaindex" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 2 }