Build Status

This package implements fast/scalable node embedding algorithms. This can be used to embed the nodes in graph objects. We support NetworkX graph types natively.

You can also efficiently embed arbitrary scipy CSR Sparse Matrices, though not all algorithms here are optimized for this usecase.

alt tag

Installing

pip install nodevectors

Supported Algorithms

Quick Example:

    import networkx as nx
    from nodevectors import Node2Vec

    # Test Graph
    G = nx.generators.classic.wheel_graph(100)

    # Fit embedding model to graph
    g2v = Node2Vec()
    # way faster than other node2vec implementations
    # Graph edge weights are handled automatically
    g2v.fit(G)

    # query embeddings for node 42
    g2v.predict(42)

    # Save and load whole node2vec model
    # Uses a smart pickling method to avoid serialization errors
    g2v.save('node2vec.pckl')
    g2v = Node2vec.load('node2vec.pckl')

    # Save model to gensim.KeyedVector format
    g2v.save_vectors("wheel_model.bin")

    # load in gensim
    from gensim.models import KeyedVectors
    model = KeyedVectors.load_word2vec_format("wheel_model.bin")
    model[str(43)] # need to make nodeID a str for gensim

Warning: Saving in Gensim format is only supported for the Node2Vec model at this point. Other models build a Dict or embeddings.

Embedding a large graph

NetworkX doesn't support large graphs (>500,000 nodes) because it uses lots of memory for each node. We recommend using CSRGraphs (which is installed with this package) to load the graph in memory:

import csrgraph as cg
import nodevectors

G = cg.read_edgelist("path_to_file.csv")
ggvec_model = nodevectors.GGVec() 
embeddings = ggvec_model.fit_transform(G)

The ProNE and GGVec algorithms are the fastest. GGVec uses the least RAM to embed larger graphs. Additionally here are some recommendations:

Embedding a VERY LARGE graph

(Upcoming).

GGVec can be used to learn embeddings directly from an edgelist file (or stream) when the order parameter is constrained to be 1. This means you can embed arbitrarily large graphs without ever loading them entirely into RAM.

Related Projects

Why is it so fast?

We leverage CSRGraphs for most algorithms. This uses CSR graph representations and a lot of Numba JIT'ed procedures.