N-grams-graphs and their applications in NLP

Report

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Gitter

18 Click here to see a small ppt on the potential use of n gram graphs for text classification

JInsect

The JINSECT toolkit is a Java-based toolkit and library that supports and demonstrates the use of n-gram graphs within Natural Language Processing applications, ranging from summarization and summary evaluation to text classiļ¬cation and indexing. This repository has parts of the collaborative work that Ayush Pareek did with Dr. George Giannakopoulos on the upcoming 2nd version of the tool. It also contains a rudimentary version of the toolkit in Python with functional N-gram Graph operations and AutoSumENG, MeMoG, NPower algorithms for multilingual summary evaluation.

Code Snippets


import gr.demokritos.iit.jinsect.documentModel.representations.DocumentNGramGraph;

...

// The string we want to represent
String sTmp = "Hello graph!";

// The default document n-gram graph with min n-gram size 
// and max n-gram size set to 3, and dist parameter set to 3
DocumentNGramGraph dngGraph = new DocumentNGramGraph();

// Create the graph
dngGraph.setDataString(sTmp);

...

import gr.demokritos.iit.jinsect.documentModel.comparators.NGramCachedGraphComparator;
import gr.demokritos.iit.jinsect.documentModel.representations.DocumentNGramGraph;
import gr.demokritos.iit.jinsect.structs.GraphSimilarity;
import gr.demokritos.iit.jinsect.utils;
import java.io.IOException;

...

    // The filename of the file the contents of which will form the graph
    String sFilename = "ayush_file.txt";
    DocumentNGramGraph dngGraph = new DocumentNGramGraph(); 
    // Load the data string from the file, also dealing with exceptions
    try {
        dngGraph.loadDataStringFromFile(sFilename);
    } catch (IOException ex) {
        ex.printStackTrace();
    }

import
gr.demokritos.iit.jinsect.documentModel.representations.DocumentNGramGraph;
import gr.demokritos.iit.jinsect.utils;

...

    // create the n-gram graph
    String sData = "Hello there, graph world!";
    DocumentNGramGraph dngGraph = new DocumentNGramGraph();
    dngGraph.setDataString(sData);

    /* The following command gets the first n-gram graph level (with the
    minimum n-gram size) and renders it, using the utils package, 
    as a DOT string */
    System.out.println(utils.graphToDot(dngGraph.getGraphLevel(0), true));

...

import gr.demokritos.iit.jinsect.documentModel.comparators.NGramCachedGraphComparator;
import gr.demokritos.iit.jinsect.documentModel.representations.DocumentNGramGraph;
import gr.demokritos.iit.jinsect.structs.GraphSimilarity;
import gr.demokritos.iit.jinsect.utils;
import java.io.IOException;

...

    String sTmp = "Hello graph!";
    DocumentNGramGraph dngGraph = new DocumentNGramGraph(); 
    dngGraph.setDataString(sTmp);
    String sTmp2 = "Hello other graph!";
    DocumentNGramGraph dngGraph2 = new DocumentNGramGraph(); 
    dngGraph2.setDataString(sTmp2);

    // Create a comparator object
    NGramCachedGraphComparator ngc = new NGramCachedGraphComparator();
    // Extract similarity
    GraphSimilarity gs = ngc.getSimilarityBetween(dngGraph, dngGraph2);
    // Output similarity (all three components: containment, value and size)
    System.out.println(gs.toString());

import gr.demokritos.iit.jinsect.documentModel.representations.DocumentNGramGraph;

...

    // create the two graphs
    String sTmpA = "Hello graph A!";
    String sTmpB = "Hello graph B!";
    DocumentNGramGraph dngGraphA = new DocumentNGramGraph();
    DocumentNGramGraph dngGraphB = new DocumentNGramGraph();
    dngGraphA.setDataString(sTmpA);
    dngGraphB.setDataString(sTmpB);

    // perform merging with weight factor 0.5 (averaging)
    // result is on dngGraphA
    dngGraphA.mergeGraph(dngGraphB, 0.5);

import gr.demokritos.iit.jinsect.documentModel.representations.DocumentNGramGraph;
import gr.demokritos.iit.jinsect.storage.INSECTFileDB;

...

        // string to be represented
        String sTmp = "Hello there, I am an example string!";
        DocumentNGramGraph dngGraph = new DocumentNGramGraph();
        INSECTFileDB<DocumentNGramGraph> db = new INSECTFileDB<DocumentNGramGraph>();

        // if the file already exists
        if (db.existsObject("test", "graph")) { 
            dngGraph = db.loadObject("test", "graph");
        }
        else {
            // Create the graph
            dngGraph.setDataString(sTmp);

            // save object to file
            db.saveObject(dngGraph, "test", "graph");
        }

This version of the library has implementations of the following features-

This library version: