StringBench

Build Status

This project compares different java string matching implementations with a performance test.

Latest Benchmarks

Benchmark quality

Subject of this Benchmark

This benchmark is designed for following objectives:

Comparing different algorithms

It is quite useful to know which algorithm to use in which scenario.

Comparing different implementation

Yet a flawed implementation is easier identified if compared with other implementations (that deviate strongly in performance).

Minor deviations between the performance of different implementations of the same algorithm should be ignored

The comparison also tests the deviations of the algorithms if applied to different text sources. At this time the benchmark contains

Some implementations strongly deviate in performance if compared in this way.

Recommendation

Use this benchmark only to select the algorithm of your choice and then select the implementation that is most suiting your requirements.

An Overview of libraries

Java API

Java provides two ways to search for strings:

Both algorithms are very stable, passing all benchmarks. The naive algorithm will not perform well on large texts/patterns. The boyer-moore is actively challenged by other implementations below.

StringSearchAlgorithms (SC)

My own library StringSearchAlgorithms provides many algorithms for single and multiple patterns along with some experimental features (e.g. regex search), providing the algorithms:

I am continuously improving the design and trying to keep the test coverage near 100%. It is actively maintained and passes all tests of the benchmark.

ByteSeek (BS)

byteseek is a library for efficiently matching patterns of bytes and search for those patterns, providing the algorithms:

byteseek provides a well designed API, is actively maintained and passes all tests of the benchmark for single patterns. For multiple patterns tests do not pass yet, but the maintainer works on it.

StringSearch (SS)

StringSearch is a popular string searching library for single pattern search (and also wildcard search) claiming to do high performance string search. It does not pass the benchmark tests and is therefore excluded from the benchmark. String search for simple patterns takes minutes where if String.indexOf takes few seconds. The maintainer refused to clarify this issue.

AhoCorasick (AC)

AhoCorasick is a popular one-algorithm library that implements the Aho-Corasick algorithm. It does not pass the benchmark tests and is therefore excluded from the benchmark. Yet waiting very long for a maintainer statement.

AhoCorasickDoubleArrayTrie (ACDA)

AhoCorasick is a very fast implementation of the Aho-Corasick algorithm. It passed the benchmark without modification. The memory consumption does not exceed the memory given by the incubation tests, yet we cannot say how it does compare to the other implementations.

Participating

If you want another framework participating in this benchmarks, meet following conditions:

Interpretation of the results of 2015-11-22

Interpretation of the results of 2015-12-06

Interpretation of the results of 2015-12-13

Interpretation of the results of 2016-08-13

Interpretation of the results of 2016-10-31

Switching Visualization Software

We changed the visualization of best performing benchmarks. Yet we will only maintain the most current benchmarks. Older benchmarks will be available as csv, but not visualized.

Interpretation of the results of 2017-04-04

Interpretation of the results of 2019-01-05