MDS (acronym of Multiple Dimension Spread)


What does this project do?

MDS (acronym of Multiple Dimension Spread) is a Schema-less columnar storage format. Provide flexible representation like JSON and efficient reading similar to other columnar storage formats.

Why is this project useful?

There was a problem that it is too large to compress and save the data as it is in the Big Data era. From the demand for improvement in compression ratio and read performance, several columnar data formats (for example, Apache ORC and Apache Parquet) were proposed. They achieve the high compression ratio from similar data in column and reading performance for grouping data by column when data is used.

However, these data formats are required the data structure in a row (or a record) should be defined before saving the data. It was necessary to decide how to use it at the time of data storage, and it was often a problem that it was difficult to decide what kind of data to use.

In this project, we provide a new columnar format which does not require the schema at the time of data storage with compression and read performance equal to (or higher in case) than other formats.

Use cases

Data Analysis

Analyzing big data requires store data compactly and get data smoothly. MDS as a columnar format is useful for this needs.

Data Lake

Data Lake is a data pool that is not required the data structure (as a schema) in the row at the time of data storage. And stored data can be used with defining its schema at the time of analyzing. See DataLake.

How do I get started?

Firstly, please get MDS related repositories following section named "How to get source".

MDS format can treat data without Hadoop environment. However, it is useful for big data. so, it needs a Hadoop environment for storage and Hive for read to use efficiently.

We have a plan to create a docker environment of Hadoop and Hive for test use, but current situation, you need to prepare Hadoop and Hive firstly.

Setup environment


CLI is a Command Line Interface tool for using MDS. following tools are provided. needs some jars, so please create jar files before using.

$ mvn package

How to use


For preparation, get MDS jars and store then to proper directories.

$ bin/ # get MDS jars from Maven repository (bin/ -h for help)

And, put MDS related jars to Hadoop.

$ cp -r jars/mds /tmp/mds_lib
$ hdfs dfs -put -r /tmp/mds_lib /mds_lib

Create MDS formatted file

convert JSON data to MDS format.

$ bin/ create -i src/example/src/main/resources/sample_json.txt -f json -o /tmp/sample.mds
$ bin/ cat -i /tmp/sample.mds -o '-' # show whole data
$ bin/ cat -i /tmp/sample.mds -o '-' -p '[ ["name"] ]' # show part of data

Copy MDS file to HDFS environment

Copy MDS file to HDFS environment.

$ hdfs dfs -mkdir -p /tmp/ss
$ hdfs dfs -put /tmp/sample.mds /tmp/ss/sample.mds

Read MDS file using Hive

Enter Hive and add jar files to use MDS format.

$ hive -i jars/mds/add_jar.hql
> create database test;
> use test;
> create external table sample_json (
    summary struct<total_price: bigint, total_weight: bigint>,
    number bigint,
    price bigint,
    name string,
    class string
  location '/tmp/ss';
> select * from sample_json;
{"total_price":550,"total_weight":412}  5 110 apple fruits
{"total_price":800,"total_weight":600}  10  80  orange  fruits

See document Hive for further detail to use.

Where can I get more help, if I need it?

Support and discussion of MDS are on the Mailing list. Please refer the following subsection named "How to contribute".

We plan to support and discussion of MDS on the Mailing list. However, please contact us via GitHub until ML is opened.

How to contribute

We welcome to join this project widely.


See document MDS


This project is on the Apache License. Please treat this project under this license.

Mailing list

User support and discussion of MDS development are on the following Mailing list. Please send a blank e-mail to the following address.

Archive is useful for what was communicated at this project.

for Developer

Please accept Contributer licence agreement when participating as a developer.

We invite you to JIRA as a bug tracking, when you mentioned in the above Mailing list.

System requirement

Following environments are required.

How to get the source

MDS library constructs jar files on following modules.


MDS sources are there.


Install gpg and create a gpg key for maven plugin to use git clone.

gpg --gen-key
gpg --list-keys

Add following gpg setting to maven-local-repository-home/conf/settings.xml . Usually, maven-local-repository-home is $HOME/.m2 .



MDS sources can get from the Maven repository.




Compile sources

Compile each source following instructions.


$ cd /local/mds/home
$ git clone
$ cd multiple-dimension-spread
$ mvn clean install


$ cd /local/mds/home
$ git clone
$ cd dataplatform-schema-lib
$ mvn clean install


$ cd /local/mds/home
$ git clone
$ cd dataplatform-config
$ mvn clean install

Next Reading


Change Logs