Spark DateTime Library

A library for exposing dateTime functions from the joda time library as SQL functions. Also provide a dsl for dateTime catylst expressions; this utilizes the [scala wrapper library](https://github.com/nscala-time/nscala-time nscala).

Build Status

Requirements

This library requires Spark 1.4+

Linking

You can link against this library in your program at the following coordiates:

groupId: org.sparklinedata
artifactId: spark-datetime_2.10
version: 0.0.1

Using with Spark shell

This package can be added to Spark using the --jars command line option. For example, to include it when starting the spark shell:

$ bin/spark-shell --packages org.sparklinedata:spark-datetime_2.10:0.0.1

Features

Function naming convention

SQL API

Assume you have a table input with a string column called dt

select dt, dateTime(dt), dayOfWeek(dateTime(dt)), dayOfWeekName(dateTime(dt)), dayOfWeekName(dateTimeWithTZ(dt)) 
from input

Date Expressions using the DSL

A basic example

import com.github.nscala_time.time.Imports._
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.sparklinedata.spark.dateTime.dsl.expressions._
import org.sparklinedata.spark.dateTime.Functions

// register all functions 
Functions.register(sqlContext)

val dT = dateTime('dt)
val dOW = dateTime('dt) dayOfWeek
val dOWNm = dateTime('dt) dayOfWeekName
val dOWNm2 = dateTimeWithTZ('dt) dayOfWeekName
val dTFixed = dateTime("2015-05-22T08:52:41.903-07:00")

val t = sql(date"select dt, $dT, $dOW, $dOWNm, $dOWNm2, $dTFixed," +
      " dateTime(\"2015-05-22T08:52:41.903-07:00\") from input")

An example about periods

import com.github.nscala_time.time.Imports._
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.sparklinedata.spark.dateTime.dsl.expressions._

val dT = dateTime('dt)
val dT1 = dateTime('dt) + 3.months
val dT2 = dateTime('dt) - 3.months
val dT3 = dateTime('dt) + 12.week
val cE = dateTime('dt) + 3.months > (dateTime('dt) + 12.week)

val t = sql(date"select dt, $dT, $dT1, $dT2, $dT3, $cE from input")

Weekend filter example

val filter : Expression = ((dateTime('dt) dayOfWeekName) === "Saturday") ||
      ((dateTime('dt) dayOfWeekName) === "Sunday")

val t = sql(date"select dt from input where $filter")

Group By example

val dayOfWeek: Expression = dateTime('dt) dayOfWeekName

val t = sql(date"select $dayOfWeek, count(*) from input group by $dayOfWeek")

Interval example

val i1 = END_DATE - 15.day to END_DATE - 10.day

val isBefore = i1 isBeforeE dateTime('dt)
val isAfter = i1 isAfterE dateTime('dt)
val i2 = dateTime('dt) to (dateTime('dt) + 5.days)
val overlapsE = i1 overlapsE i2
val abutsE = i1 abutsE i2

val t = sql(date"select dt, $isBefore, $isAfter, $overlapsE, $abutsE from input")

Time Bucketing

Use this feature to bucket dates into given Periods. For e.g. 8.hours, 30.mins, 2.days etc.

The following example buckets rows into 3 day periods. The bucket function on a DateExpression takes an origin date and a Period specification. The Period is an iso8061 specification for period.

val start = dateTime("2015-06-23T17:27:43.769-07:00")
val dT = dateTime('dt)
val timeBucket = dateTime('dt) bucket(start, 3.days)

val t = sql(date"select dt, $dT, $timeBucket from input")

Or the direct sql for the above query is:

select dt, dateTime(`dt`), 
      timeBucket(dateTime(`dt`),dateTime("2015-05-24T17:27:43.769-07:00"),period("P3D")) 
from input

Building From Source

This library is built with SBT, which is automatically downloaded by the included shell script. To build a JAR file simply run build/sbt package from the project root. The build configuration includes support for both Scala 2.10 and 2.11.