fletcher

A library that provides a generic set of Pandas ExtensionDType/Array implementations backed by Apache Arrow. They support a wider range of types than Pandas natively supports and also bring a different set of constraints and behaviours that are beneficial in many situations.

Usage

To use fletcher in Pandas DataFrames, all you need to do is to wrap your data in a FletcherChunkedArray or FletcherContinuousArray object. Your data can be of either pyarrow.Array, pyarrow.ChunkedArray or a type that can be passed to pyarrow.array(…).

import fletcher as fr
import pandas as pd

df = pd.DataFrame({
    'str_chunked': fr.FletcherChunkedArray(['a', 'b', 'c']),
    'str_continuous': fr.FletcherContinuousArray(['a', 'b', 'c']),
})

df.info()

# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 3 entries, 0 to 2
# Data columns (total 2 columns):
#  #   Column          Non-Null Count  Dtype                      
# ---  ------          --------------  -----                      
#  0   str_chunked     3 non-null      fletcher_chunked[string]   
#  1   str_continuous  3 non-null      fletcher_continuous[string]
# dtypes: fletcher_chunked[string](1), fletcher_continuous[string](1)
# memory usage: 166.0 bytes

Development

While you can use fletcher in pip-based environments, we strongly recommend using a conda based development setup with packages from conda-forge.

# Create the conda environment with all necessary dependencies
conda create -y -q -n fletcher python=3.6 \
    pre-commit \
    asv \
    numba \
    pandas \
    pip \
    pyarrow \
    pytest \
    pytest-cov \
    six \
    -c conda-forge

# Activate the newly created environment
source activate fletcher

# Install fletcher into the current environment
pip install -e .

# Run the unit tests (you should do this several times during development)
py.test

# Install pre-commit hooks
# These will then be automatically run on every commit and ensure that files
# are black formatted, have no flake8 issues and mypy checks the type consistency.
pre-commit install

Code formatting is done using black. This should keep everything in a consistent styling and the formatting can be automatically adjusted using black .. Note that we have pinned the version of black to ensure that the formatting is reproducible.

Benchmarks

In benchmarks/ we provide a set of benchmarks to compare the performance of fletcher against pandas and ensure that fletcher itself stays performant. The benchmarks are written using airspeed velocity. When developing the benchmarks you can run them using asv dev (use -b <pattern> to only run a selection of them) only once. To get real benchmark values, you should use asv run --python=same to run the benchmarks multiple times and get meaningful average runtimes.