Python chainer.datasets() Examples

The following are 4 code examples of chainer.datasets(). You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may also want to check out all available functions/classes of the module chainer , or try the search function

Example #1

Source File: cifar.py From chainer with MIT License

5 votes

def get_cifar10(withlabel=True, ndim=3, scale=1., dtype=None):
    """Gets the CIFAR-10 dataset.

    `CIFAR-10 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ is a set of small
    natural images. Each example is an RGB color image of size 32x32,
    classified into 10 groups. In the original images, each component of pixels
    is represented by one-byte unsigned integer. This function scales the
    components to floating point values in the interval ``[0, scale]``.

    This function returns the training set and the test set of the official
    CIFAR-10 dataset. If ``withlabel`` is ``True``, each dataset consists of
    tuples of images and labels, otherwise it only consists of images.

    Args:
        withlabel (bool): If ``True``, it returns datasets with labels. In this
            case, each example is a tuple of an image and a label. Otherwise,
            the datasets only contain images.
        ndim (int): Number of dimensions of each image. The shape of each image
            is determined depending on ndim as follows:

            - ``ndim == 1``: the shape is ``(3072,)``
            - ``ndim == 3``: the shape is ``(3, 32, 32)``

        scale (float): Pixel value scale. If it is 1 (default), pixels are
            scaled to the interval ``[0, 1]``.
        dtype: Data type of resulting image arrays. ``chainer.config.dtype`` is
            used by default (see :ref:`configuration`).

    Returns:
        A tuple of two datasets. If ``withlabel`` is ``True``, both datasets
        are :class:`~chainer.datasets.TupleDataset` instances. Otherwise, both
        datasets are arrays of images.

    """
    return _get_cifar('cifar-10', withlabel, ndim, scale, dtype)

Example #2

Source File: cifar.py From chainer with MIT License

5 votes

def get_cifar100(withlabel=True, ndim=3, scale=1., dtype=None):
    """Gets the CIFAR-100 dataset.

    `CIFAR-100 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ is a set of
    small natural images. Each example is an RGB color image of size 32x32,
    classified into 100 groups. In the original images, each component
    pixels is represented by one-byte unsigned integer. This function scales
    the components to floating point values in the interval ``[0, scale]``.

    This function returns the training set and the test set of the official
    CIFAR-100 dataset. If ``withlabel`` is ``True``, each dataset consists of
    tuples of images and labels, otherwise it only consists of images.

    Args:
        withlabel (bool): If ``True``, it returns datasets with labels. In this
            case, each example is a tuple of an image and a label. Otherwise,
            the datasets only contain images.
        ndim (int): Number of dimensions of each image. The shape of each image
            is determined depending on ndim as follows:

            - ``ndim == 1``: the shape is ``(3072,)``
            - ``ndim == 3``: the shape is ``(3, 32, 32)``

        scale (float): Pixel value scale. If it is 1 (default), pixels are
            scaled to the interval ``[0, 1]``.
        dtype: Data type of resulting image arrays. ``chainer.config.dtype`` is
            used by default (see :ref:`configuration`).

    Returns:
        A tuple of two datasets. If ``withlabel`` is ``True``, both
        are :class:`~chainer.datasets.TupleDataset` instances. Otherwise, both
        datasets are arrays of images.

    """
    return _get_cifar('cifar-100', withlabel, ndim, scale, dtype)

Example #3

Source File: scatter.py From chainer with MIT License

4 votes

def scatter_index(n_total_samples, comm, root=0, *, force_equal_length=True):
    '''Scatters only index to avoid heavy dataset broadcast

    This is core functionality of ``scatter_dataset``, which is
    almost equal to following code snippet::

        (b, e) = scatter_index(len(dataset), comm)
        order = None
        if shuffle:
            order = numpy.random.RandomState(seed).permutation(
                n_total_samples)
            order = comm.bcast_obj(order)
        dataset = SubDataset(dataset, b, e, order)

    Note::
        Make sure ``force_equal_length`` flag is *not* off for
        multinode evaluator or multinode updaters, which assume that
        the iterator has the same lengths among processes to work
        correctly.

    Args:
        n_total_samples (int): number of total samples to scatter
        comm: ChainerMN communicator object
        root (int): root rank to coordinate the operation
        force_equal_length (bool):
            Force the scattered fragments of the index have equal
            length. If ``True``, number of scattered indices is
            guaranteed to be equal among processes and scattered
            datasets may have duplication among processes. Otherwise,
            number of scattered indices may not be equal among
            processes, but scattered indices are guaranteed to have
            no duplication among processes, intended for strict
            evaluation of test dataset to avoid duplicated examples.

    Returns:
        Tuple of two integers, that stands for beginning and ending
        offsets of the assigned sub part of samples. The ending offset
        is not border inclusive.

    '''
    if comm.rank == root:
        for (i, b, e) in _scatter_index(n_total_samples, comm.size,
                                        force_equal_length):
            if i == root:
                mine = (b, e)
            else:
                comm.send_obj((b, e), dest=i)
        return mine
    else:
        return comm.recv_obj(source=root)

Example #4

Source File: svhn.py From chainer with MIT License

4 votes

def get_svhn(withlabel=True, scale=1., dtype=None, label_dtype=numpy.int32,
             add_extra=False):
    """Gets the SVHN dataset.

    `The Street View House Numbers (SVHN) dataset
    <http://ufldl.stanford.edu/housenumbers/>`_
    is a dataset similar to MNIST but composed of cropped images of house
    numbers.
    The functionality of this function is identical to the counterpart for the
    MNIST dataset (:func:`~chainer.datasets.get_mnist`),
    with the exception that there is no ``ndim`` argument.

    .. note::
       `SciPy <https://www.scipy.org/>`_ is required to use this feature.

    Args:
        withlabel (bool): If ``True``, it returns datasets with labels. In this
            case, each example is a tuple of an image and a label. Otherwise,
            the datasets only contain images.
        scale (float): Pixel value scale. If it is 1 (default), pixels are
            scaled to the interval ``[0, 1]``.
        dtype: Data type of resulting image arrays. ``chainer.config.dtype`` is
            used by default (see :ref:`configuration`).
        label_dtype: Data type of the labels.
        add_extra: Use extra training set.

    Returns:
        If ``add_extra`` is ``False``, a tuple of two datasets (train and
        test). Otherwise, a tuple of three datasets (train, test, and extra).
        If ``withlabel`` is ``True``, all datasets are
        :class:`~chainer.datasets.TupleDataset` instances. Otherwise, both
        datasets are arrays of images.

    """
    if not _scipy_available:
        raise RuntimeError('SciPy is not available: %s' % _error)

    train_raw = _retrieve_svhn_training()
    dtype = chainer.get_dtype(dtype)

    train = _preprocess_svhn(train_raw, withlabel, scale, dtype,
                             label_dtype)
    test_raw = _retrieve_svhn_test()
    test = _preprocess_svhn(test_raw, withlabel, scale, dtype,
                            label_dtype)
    if add_extra:
        extra_raw = _retrieve_svhn_extra()
        extra = _preprocess_svhn(extra_raw, withlabel, scale, dtype,
                                 label_dtype)
        return train, test, extra
    else:
        return train, test