Java Code Examples for org.apache.arrow.vector.types.Types#getMinorTypeForArrowType()

The following examples show how to use org.apache.arrow.vector.types.Types#getMinorTypeForArrowType() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: SchemaUtils.java    From aws-athena-query-federation with Apache License 2.0 6 votes vote down vote up
/**
 * Used to merge LIST Field into a single Field. If called with two identical LISTs the output is essentially
 * the same as either of the inputs.
 *
 * @param fieldName The name of the merged Field.
 * @param curParentField The current field to use as the base for the merge.
 * @param newParentField The new field to merge into the base.
 * @return The merged field.
 */
private static Field mergeListField(String fieldName, Field curParentField, Field newParentField)
{
    //Apache Arrow lists have a special child that holds the concrete type of the list.
    Types.MinorType newInnerType = Types.getMinorTypeForArrowType(curParentField.getChildren().get(0).getType());
    Types.MinorType curInnerType = Types.getMinorTypeForArrowType(newParentField.getChildren().get(0).getType());
    if (newInnerType == Types.MinorType.LIST && curInnerType == Types.MinorType.LIST) {
        return FieldBuilder.newBuilder(fieldName, Types.MinorType.LIST.getType())
                .addField(mergeStructField("", curParentField.getChildren().get(0), newParentField.getChildren().get(0))).build();
    }
    else if (curInnerType != newInnerType) {
        //TODO: currently we resolve fields with mixed types by defaulting to VARCHAR. This is _not_ ideal
        logger.warn("mergeListField: Encountered a mixed-type list field[{}] {} vs {}, defaulting to String.",
                fieldName, curInnerType, newInnerType);
        return FieldBuilder.newBuilder(fieldName, Types.MinorType.LIST.getType()).addStringField("").build();
    }

    return curParentField;
}
 
Example 2
Source File: HbaseRecordHandler.java    From aws-athena-query-federation with Apache License 2.0 6 votes vote down vote up
/**
 * Addes the specified Apache Arrow field to the Scan to satisfy the requested projection.
 *
 * @param scan The scan object that will be used to read data from HBase.
 * @param field The field to be added to the scan.
 */
private void addToProjection(Scan scan, Field field)
{
    //ignore the special 'row' column since we get that by default.
    if (HbaseSchemaUtils.ROW_COLUMN_NAME.equalsIgnoreCase(field.getName())) {
        return;
    }

    Types.MinorType columnType = Types.getMinorTypeForArrowType(field.getType());
    switch (columnType) {
        case STRUCT:
            for (Field child : field.getChildren()) {
                scan.addColumn(field.getName().getBytes(UTF_8), child.getName().getBytes(UTF_8));
            }
            return;
        default:
            String[] nameParts = HbaseSchemaUtils.extractColumnParts(field.getName());
            if (nameParts.length != 2) {
                throw new RuntimeException("Column name " + field.getName() + " does not meet family:column hbase convention.");
            }
            scan.addColumn(nameParts[0].getBytes(UTF_8), nameParts[1].getBytes(UTF_8));
    }
}
 
Example 3
Source File: DDBRecordMetadata.java    From aws-athena-query-federation with Apache License 2.0 5 votes vote down vote up
/**
 * determines whether the schema contains any type that can be coercible
 * @param schema Schema to extract out the info from
 * @return boolean indicating existence of coercible type in schema
 */
private boolean isContainsCoercibleType(Schema schema)
{
    if (schema != null && schema.getFields() != null) {
        for (Field field : schema.getFields()) {
            Types.MinorType fieldType = Types.getMinorTypeForArrowType(field.getType());
            if (isDateTimeFieldType(fieldType) || !fieldType.equals(Types.MinorType.DECIMAL)) {
                return true;
            }
        }
    }
    return false;
}
 
Example 4
Source File: ValueConverter.java    From aws-athena-query-federation with Apache License 2.0 5 votes vote down vote up
/**
 * Allows for coercing types in the event that schema has evolved or there were other data issues.
 * @param field The Apache Arrow field that the value belongs to.
 * @param origVal The original value from Redis (before any conversion or coercion).
 * @return The coerced value.
 */
public static Object convert(Field field, String origVal)
{
    if (origVal == null) {
        return origVal;
    }

    ArrowType arrowType = field.getType();
    Types.MinorType minorType = Types.getMinorTypeForArrowType(arrowType);

    switch (minorType) {
        case VARCHAR:
            return origVal;
        case INT:
        case SMALLINT:
        case TINYINT:
            return Integer.valueOf(origVal);
        case BIGINT:
            return Long.valueOf(origVal);
        case FLOAT8:
            return Double.valueOf(origVal);
        case FLOAT4:
            return Float.valueOf(origVal);
        case BIT:
            return Boolean.valueOf(origVal);
        case VARBINARY:
            try {
                return origVal.getBytes("UTF-8");
            }
            catch (UnsupportedEncodingException ex) {
                throw new RuntimeException(ex);
            }
        default:
            throw new RuntimeException("Unsupported type conversation " + minorType + " field: " + field.getName());
    }
}
 
Example 5
Source File: ElasticsearchFieldResolver.java    From aws-athena-query-federation with Apache License 2.0 5 votes vote down vote up
/**
 * Allows for coercion of a list of values where the returned types do not match the schema.
 * Multiple fields in Elasticsearch can be returned as a string, numeric (Integer, Long, Double), or null.
 * @param field is the field that we are coercing the value into.
 * @param fieldValue is the list of value to coerce
 * @return the coerced list of value.
 * @throws RuntimeException if the fieldType is not a LIST or the fieldValue is instanceof Map (STRUCT).
 */
protected Object coerceListField(Field field, Object fieldValue)
        throws RuntimeException
{
    Types.MinorType fieldType = Types.getMinorTypeForArrowType(field.getType());

    switch (fieldType) {
        case LIST:
            Field childField = field.getChildren().get(0);
            if (fieldValue instanceof List) {
                // Both fieldType and fieldValue are lists => Return as a new list of values, applying coercion
                // where necessary in order to match the type of the field being mapped into.
                List<Object> coercedValues = new ArrayList<>();
                ((List) fieldValue).forEach(value ->
                        coercedValues.add(coerceField(childField, value)));
                return coercedValues;
            }
            else if (!(fieldValue instanceof Map)) {
                // This is an abnormal case where the fieldType was defined as a list in the schema,
                // however, the fieldValue returns as a single value => Return as a list of a single value
                // applying coercion where necessary in order to match the type of the field being mapped into.
                return Collections.singletonList(coerceField(childField, fieldValue));
            }
            break;
        default:
            break;
    }

    throw new RuntimeException("Invalid field value encountered in Document for field: " + field.toString() +
            ",value: " + fieldValue.toString());
}
 
Example 6
Source File: ElasticsearchTypeUtils.java    From aws-athena-query-federation with Apache License 2.0 5 votes vote down vote up
/**
 * Create the appropriate field extractor used for extracting field values from a Document based on the field type.
 * @param field is used to determine which extractor to generate based on the field type.
 * @return a field extractor.
 */
protected Extractor makeExtractor(Field field)
{
    Types.MinorType fieldType = Types.getMinorTypeForArrowType(field.getType());

    switch (fieldType) {
        case VARCHAR:
            return makeVarCharExtractor(field);
        case BIGINT:
            return makeBigIntExtractor(field);
        case INT:
            return makeIntExtractor(field);
        case SMALLINT:
            return makeSmallIntExtractor(field);
        case TINYINT:
            return makeTinyIntExtractor(field);
        case FLOAT8:
            return makeFloat8Extractor(field);
        case FLOAT4:
            return makeFloat4Extractor(field);
        case DATEMILLI:
            return makeDateMilliExtractor(field);
        case BIT:
            return makeBitExtractor(field);
        default:
            return null;
    }
}
 
Example 7
Source File: DocDBFieldResolver.java    From aws-athena-query-federation with Apache License 2.0 5 votes vote down vote up
@Override
public Object getFieldValue(Field field, Object value)
{
    Types.MinorType minorType = Types.getMinorTypeForArrowType(field.getType());
    if (minorType == Types.MinorType.LIST) {
        return TypeUtils.coerce(field, ((Document) value).get(field.getName()));
    }
    else if (value instanceof Document) {
        Object rawVal = ((Document) value).get(field.getName());
        return TypeUtils.coerce(field, rawVal);
    }
    throw new RuntimeException("Expected LIST or Document type but found " + minorType);
}
 
Example 8
Source File: SchemaUtils.java    From aws-athena-query-federation with Apache License 2.0 5 votes vote down vote up
/**
 * Used to merge STRUCT Field into a single Field. If called with two identical STRUCTs the output is essentially
 * the same as either of the inputs.
 *
 * @param fieldName The name of the merged Field.
 * @param curParentField The current field to use as the base for the merge.
 * @param newParentField The new field to merge into the base.
 * @return The merged field.
 */
private static Field mergeStructField(String fieldName, Field curParentField, Field newParentField)
{
    FieldBuilder union = FieldBuilder.newBuilder(fieldName, Types.MinorType.STRUCT.getType());
    for (Field nextCur : curParentField.getChildren()) {
        union.addField(nextCur);
    }

    for (Field nextNew : newParentField.getChildren()) {
        Field curField = union.getChild(nextNew.getName());
        if (curField == null) {
            union.addField(nextNew);
            continue;
        }

        Types.MinorType newType = Types.getMinorTypeForArrowType(nextNew.getType());
        Types.MinorType curType = Types.getMinorTypeForArrowType(curField.getType());

        if (curType != newType) {
            //TODO: currently we resolve fields with mixed types by defaulting to VARCHAR. This is _not_ ideal
            //for various reasons but also because it will cause predicate odities if used in a filter.
            logger.warn("mergeStructField: Encountered a mixed-type field[{}] {} vs {}, defaulting to String.",
                    nextNew.getName(), newType, curType);

            union.addStringField(nextNew.getName());
        }
        else if (curType == Types.MinorType.LIST) {
            union.addField(mergeListField(nextNew.getName(), curField, nextNew));
        }
        else if (curType == Types.MinorType.STRUCT) {
            union.addField(mergeStructField(nextNew.getName(), curField, nextNew));
        }
    }

    return union.build();
}
 
Example 9
Source File: FieldResolver.java    From aws-athena-query-federation with Apache License 2.0 5 votes vote down vote up
public Object getFieldValue(Field field, Object value)
{
    Types.MinorType minorType = Types.getMinorTypeForArrowType(field.getType());
    if (value instanceof Map) {
        return ((Map<String, Object>) value).get(field.getName());
    }
    else if (minorType == Types.MinorType.LIST) {
        return ((List) value).iterator();
    }
    throw new RuntimeException("Expected LIST type but found " + minorType);
}
 
Example 10
Source File: DDBRecordMetadata.java    From aws-athena-query-federation with Apache License 2.0 5 votes vote down vote up
private Set<String> getNonComparableColumns(Schema schema)
{
    Set<String> nonComparableColumns = new HashSet<>();
    if (schema != null && schema.getFields() != null) {
        for (Field field : schema.getFields()) {
            Types.MinorType fieldType = Types.getMinorTypeForArrowType(field.getType());
            if (DefaultGlueType.getNonComparableSet().contains(fieldType.name())) {
                nonComparableColumns.add(field.getName());
            }
        }
    }
    return nonComparableColumns;
}
 
Example 11
Source File: EquatableValueSet.java    From aws-athena-query-federation with Apache License 2.0 5 votes vote down vote up
@Override
public boolean equals(Object obj)
{
    if (this == obj) {
        return true;
    }
    if (obj == null || getClass() != obj.getClass()) {
        return false;
    }
    final EquatableValueSet other = (EquatableValueSet) obj;

    if (this.getType() != null && other.getType() != null && Types.getMinorTypeForArrowType(this.getType()) == Types.getMinorTypeForArrowType(other.getType())) {
        //some arrow types require checking the minor type only, like Decimal.
        //We ignore any params though we may want to reconsider that in the future
    }
    else if (this.getType() != other.getType()) {
        return false;
    }

    if (this.whiteList != other.whiteList) {
        return false;
    }

    if (this.nullAllowed != other.nullAllowed) {
        return false;
    }

    if (this.valueBlock == null && other.valueBlock != null) {
        return false;
    }

    if (this.valueBlock != null && !this.valueBlock.equalsAsSet(other.valueBlock)) {
        return false;
    }

    return true;
}
 
Example 12
Source File: UserDefinedFunctionHandler.java    From aws-athena-query-federation with Apache License 2.0 5 votes vote down vote up
private Class[] extractJavaTypes(Schema schema)
{
    Class[] types = new Class[schema.getFields().size()];

    List<Field> fields = schema.getFields();
    for (int i = 0; i < fields.size(); ++i) {
        Types.MinorType minorType = Types.getMinorTypeForArrowType(fields.get(i).getType());
        types[i] = BlockUtils.getJavaType(minorType);
    }

    return types;
}
 
Example 13
Source File: ListArrowValueProjector.java    From aws-athena-query-federation with Apache License 2.0 5 votes vote down vote up
public ListArrowValueProjector(FieldReader listReader)
{
    this.listReader = requireNonNull(listReader, "listReader is null");

    List<Field> children = listReader.getField().getChildren();
    if (children.size() != 1) {
        throw new RuntimeException("Unexpected number of children for ListProjector field "
                + listReader.getField().getName());
    }
    Types.MinorType minorType = Types.getMinorTypeForArrowType(children.get(0).getType());
    projection = createValueProjection(minorType);
}
 
Example 14
Source File: HbaseSchemaUtils.java    From aws-athena-query-federation with Apache License 2.0 5 votes vote down vote up
/**
 * Helper that can coerce the given HBase value to the requested Apache Arrow type.
 *
 * @param isNative If True, the HBase value is stored using native bytes. If False, the value is serialized as a String.
 * @param type The Apache Arrow Type that the value should be coerced to before returning.
 * @param value The HBase value to coerce.
 * @return The coerced value which is now allowed with the provided Apache Arrow type.
 */
public static Object coerceType(boolean isNative, ArrowType type, byte[] value)
{
    if (value == null) {
        return null;
    }

    Types.MinorType minorType = Types.getMinorTypeForArrowType(type);
    switch (minorType) {
        case VARCHAR:
            return Bytes.toString(value);
        case INT:
            return isNative ? ByteBuffer.wrap(value).getInt() : Integer.parseInt(Bytes.toString(value));
        case BIGINT:
            return isNative ? ByteBuffer.wrap(value).getLong() : Long.parseLong(Bytes.toString(value));
        case FLOAT4:
            return isNative ? ByteBuffer.wrap(value).getFloat() : Float.parseFloat(Bytes.toString(value));
        case FLOAT8:
            return isNative ? ByteBuffer.wrap(value).getDouble() : Double.parseDouble(Bytes.toString(value));
        case BIT:
            if (isNative) {
                return (value[0] != 0);
            }
            else {
                return Boolean.parseBoolean(Bytes.toString(value));
            }
        case VARBINARY:
            return value;
        default:
            throw new IllegalArgumentException(type + " with minorType[" + minorType + "] is not supported.");
    }
}
 
Example 15
Source File: SchemaUtils.java    From aws-athena-query-federation with Apache License 2.0 4 votes vote down vote up
/**
 * This method will produce an Apache Arrow Schema for the given TableName and DocumentDB connection
 * by scanning up to the requested number of rows and using basic schema inference to determine
 * data types.
 *
 * @param client The DocumentDB connection to use for the scan operation.
 * @param table The DocumentDB TableName for which to produce an Apache Arrow Schema.
 * @param numObjToSample The number of records to scan as part of producing the Schema.
 * @return An Apache Arrow Schema representing the schema of the HBase table.
 * @note The resulting schema is a union of the schema of every row that is scanned. Presently the code does not
 * attempt to resolve conflicts if unique field has different types across documents. It is recommend that you
 * use AWS Glue to define a schema for tables which may have such conflicts. In the future we may enhance this method
 * to use a reasonable default (like String) and coerce heterogeneous fields to avoid query failure but forcing
 * explicit handling by defining Schema in AWS Glue is likely a better approach.
 */
public static Schema inferSchema(MongoClient client, TableName table, int numObjToSample)
{
    MongoDatabase db = client.getDatabase(table.getSchemaName());
    int docCount = 0;
    int fieldCount = 0;
    try (MongoCursor<Document> docs = db.getCollection(table.getTableName()).find().batchSize(numObjToSample)
            .maxScan(numObjToSample).limit(numObjToSample).iterator()) {
        if (!docs.hasNext()) {
            return SchemaBuilder.newBuilder().build();
        }
        SchemaBuilder schemaBuilder = SchemaBuilder.newBuilder();

        while (docs.hasNext()) {
            docCount++;
            Document doc = docs.next();
            for (String key : doc.keySet()) {
                fieldCount++;
                Field newField = getArrowField(key, doc.get(key));
                Types.MinorType newType = Types.getMinorTypeForArrowType(newField.getType());
                Field curField = schemaBuilder.getField(key);
                Types.MinorType curType = (curField != null) ? Types.getMinorTypeForArrowType(curField.getType()) : null;

                if (curField == null) {
                    schemaBuilder.addField(newField);
                }
                else if (newType != curType) {
                    //TODO: currently we resolve fields with mixed types by defaulting to VARCHAR. This is _not_ ideal
                    logger.warn("inferSchema: Encountered a mixed-type field[{}] {} vs {}, defaulting to String.",
                            key, curType, newType);
                    schemaBuilder.addStringField(key);
                }
                else if (curType == Types.MinorType.LIST) {
                    schemaBuilder.addField(mergeListField(key, curField, newField));
                }
                else if (curType == Types.MinorType.STRUCT) {
                    schemaBuilder.addField(mergeStructField(key, curField, newField));
                }
            }
        }

        Schema schema = schemaBuilder.build();
        if (schema.getFields().isEmpty()) {
            throw new RuntimeException("No columns found after scanning " + fieldCount + " values across " +
                    docCount + " documents. Please ensure the collection is not empty and contains at least 1 supported column type.");
        }
        return schema;
    }
    finally {
        logger.info("inferSchema: Evaluated {} field values across {} documents.", fieldCount, docCount);
    }
}
 
Example 16
Source File: BlockUtils.java    From aws-athena-query-federation with Apache License 2.0 4 votes vote down vote up
/**
 * Used to write a Struct value.
 *
 * @param allocator The BlockAllocator which can be used to generate Apache Arrow Buffers for types
 * which require conversion to an Arrow Buffer before they can be written using the FieldWriter.
 * @param writer The FieldWriter for the Struct field we'd like to write into.
 * @param field The Schema details of the Struct Field we are writing into.
 * @param pos The position (row) in the Apache Arrow batch we are writing to.
 * @param value The value we'd like to write as a struct.
 * @param resolver The field resolver that can be used to extract individual Struct fields from the value.
 */
@VisibleForTesting
protected static void writeStruct(BufferAllocator allocator,
        StructWriter writer,
        Field field,
        int pos,
        Object value,
        FieldResolver resolver)
{
    //We expect null writes to have been handled earlier so this is a no-op.
    if (value == null) {
        return;
    }

    //Indicate the beginning of the struct value, this is how Apache Arrow handles the variable length of Struct types.
    writer.start();
    for (Field nextChild : field.getChildren()) {
        //For each child field that comprises the struct, attempt to extract and write the corresponding value
        //using the FieldResolver.
        Object childValue = resolver.getFieldValue(nextChild, value);
        switch (Types.getMinorTypeForArrowType(nextChild.getType())) {
            case LIST:
                writeList(allocator,
                        (FieldWriter) writer.list(nextChild.getName()),
                        nextChild,
                        pos,
                        ((List) childValue),
                        resolver);
                break;
            case STRUCT:
                writeStruct(allocator,
                        writer.struct(nextChild.getName()),
                        nextChild,
                        pos,
                        childValue,
                        resolver);
                break;
            default:
                writeStructValue(writer, nextChild, allocator, childValue);
                break;
        }
    }
    writer.end();
}
 
Example 17
Source File: BlockUtils.java    From aws-athena-query-federation with Apache License 2.0 4 votes vote down vote up
/**
 * Used to write a List value.
 *
 * @param allocator The BlockAllocator which can be used to generate Apache Arrow Buffers for types
 * which require conversion to an Arrow Buffer before they can be written using the FieldWriter.
 * @param writer The FieldWriter for the List field we'd like to write into.
 * @param field The Schema details of the List Field we are writing into.
 * @param pos The position (row) in the Apache Arrow batch we are writing to.
 * @param value An iterator to the collection of values we want to write into the row.
 * @param resolver The field resolver that can be used to extract individual values from the value iterator.
 */
@VisibleForTesting
protected static void writeList(BufferAllocator allocator,
        FieldWriter writer,
        Field field,
        int pos,
        Iterable value,
        FieldResolver resolver)
{
    if (value == null) {
        return;
    }

    //Apache Arrow List types have a single 'special' child field which gives us the concrete type of the values
    //stored in the list.
    Field child = null;
    if (field.getChildren() != null && !field.getChildren().isEmpty()) {
        child = field.getChildren().get(0);
    }

    //Mark the beginning of the list, this is essentially how Apache Arrow handles the variable length nature
    //of lists.
    writer.startList();

    Iterator itr = value.iterator();
    while (itr.hasNext()) {
        //For each item in the iterator, attempt to write it to the list.
        Object val = itr.next();
        if (val != null) {
            switch (Types.getMinorTypeForArrowType(child.getType())) {
                case LIST:
                    try {
                        writeList(allocator, (FieldWriter) writer.list(), child, pos, ((List) val), resolver);
                    }
                    catch (Exception ex) {
                        throw ex;
                    }
                    break;
                case STRUCT:
                    writeStruct(allocator, writer.struct(), child, pos, val, resolver);
                    break;
                default:
                    writeListValue(writer, child.getType(), allocator, val);
                    break;
            }
        }
    }
    writer.endList();
}
 
Example 18
Source File: DocumentGenerator.java    From aws-athena-query-federation with Apache License 2.0 4 votes vote down vote up
/**
 * This should be replaced with something that actually reads useful data.
 */
public static Document makeRandomRow(List<Field> fields, int seed)
{
    Document result = new Document();

    for (Field next : fields) {
        boolean negative = seed % 2 == 1;
        Types.MinorType minorType = Types.getMinorTypeForArrowType(next.getType());
        switch (minorType) {
            case INT:
                int iVal = seed * (negative ? -1 : 1);
                result.put(next.getName(), iVal);
                break;
            case TINYINT:
            case SMALLINT:
                int stVal = (seed % 4) * (negative ? -1 : 1);
                result.put(next.getName(), stVal);
                break;
            case UINT1:
            case UINT2:
            case UINT4:
            case UINT8:
                int uiVal = seed % 4;
                result.put(next.getName(), uiVal);
                break;
            case FLOAT4:
                float fVal = seed * 1.1f * (negative ? -1 : 1);
                result.put(next.getName(), fVal);
                break;
            case FLOAT8:
            case DECIMAL:
                double d8Val = seed * 1.1D * (negative ? -1 : 1);
                result.put(next.getName(), d8Val);
                break;
            case BIT:
                boolean bVal = seed % 2 == 0;
                result.put(next.getName(), bVal);
                break;
            case BIGINT:
                long lVal = seed * 1L * (negative ? -1 : 1);
                result.put(next.getName(), lVal);
                break;
            case VARCHAR:
                String vVal = "VarChar" + seed;
                result.put(next.getName(), vVal);
                break;
            case VARBINARY:
                byte[] binaryVal = ("VarChar" + seed).getBytes();
                result.put(next.getName(), binaryVal);
                break;
            case STRUCT:
                result.put(next.getName(), makeRandomRow(next.getChildren(), seed));
                break;
            case LIST:
                //TODO: pretty dirty way of generating lists should refactor this to support better generation
                Types.MinorType listType = Types.getMinorTypeForArrowType(next.getChildren().get(0).getType());
                switch (listType) {
                    case VARCHAR:
                        List<String> listVarChar = new ArrayList<>();
                        listVarChar.add("VarChar" + seed);
                        listVarChar.add("VarChar" + seed + 1);
                        result.put(next.getName(), listVarChar);
                        break;
                    case INT:
                        List<Integer> listIVal = new ArrayList<>();
                        listIVal.add(seed * (negative ? -1 : 1));
                        listIVal.add(seed * (negative ? -1 : 1) + 1);
                        result.put(next.getName(), listIVal);
                        break;
                    default:
                        throw new RuntimeException(minorType + " is not supported in list");
                }
                break;
            default:
                throw new RuntimeException(minorType + " is not supported");
        }
    }

    return result;
}
 
Example 19
Source File: GeneratedRowWriter.java    From aws-athena-query-federation with Apache License 2.0 4 votes vote down vote up
private FieldWriter makeFieldWriter(FieldVector vector)
{
    Field field = vector.getField();
    String fieldName = field.getName();
    Types.MinorType fieldType = Types.getMinorTypeForArrowType(field.getType());
    Extractor extractor = extractors.get(fieldName);
    ConstraintProjector constraint = constraints.get(fieldName);
    FieldWriterFactory factory = fieldWriterFactories.get(fieldName);

    if (factory != null) {
        return factory.create(vector, extractor, constraint);
    }

    if (extractor == null) {
        throw new IllegalStateException("Missing extractor for field[" + fieldName + "]");
    }

    switch (fieldType) {
        case INT:
            return new IntFieldWriter((IntExtractor) extractor, (IntVector) vector, constraint);
        case BIGINT:
            return new BigIntFieldWriter((BigIntExtractor) extractor, (BigIntVector) vector, constraint);
        case DATEMILLI:
            return new DateMilliFieldWriter((DateMilliExtractor) extractor, (DateMilliVector) vector, constraint);
        case DATEDAY:
            return new DateDayFieldWriter((DateDayExtractor) extractor, (DateDayVector) vector, constraint);
        case TINYINT:
            return new TinyIntFieldWriter((TinyIntExtractor) extractor, (TinyIntVector) vector, constraint);
        case SMALLINT:
            return new SmallIntFieldWriter((SmallIntExtractor) extractor, (SmallIntVector) vector, constraint);
        case FLOAT4:
            return new Float4FieldWriter((Float4Extractor) extractor, (Float4Vector) vector, constraint);
        case FLOAT8:
            return new Float8FieldWriter((Float8Extractor) extractor, (Float8Vector) vector, constraint);
        case DECIMAL:
            return new DecimalFieldWriter((DecimalExtractor) extractor, (DecimalVector) vector, constraint);
        case BIT:
            return new BitFieldWriter((BitExtractor) extractor, (BitVector) vector, constraint);
        case VARCHAR:
            return new VarCharFieldWriter((VarCharExtractor) extractor, (VarCharVector) vector, constraint);
        case VARBINARY:
            return new VarBinaryFieldWriter((VarBinaryExtractor) extractor, (VarBinaryVector) vector, constraint);
        default:
            throw new RuntimeException(fieldType + " is not supported");
    }
}
 
Example 20
Source File: ElasticsearchFieldResolver.java    From aws-athena-query-federation with Apache License 2.0 4 votes vote down vote up
/**
 * Return the field value from a complex structure or list.
 * @param field is the field that we would like to extract from the provided value.
 * @param originalValue is the original value object.
 * @return the field's value as a List for a LIST field type, a Map for a STRUCT field type, or the actual
 * value if neither of the above.
 * @throws IllegalArgumentException if originalValue is not an instance of Map.
 * @throws RuntimeException if the fieldName does not exist in originalValue, if the fieldType is a STRUCT and
 * the fieldValue is not instance of Map, or if the fieldType is neither a LIST or a STRUCT but the fieldValue
 * is instance of Map (STRUCT).
 */
@Override
public Object getFieldValue(Field field, Object originalValue)
        throws RuntimeException
{
    Types.MinorType fieldType = Types.getMinorTypeForArrowType(field.getType());
    String fieldName = field.getName();
    Object fieldValue;

    if (originalValue instanceof Map) {
        if (((Map) originalValue).containsKey(fieldName)) {
            fieldValue = ((Map) originalValue).get(fieldName);
        }
        else {
            throw new RuntimeException("Field not found in Document: " + fieldName);
        }
    }
    else {
        throw new IllegalArgumentException("Invalid argument type. Expecting a Map, but got: " +
                originalValue.getClass().getTypeName());
    }

    switch (fieldType) {
        case LIST:
            return coerceListField(field, fieldValue);
        case STRUCT:
            if (fieldValue instanceof Map) {
                // Both fieldType and fieldValue are nested structures => return as map.
                return fieldValue;
            }
            break;
        default:
            if (!(fieldValue instanceof Map)) {
                return coerceField(field, fieldValue);
            }
            break;
    }

    throw new RuntimeException("Invalid field value encountered in Document for field: " + field +
            ",value: " + fieldValue);
}