- This connector is still being actively developed. We don't recommend this on production.
dremio-flight-connector-{VERSION}-shaded.jar
file in the target folder and put it in the \dremio\jars folder in Dremio-Ddremio.flight.enabled=true
MUST be set to enable flight-Ddremio.flight.parallel.enabled=true
MUST be set on all executors to enable parallel flightThe parallel flight stream is now working in Dremio. However this requires a patched dremio-oss to work correctly. This allows executors to stream results directly to the python/spark connector in parallel. See: Spark Connector. This results in an approximate linear performance increase over serial Flight (with properly configured parallelization)
dremio.conf
. This plugin currently uses the same certificates as the webserver.-Ddremio.flight.use-ssl=true
-Ddremio.flight.port=47470
to change the port. Defaults to 47470
-Ddremio.flight.host=localhost
to change the host/listening interface. Particularly useful if you are
accessing remotely or generating ssl certs The Flight endpoint is exposed on port 47470
. The most recent release of pyarrow (0.15.0
) has the flight client
built in. To access Dremio via Flight first install pyarrow (conda install pyarrow -c conda-forge
or pip install pyarrow
). Then
use the dremio-client to access flight. Alternatively use:
from pyarrow import flight
import pyarrow as pa
class HttpDremioClientAuthHandler(flight.ClientAuthHandler):
def __init__(self, username, password):
super(flight.ClientAuthHandler, self).__init__()
self.basic_auth = flight.BasicAuth(username, password)
self.token = None
def authenticate(self, outgoing, incoming):
auth = self.basic_auth.serialize()
outgoing.write(auth)
self.token = incoming.read()
def get_token(self):
return self.token
username = '<USERNAME>'
password = '<PASSWORD>'
sql = '''<SQL_QUERY>'''
client = flight.FlightClient('grpc+tcp://<DREMIO_COORDINATOR_HOST>:47470')
client.authenticate(HttpDremioClientAuthHandler(username, password))
info = client.get_flight_info(flight.FlightDescriptor.for_command(sql))
reader = client.do_get(info.endpoints[0].ticket)
batches = []
while True:
try:
batch, metadata = reader.read_chunk()
batches.append(batch)
except StopIteration:
break
data = pa.Table.from_batches(batches)
df = data.to_pandas()
Command
protobuf object has been removed to reduce dependencies