Ibis Project Blog

Python productivity framework for the Apache Hadoop ecosystem. Development updates, use cases, and internals.

Ibis 0.5: SQLite, Python 3, and more

The next Ibis release is out, with some major new functionality:

  • SQLite client and support for most SQLite built-in functions
  • Python 3 compatibility (single codebase)
  • SQLAlchemy-based expression translation toolchain to enable easier internal code reuse amongst SQL engines and pave road for PostgreSQL, Redshift, Vertica, and other analytic SQL engine support in the near future.
  • Asynchronous query execution API (expr.execute(async=True)) for Impala supporting query status and cancellation. This is very helpful in building multithreaded applications.
  • Support for using Impala user-defined aggregate (UDA) functions

There's a lot more, of course. Check out the detailed release notes, and read on for more about the upcoming roadmap.

Check out this follow-up post for a quick start in using Ibis on SQLite with the newly posted Crunchbase dataset.

Install Ibis from PyPI with

pip install ibis-framework

Thanks to all who contributed patches:

$ git log v0.4.0..v0.5.0 --pretty=format:%aN | sort | uniq -c | sort -rn
     55 Wes McKinney
      9 Uri Laserson
      1 Kristopher Overholt

Big news: expanding SQL engine support

One of the major goals of Ibis is to enable analytics work to be migrated from SQL code to Python code. Since much data being warehoused in analytic SQL systems (like Impala on HDFS or Redshift on AWS) isn't going anywhere soon, architecturally this requires building a feature-complete SQL translation toolchain. Ibis compiles Python to SQL behind the scenes and sends it to your data engine of choice.

We are taking SQL feature coverage very seriously. That means if you find a SELECT SQL query that cannot be expressed with Ibis, we will treat it as a bug.

Between Ibis 0.4 and 0.5, I undertook significant refactoring to separate Impala-specific functionality from the more generic SQL compilation toolchain. As part of this, I added a SQLAlchemy compiler-translator that converts Ibis expressions into SQLAlchemy expressions. To see this through to completion, I built a SQLite Ibis client that takes advantage of this.

Supporting more SQL engines is a lot of work, because each system has its own set of built-in functions, and these have to be wrapped and connected to the SQL-independent Ibis expression DSL.

Having this flexible and reusable translation toolchain available also makes it easier to smooth over behavior differences and API inconsistencies between SQL engines.

I would like to add more SQL engines; those designed for analytics (like Redshift, Vertica, and Presto) are likely to receive more attention in the short term. If you would like to get involved please get in touch.

Upcoming Ibis roadmap

Focus area in the coming months for the project will be:

  • Expanding SQL engine support (Redshift, Presto, Vertica, and Spark SQL are high priorities)
  • Support for Impala complex (nested) types
  • Tools for more complex ETL workflows on Impala

See the GitHub issue tracker for the granular feature roadmap.