Feature management has become a major challenge for ML Engineers at Airbnb. They spend a lot of time dealing with infrastructure complexities instead of focusing on their models. To solve this problem, Airbnb has created Chronon, an API that streamlines feature data management and ensures consistency between training and production environments.
Chronon can ingest data from various sources, including event streams, data warehouses, and more. It can handle both real-time event data and historical snapshots seamlessly.
With Chronon’s SQL-like transformations and time-based aggregations, ML practitioners have the flexibility to process data easily. They can perform complex computations while ensuring full flexibility and composability.
Chronon caters to both online and offline data generation requirements. It can serve low-latency endpoints for feature data or Hive tables for training data. Users can decide the update frequency based on their needs, from real-time updates to daily refreshes.
Chronon allows users to express the desired update frequency for derived data. It offers two accuracy models, “Temporal” and “Snapshot,” to ensure computations align with specific requirements.
Data sources are crucial in the Chronon ecosystem. It supports event data sources, entity data sources, and cumulative event sources for tracking historical changes.
Chronon operates in two contexts: online and offline. Online computations have low latency, while offline computations are performed on warehouse datasets using batch jobs. There are three categories of Chronon definitions: GroupBy for aggregation, Join for combining data, and StagingQuery for custom Spark SQL computations.
Chronon’s GroupBy aggregations provide additional extensions to traditional SQL group-by functionalities. Users can leverage Windows for time-bound aggregations, bucketing for granularity, and auto-unpack for nested data within an array. Time-based aggregations offer more flexibility in creating insightful features for ML models.
Chronon has revolutionized feature engineering for Airbnb’s ML practitioners. It simplifies feature engineering and frees ML Engineers from manual pipeline implementation. This allows them to focus on building innovative models that cater to user behaviors and product demands.
In conclusion, Chronon is an indispensable tool in Airbnb’s machine learning arsenal. It enhances productivity and scalability in feature engineering, empowering ML practitioners to deliver cutting-edge models and improve the Airbnb experience for users.