DuckLake 1.0: SQL-catalog table format + data inlining, MIT-licensed

DuckDB Labs ने पिछले महीने DuckLake 1.0 ship किया और MotherDuck ने production support wire किया — एक lakehouse table format जो Iceberg और Delta Lake से एक architecturally consequential तरीक़े में diverge करता है: object storage में JSON manifest files के तौर पर metadata के बजाय, सारा metadata और catalog एक SQL database में रहता है। team का why पर framing direct है — lake formats में file-based metadata complex coordination, slow metadata operations, और बहुत सारी छोटी files की तरफ़ ले जाता है। MotherDuck के launch numbers claim करते हैं catalog hot path पर 10x तेज़ queries और 10x ज़्यादा transactions per second। raw data object storage में Parquet में रहता है; सिर्फ़ catalog Postgres-class durability में move हुआ। license MIT। clients Apache DataFusion, Spark, Trino और Pandas के लिए ship हैं — day one से multi-engine, DuckDB-locked नहीं।

builders के लिए मायने रखने वाला mechanism data inlining है। Iceberg की small-file problem वो chronic compaction debt है जो हर production lakehouse पर लगती है: कुछ rows stream करो, वो एक नई Parquet file में जाती हैं, दस हज़ार बार repeat, अब आपकी reads दसियों हज़ार tiny files scan करती हैं जब तक compaction job catch up नहीं करता। DuckLake का inlining threshold default दस rows पर — इससे नीचे की कोई भी चीज़ नई Parquet object spawn करने के बजाय सीधे SQL catalog में चली जाती है। sorted tables और high-cardinality columns के लिए bucket partitioning 1.0 में हैं, geometry types upgrade हुए, और deletion vectors explicitly Iceberg-compatible हैं — DuckLake-emitted vector existing Iceberg consumers से साफ़ पढ़ता है। interop intentional है; ये एक format है जो operational characteristics पर जीतने की कोशिश कर रहा है, lock-in पर नहीं। architectural trade ये है कि आप पर अब metadata path पर Postgres (या compatible) availability dependency है। ज़्यादातर builders के लिए ये feature है — Postgres को चालीस साल से low-latency high-concurrency reads के लिए tune किया गया है। multi-region lakehouses को बिना obvious central catalog DB के चलाने वाले builders के लिए, ये नया सवाल है जिसके आसपास design करना है।

ecosystem reading: Iceberg और Delta Lake के बीच lakehouse-format war इस point पर दस साल पुरानी है और operational answer नहीं आया। Iceberg ने metadata-spec battle जीती पर Delta अब भी better tooling ship करता है। DuckLake तीसरे axis से आता है — file-based catalog को पूरी तरह bypass करना। ये उन production deployments में Iceberg replace नहीं करेगा जहाँ पहले से working compaction pipelines और Iceberg-native queries हैं, पर 2026 में नया analytics layer खड़ा करने वाले builders के लिए, SQL-catalog approach operationally बहुत better दिखेगा। MotherDuck का pitch — «setup 2 SQL commands में» — honest description है: अगर आपके पास Postgres है, तो DuckLake catalog है। multi-engine support का मतलब आप DuckDB पर bet नहीं कर रहे; heavy analytics Trino पर चला सकते हो जबकि light operational queries DuckDB को सीधे same data के against hit करती हैं। बड़ा open question Iceberg से migration path है, जिसे launch announcement vague छोड़ता है — ये वो workstream है जो अगले 6 महीनों में surface होगा।

practical move: अगर आप इस quarter में नया analytics layer या ML feature store बना रहे हो, DuckLake 1.0 Iceberg के against एक week eval के लायक है। setup-simplicity gap अकेला real है — अपनी Glue/Hive Metastore configuration को Postgres connection string से replace करो। अगर आप active compaction tuning के साथ production Iceberg चला रहे हो, migration cost non-zero है और कोई documented path नहीं; revisit करो जब DuckLake दोनों directions में explicit Iceberg readers/writers ship करे। ML pipelines के लिए जो छोटे batches frequently लिखते हैं (training-data ingestion, inference logs, eval traces), data inlining अकेला एक class का operational pain solve करता है जिसके लिए compaction-based approaches पैसे देते रहते हैं। 1.0 stable spec promise + backward-compatibility commitment इसे नए builds के लिए safe to bet बनाता है; existing builds के लिए, सवाल ये है कि आपकी current friction migration cost के लायक है या नहीं।

DuckLake 1.0: SQL-catalog table format + data inlining, MIT-licensed

और समाचार