Skip to content

Releases: pola-rs/polars

Rust Polars 0.48.1

21 May 11:05
5e1b4b7
Compare
Choose a tag to compare

🚀 Performance improvements

  • Switch eligible casts to non-strict in optimizer (#22850)

🐞 Bug fixes

  • Fix RuntimeError when serializing the same DataFrame from multiple threads (#22844)

📦 Build system

  • Fix building polars-lazy with certain features (#22846)
  • Add missing features (#22839)

🛠️ Other improvements

  • Update Rust Polars versions (#22854)

Thank you to all our contributors for making this release possible!
@JakubValtar, @bschoenmaeckers, @nameexhaustion and @stijnherfst

Python Polars 1.30.0

21 May 13:33
ee0903b
Compare
Choose a tag to compare

🚀 Performance improvements

  • Switch eligible casts to non-strict in optimizer (#22850)
  • Allow predicate passing set_sorted (#22797)
  • Increase default cross-file parallelism limit for new-streaming multiscan (#22700)
  • Add elementwise execution mode for list.eval (#22715)
  • Support optimised init from non-dict Mapping objects in from_records and frame/series constructors (#22638)
  • Add streaming cross-join node (#22581)
  • Switch off maintain_order in group-by followed by sort (#22492)

✨ Enhancements

  • Load AWS endpoint_url using boto3 (#22851)
  • Implemented list.filter (#22749)
  • Support binaryoffset in search sorted (#22786)
  • Add nulls_equal flag to list/arr.contains (#22773)
  • Implement LazyFrame.match_to_schema (#22726)
  • Improved time-string parsing and inference (generally, and via the SQL interface) (#22606)
  • Allow for .over to be called without partition_by (#22712)
  • Support AnyValue translation from PyMapping values (#22722)
  • Support optimised init from non-dict Mapping objects in from_records and frame/series constructors (#22638)
  • Support inference of Int128 dtype from databases that support it (#22682)
  • Add options to write Parquet field metadata (#22652)
  • Add cast_options parameter to control type casting in scan_parquet (#22617)
  • Allow casting List<UInt8> to Binary (#22611)
  • Allow setting of regex size limit using POLARS_REGEX_SIZE_LIMIT (#22651)
  • Support use of literal values as "other" when evaluating Series.zip_with (#22632)
  • Allow to read and write custom file-level parquet metadata (#21806)
  • Support PEP702 @deprecated decorator behaviour (#22594)
  • Support grouping by pl.Array (#22575)
  • Preserve exception type and traceback for errors raised from Python (#22561)
  • Use fixed-width font in streaming phys plan graph (#22540)

🐞 Bug fixes

  • Fix RuntimeError when serializing the same DataFrame from multiple threads (#22844)
  • Fix map_elements predicate pushdown (#22833)
  • Fix reverse list type (#22832)
  • Don't require numpy for search_sorted (#22817)
  • Add type equality checking for relevant methods (#22802)
  • Invalid output for fill_null after when.then on structs (#22798)
  • Don't panic for cross join with misaligned chunking (#22799)
  • Panic on quantile over nulls in rolling window (#22792)
  • Respect BinaryOffset metadata (#22785)
  • Correct the output order of PartitionByKey and PartitionParted (#22778)
  • Fallback to non-strict casting for deprecated casts (#22760)
  • Clippy on new stable version (#22771)
  • Handle sliced out remainder for bitmaps (#22759)
  • Don't merge Enum categories on append (#22765)
  • Fix unnest() not working on empty struct columns (#22391)
  • Fix the default value type in Schema init (#22589)
  • Correct name in unnest error message (#22740)
  • Provide "schema" to DataFrame, even if empty JSON (#22739)
  • Properly account for nulls in the is_not_nan check made in drop_nans (#22707)
  • Incorrect result from SQL count(*) with partition by (#22728)
  • Fix deadlock joining scanned tables with low thread count (#22672)
  • Don't allow deserializing incompatible DSL (#22644)
  • Incorrect null dtype from binary ops in empty group_by (#22721)
  • Don't mark str.replace_many with Mapping as deprecated (#22697)
  • Gzip has maximum compression of 9, not 10 (#22685)
  • Fix predicate pushdown of fallible expressions (#22669)
  • Fix index out of bounds panic when scanning hugging face (#22661)
  • Panic on group_by with literal and empty rows (#22621)
  • Return input instead of panicking if empty subset in drop_nulls() and drop_nans() (#22469)
  • Bump argminmax to 0.6.3 (#22649)
  • DSL version deserialization endianness (#22642)
  • Allow Expr.round() to be called on integer dtypes (#22622)
  • Fix panic when filtering based on row index column in parquet (#22616)
  • WASM and PyOdide compile (#22613)
  • Resolve get() SchemaMismatch panic (#22350)
  • Panic in group_by_dynamic on single-row df with group_by (#22597)
  • Add new_streaming feature to polars crate (#22601)
  • Consistently use Unix epoch as origin for dt.truncate (except weekly buckets which start on Mondays) (#22592)
  • Fix interpolate on dtype Decimal (#22541)
  • CSV count rows skipped last line if file did not end with newline (#22577)
  • Make nested strict casting actually strict (#22497)
  • Make replace and replace_strict mapping use list literals (#22566)
  • Allow pivot on Time column (#22550)
  • Fix error when providing CSV schema with extra columns (#22544)
  • Panic on bitwise op between Series and Expr (#22527)
  • Multi-selector regex expansion (#22542)

📖 Documentation

  • Add pre-release policy (#22808)
  • Fix broken link to service account page in Polars Cloud docs (#22762)
  • Add match_to_schema to API reference (#22777)
  • Provide additional explanation and examples for the value_counts "normalize" parameter (#22756)
  • Rework documentation for drop/fill for nulls/nans (#22657)
  • Add documentation to new RoundMode parameter in round (#22555)
  • Add missing repeat_by to API reference, fixup list.get (#22698)
  • Fix non-rendering bullet points in scan_iceberg (#22694)
  • Improve insert_column docstring (description and examples) (#22551)
  • Improve join documentation (#22556)

📦 Build system

  • Fix building polars-lazy with certain features (#22846)
  • Add missing features (#22839)
  • Patch pyo3 to disable recompilation (#22796)

🛠️ Other improvements

  • Update Rust Polars versions (#22854)
  • Add basic smoke test for free-threaded python (#22481)
  • Update Polars Rust versions (#22834)
  • Fix nix build (#22809)
  • Fix flake.nix to work on macos (#22803)
  • Unused variables on release build (#22800)
  • Update cloud docs (#22624)
  • Fix unstable list.eval performance test (#22729)
  • Add proptest implementations for all Array types (#22711)
  • Dispatch .write_* to .lazy().sink_*(engine='in-memory') (#22582)
  • Move to all optimization flags to QueryOptFlags (#22680)
  • Add test for str.replace_many (#22615)
  • Stabilize sink_* (#22643)
  • Add proptest for row-encode (#22626)
  • Update rust version in nix flake (#22627)
  • Add a nix flake with a devShell and package (#22246)
  • Use a wrapper struct to store time zone (#22523)
  • Add proptest testing for for parquet decoding kernels (#22608)
  • Include equiprobable as valid quantile method (#22571)
  • Remove confusing error context calling .collect(_eager=True) (#22602)
  • Fix test_truncate_path test case (#22598)
  • Unify function flags into 1 bitset (#22573)
  • Display the operation behind in-memory-map (#22552)

Thank you to all our contributors for making this release possible!
@IvanIsCoding, @JakubValtar, @Julian-J-S, @LucioFranco, @MarcoGorelli, @WH-2099, @alexander-beedie, @borchero, @bschoenmaeckers, @cmdlineluser, @coastalwhite, @etiennebacher, @florian-klein, @itamarst, @kdn36, @mcrumiller, @nameexhaustion, @nikaltipar, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @teotwaki, @timkpaine and @wence-

Rust Polars 0.48.0

20 May 11:07
bfa5e96
Compare
Choose a tag to compare

💥 Breaking changes

  • Use a wrapper struct to store time zone (#22523)

🚀 Performance improvements

  • Allow predicate passing set_sorted (#22797)
  • Increase default cross-file parallelism limit for new-streaming multiscan (#22700)
  • Add elementwise execution mode for list.eval (#22715)
  • Support optimised init from non-dict Mapping objects in from_records and frame/series constructors (#22638)
  • Add streaming cross-join node (#22581)
  • Switch off maintain_order in group-by followed by sort (#22492)

✨ Enhancements

  • Format named functions (#22831)
  • Implemented list.filter (#22749)
  • Support binaryoffset in search sorted (#22786)
  • Add nulls_equal flag to list/arr.contains (#22773)
  • Allow named opaque functions for serde (#22734)
  • Implement LazyFrame.match_to_schema (#22726)
  • Improved time-string parsing and inference (generally, and via the SQL interface) (#22606)
  • Allow for .over to be called without partition_by (#22712)
  • Support AnyValue translation from PyMapping values (#22722)
  • Support optimised init from non-dict Mapping objects in from_records and frame/series constructors (#22638)
  • Add options to write Parquet field metadata (#22652)
  • Allow casting List<UInt8> to Binary (#22611)
  • Allow setting of regex size limit using POLARS_REGEX_SIZE_LIMIT (#22651)

🐞 Bug fixes

  • Fix reverse list type (#22832)
  • Add type equality checking for relevant methods (#22802)
  • Invalid output for fill_null after when.then on structs (#22798)
  • Don't panic for cross join with misaligned chunking (#22799)
  • Panic on quantile over nulls in rolling window (#22792)
  • Respect BinaryOffset metadata (#22785)
  • Correct the output order of PartitionByKey and PartitionParted (#22778)
  • Fallback to non-strict casting for deprecated casts (#22760)
  • Clippy on new stable version (#22771)
  • Handle sliced out remainder for bitmaps (#22759)
  • Don't merge Enum categories on append (#22765)
  • Fix unnest() not working on empty struct columns (#22391)
  • Correct name in unnest error message (#22740)
  • Properly account for nulls in the is_not_nan check made in drop_nans (#22707)
  • Incorrect result from SQL count(*) with partition by (#22728)
  • Fix deadlock joining scanned tables with low thread count (#22672)
  • Don't allow deserializing incompatible DSL (#22644)
  • Incorrect null dtype from binary ops in empty group_by (#22721)
  • Don't mark str.replace_many with Mapping as deprecated (#22697)
  • Gzip has maximum compression of 9, not 10 (#22685)
  • Fix predicate pushdown of fallible expressions (#22669)
  • Fix index out of bounds panic when scanning hugging face (#22661)
  • Fix polars crate not compiling when lazy feature enabled (#22655)
  • Panic on group_by with literal and empty rows (#22621)
  • Return input instead of panicking if empty subset in drop_nulls() and drop_nans() (#22469)
  • Bump argminmax to 0.6.3 (#22649)
  • DSL version deserialization endianness (#22642)
  • Fix nested dtype row encoding (#22557)
  • Allow Expr.round() to be called on integer dtypes (#22622)
  • Fix panic when filtering based on row index column in parquet (#22616)
  • WASM and PyOdide compile (#22613)
  • Resolve get() SchemaMismatch panic (#22350)

📖 Documentation

  • Add pre-release policy (#22808)
  • Fix broken link to service account page in Polars Cloud docs (#22762)
  • Rework documentation for drop/fill for nulls/nans (#22657)

📦 Build system

  • Patch pyo3 to disable recompilation (#22796)

🛠️ Other improvements

  • Update Polars Rust versions (#22834)
  • Cleanup polars-python lifetimes (#22548)
  • Fix nix build (#22809)
  • Fix flake.nix to work on macos (#22803)
  • Remove unused dependencies in polars-arrow (#22806)
  • Unused variables on release build (#22800)
  • Update cloud docs (#22624)
  • Add proptest implementations for all Array types (#22711)
  • Dispatch .write_* to .lazy().sink_*(engine='in-memory') (#22582)
  • Move to all optimization flags to QueryOptFlags (#22680)
  • Add test for str.replace_many (#22615)
  • Stabilize sink_* (#22643)
  • Add proptest for row-encode (#22626)
  • Emphasize PolarsDataType::get_dtype is static-only (#22648)
  • Use named fields for Logical (#22647)
  • Update rust version in nix flake (#22627)
  • Add a nix flake with a devShell and package (#22246)
  • Use a wrapper struct to store time zone (#22523)
  • Add proptest testing for for parquet decoding kernels (#22608)

Thank you to all our contributors for making this release possible!
@IvanIsCoding, @JakubValtar, @Julian-J-S, @LucioFranco, @MarcoGorelli, @WH-2099, @alexander-beedie, @borchero, @bschoenmaeckers, @cmdlineluser, @coastalwhite, @etiennebacher, @florian-klein, @itamarst, @kdn36, @nameexhaustion, @nikaltipar, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @teotwaki, @timkpaine and @wence-

Python Polars 1.30.0-beta.1

16 May 19:06
103f194
Compare
Choose a tag to compare
Pre-release

🚀 Performance improvements

  • Increase default cross-file parallelism limit for new-streaming multiscan (#22700)
  • Add elementwise execution mode for list.eval (#22715)
  • Support optimised init from non-dict Mapping objects in from_records and frame/series constructors (#22638)
  • Add streaming cross-join node (#22581)
  • Switch off maintain_order in group-by followed by sort (#22492)

✨ Enhancements

  • Support binaryoffset in search sorted (#22786)
  • Add nulls_equal flag to list/arr.contains (#22773)
  • Implement LazyFrame.match_to_schema (#22726)
  • Improved time-string parsing and inference (generally, and via the SQL interface) (#22606)
  • Allow for .over to be called without partition_by (#22712)
  • Support AnyValue translation from PyMapping values (#22722)
  • Support optimised init from non-dict Mapping objects in from_records and frame/series constructors (#22638)
  • Support inference of Int128 dtype from databases that support it (#22682)
  • Add options to write Parquet field metadata (#22652)
  • Add cast_options parameter to control type casting in scan_parquet (#22617)
  • Allow casting List<UInt8> to Binary (#22611)
  • Allow setting of regex size limit using POLARS_REGEX_SIZE_LIMIT (#22651)
  • Support use of literal values as "other" when evaluating Series.zip_with (#22632)
  • Allow to read and write custom file-level parquet metadata (#21806)
  • Support PEP702 @deprecated decorator behaviour (#22594)
  • Support grouping by pl.Array (#22575)
  • Preserve exception type and traceback for errors raised from Python (#22561)
  • Use fixed-width font in streaming phys plan graph (#22540)

🐞 Bug fixes

  • Respect BinaryOffset metadata (#22785)
  • Correct the output order of PartitionByKey and PartitionParted (#22778)
  • Fallback to non-strict casting for deprecated casts (#22760)
  • Clippy on new stable version (#22771)
  • Handle sliced out remainder for bitmaps (#22759)
  • Don't merge Enum categories on append (#22765)
  • Fix unnest() not working on empty struct columns (#22391)
  • Fix the default value type in Schema init (#22589)
  • Correct name in unnest error message (#22740)
  • Provide "schema" to DataFrame, even if empty JSON (#22739)
  • Properly account for nulls in the is_not_nan check made in drop_nans (#22707)
  • Incorrect result from SQL count(*) with partition by (#22728)
  • Fix deadlock joining scanned tables with low thread count (#22672)
  • Don't allow deserializing incompatible DSL (#22644)
  • Incorrect null dtype from binary ops in empty group_by (#22721)
  • Don't mark str.replace_many with Mapping as deprecated (#22697)
  • Gzip has maximum compression of 9, not 10 (#22685)
  • Fix predicate pushdown of fallible expressions (#22669)
  • Fix index out of bounds panic when scanning hugging face (#22661)
  • Panic on group_by with literal and empty rows (#22621)
  • Return input instead of panicking if empty subset in drop_nulls() and drop_nans() (#22469)
  • Bump argminmax to 0.6.3 (#22649)
  • DSL version deserialization endianness (#22642)
  • Allow Expr.round() to be called on integer dtypes (#22622)
  • Fix panic when filtering based on row index column in parquet (#22616)
  • WASM and PyOdide compile (#22613)
  • Resolve get() SchemaMismatch panic (#22350)
  • Panic in group_by_dynamic on single-row df with group_by (#22597)
  • Add new_streaming feature to polars crate (#22601)
  • Consistently use Unix epoch as origin for dt.truncate (except weekly buckets which start on Mondays) (#22592)
  • Fix interpolate on dtype Decimal (#22541)
  • CSV count rows skipped last line if file did not end with newline (#22577)
  • Make nested strict casting actually strict (#22497)
  • Make replace and replace_strict mapping use list literals (#22566)
  • Allow pivot on Time column (#22550)
  • Fix error when providing CSV schema with extra columns (#22544)
  • Panic on bitwise op between Series and Expr (#22527)
  • Multi-selector regex expansion (#22542)

📖 Documentation

  • Fix broken link to service account page in Polars Cloud docs (#22762)
  • Add match_to_schema to API reference (#22777)
  • Provide additional explanation and examples for the value_counts "normalize" parameter (#22756)
  • Rework documentation for drop/fill for nulls/nans (#22657)
  • Add documentation to new RoundMode parameter in round (#22555)
  • Add missing repeat_by to API reference, fixup list.get (#22698)
  • Fix non-rendering bullet points in scan_iceberg (#22694)
  • Improve insert_column docstring (description and examples) (#22551)
  • Improve join documentation (#22556)

🛠️ Other improvements

  • Update cloud docs (#22624)
  • Fix unstable list.eval performance test (#22729)
  • Add proptest implementations for all Array types (#22711)
  • Dispatch .write_* to .lazy().sink_*(engine='in-memory') (#22582)
  • Move to all optimization flags to QueryOptFlags (#22680)
  • Add test for str.replace_many (#22615)
  • Stabilize sink_* (#22643)
  • Add proptest for row-encode (#22626)
  • Update rust version in nix flake (#22627)
  • Add a nix flake with a devShell and package (#22246)
  • Use a wrapper struct to store time zone (#22523)
  • Add proptest testing for for parquet decoding kernels (#22608)
  • Include equiprobable as valid quantile method (#22571)
  • Remove confusing error context calling .collect(_eager=True) (#22602)
  • Fix test_truncate_path test case (#22598)
  • Unify function flags into 1 bitset (#22573)
  • Display the operation behind in-memory-map (#22552)

Thank you to all our contributors for making this release possible!
@JakubValtar, @Julian-J-S, @MarcoGorelli, @WH-2099, @alexander-beedie, @borchero, @cmdlineluser, @coastalwhite, @etiennebacher, @florian-klein, @itamarst, @kdn36, @mcrumiller, @nameexhaustion, @nikaltipar, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @teotwaki, @timkpaine and @wence-

Rust Polars 0.47.1

05 May 13:13
ba3be4e
Compare
Choose a tag to compare

🏆 Highlights

  • Enable common subplan elimination across plans in collect_all (#21747)
  • Add lazy sinks (#21733)
  • Add PartitionByKey for new streaming sinks (#21689)
  • Enable new streaming memory sinks by default (#21589)

💥 Breaking changes

  • Make bottom interval closed in hist (#22090)

🚀 Performance improvements

  • Avoid alloc_zeroed in decompression (#22460)
  • Lower Expr.(n_)unique to group_by on streaming engine (#22420)
  • Chunk huge munmap calls (#22414)
  • Add single-key variants of streaming group_by (#22409)
  • Improve accumulate_dataframes_vertical performance (#22399)
  • Use optimize rolling_quantile with varying window sizes (#22353)
  • Dedicated rolling_skew kernel (#22333)
  • Call large munmap's in background thread (#22329)
  • New streaming group_by implementation (#22285)
  • Patch jemalloc to not purge huge allocs eagerly if we have background threads (#22318)
  • Turn on parallel=prefiltered by default for new streaming (#22190)
  • Add CSE to streaming groupby (#22196)
  • Speed-up new streaming predicate filtering (#22179)
  • Speedup new-streaming file row count (#22169)
  • Fix quadratic behavior when casting Enums (#22008)
  • Lower is_in to bitmap-output semi-join in new streaming engine (#21948)
  • Fast path for empty inner join (#21965)
  • Add native semi/anti join in new streaming engine (#21937)
  • Cache regex compilation globally (#21929)
  • Use views for binary hash tables and add single-key binary variant (#21872)
  • Avoid rechunking in gather (#21876)
  • Switch ahash for foldhash (#21852)
  • Put THP behind feature flag (#21853)
  • Enable THP by default (#21829)
  • Improve join performance for expanding joins (#21821)
  • Use binary_search instead of contains in business-day functions (#21775)
  • Implement linear-time rolling_min/max (#21770)
  • Improve InputIndependentSelect by delegating to InMemorySourceNode (#21767)
  • Enable common subplan elimination across plans in collect_all (#21747)
  • Allow elementwise functions in recursive lowering (#21653)
  • Add primitive single-key hashtable to new-streaming join (#21712)
  • Remove unnecessary black_boxes in Kahan summation (#21679)
  • Box large enum variants (#21657)
  • Improve join performance for new-streaming engine (#21620)
  • Pre-fill caches (#21646)
  • Optimize only a single cache input (#21644)
  • Collect parquet statistics in one contiguous buffer (#21632)
  • Update Cargo.lock (mainly for zstd 1.5.7) (#21612)
  • Don't maintain order when maintain_order=False in new streaming sinks (#21586)
  • Pre-sort groups in group-by-dynamic (#21569)
  • Provide a fallback skip batch predicate for constant batches (#21477)
  • Parallelize the passing in new streaming multiscan (#21430)
  • Toggle projection pushdown for eager rolling (#21405)
  • Fix pathologic rolling + group-by performance and memory explosion (#21403)
  • Add sampling to new-streaming equi join to decide between build/probe side (#21197)
  • Reduce sharing in stringview arrays in new-streaming equijoin (#21129)
  • Implement native Expr.count() on new-streaming (#21126)
  • Speed up list operations that use amortized_iter() (#20964)
  • Use Cow as output for rechunk and add rechunk_mut (#21116)
  • Reduce arrow slice mmap overhead (#21113)
  • Reduce conversion cost in chunked string gather (#21112)
  • Enable prefiltered by default for new streaming (#21109)
  • Enable parquet column expressions for streaming (#21101)
  • Deduplicate buffers again in stringview concat kernel (#21098)
  • Add dedicated concatenate kernels (#21080)
  • Rechunk only once during join probe gather (#21072)
  • Speed up from_pandas when converting frame with multi-index columns (#21063)
  • Change default memory prefetch to MADV_WILLNEED (#21056)
  • Remove cast to boolean after comparison in optimizer (#21022)
  • Split last rowgroup among all threads in new-streaming parquet reader (#21027)
  • Recombine into larger morsels in new-streaming join (#21008)
  • Improve list.min and list.max performance for logical types (#20972)
  • Ensure count query select minimal columns (#20923)

✨ Enhancements

  • Support grouping by pl.Array (#22575)
  • Preserve exception type and traceback for errors raised from Python (#22561)
  • Use fixed-width font in streaming phys plan graph (#22540)
  • Highlight nodes in streaming phys plan graph (#22535)
  • Support BinaryOffset serde (#22528)
  • Show physical stage graph (#22491)
  • Add structure for dispatching iceberg to native scans (#22405)
  • Add SQL support for checking array values with IN and NOT IN expressions (#22487)
  • Add more IRBuilder utils (#22482)
  • Support DataFrame and Series init from torch Tensor objects (#22177)
  • Add RoundMode for Decimal and Float (#22248)
  • Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)
  • Make streaming dispatch public (#22347)
  • Add rolling_kurtosis (#22335)
  • Support Cast in IO plugin predicates (#22317)
  • Add .sort(nulls_last=True) to booleans, categoricals and enums (#22300)
  • Add rolling min/max for temporals (#22271)
  • Support literal:list agg (#22249)
  • Support implode + agg (#22230)
  • Dispatch scans to new-streaming by default (#22153)
  • Improved expression autocomplete for IPython, Jupyter, and Marimo (#22221)
  • Expose FunctionIR::FastCount in the python visitor (#22195)
  • Add SPLIT_PART string function to the SQL interface (#22158)
  • Allow scalar expr in Expr.diff (#22142)
  • Support additional unsigned int aliases in the SQL interface (#22127)
  • Add STRING_TO_ARRAY function to the SQL interface (#22129)
  • Add dt.is_business_day (#21776)
  • Add support for Int128 parsing/recognition to the SQL interface (#22104)
  • Allow sinking to abstract python io and fs classes (#21987)
  • Add add_alp_optimize_exprs to IRBuilder (#22061)
  • Add cat.slice (#21971)
  • Support growing schema if line lenght increases during csv schema inference (#21979)
  • Replace thread unsafe GilOnceCell with Mutex (#21927)
  • Support modified dsl in file cache (#21907)
  • Add support for io-plugins in new-streaming (#21870)
  • Add PartitionParted (#21788)
  • Add DoubleEndedIterator for CatIter (#21816)
  • Minor improvements to EXPLAIN plan output (#21822)
  • Add polars_testing folder with relevant files and add_series_equal!() functionality (#21722)
  • Allow to use repeat_by with (nested) lists and structs (#21206)
  • Add support for rolling_(sum/min/max) for booleans through casting (#21748)
  • Support multi-column sort for all nested types and nested search-sorted (#21743)
  • Add lazy sinks (#21733)
  • Add PartitionByKey for new streaming sinks (#21689)
  • Fix replace flags (#21731)
  • Add mkdir flag to sinks (#21717)
  • Enable joins on list/array dtypes (#21687)
  • Add a config option to specify the default engine to attempt to use during lazyframe calls (#20717)
  • Support all elementwise functions in IO plugin predicates (#21705)
  • Stabilize Enum datatype (#21686)
  • Support Polars int128 in from arrow (#21688)
  • Use FFI to read dataframe instead of transmute (#21673)
  • Enable new streaming memory sinks by default (#21589)
  • Cloud support for new-streaming scans and sinks (#21621)
  • Add len method to arr (#21618)
  • Closeable files on unix (#21588)
  • Add new PartitionMaxSize sink (#21573)
  • Implement unpack_dtypes() functionality with unit tests (#21574)
  • Support engine callback for LazyFrame.profile (#21534)
  • Dispatch new-streaming CSV negative slice to separate node (#21579)
  • Add NDJSON source to new streaming engine (#21562)
  • Add lossy decoding to read_csv for non-utf8 encodings (#21433)
  • Add 'nulls_equal' parameter to is_in (#21426)
  • Improve numeric stability rolling_{std, var, cov, corr} (#21528)
  • IR Serde cross-filter (#21488)
  • Support writing Time type in json (#21454)
  • Activate all optimizations in sinks (#21462)
  • Add AssertionError variant to PolarsError in polars-error (#21460)
  • Pass filter to inner readers in multiscan new streaming (#21436)
  • Implement i128 -> str cast (#21411)
  • Version DSL (#21383)
  • Make user facing binary formats mostly self describing (#21380)
  • Filter hive files using predicates in new streaming (#21372)
  • Add negative slicing to new streaming multiscan (#21219)
  • Pub-licize Expr DSL Function enums (#20421)
  • Implement sorted flags for struct series (#21290)
  • Support reading arrow Map type from Delta (#21330)
  • Add a dedicated remove method for DataFrame and LazyFrame (#21259)
  • Expose include_file_paths to python visitor (#21279)
  • Implement merge_sorted for struct (#21205)
  • Add positive slice for new streaming MultiScan (#21191)
  • Don't take in rewriting visitor (#21212)
  • Add SQL support for the DELETE statement (#21190)
  • Add row index to new streaming multiscan (#21169)
  • Improve DataFrame fmt in explain (#21158)
  • Add projection pushdown to new streaming multiscan (#21139)
  • Implement join on struct dtype (#21093)
  • Use unique temporary directory path per user and restrict permissions (#21125)
  • Enable new streaming multiscan for CSV (#21124)
  • Environment POLARS_MAX_CONCURRENT_SCANS in multiscan for new streaming (#21127)
  • Multi/Hive scans in new streaming engine (#21011)
  • Add linear_spaces (#20941)
  • Implement merge_sorted for binary (#21045)
  • Hold string cache in new streaming engine and fix row-encoding (#21039)
  • Support max/min method for Time dtype (#19815)
  • Implement a streaming merge sorted node (#20960)
  • Automatically use temporary credentials API for scanning Unity catalog tables (#21020)
  • Add negative slice support to new-streaming engine (#21001)
  • Allow for more RG skipping by rewriting expr in planner (#20828)
  • Rename catalog schema to namespace (#20993)
  • Add functionality to create and delete catalogs, tables and schemas to Unity catalog client (#20956)
  • Improved support for KeyboardInterrupts (#20961...
Read more

Python Polars 1.29.0

30 Apr 20:57
a0e3e38
Compare
Choose a tag to compare

🚀 Performance improvements

  • Avoid alloc_zeroed in decompression (#22460)

✨ Enhancements

  • Highlight nodes in streaming phys plan graph (#22535)
  • Show physical stage graph (#22491)
  • Add structure for dispatching iceberg to native scans (#22405)
  • Add SQL support for checking array values with IN and NOT IN expressions (#22487)
  • Support DataFrame and Series init from torch Tensor objects (#22177)
  • Add RoundMode for Decimal and Float (#22248)
  • Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)

🐞 Bug fixes

  • Streaming outer join coalesce bug (#22530)
  • Remove redundant print statement in assert_frame_schema_equal() (#22529)
  • Bug in .unique() followed by .slice() (#22471)
  • Fix error reading parquet with datetimes written by pandas (#22524)
  • Fix schema_overrides not taking effect in NDJSON (#22521)
  • Fold flags and verify scalar correctness in apply (#22519)
  • Invalid values were triggering panics instead of returning null in dt.to_date / dt.to_datetime (#22500)
  • Ensure numpy isinstance check is lazy (avoid forcing the dependency) (#22486)
  • Incorrectly dropped sort after unique for some queries (#22489)
  • Fix incorrect ternary agg state with mixed columns and scalars (#22496)
  • Make replace and replace_strict properly elementwise (#22465)
  • Fix index out of bounds panic on parquet prefiltering (#22458)
  • Integer underflow when checking parquet UTF-8 (#22472)
  • Add implementation for array.get with idx overflow (#22449)
  • Deprecate str. collection functions with flat strings and mark as elementwise (#22461)
  • Deprecate flat list.gather and mark as elementwise (#22456)
  • Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)

📖 Documentation

  • Fix typo in structs page (#22504)

🛠️ Other improvements

  • Don't store name/dtype in grouper (#22525)
  • Add structure for dispatching iceberg to native scans (#22405)
  • Remove unused reduction code (#22462)
  • Pin to explicit macOS version in code coverage (#22432)

Thank you to all our contributors for making this release possible!
@AH-Merii, @JakubValtar, @Julian-J-S, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @Matt711, @alexander-beedie, @brianmakesthings, @coastalwhite, @nameexhaustion, @orlp and @ritchie46

Python Polars 1.28.1

27 Apr 15:33
506319e
Compare
Choose a tag to compare

🐞 Bug fixes

  • Reading of reencoded categorical in Parquet (#22436)
  • Last thread in parquet predicate filter oob (#22429)

📖 Documentation

  • Fix a few typos in the new "multiplexing" page (#22434)
  • Add multiplexing page (#22426)

📦 Build system

  • Update pyo3 and numpy crates to version 0.24 (#22015)

🛠️ Other improvements

  • Add test for implode + over (#22437)
  • Fix CI by removing use_legacy_dataset (#22438)
  • Only use pytorch index-url for pytorch package (#22355)

Thank you to all our contributors for making this release possible!
@bschoenmaeckers, @coastalwhite, @etiennebacher, @mcrumiller and @ritchie46

Python Polars 1.28.0

26 Apr 09:02
8d30e79
Compare
Choose a tag to compare

🚀 Performance improvements

  • Lower Expr.(n_)unique to group_by on streaming engine (#22420)
  • Chunk huge munmap calls (#22414)
  • Add single-key variants of streaming group_by (#22409)
  • Improve accumulate_dataframes_vertical performance (#22399)
  • Use optimize rolling_quantile with varying window sizes (#22353)
  • Dedicated rolling_skew kernel (#22333)
  • Call large munmap's in background thread (#22329)
  • New streaming group_by implementation (#22285)
  • Patch jemalloc to not purge huge allocs eagerly if we have background threads (#22318)
  • Turn on parallel=prefiltered by default for new streaming (#22190)

✨ Enhancements

  • When reporting unexpected types in errors, module-qualify the typename (#22390)
  • Add Series backward_fill / forward_fill (#22360)
  • Add GPU support to sink_* APIs (#20940)
  • Changed mapping type from dict to Mapping (#19400) (#19436)
  • Make streaming dispatch public (#22347)
  • Add rolling_kurtosis (#22335)
  • Support Cast in IO plugin predicates (#22317)
  • Add .sort(nulls_last=True) to booleans, categoricals and enums (#22300)
  • Add rolling min/max for temporals (#22271)
  • Support literal:list agg (#22249)
  • Support running Polars SQL queries against any objects implementing the PyCapsule interface (#22235)
  • Support implode + agg (#22230)
  • Dispatch scans to new-streaming by default (#22153)

🐞 Bug fixes

  • Ensure write_excel correctly preserves null values in nested dtype data on export (#22379)
  • Panic when visualizing streaming physical plan with joins (#22404)
  • Fix incorrect filter after LazyFrame.rename().select() (#22380)
  • Fix select(len()) performance regression (#22363)
  • Handle pytz named timezone in lit (#21785)
  • Don't leak state during prefill CSE cache (#22341)
  • Maintain float32 type in partitioned group-by (#22340)
  • Resolve streaming panic on multiple merge_sorted (#22205)
  • Fix ndjson nested types (#22325)
  • Fix nested datetypes in ndjson (#22321)
  • Check matching lengths for pl.corr (#22305)
  • Move type coercion for pl.duration to planner (#22304)
  • Check dtype to avoid panic with mixed types in min/max_horizontal (#21857)
  • Coalesce correct column for new streaming full join (#22301)
  • Don't collect NaN from Parquet Statistics (#22294)
  • Set revmap for empty AnyValue to Series (#22293)
  • Add an __all__ entry to internal type definition module (#22254)
  • Datetime parser was incorrectly parsing 8-digit fractional seconds when format specified to expect 9 (#22180)
  • More robust str → date conversion when reading from spreadsheet (#22276)
  • Deprecate using is_in with 2 equal types and mark as elementwise (#22178)
  • Duplicate key column name in streaming group_by due to CSE (#22280)
  • Raise ColumnNotFoundError for missing columns in join_where (#22268)
  • Parquet filters for logical types and operations (#22253)
  • Ensure floating-point accuracy in hist (#22245)
  • Check matching key datatypes for new streaming joins (#22247)
  • Incorrect length BinaryArray/ListBuilder (#22227)

📖 Documentation

  • Update docs for schema arg in scan_csv to match read_csv (#22357)
  • Update pl.when documentation (#22345)
  • Add missing is_business_day to documentation reference (#22338)
  • Improve interpolation documentation to clarify behavior of null values (#22274)

🛠️ Other improvements

  • Install pytorch for 3.13 on Windows (#22356)
  • Make interpolate fix more robust (#22421)
  • Fix interpolate test (#22417)
  • Reduce hot table size in debug mode (#22400)
  • Replace intrinsic with non-intrinsic (#22401)
  • Make streaming dispatch public (#22347)
  • Update rustc to 'nightly-2025-04-19' (#22342)
  • Update mozilla-actions/sccache-action (#22319)
  • Purge old parquet and scan code (#22226)
  • Add an __all__ entry to internal type definition module (#22254)
  • Add online skew/kurtosis algorithm for future use in rolling kernels (#22261)
  • Add Polars Cloud 0.0.7 release notes (#22223)
  • Change format name from list to implode (#22240)
  • Make other parallel parquet modes filter afterwards (#22228)
  • Close async reader issues (#22224)
  • Add BinaryArrayBuilder (#22225)

Thank you to all our contributors for making this release possible!
@DavideCanton, @JakubValtar, @Jesse-Bakker, @MarcoGorelli, @NeejWeej, @Shoeboxam, @adamreeve, @alexander-beedie, @axellpadilla, @cmdlineluser, @coastalwhite, @d-reynol, @dongchao-1, @florian-klein, @kdn36, @math-hiyoko, @mcrumiller, @mroeschke, @nameexhaustion, @orlp, @ritchie46, @stijnherfst and @yiteng-guo

Python Polars 1.27.1

11 Apr 10:26
319a9a8
Compare
Choose a tag to compare

✨ Enhancements

  • Improved expression autocomplete for IPython, Jupyter, and Marimo (#22221)

🐞 Bug fixes

  • Incorrect condition on empty inner join fast path (#22208)
  • Fallback predicate filter for min=max with is_in (#22213)
  • Don't panic for LruCachedFunc for size=0 (#22215)
  • Writing masked out list values to json (#22210)
  • Deadlock in streaming distributor (#22207)

Thank you to all our contributors for making this release possible!
@Matt711, @alexander-beedie, @coastalwhite, @dependabot[bot], @orlp, @ritchie46 and dependabot[bot]

Python Polars 1.27.0

09 Apr 17:27
075fe61
Compare
Choose a tag to compare

💥 Breaking changes

  • Make bottom interval closed in hist (#22090)
  • Change Partition API to base_path and file_path (#21888)

🚀 Performance improvements

  • Add CSE to streaming groupby (#22196)
  • Speed-up new streaming predicate filtering (#22179)
  • Speedup new-streaming file row count (#22169)
  • Fix quadratic behavior when casting Enums (#22008)
  • Lower is_in to bitmap-output semi-join in new streaming engine (#21948)
  • Fast path for empty inner join (#21965)
  • Add native semi/anti join in new streaming engine (#21937)
  • Cache regex compilation globally (#21929)

✨ Enhancements

  • Add SPLIT_PART string function to the SQL interface (#22158)
  • Allow scalar expr in Expr.diff (#22142)
  • Support additional unsigned int aliases in the SQL interface (#22127)
  • Add STRING_TO_ARRAY function to the SQL interface (#22129)
  • Add dt.is_business_day (#21776)
  • Add an eager parameter to pl.cov (#22098)
  • Add support for Int128 parsing/recognition to the SQL interface (#22104)
  • Add an eager parameter to pl.coalesce (#22092)
  • Add an eager parameter to pl.corr (#22097)
  • Allow sinking to abstract python io and fs classes (#21987)
  • Add add_alp_optimize_exprs to IRBuilder (#22061)
  • Add cat.slice (#21971)
  • Support growing schema if line lenght increases during csv schema inference (#21979)
  • Replace thread unsafe GilOnceCell with Mutex (#21927)
  • Support modified dsl in file cache (#21907)

🐞 Bug fixes

  • Implode in agg (#22197)
  • Reduce GIL hold time for IO plugins in new-streaming (#22186)
  • Enhance predicate validation and cast safety in join_where (#22112)
  • Handle Parquet with compressed empty DataPage v2 (#22172)
  • Schema error during lowering (#22175)
  • Rewrite unroll of overlapping groups to mitigate out of range index panic (#22072)
  • Incorrect rounding for very large/small numbers (#22173)
  • Allow set input to list.set_* operations (#22163)
  • Deadlock in join due to rayon nested task-stealing (#22159)
  • Mark Expr.repeat_by as elementwise (#22068)
  • Fix csv serializer panic by supporting ScalarColumn in as_single_chunk (#22146)
  • Raise an error if a number doesn't have associated unit in duration strings (#22035)
  • Add i128 as supertype to boolean (#22138)
  • Fix panic when constructing DF from pyarrow due to duplicate field names (#22114)
  • Add broadcasts and error messages for many elementwise operations (#22130)
  • Throw error for n=0 on list.gather_every (#22122)
  • Throw error for unsupported rolling operations (#22121)
  • Error on unequal length str.to_integer arguments (#22100)
  • Make bottom interval closed in hist (#22090)
  • Relative path resolution for plugin libraries (#21911)
  • Avoiding panic with striptime for out-of-bounds dates (#21208)
  • Join revmaps for categoricals in merge_sorted (#21976)
  • Fix glob expansion matching extra files (#21991)
  • Ensure SQL dot-notation for nested column fields resolves correctly (#22109)
  • Parquet filter performance regression from multiscan dispatch (#22116)
  • Panic for unequal length ewm_mean_by args (#22093)
  • Add scalarity checks to pl.repeat (#22088)
  • Type check n parameter of pl.repeat (#22071)
  • Mark bitwise_{count,leading,trailing}_{ones,zeros} as elementwise (#22044)
  • Mark pl.*_ranges functions correctly as element-wise (#22059)
  • Correctly type check pl.arctan2 (#22060)
  • Mark pl.business_day_count as elementwise (#22055)
  • Check input python type for str.extract_groups (#22032)
  • Check types for fill_char in str.pad_{start,end} (#22036)
  • Mark str.to_decimal properly as non-elementwise (#22040)
  • Documented return type for bin.encode and bin.decode (#22022)
  • Revert #22017 and improve block(_in_place)_on doc comment (#22031)
  • Remove outdated depth warning (#22030)
  • Expression pl.concat was incorrectly marked as elementwise (#22019)
  • Use block_in_place_on to start streaming (#22017)
  • Panic on empty aggregation in streaming (#22016)
  • Error instead of panick for invalid durations in dt.offset_by() and dt.round() (#21982)
  • Raise error instead of silently appending NULL in NDJSON parsing (#21953)
  • Ensure AV is static before pushing to row buffer (#21967)
  • Deadlock in new-streaming multiplexer (#21963)
  • Release GIL in collect_with_callback (#21941)
  • Panic in new RegexCache (#21935)
  • Type hint of cs.exclude() is SelectorType instead of Expr (#21892)
  • Add correct deprecation warning for .str.concat (#21666)
  • Use absolute paths by defaults for plugins (#21904)

📖 Documentation

  • Add user guide section on working with Sheets in Colab (#22161)
  • Update distributed engine docs (#22128)
  • Add Polars Cloud release notes (#22021)
  • Remove trailing space in settings POLARS_CLOUD_CLIENT_ID (#21995)
  • Fix typo (#21954)
  • Fix 'pickleable' typo in docs (#21938)
  • Change ctx to compute=ctx for all remote query examples (#21930)

🛠️ Other improvements

  • Remove old MultiScanExec for in-memory (#22184)
  • Separate FunctionOptions from DSL calls (#22133)
  • Undeprecate backward_fill and forward_fill (#22156)
  • Handle conversion of Duration specially in pyir (#22101)
  • Deprecate duplicate backward_fill and forward_fill interface (#22083)
  • Solve clippy lints for 1.86 (#22102)
  • Remove rust exclusive MaxBound and MinBound fill strategies (#22063)
  • Change Partition API to base_path and file_path (#21888)
  • Fix pydantic model_fields deprecation (#21958)

Thank you to all our contributors for making this release possible!
@DeflateAwning, @EnricoMi, @Jacob640, @JakubValtar, @MarcoGorelli, @MaxJackson, @alexander-beedie, @amotzop, @anath2, @bschoenmaeckers, @cnpryer, @coastalwhite, @dependabot[bot], @eitsupi, @etiennebacher, @hemanth94, @kdn36, @mcrumiller, @nameexhaustion, @orlp, @r-brink, @rgertenbach, @ritchie46, @sebasv, @silannisik, @stijnherfst, @wence-, @zachlefevre and dependabot[bot]