Releases: pola-rs/polars
Rust Polars 0.48.1
🚀 Performance improvements
- Switch eligible casts to non-strict in optimizer (#22850)
🐞 Bug fixes
- Fix RuntimeError when serializing the same DataFrame from multiple threads (#22844)
📦 Build system
🛠️ Other improvements
- Update Rust Polars versions (#22854)
Thank you to all our contributors for making this release possible!
@JakubValtar, @bschoenmaeckers, @nameexhaustion and @stijnherfst
Python Polars 1.30.0
🚀 Performance improvements
- Switch eligible casts to non-strict in optimizer (#22850)
- Allow predicate passing set_sorted (#22797)
- Increase default cross-file parallelism limit for new-streaming multiscan (#22700)
- Add elementwise execution mode for
list.eval
(#22715) - Support optimised init from non-dict
Mapping
objects infrom_records
and frame/series constructors (#22638) - Add streaming cross-join node (#22581)
- Switch off
maintain_order
in group-by followed by sort (#22492)
✨ Enhancements
- Load AWS
endpoint_url
using boto3 (#22851) - Implemented
list.filter
(#22749) - Support binaryoffset in search sorted (#22786)
- Add
nulls_equal
flag tolist/arr.contains
(#22773) - Implement
LazyFrame.match_to_schema
(#22726) - Improved time-string parsing and inference (generally, and via the SQL interface) (#22606)
- Allow for
.over
to be called withoutpartition_by
(#22712) - Support
AnyValue
translation fromPyMapping
values (#22722) - Support optimised init from non-dict
Mapping
objects infrom_records
and frame/series constructors (#22638) - Support inference of
Int128
dtype from databases that support it (#22682) - Add options to write Parquet field metadata (#22652)
- Add
cast_options
parameter to control type casting inscan_parquet
(#22617) - Allow casting
List<UInt8>
toBinary
(#22611) - Allow setting of regex size limit using
POLARS_REGEX_SIZE_LIMIT
(#22651) - Support use of literal values as "other" when evaluating
Series.zip_with
(#22632) - Allow to read and write custom file-level parquet metadata (#21806)
- Support PEP702
@deprecated
decorator behaviour (#22594) - Support grouping by
pl.Array
(#22575) - Preserve exception type and traceback for errors raised from Python (#22561)
- Use fixed-width font in streaming phys plan graph (#22540)
🐞 Bug fixes
- Fix RuntimeError when serializing the same DataFrame from multiple threads (#22844)
- Fix map_elements predicate pushdown (#22833)
- Fix reverse list type (#22832)
- Don't require numpy for search_sorted (#22817)
- Add type equality checking for relevant methods (#22802)
- Invalid output for
fill_null
afterwhen.then
on structs (#22798) - Don't panic for cross join with misaligned chunking (#22799)
- Panic on quantile over nulls in rolling window (#22792)
- Respect BinaryOffset metadata (#22785)
- Correct the output order of
PartitionByKey
andPartitionParted
(#22778) - Fallback to non-strict casting for deprecated casts (#22760)
- Clippy on new stable version (#22771)
- Handle sliced out remainder for bitmaps (#22759)
- Don't merge
Enum
categories on append (#22765) - Fix unnest() not working on empty struct columns (#22391)
- Fix the default value type in
Schema
init (#22589) - Correct name in
unnest
error message (#22740) - Provide "schema" to
DataFrame
, even if empty JSON (#22739) - Properly account for nulls in the
is_not_nan
check made indrop_nans
(#22707) - Incorrect result from SQL
count(*)
withpartition by
(#22728) - Fix deadlock joining scanned tables with low thread count (#22672)
- Don't allow deserializing incompatible DSL (#22644)
- Incorrect null dtype from binary ops in empty group_by (#22721)
- Don't mark
str.replace_many
with Mapping as deprecated (#22697) - Gzip has maximum compression of 9, not 10 (#22685)
- Fix predicate pushdown of fallible expressions (#22669)
- Fix
index out of bounds
panic when scanning hugging face (#22661) - Panic on
group_by
with literal and empty rows (#22621) - Return input instead of panicking if empty subset in
drop_nulls()
anddrop_nans()
(#22469) - Bump argminmax to 0.6.3 (#22649)
- DSL version deserialization endianness (#22642)
- Allow Expr.round() to be called on integer dtypes (#22622)
- Fix panic when filtering based on row index column in parquet (#22616)
- WASM and PyOdide compile (#22613)
- Resolve
get()
SchemaMismatch panic (#22350) - Panic in group_by_dynamic on single-row df with group_by (#22597)
- Add
new_streaming
feature topolars
crate (#22601) - Consistently use Unix epoch as origin for
dt.truncate
(except weekly buckets which start on Mondays) (#22592) - Fix interpolate on dtype Decimal (#22541)
- CSV count rows skipped last line if file did not end with newline (#22577)
- Make nested strict casting actually strict (#22497)
- Make
replace
andreplace_strict
mapping use list literals (#22566) - Allow pivot on
Time
column (#22550) - Fix error when providing CSV schema with extra columns (#22544)
- Panic on bitwise op between Series and Expr (#22527)
- Multi-selector regex expansion (#22542)
📖 Documentation
- Add pre-release policy (#22808)
- Fix broken link to service account page in Polars Cloud docs (#22762)
- Add
match_to_schema
to API reference (#22777) - Provide additional explanation and examples for the
value_counts
"normalize" parameter (#22756) - Rework documentation for
drop
/fill
for nulls/nans (#22657) - Add documentation to new
RoundMode
parameter inround
(#22555) - Add missing
repeat_by
to API reference, fixuplist.get
(#22698) - Fix non-rendering bullet points in
scan_iceberg
(#22694) - Improve
insert_column
docstring (description and examples) (#22551) - Improve
join
documentation (#22556)
📦 Build system
- Fix building
polars-lazy
with certain features (#22846) - Add missing features (#22839)
- Patch pyo3 to disable recompilation (#22796)
🛠️ Other improvements
- Update Rust Polars versions (#22854)
- Add basic smoke test for free-threaded python (#22481)
- Update Polars Rust versions (#22834)
- Fix
nix build
(#22809) - Fix flake.nix to work on macos (#22803)
- Unused variables on release build (#22800)
- Update cloud docs (#22624)
- Fix unstable
list.eval
performance test (#22729) - Add proptest implementations for all Array types (#22711)
- Dispatch
.write_*
to.lazy().sink_*(engine='in-memory')
(#22582) - Move to all optimization flags to
QueryOptFlags
(#22680) - Add test for
str.replace_many
(#22615) - Stabilize
sink_*
(#22643) - Add proptest for row-encode (#22626)
- Update rust version in nix flake (#22627)
- Add a nix flake with a devShell and package (#22246)
- Use a wrapper struct to store time zone (#22523)
- Add
proptest
testing for for parquet decoding kernels (#22608) - Include equiprobable as valid quantile method (#22571)
- Remove confusing error context calling
.collect(_eager=True)
(#22602) - Fix test_truncate_path test case (#22598)
- Unify function flags into 1 bitset (#22573)
- Display the operation behind
in-memory-map
(#22552)
Thank you to all our contributors for making this release possible!
@IvanIsCoding, @JakubValtar, @Julian-J-S, @LucioFranco, @MarcoGorelli, @WH-2099, @alexander-beedie, @borchero, @bschoenmaeckers, @cmdlineluser, @coastalwhite, @etiennebacher, @florian-klein, @itamarst, @kdn36, @mcrumiller, @nameexhaustion, @nikaltipar, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @teotwaki, @timkpaine and @wence-
Rust Polars 0.48.0
💥 Breaking changes
- Use a wrapper struct to store time zone (#22523)
🚀 Performance improvements
- Allow predicate passing set_sorted (#22797)
- Increase default cross-file parallelism limit for new-streaming multiscan (#22700)
- Add elementwise execution mode for
list.eval
(#22715) - Support optimised init from non-dict
Mapping
objects infrom_records
and frame/series constructors (#22638) - Add streaming cross-join node (#22581)
- Switch off
maintain_order
in group-by followed by sort (#22492)
✨ Enhancements
- Format named functions (#22831)
- Implemented
list.filter
(#22749) - Support binaryoffset in search sorted (#22786)
- Add
nulls_equal
flag tolist/arr.contains
(#22773) - Allow named opaque functions for serde (#22734)
- Implement
LazyFrame.match_to_schema
(#22726) - Improved time-string parsing and inference (generally, and via the SQL interface) (#22606)
- Allow for
.over
to be called withoutpartition_by
(#22712) - Support
AnyValue
translation fromPyMapping
values (#22722) - Support optimised init from non-dict
Mapping
objects infrom_records
and frame/series constructors (#22638) - Add options to write Parquet field metadata (#22652)
- Allow casting
List<UInt8>
toBinary
(#22611) - Allow setting of regex size limit using
POLARS_REGEX_SIZE_LIMIT
(#22651)
🐞 Bug fixes
- Fix reverse list type (#22832)
- Add type equality checking for relevant methods (#22802)
- Invalid output for
fill_null
afterwhen.then
on structs (#22798) - Don't panic for cross join with misaligned chunking (#22799)
- Panic on quantile over nulls in rolling window (#22792)
- Respect BinaryOffset metadata (#22785)
- Correct the output order of
PartitionByKey
andPartitionParted
(#22778) - Fallback to non-strict casting for deprecated casts (#22760)
- Clippy on new stable version (#22771)
- Handle sliced out remainder for bitmaps (#22759)
- Don't merge
Enum
categories on append (#22765) - Fix unnest() not working on empty struct columns (#22391)
- Correct name in
unnest
error message (#22740) - Properly account for nulls in the
is_not_nan
check made indrop_nans
(#22707) - Incorrect result from SQL
count(*)
withpartition by
(#22728) - Fix deadlock joining scanned tables with low thread count (#22672)
- Don't allow deserializing incompatible DSL (#22644)
- Incorrect null dtype from binary ops in empty group_by (#22721)
- Don't mark
str.replace_many
with Mapping as deprecated (#22697) - Gzip has maximum compression of 9, not 10 (#22685)
- Fix predicate pushdown of fallible expressions (#22669)
- Fix
index out of bounds
panic when scanning hugging face (#22661) - Fix polars crate not compiling when lazy feature enabled (#22655)
- Panic on
group_by
with literal and empty rows (#22621) - Return input instead of panicking if empty subset in
drop_nulls()
anddrop_nans()
(#22469) - Bump argminmax to 0.6.3 (#22649)
- DSL version deserialization endianness (#22642)
- Fix nested dtype row encoding (#22557)
- Allow Expr.round() to be called on integer dtypes (#22622)
- Fix panic when filtering based on row index column in parquet (#22616)
- WASM and PyOdide compile (#22613)
- Resolve
get()
SchemaMismatch panic (#22350)
📖 Documentation
- Add pre-release policy (#22808)
- Fix broken link to service account page in Polars Cloud docs (#22762)
- Rework documentation for
drop
/fill
for nulls/nans (#22657)
📦 Build system
- Patch pyo3 to disable recompilation (#22796)
🛠️ Other improvements
- Update Polars Rust versions (#22834)
- Cleanup
polars-python
lifetimes (#22548) - Fix
nix build
(#22809) - Fix flake.nix to work on macos (#22803)
- Remove unused dependencies in
polars-arrow
(#22806) - Unused variables on release build (#22800)
- Update cloud docs (#22624)
- Add proptest implementations for all Array types (#22711)
- Dispatch
.write_*
to.lazy().sink_*(engine='in-memory')
(#22582) - Move to all optimization flags to
QueryOptFlags
(#22680) - Add test for
str.replace_many
(#22615) - Stabilize
sink_*
(#22643) - Add proptest for row-encode (#22626)
- Emphasize PolarsDataType::get_dtype is static-only (#22648)
- Use named fields for Logical (#22647)
- Update rust version in nix flake (#22627)
- Add a nix flake with a devShell and package (#22246)
- Use a wrapper struct to store time zone (#22523)
- Add
proptest
testing for for parquet decoding kernels (#22608)
Thank you to all our contributors for making this release possible!
@IvanIsCoding, @JakubValtar, @Julian-J-S, @LucioFranco, @MarcoGorelli, @WH-2099, @alexander-beedie, @borchero, @bschoenmaeckers, @cmdlineluser, @coastalwhite, @etiennebacher, @florian-klein, @itamarst, @kdn36, @nameexhaustion, @nikaltipar, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @teotwaki, @timkpaine and @wence-
Python Polars 1.30.0-beta.1
🚀 Performance improvements
- Increase default cross-file parallelism limit for new-streaming multiscan (#22700)
- Add elementwise execution mode for
list.eval
(#22715) - Support optimised init from non-dict
Mapping
objects infrom_records
and frame/series constructors (#22638) - Add streaming cross-join node (#22581)
- Switch off
maintain_order
in group-by followed by sort (#22492)
✨ Enhancements
- Support binaryoffset in search sorted (#22786)
- Add
nulls_equal
flag tolist/arr.contains
(#22773) - Implement
LazyFrame.match_to_schema
(#22726) - Improved time-string parsing and inference (generally, and via the SQL interface) (#22606)
- Allow for
.over
to be called withoutpartition_by
(#22712) - Support
AnyValue
translation fromPyMapping
values (#22722) - Support optimised init from non-dict
Mapping
objects infrom_records
and frame/series constructors (#22638) - Support inference of
Int128
dtype from databases that support it (#22682) - Add options to write Parquet field metadata (#22652)
- Add
cast_options
parameter to control type casting inscan_parquet
(#22617) - Allow casting List<UInt8> to Binary (#22611)
- Allow setting of regex size limit using
POLARS_REGEX_SIZE_LIMIT
(#22651) - Support use of literal values as "other" when evaluating
Series.zip_with
(#22632) - Allow to read and write custom file-level parquet metadata (#21806)
- Support PEP702
@deprecated
decorator behaviour (#22594) - Support grouping by
pl.Array
(#22575) - Preserve exception type and traceback for errors raised from Python (#22561)
- Use fixed-width font in streaming phys plan graph (#22540)
🐞 Bug fixes
- Respect BinaryOffset metadata (#22785)
- Correct the output order of
PartitionByKey
andPartitionParted
(#22778) - Fallback to non-strict casting for deprecated casts (#22760)
- Clippy on new stable version (#22771)
- Handle sliced out remainder for bitmaps (#22759)
- Don't merge
Enum
categories on append (#22765) - Fix unnest() not working on empty struct columns (#22391)
- Fix the default value type in
Schema
init (#22589) - Correct name in
unnest
error message (#22740) - Provide "schema" to
DataFrame
, even if empty JSON (#22739) - Properly account for nulls in the
is_not_nan
check made indrop_nans
(#22707) - Incorrect result from SQL
count(*)
withpartition by
(#22728) - Fix deadlock joining scanned tables with low thread count (#22672)
- Don't allow deserializing incompatible DSL (#22644)
- Incorrect null dtype from binary ops in empty group_by (#22721)
- Don't mark
str.replace_many
with Mapping as deprecated (#22697) - Gzip has maximum compression of 9, not 10 (#22685)
- Fix predicate pushdown of fallible expressions (#22669)
- Fix
index out of bounds
panic when scanning hugging face (#22661) - Panic on
group_by
with literal and empty rows (#22621) - Return input instead of panicking if empty subset in
drop_nulls()
anddrop_nans()
(#22469) - Bump argminmax to 0.6.3 (#22649)
- DSL version deserialization endianness (#22642)
- Allow Expr.round() to be called on integer dtypes (#22622)
- Fix panic when filtering based on row index column in parquet (#22616)
- WASM and PyOdide compile (#22613)
- Resolve
get()
SchemaMismatch panic (#22350) - Panic in group_by_dynamic on single-row df with group_by (#22597)
- Add
new_streaming
feature topolars
crate (#22601) - Consistently use Unix epoch as origin for
dt.truncate
(except weekly buckets which start on Mondays) (#22592) - Fix interpolate on dtype Decimal (#22541)
- CSV count rows skipped last line if file did not end with newline (#22577)
- Make nested strict casting actually strict (#22497)
- Make
replace
andreplace_strict
mapping use list literals (#22566) - Allow pivot on
Time
column (#22550) - Fix error when providing CSV schema with extra columns (#22544)
- Panic on bitwise op between Series and Expr (#22527)
- Multi-selector regex expansion (#22542)
📖 Documentation
- Fix broken link to service account page in Polars Cloud docs (#22762)
- Add
match_to_schema
to API reference (#22777) - Provide additional explanation and examples for the
value_counts
"normalize" parameter (#22756) - Rework documentation for
drop
/fill
for nulls/nans (#22657) - Add documentation to new
RoundMode
parameter inround
(#22555) - Add missing
repeat_by
to API reference, fixuplist.get
(#22698) - Fix non-rendering bullet points in
scan_iceberg
(#22694) - Improve
insert_column
docstring (description and examples) (#22551) - Improve
join
documentation (#22556)
🛠️ Other improvements
- Update cloud docs (#22624)
- Fix unstable
list.eval
performance test (#22729) - Add proptest implementations for all Array types (#22711)
- Dispatch
.write_*
to.lazy().sink_*(engine='in-memory')
(#22582) - Move to all optimization flags to
QueryOptFlags
(#22680) - Add test for
str.replace_many
(#22615) - Stabilize
sink_*
(#22643) - Add proptest for row-encode (#22626)
- Update rust version in nix flake (#22627)
- Add a nix flake with a devShell and package (#22246)
- Use a wrapper struct to store time zone (#22523)
- Add
proptest
testing for for parquet decoding kernels (#22608) - Include equiprobable as valid quantile method (#22571)
- Remove confusing error context calling
.collect(_eager=True)
(#22602) - Fix test_truncate_path test case (#22598)
- Unify function flags into 1 bitset (#22573)
- Display the operation behind
in-memory-map
(#22552)
Thank you to all our contributors for making this release possible!
@JakubValtar, @Julian-J-S, @MarcoGorelli, @WH-2099, @alexander-beedie, @borchero, @cmdlineluser, @coastalwhite, @etiennebacher, @florian-klein, @itamarst, @kdn36, @mcrumiller, @nameexhaustion, @nikaltipar, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @teotwaki, @timkpaine and @wence-
Rust Polars 0.47.1
🏆 Highlights
- Enable common subplan elimination across plans in
collect_all
(#21747) - Add lazy sinks (#21733)
- Add
PartitionByKey
for new streaming sinks (#21689) - Enable new streaming memory sinks by default (#21589)
💥 Breaking changes
- Make bottom interval closed in
hist
(#22090)
🚀 Performance improvements
- Avoid alloc_zeroed in decompression (#22460)
- Lower Expr.(n_)unique to group_by on streaming engine (#22420)
- Chunk huge munmap calls (#22414)
- Add single-key variants of streaming group_by (#22409)
- Improve accumulate_dataframes_vertical performance (#22399)
- Use optimize rolling_quantile with varying window sizes (#22353)
- Dedicated
rolling_skew
kernel (#22333) - Call large munmap's in background thread (#22329)
- New streaming group_by implementation (#22285)
- Patch jemalloc to not purge huge allocs eagerly if we have background threads (#22318)
- Turn on
parallel=prefiltered
by default for new streaming (#22190) - Add CSE to streaming groupby (#22196)
- Speed-up new streaming predicate filtering (#22179)
- Speedup new-streaming file row count (#22169)
- Fix quadratic behavior when casting Enums (#22008)
- Lower is_in to bitmap-output semi-join in new streaming engine (#21948)
- Fast path for empty inner join (#21965)
- Add native semi/anti join in new streaming engine (#21937)
- Cache regex compilation globally (#21929)
- Use views for binary hash tables and add single-key binary variant (#21872)
- Avoid rechunking in gather (#21876)
- Switch ahash for foldhash (#21852)
- Put THP behind feature flag (#21853)
- Enable THP by default (#21829)
- Improve join performance for expanding joins (#21821)
- Use binary_search instead of contains in business-day functions (#21775)
- Implement linear-time rolling_min/max (#21770)
- Improve InputIndependentSelect by delegating to InMemorySourceNode (#21767)
- Enable common subplan elimination across plans in
collect_all
(#21747) - Allow elementwise functions in recursive lowering (#21653)
- Add primitive single-key hashtable to new-streaming join (#21712)
- Remove unnecessary black_boxes in Kahan summation (#21679)
- Box large enum variants (#21657)
- Improve join performance for new-streaming engine (#21620)
- Pre-fill caches (#21646)
- Optimize only a single cache input (#21644)
- Collect parquet statistics in one contiguous buffer (#21632)
- Update Cargo.lock (mainly for zstd 1.5.7) (#21612)
- Don't maintain order when maintain_order=False in new streaming sinks (#21586)
- Pre-sort groups in group-by-dynamic (#21569)
- Provide a fallback skip batch predicate for constant batches (#21477)
- Parallelize the passing in new streaming multiscan (#21430)
- Toggle projection pushdown for eager rolling (#21405)
- Fix pathologic
rolling + group-by
performance and memory explosion (#21403) - Add sampling to new-streaming equi join to decide between build/probe side (#21197)
- Reduce sharing in stringview arrays in new-streaming equijoin (#21129)
- Implement native Expr.count() on new-streaming (#21126)
- Speed up list operations that use amortized_iter() (#20964)
- Use Cow as output for rechunk and add rechunk_mut (#21116)
- Reduce arrow slice mmap overhead (#21113)
- Reduce conversion cost in chunked string gather (#21112)
- Enable prefiltered by default for new streaming (#21109)
- Enable parquet column expressions for streaming (#21101)
- Deduplicate buffers again in stringview concat kernel (#21098)
- Add dedicated concatenate kernels (#21080)
- Rechunk only once during join probe gather (#21072)
- Speed up from_pandas when converting frame with multi-index columns (#21063)
- Change default memory prefetch to MADV_WILLNEED (#21056)
- Remove cast to boolean after comparison in optimizer (#21022)
- Split last rowgroup among all threads in new-streaming parquet reader (#21027)
- Recombine into larger morsels in new-streaming join (#21008)
- Improve
list.min
andlist.max
performance for logical types (#20972) - Ensure count query select minimal columns (#20923)
✨ Enhancements
- Support grouping by
pl.Array
(#22575) - Preserve exception type and traceback for errors raised from Python (#22561)
- Use fixed-width font in streaming phys plan graph (#22540)
- Highlight nodes in streaming phys plan graph (#22535)
- Support BinaryOffset serde (#22528)
- Show physical stage graph (#22491)
- Add structure for dispatching iceberg to native scans (#22405)
- Add SQL support for checking array values with
IN
andNOT IN
expressions (#22487) - Add more IRBuilder utils (#22482)
- Support
DataFrame
andSeries
init from torchTensor
objects (#22177) - Add
RoundMode
for Decimal and Float (#22248) - Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)
- Make streaming dispatch public (#22347)
- Add
rolling_kurtosis
(#22335) - Support Cast in IO plugin predicates (#22317)
- Add
.sort(nulls_last=True)
to booleans, categoricals and enums (#22300) - Add rolling min/max for temporals (#22271)
- Support literal:list agg (#22249)
- Support
implode + agg
(#22230) - Dispatch scans to new-streaming by default (#22153)
- Improved expression autocomplete for
IPython
,Jupyter
, andMarimo
(#22221) - Expose
FunctionIR::FastCount
in the python visitor (#22195) - Add
SPLIT_PART
string function to the SQL interface (#22158) - Allow scalar expr in
Expr.diff
(#22142) - Support additional unsigned int aliases in the SQL interface (#22127)
- Add
STRING_TO_ARRAY
function to the SQL interface (#22129) - Add dt.is_business_day (#21776)
- Add support for
Int128
parsing/recognition to the SQL interface (#22104) - Allow sinking to abstract python
io
andfs
classes (#21987) - Add
add_alp_optimize_exprs
toIRBuilder
(#22061) - Add
cat.slice
(#21971) - Support growing schema if line lenght increases during csv schema inference (#21979)
- Replace thread unsafe
GilOnceCell
withMutex
(#21927) - Support modified dsl in file cache (#21907)
- Add support for io-plugins in new-streaming (#21870)
- Add
PartitionParted
(#21788) - Add DoubleEndedIterator for CatIter (#21816)
- Minor improvements to EXPLAIN plan output (#21822)
- Add
polars_testing
folder with relevant files andadd_series_equal!()
functionality (#21722) - Allow to use
repeat_by
with (nested) lists and structs (#21206) - Add support for rolling_(sum/min/max) for booleans through casting (#21748)
- Support multi-column sort for all nested types and nested search-sorted (#21743)
- Add lazy sinks (#21733)
- Add
PartitionByKey
for new streaming sinks (#21689) - Fix replace flags (#21731)
- Add
mkdir
flag to sinks (#21717) - Enable joins on list/array dtypes (#21687)
- Add a config option to specify the default engine to attempt to use during lazyframe calls (#20717)
- Support all elementwise functions in IO plugin predicates (#21705)
- Stabilize Enum datatype (#21686)
- Support Polars int128 in from arrow (#21688)
- Use FFI to read dataframe instead of transmute (#21673)
- Enable new streaming memory sinks by default (#21589)
- Cloud support for new-streaming scans and sinks (#21621)
- Add len method to arr (#21618)
- Closeable files on unix (#21588)
- Add new
PartitionMaxSize
sink (#21573) - Implement
unpack_dtypes()
functionality with unit tests (#21574) - Support engine callback for
LazyFrame.profile
(#21534) - Dispatch new-streaming CSV negative slice to separate node (#21579)
- Add NDJSON source to new streaming engine (#21562)
- Add lossy decoding to
read_csv
for non-utf8 encodings (#21433) - Add 'nulls_equal' parameter to
is_in
(#21426) - Improve numeric stability
rolling_{std, var, cov, corr}
(#21528) - IR Serde cross-filter (#21488)
- Support writing
Time
type in json (#21454) - Activate all optimizations in sinks (#21462)
- Add
AssertionError
variant toPolarsError
inpolars-error
(#21460) - Pass filter to inner readers in multiscan new streaming (#21436)
- Implement i128 -> str cast (#21411)
- Version DSL (#21383)
- Make user facing binary formats mostly self describing (#21380)
- Filter hive files using predicates in new streaming (#21372)
- Add negative slicing to new streaming multiscan (#21219)
- Pub-licize Expr DSL Function enums (#20421)
- Implement sorted flags for struct series (#21290)
- Support reading arrow Map type from Delta (#21330)
- Add a dedicated
remove
method forDataFrame
andLazyFrame
(#21259) - Expose
include_file_paths
to python visitor (#21279) - Implement
merge_sorted
for struct (#21205) - Add positive slice for new streaming MultiScan (#21191)
- Don't take in rewriting visitor (#21212)
- Add SQL support for the
DELETE
statement (#21190) - Add row index to new streaming multiscan (#21169)
- Improve DataFrame fmt in explain (#21158)
- Add projection pushdown to new streaming multiscan (#21139)
- Implement join on struct dtype (#21093)
- Use unique temporary directory path per user and restrict permissions (#21125)
- Enable new streaming multiscan for CSV (#21124)
- Environment
POLARS_MAX_CONCURRENT_SCANS
in multiscan for new streaming (#21127) - Multi/Hive scans in new streaming engine (#21011)
- Add
linear_spaces
(#20941) - Implement
merge_sorted
for binary (#21045) - Hold string cache in new streaming engine and fix row-encoding (#21039)
- Support max/min method for Time dtype (#19815)
- Implement a streaming merge sorted node (#20960)
- Automatically use temporary credentials API for scanning Unity catalog tables (#21020)
- Add negative slice support to new-streaming engine (#21001)
- Allow for more RG skipping by rewriting expr in planner (#20828)
- Rename catalog
schema
tonamespace
(#20993) - Add functionality to create and delete catalogs, tables and schemas to Unity catalog client (#20956)
- Improved support for KeyboardInterrupts (#20961...
Python Polars 1.29.0
🚀 Performance improvements
- Avoid alloc_zeroed in decompression (#22460)
✨ Enhancements
- Highlight nodes in streaming phys plan graph (#22535)
- Show physical stage graph (#22491)
- Add structure for dispatching iceberg to native scans (#22405)
- Add SQL support for checking array values with
IN
andNOT IN
expressions (#22487) - Support
DataFrame
andSeries
init from torchTensor
objects (#22177) - Add
RoundMode
for Decimal and Float (#22248) - Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)
🐞 Bug fixes
- Streaming outer join coalesce bug (#22530)
- Remove redundant print statement in
assert_frame_schema_equal()
(#22529) - Bug in
.unique()
followed by.slice()
(#22471) - Fix error reading parquet with datetimes written by pandas (#22524)
- Fix
schema_overrides
not taking effect in NDJSON (#22521) - Fold flags and verify scalar correctness in apply (#22519)
- Invalid values were triggering panics instead of returning
null
indt.to_date
/dt.to_datetime
(#22500) - Ensure numpy
isinstance
check is lazy (avoid forcing the dependency) (#22486) - Incorrectly dropped sort after unique for some queries (#22489)
- Fix incorrect ternary agg state with mixed columns and scalars (#22496)
- Make
replace
andreplace_strict
properly elementwise (#22465) - Fix index out of bounds panic on parquet prefiltering (#22458)
- Integer underflow when checking parquet UTF-8 (#22472)
- Add implementation for
array.get
with idx overflow (#22449) - Deprecate
str.
collection functions with flat strings and mark as elementwise (#22461) - Deprecate flat
list.gather
and mark as elementwise (#22456) - Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)
📖 Documentation
- Fix typo in structs page (#22504)
🛠️ Other improvements
- Don't store name/dtype in grouper (#22525)
- Add structure for dispatching iceberg to native scans (#22405)
- Remove unused reduction code (#22462)
- Pin to explicit macOS version in code coverage (#22432)
Thank you to all our contributors for making this release possible!
@AH-Merii, @JakubValtar, @Julian-J-S, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @Matt711, @alexander-beedie, @brianmakesthings, @coastalwhite, @nameexhaustion, @orlp and @ritchie46
Python Polars 1.28.1
🐞 Bug fixes
- Reading of reencoded categorical in Parquet (#22436)
- Last thread in parquet predicate filter oob (#22429)
📖 Documentation
📦 Build system
- Update
pyo3
andnumpy
crates to version0.24
(#22015)
🛠️ Other improvements
- Add test for
implode
+over
(#22437) - Fix CI by removing use_legacy_dataset (#22438)
- Only use pytorch index-url for
pytorch
package (#22355)
Thank you to all our contributors for making this release possible!
@bschoenmaeckers, @coastalwhite, @etiennebacher, @mcrumiller and @ritchie46
Python Polars 1.28.0
🚀 Performance improvements
- Lower Expr.(n_)unique to group_by on streaming engine (#22420)
- Chunk huge munmap calls (#22414)
- Add single-key variants of streaming group_by (#22409)
- Improve accumulate_dataframes_vertical performance (#22399)
- Use optimize rolling_quantile with varying window sizes (#22353)
- Dedicated
rolling_skew
kernel (#22333) - Call large munmap's in background thread (#22329)
- New streaming group_by implementation (#22285)
- Patch jemalloc to not purge huge allocs eagerly if we have background threads (#22318)
- Turn on
parallel=prefiltered
by default for new streaming (#22190)
✨ Enhancements
- When reporting unexpected types in errors, module-qualify the typename (#22390)
- Add Series
backward_fill
/forward_fill
(#22360) - Add GPU support to sink_* APIs (#20940)
- Changed mapping type from
dict
toMapping
(#19400) (#19436) - Make streaming dispatch public (#22347)
- Add
rolling_kurtosis
(#22335) - Support Cast in IO plugin predicates (#22317)
- Add
.sort(nulls_last=True)
to booleans, categoricals and enums (#22300) - Add rolling min/max for temporals (#22271)
- Support literal:list agg (#22249)
- Support running Polars SQL queries against any objects implementing the PyCapsule interface (#22235)
- Support
implode + agg
(#22230) - Dispatch scans to new-streaming by default (#22153)
🐞 Bug fixes
- Ensure
write_excel
correctly preserves null values in nested dtype data on export (#22379) - Panic when visualizing streaming physical plan with joins (#22404)
- Fix incorrect filter after
LazyFrame.rename().select()
(#22380) - Fix
select(len())
performance regression (#22363) - Handle pytz named timezone in
lit
(#21785) - Don't leak state during prefill CSE cache (#22341)
- Maintain float32 type in partitioned group-by (#22340)
- Resolve streaming panic on multiple
merge_sorted
(#22205) - Fix ndjson nested types (#22325)
- Fix nested datetypes in ndjson (#22321)
- Check matching lengths for
pl.corr
(#22305) - Move type coercion for
pl.duration
to planner (#22304) - Check dtype to avoid panic with mixed types in min/max_horizontal (#21857)
- Coalesce correct column for new streaming full join (#22301)
- Don't collect
NaN
from Parquet Statistics (#22294) - Set revmap for empty
AnyValue
toSeries
(#22293) - Add an
__all__
entry to internal type definition module (#22254) - Datetime parser was incorrectly parsing 8-digit fractional seconds when format specified to expect 9 (#22180)
- More robust
str → date
conversion when reading from spreadsheet (#22276) - Deprecate using
is_in
with 2 equal types and mark as elementwise (#22178) - Duplicate key column name in streaming group_by due to CSE (#22280)
- Raise
ColumnNotFoundError
for missing columns injoin_where
(#22268) - Parquet filters for logical types and operations (#22253)
- Ensure floating-point accuracy in
hist
(#22245) - Check matching key datatypes for new streaming joins (#22247)
- Incorrect length BinaryArray/ListBuilder (#22227)
📖 Documentation
- Update docs for schema arg in scan_csv to match read_csv (#22357)
- Update
pl.when
documentation (#22345) - Add missing
is_business_day
to documentation reference (#22338) - Improve interpolation documentation to clarify behavior of null values (#22274)
🛠️ Other improvements
- Install pytorch for 3.13 on Windows (#22356)
- Make interpolate fix more robust (#22421)
- Fix interpolate test (#22417)
- Reduce hot table size in debug mode (#22400)
- Replace intrinsic with non-intrinsic (#22401)
- Make streaming dispatch public (#22347)
- Update rustc to 'nightly-2025-04-19' (#22342)
- Update mozilla-actions/sccache-action (#22319)
- Purge old parquet and scan code (#22226)
- Add an
__all__
entry to internal type definition module (#22254) - Add online skew/kurtosis algorithm for future use in rolling kernels (#22261)
- Add Polars Cloud 0.0.7 release notes (#22223)
- Change format name from list to implode (#22240)
- Make other parallel parquet modes filter afterwards (#22228)
- Close async reader issues (#22224)
- Add BinaryArrayBuilder (#22225)
Thank you to all our contributors for making this release possible!
@DavideCanton, @JakubValtar, @Jesse-Bakker, @MarcoGorelli, @NeejWeej, @Shoeboxam, @adamreeve, @alexander-beedie, @axellpadilla, @cmdlineluser, @coastalwhite, @d-reynol, @dongchao-1, @florian-klein, @kdn36, @math-hiyoko, @mcrumiller, @mroeschke, @nameexhaustion, @orlp, @ritchie46, @stijnherfst and @yiteng-guo
Python Polars 1.27.1
✨ Enhancements
- Improved expression autocomplete for
IPython
,Jupyter
, andMarimo
(#22221)
🐞 Bug fixes
- Incorrect condition on empty inner join fast path (#22208)
- Fallback predicate filter for
min=max
withis_in
(#22213) - Don't panic for
LruCachedFunc
forsize=0
(#22215) - Writing masked out list values to json (#22210)
- Deadlock in streaming distributor (#22207)
Thank you to all our contributors for making this release possible!
@Matt711, @alexander-beedie, @coastalwhite, @dependabot[bot], @orlp, @ritchie46 and dependabot[bot]
Python Polars 1.27.0
💥 Breaking changes
- Make bottom interval closed in
hist
(#22090) - Change Partition API to
base_path
andfile_path
(#21888)
🚀 Performance improvements
- Add CSE to streaming groupby (#22196)
- Speed-up new streaming predicate filtering (#22179)
- Speedup new-streaming file row count (#22169)
- Fix quadratic behavior when casting Enums (#22008)
- Lower is_in to bitmap-output semi-join in new streaming engine (#21948)
- Fast path for empty inner join (#21965)
- Add native semi/anti join in new streaming engine (#21937)
- Cache regex compilation globally (#21929)
✨ Enhancements
- Add
SPLIT_PART
string function to the SQL interface (#22158) - Allow scalar expr in
Expr.diff
(#22142) - Support additional unsigned int aliases in the SQL interface (#22127)
- Add
STRING_TO_ARRAY
function to the SQL interface (#22129) - Add dt.is_business_day (#21776)
- Add an
eager
parameter topl.cov
(#22098) - Add support for
Int128
parsing/recognition to the SQL interface (#22104) - Add an
eager
parameter topl.coalesce
(#22092) - Add an
eager
parameter topl.corr
(#22097) - Allow sinking to abstract python
io
andfs
classes (#21987) - Add
add_alp_optimize_exprs
toIRBuilder
(#22061) - Add
cat.slice
(#21971) - Support growing schema if line lenght increases during csv schema inference (#21979)
- Replace thread unsafe
GilOnceCell
withMutex
(#21927) - Support modified dsl in file cache (#21907)
🐞 Bug fixes
- Implode in agg (#22197)
- Reduce GIL hold time for IO plugins in new-streaming (#22186)
- Enhance predicate validation and cast safety in
join_where
(#22112) - Handle Parquet with compressed empty DataPage v2 (#22172)
- Schema error during lowering (#22175)
- Rewrite unroll of overlapping groups to mitigate out of range index panic (#22072)
- Incorrect rounding for very large/small numbers (#22173)
- Allow set input to
list.set_*
operations (#22163) - Deadlock in join due to rayon nested task-stealing (#22159)
- Mark
Expr.repeat_by
as elementwise (#22068) - Fix csv serializer panic by supporting ScalarColumn in as_single_chunk (#22146)
- Raise an error if a number doesn't have associated unit in duration strings (#22035)
- Add
i128
as supertype to boolean (#22138) - Fix panic when constructing DF from pyarrow due to duplicate field names (#22114)
- Add broadcasts and error messages for many elementwise operations (#22130)
- Throw error for
n=0
onlist.gather_every
(#22122) - Throw error for unsupported rolling operations (#22121)
- Error on unequal length
str.to_integer
arguments (#22100) - Make bottom interval closed in
hist
(#22090) - Relative path resolution for plugin libraries (#21911)
- Avoiding panic with striptime for out-of-bounds dates (#21208)
- Join revmaps for categoricals in
merge_sorted
(#21976) - Fix glob expansion matching extra files (#21991)
- Ensure SQL dot-notation for nested column fields resolves correctly (#22109)
- Parquet filter performance regression from multiscan dispatch (#22116)
- Panic for unequal length
ewm_mean_by
args (#22093) - Add scalarity checks to
pl.repeat
(#22088) - Type check
n
parameter ofpl.repeat
(#22071) - Mark
bitwise_{count,leading,trailing}_{ones,zeros}
as elementwise (#22044) - Mark
pl.*_ranges
functions correctly as element-wise (#22059) - Correctly type check
pl.arctan2
(#22060) - Mark
pl.business_day_count
as elementwise (#22055) - Check input python type for
str.extract_groups
(#22032) - Check types for
fill_char
instr.pad_{start,end}
(#22036) - Mark
str.to_decimal
properly as non-elementwise (#22040) - Documented return type for
bin.encode
andbin.decode
(#22022) - Revert #22017 and improve block(_in_place)_on doc comment (#22031)
- Remove outdated depth warning (#22030)
- Expression pl.concat was incorrectly marked as elementwise (#22019)
- Use block_in_place_on to start streaming (#22017)
- Panic on empty aggregation in streaming (#22016)
- Error instead of panick for invalid durations in
dt.offset_by()
anddt.round()
(#21982) - Raise error instead of silently appending NULL in NDJSON parsing (#21953)
- Ensure AV is static before pushing to row buffer (#21967)
- Deadlock in new-streaming multiplexer (#21963)
- Release GIL in
collect_with_callback
(#21941) - Panic in new RegexCache (#21935)
- Type hint of
cs.exclude()
isSelectorType
instead ofExpr
(#21892) - Add correct deprecation warning for .str.concat (#21666)
- Use absolute paths by defaults for plugins (#21904)
📖 Documentation
- Add user guide section on working with Sheets in Colab (#22161)
- Update distributed engine docs (#22128)
- Add Polars Cloud release notes (#22021)
- Remove trailing space in settings POLARS_CLOUD_CLIENT_ID (#21995)
- Fix typo (#21954)
- Fix 'pickleable' typo in docs (#21938)
- Change ctx to compute=ctx for all remote query examples (#21930)
🛠️ Other improvements
- Remove old
MultiScanExec
for in-memory (#22184) - Separate
FunctionOptions
from DSL calls (#22133) - Undeprecate
backward_fill
andforward_fill
(#22156) - Handle conversion of Duration specially in pyir (#22101)
- Deprecate duplicate
backward_fill
andforward_fill
interface (#22083) - Solve clippy lints for 1.86 (#22102)
- Remove rust exclusive
MaxBound
andMinBound
fill strategies (#22063) - Change Partition API to
base_path
andfile_path
(#21888) - Fix pydantic model_fields deprecation (#21958)
Thank you to all our contributors for making this release possible!
@DeflateAwning, @EnricoMi, @Jacob640, @JakubValtar, @MarcoGorelli, @MaxJackson, @alexander-beedie, @amotzop, @anath2, @bschoenmaeckers, @cnpryer, @coastalwhite, @dependabot[bot], @eitsupi, @etiennebacher, @hemanth94, @kdn36, @mcrumiller, @nameexhaustion, @orlp, @r-brink, @rgertenbach, @ritchie46, @sebasv, @silannisik, @stijnherfst, @wence-, @zachlefevre and dependabot[bot]