one of methods does support windowing and watermarking? ✅ 2025-08-14
is options for custom aggregations?
what is role of reduce functions? where in the general big picture they fit? compare theire functions in different scenarios, batch and structured streaming, in related to working with dataframe and dataset
declarative aggregations (like in SQL), while KeyValueGroupedDataset is a type-safe handle for applying custom, programmatic logic to groups of whole objects. i see some conflicts with this and existance of both custom aggregators for both of them
.groupByKey() on a typed Dataset<T>. It represents a collection of groups where each group contains all the original objects (T) that share a common key (K).RelationalGroupedDataset, potentially leading to lower performance and higher memory usage if the groups are large.
this has contractions with method in custom aggregation. doesn’t it?next question about agg methods on KeyValueGropedDataSet
on KeyValueGroupedDataset, You don’t use .agg(). instead you use mapGroups, flatMapGroups reduceGroups and mapGroupsWithState, flatMapGroupsWithState
this list is excellent. ask to repeate first question for this liest of 5 methods in KeyValueGroupedDataset
DataSet.reduce() and KeyValueGroupedDataSet.reduceGroups()does spark need application.jar to be available to workers also? or just making it accessible to master suffice? if it is requited to be accessible to workers why is it required to be serializable?
org.apache.spark.sql.execution.streaming.state.StateStoreProvider org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider
is it possible to have checkpointing with batch execution? how does it work?
deltalake and iceburg, what
compare olap and oltp
cap theorem again