Often while performing statistical aggregations, we get scenario to perform aggregations on next n number of seconds from current row for each of the rows. Spark Window api provides a nice rangeBetween functionality which facilitates performing above. For Example: // Sample data with timestamp // Sample data with timestamp val customers = sc.parallelize(List(("Alice", "2016-05-01 00:00:00", 10,4), ("Alice", "2016-05-01 00:00:01", 20,2), ("Alice", "2016-05-01 00:00:02", 30,4), ("Alice", "2016-05-01 00:00:02", 40,6), ("Alice", "2016-05-01 00:00:03", 50,1), ("Alice", "2016-05-01 00:00:03", 60,4), ("Alice", "2016-05-01 00:00:04", 70,2), ("Alice", "2016-05-01 00:00:05", 80,4), ("Bob", "2016-05-01 00:00:03", 25,6), ("Bob", "2016-05-01 00:00:04", 29,7), ("Bob", "2016-05-...
Comments