!CJoUbovqKaCGrFkbrY:matrix.org

Spark with Scala

397 Members
A place to discuss and ask questions about using Scala for Spark programming.3 Servers

Load older messages


SenderMessageTime
20 Oct 2023
@_discord_89507544619315200:t2bot.ioderya changed their display name from derya to derya#0.21:38:12
@_discord_89507544619315200:t2bot.ioderya changed their profile picture.21:38:46
@_discord_89507544619315200:t2bot.ioderya changed their display name from derya#0 to derya.21:43:39
24 Oct 2023
@_discord_255141202347687937:t2bot.iowhiteturq 19:11:37
25 Oct 2023
@_discord_264442795245174785:t2bot.iolafeychine changed their display name from Vincent Lafeychine to lafeychine.13:35:17
26 Oct 2023
@_discord_213494046864179210:t2bot.iopgm changed their display name from pgm to progamermatt.15:15:17
27 Oct 2023
@_discord_140207550816714752:t2bot.iotxdv 06:36:27
@_discord_818401984230981642:t2bot.iocheapsolutionarchitect I do not think Spark supports such a use case. Usually you would trigger the application via a cronjob, or some kind of cluster manager. And you could run into a timeout. If your driver does not run on the Spark-Cluster, you can however, try to wait in your driver application and then call start() on your stream. 09:39:35
@_discord_818401984230981642:t2bot.iocheapsolutionarchitect * I do not think Spark supports such a use case. Usually you would trigger the application via a cronjob, or some kind of cluster manager task. If your driver does not run on the Spark-Cluster, you can however, try to wait in your driver application and then call start() on your stream. 09:45:33
@_discord_818401984230981642:t2bot.iocheapsolutionarchitect * I do not think Spark supports such a use case. Usually you would trigger the application via a cronjob, or some kind of cluster manager task. If your driver does not directly run on the Spark-Cluster, you can however, try to wait in your driver application and then call start() on your stream. 09:45:58
@_discord_818401984230981642:t2bot.iocheapsolutionarchitect As far as I have understood, StreamingContext is old API and Structured Streaming is the successor. And I would not call the alignment on batch interval borders a feature. It breaks down really fast, e.g. how do you trigger the batch on every hour starting half past a defined hour? However, my experience is confined to my specific cluster architecture. I run Spark in standalone mode on a k8s cluster. So every Spark-driver-app is a running Pod. This allows me, for example to sleep-wait within the entry point script or within the driver app and so on. 20:21:09
@_discord_397996873354444810:t2bot.ioUFO#0678 nice! I've been wanting to try a small k8s spark instance 20:45:20
29 Oct 2023
@_discord_840147810765766656:t2bot.iocu changed their display name from cu4381#0 to cu4381.23:32:40
1 Nov 2023
@_discord_510415959480336393:t2bot.ioYandexTan changed their profile picture.16:21:38
7 Nov 2023
@softinio:matrix.orgsoftinio changed their profile picture.17:56:04
12 Nov 2023
@_discord_818401984230981642:t2bot.iocheapsolutionarchitect Take a look at the class comment here https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/KeyValueGroupedDataset.html. Your K is the first element of the tuple, and your V is the tuple, that results in the given return type. If you call .collect, you will see your expected result. If you want a DF of type String, Array[Int], you could probably make use of mapGroups and turn the second position of the tuple into an Array. 06:09:12
@_discord_818401984230981642:t2bot.iocheapsolutionarchitect * Take a look at the class comment here https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/KeyValueGroupedDataset.html. Your K is the first element of the tuple, and your V is the tuple, that results in the given return type. If you call .collect, you will probably see your expected result. If you want a DF of type String, Array[Int], you could probably make use of mapGroups and turn the second position of the tuple into an Array. 06:13:26
13 Nov 2023
@_discord_936395565154054144:t2bot.ioeje changed their display name from eje to eje4073.18:30:41
14 Nov 2023
@_discord_734434444433293464:t2bot.ioatk91 changed their display name from atk91#0 to atk91.19:22:01
28 Nov 2023
@_discord_305362010374406144:t2bot.ioZone of proximal development changed their display name from marouan28#0 to marouan28.12:18:26
@_discord_902417901972258868:t2bot.iokagaku2340 changed their display name from kagaku to kagaku2340.20:37:26
30 Nov 2023
@_discord_736256129323106384:t2bot.iovazand 08:41:11
2 Dec 2023
@_discord_89507544619315200:t2bot.ioderya changed their profile picture.00:07:35
3 Dec 2023
@_discord_510415959480336393:t2bot.ioYandexTan changed their profile picture.08:38:56
4 Dec 2023
@_discord_632729825647525922:t2bot.ioekrich#7695 Repost from Jobs. https://discord.com/channels/632150470000902164/632628675287973908/1181311320298098808
Excited to see the Release Notes for Spark here. https://spark.apache.org/releases/spark-release-3-5-0.html Excerpt:

Removals, Behavior Changes and Deprecations
Upcoming Removal

The following features will be removed in the next Spark major release

Support for Java 8 and Java 11, and the minimal supported Java version will be Java 17
Support for Scala 2.12, and the minimal supported Scala version will be 2.13
20:56:01
@_discord_632729825647525922:t2bot.ioekrich#7695 Typically, Spark has 2 versions, current and next with the default to be 2.13 and next 3. 20:56:49
@_discord_632729825647525922:t2bot.ioekrich#7695 * Repost from Jobs. https://discord.com/channels/632150470000902164/632628675287973908/1181311320298098808
Excited to see the Release Notes for Spark here. https://spark.apache.org/releases/spark-release-3-5-0.html Excerpt:

Removals, Behavior Changes and Deprecations
Upcoming Removal

The following features will be removed in the next Spark major release

Support for Java 8 and Java 11, and the minimal supported Java version will be Java 17
Support for Scala 2.12, and the minimal supported Scala version will be 2.13
20:57:25
@_discord_632729825647525922:t2bot.ioekrich#7695 * Repost from Jobs. https://discord.com/channels/632150470000902164/632628675287973908/1181311320298098808
Excited to see the Release Notes for Spark here. https://spark.apache.org/releases/spark-release-3-5-0.html Excerpt:

Removals, Behavior Changes and Deprecations
Upcoming Removal

The following features will be removed in the next Spark major release

Support for Java 8 and Java 11, and the minimal supported Java version will be Java 17
Support for Scala 2.12, and the minimal supported Scala version will be 2.13
20:58:02
5 Dec 2023
@_discord_818401984230981642:t2bot.iocheapsolutionarchitect Try spark-sql-api instead of spark-sql. 08:12:30
7 Dec 2023
@_discord_818401984230981642:t2bot.iocheapsolutionarchitect Sorry for the late answer, you will get the same behavior in Scala 2.13.x. The case class is probably defined within a method. I do not know enough about the internals of Scala, but it looks like the type tag can not be estimated. However, if you pull the case class in into the outer class, it works. So do this instead of the commented line
class Spark1() {
  case class A(b: String, c: String)

  private def execute() : Unit = {
    val spark = SparkSession.builder().getOrCreate()
    import spark.implicits._

    //case class A(b: String, c: String)
    val as = Seq(
      A("b", "c"),
    ).toDS()

    as.show()
  }
}
05:03:43

Show newer messages


Back to Room ListRoom Version: 9