!CJoUbovqKaCGrFkbrY:matrix.org

Spark with Scala

399 Members
A place to discuss and ask questions about using Scala for Spark programming.3 Servers

Load older messages


SenderMessageTime
26 Aug 2022
@_discord_715647557921013852:t2bot.iocdb
  .readStream
  .format("EventProvider")
  .load
  .join(members, lower('to) === lower(members("address")), "left_outer")
  .where('to === 'sender)
  .withWatermark("timestamp", "10 minutes")
  .groupBy(window($"timestamp", "10 minutes", "1 minutes").getField("end").as("end"), 'timestamp, 'to, 'contract)
  .count
  .groupByKey(row => row.getAs[Timestamp]("end"))
  .flatMapGroups { case (end, iterator) =>
      ...
  }
  .writeStream
  .format("kafka")
  .option("topic", "events-transfer-popular-collections")
  .option("checkpointLocation", "/checkpoints/events-transfer-popular-collections")
  .outputMode("append")
  .start
13:41:59
@_discord_715647557921013852:t2bot.iocdb From the docs it seems that "append" output mode should result in one result per window, but I'm seeing many records with duplicate end fields in Kafka. The function passed to flatMapGroups does some fiddly aggregations (omitted) and emits a single result. 13:42:13
@_discord_715647557921013852:t2bot.iocdb This is my first time working with sliding window so I expect I'm doing something simple wrong. 13:42:33
27 Aug 2022
@_discord_387435475142705152:t2bot.iozebos changed their display name from zebos to zebos#7602.10:46:10
@_discord_387435475142705152:t2bot.iozebos changed their display name from zebos#7602 to zebos.10:46:19
@_discord_409493581863190528:t2bot.ioO( N logN) If there is dataframe:
| status |
success
success
failure
success
success
20:05:38
@_discord_409493581863190528:t2bot.ioO( N logN) How to quickly find whether there is a “failure” in this DF? 20:05:38
@_discord_409493581863190528:t2bot.ioO( N logN) * How to quickly find whether there is one “failure” in this DF? 20:07:10
28 Aug 2022
@_discord_409493581863190528:t2bot.ioO( N logN) sveen Thanks! Let me take a look 18:13:24
@_discord_409493581863190528:t2bot.ioO( N logN) Using toDF() on List or Seq collection. is ToDF() an action? 19:22:58
29 Aug 2022
@_discord_510415959480336393:t2bot.ioPotatoe[F[_]] changed their display name from Potatoe[F[_]] to Potatoe[F[_]]#8651.11:30:30
@_discord_510415959480336393:t2bot.ioPotatoe[F[_]] changed their display name from Potatoe[F[_]]#8651 to Potatoe[F[_]].11:30:42
30 Aug 2022
@_discord_367633594614546442:t2bot.ioBrad_UMATR changed their display name from Brad_UMATR to Brad_UMATR#8384.09:01:39
@_discord_367633594614546442:t2bot.ioBrad_UMATR changed their display name from Brad_UMATR#8384 to Brad_UMATR.09:01:45
@charmer:matrix.orgcharmer changed their profile picture.10:14:21
@_discord_632729825647525922:t2bot.ioekrich#7695 Normally, the dependencies are "provided" as they get installed on the server. 15:51:50
5 Sep 2022
@_discord_689908456832237592:t2bot.ioStewart hey, would someone please be able to confirm something. Can a single partition be on more than one node? not duplicated, but actually half of one partition is on one node and the other half is on another,.. that's not possible, right? it's an integer number of partitions per executor and an integer number of executors per node,.. right? 18:21:25
6 Sep 2022
@_discord_689908456832237592:t2bot.ioStewart Thanks for confirming. It is much appreciated 12:32:25
7 Sep 2022
@_discord_818401984230981642:t2bot.iocheapsolutionarchitect#6849 Hm, why not just using the select and then applying like df.withColumnRenamed(.....) to rename the columns and df.withColumn("mycolumnname",col("mycolumn").cast("integer")) change the b to an Int. You can use Spark-SQL but you also can switch to the Spark-DSL. 07:22:47
@_discord_818401984230981642:t2bot.iocheapsolutionarchitect#6849Redacted or Malformed Event07:58:24
@_discord_818401984230981642:t2bot.iocheapsolutionarchitect#6849 Hm, technically you could access the columns or the schema of a dataframe, but the df is not available until the call of .sql(...). You could specify the schema manually, which is not what you want, I guess. Or maybe start a dummy query which allows you to get the schema from a DF and then design your query string based on that information. 07:59:51
8 Sep 2022
@_discord_818401984230981642:t2bot.iocheapsolutionarchitect#6849 Yeah, probably, I'm not so deep into spark that I have a strong opinion about that ;). 11:15:33
10 Sep 2022
@_discord_957313992915824701:t2bot.ioNio#0287 changed their display name from mugeng to Nio#0287.18:50:32
@_discord_957313992915824701:t2bot.ioNio#0287 set a profile picture.18:50:37
12 Sep 2022
@_discord_469980680944877568:t2bot.ioEmily changed their display name from Emily to Emily#6789.03:04:47
@_discord_469980680944877568:t2bot.ioEmily changed their display name from Emily#6789 to Emily.03:04:57
@_discord_425199610680573954:t2bot.ioWatynecc#8589 changed their profile picture.11:50:06
14 Sep 2022
@_discord_97104885337575424:t2bot.ioskip#2048 changed their display name from skip to skip#2048.07:11:20
19 Sep 2022
@_discord_984595059863326740:t2bot.ioantonkal changed their display name from antonkal to antonkal#9938.17:14:44
@_discord_984595059863326740:t2bot.ioantonkal changed their display name from antonkal#9938 to antonkal.17:14:51

There are no newer messages yet.


Back to Room List