!CJoUbovqKaCGrFkbrY:matrix.org

Spark with Scala

400 Members
A place to discuss and ask questions about using Scala for Spark programming.3 Servers

Load older messages


SenderMessageTime
26 May 2022
@_discord_547027845034278912:t2bot.iopaddenpaadje I'm trying to do something like this. It should join and the table should be full but it's empty 20:57:26
@_discord_547027845034278912:t2bot.iopaddenpaadje I have no idea what I'm doing wrong lol 20:57:41
@_discord_547027845034278912:t2bot.iopaddenpaadjeunknown.png
Download unknown.png
20:58:50
@_discord_243951995709292544:t2bot.ioMajestic#1066 and addressDS is a Dataset[RawAddress] I guess 21:02:28
@_discord_243951995709292544:t2bot.ioMajestic#1066 * and addressDS is a Dataset[RawAddress] I guess ? 21:02:46
@_discord_547027845034278912:t2bot.iopaddenpaadje yes 21:02:52
@_discord_547027845034278912:t2bot.iopaddenpaadjeunknown.png
Download unknown.png
21:03:28
@_discord_547027845034278912:t2bot.iopaddenpaadje also 21:03:28
@_discord_243951995709292544:t2bot.ioMajestic#1066 Have you tried explicitly declaring the join condition? Like ds1.col("idLeft") = ds2.col("idRight") 21:03:34
@_discord_547027845034278912:t2bot.iopaddenpaadje yes 21:03:39
@_discord_547027845034278912:t2bot.iopaddenpaadje doesn't work 21:03:42
@_discord_243951995709292544:t2bot.ioMajestic#1066 Are your IDs actually matching? 21:06:16
@_discord_243951995709292544:t2bot.ioMajestic#1066 (by that I mean, yes it should join, so maybe the problem is on the data itself?) 21:07:05
@_discord_547027845034278912:t2bot.iopaddenpaadje I've checked some records and there should be a lot of matching data 21:07:20
@_discord_547027845034278912:t2bot.iopaddenpaadjeunknown.png
Download unknown.png
21:07:29
@_discord_547027845034278912:t2bot.iopaddenpaadje what I'm trying to achieve 21:07:29
@_discord_243951995709292544:t2bot.ioMajestic#1066 You can check by collecting and comparing with Scala? 21:13:32
@_discord_547027845034278912:t2bot.iopaddenpaadje I'll try to go back in the case class and change the types since one is type Long and other Option[Long] 21:13:56
@_discord_547027845034278912:t2bot.iopaddenpaadje I've checked manually in csv files 21:14:24
@_discord_243951995709292544:t2bot.ioMajestic#1066 Maybe I'm missing something, but I thought this should not matter too much 21:16:10
@_discord_547027845034278912:t2bot.iopaddenpaadje I've thought that too, when I print the schema it says both are Long 21:16:43
@_discord_547027845034278912:t2bot.iopaddenpaadje let me try it 21:16:45
@_discord_547027845034278912:t2bot.iopaddenpaadje I mean I can't think of another thing that could be causing the issue 21:17:07
@_discord_243951995709292544:t2bot.ioMajestic#1066 Yes on the dataframe it will just appear as Long nullable=true 21:18:08
@_discord_243951995709292544:t2bot.ioMajestic#1066 The cast not going well, but that would be surprising 21:18:42
@_discord_547027845034278912:t2bot.iopaddenpaadje lol it's not it 21:26:54
@_discord_547027845034278912:t2bot.iopaddenpaadje tried it 21:27:05
@_discord_243951995709292544:t2bot.ioMajestic#1066 You can go full "manual join" by doing 2 groupBy and a cogroup operation (and drop into debug there). Then you will see what matches and be understand what's going wrong 21:33:49
@_discord_243951995709292544:t2bot.ioMajestic#1066 * You can go full "manual join" by doing 2 groupBy and a cogroup operation (and drop into debug there). Then you will see what matches and maybe understand what's going wrong 21:34:23
@_discord_547027845034278912:t2bot.iopaddenpaadje thanks for the idea, will try it 21:58:39

Show newer messages


Back to Room ListRoom Version: 9