!wypKDDiZJdzZRWebIG:matrix.org

J

330 Members
5 Servers

Load older messages


SenderMessageTime
26 Sep 2022
@_discord_687763954050793501:t2bot.ioRaul (Miller) (Still need to be installed, quite often...) 21:08:27
@_discord_687763954050793501:t2bot.ioRaul (Miller) (But don't be too shy about installing J somewhere.) 21:08:40
27 Sep 2022
@_discord_828684654454112286:t2bot.iobobTerryo You can run J code on the J playground from the browser without J installation. https://jsoftware.github.io/j-playground/bin/html2/ 00:26:30
@_discord_722191022557364315:t2bot.iojpf The arrow/parquet files are memory mapped, and I can even read lines individually in this particular format. The problem isn't so much the ability to work with this data at all, but rather the abilty to display and utilze this data in a manner simliar to other arrays, without creating a large memory footprint. 01:46:52
@_discord_722191022557364315:t2bot.iojpf This probably involved a helper function that passes through the existing data but concatentates and adds LFs to the read strings. This is then 'non-native' as the display differs from the array form (and this seems like a poor replacement for a modern database), but at least it will display. Ultimately, I suppose the actual strings must be boxed if they are to remain ragged. 01:50:14
@_discord_722191022557364315:t2bot.iojpf * This probably involves a helper function that passes through the existing non-string data but concatentates and adds LFs to the read strings. This is then 'non-first-class' as the display differs from the array form (and this seems like a poor replacement for a modern database), but at least it will display. Ultimately, I suppose the actual strings must be boxed if they are to remain ragged. 01:51:35
@_discord_722191022557364315:t2bot.iojpf * The arrow/parquet files are memory mapped, and I can even read lines individually in this particular format. The problem isn't so much the ability to work with this data at all, but rather the abilty to display and utilze this data in a first-class manner simliar to other arrays, without creating a large memory footprint. 01:51:57
@_discord_722191022557364315:t2bot.iojpf * The arrow/parquet files are memory mapped, and I can even read lines individually in this particular format. The problem isn't so much the ability to work with this data at all, but rather the abilty to display and utilze this data in a first-class manner simliar to other non-string arrays, without creating a large memory footprint. 01:52:09
@_discord_687763954050793501:t2bot.ioRaul (Miller) (sorry, took a nap... I haven't retained a copy of that code. However, like I said, it's easy to model ssndx and ssdir in J. Hang on and I'll write them.) 07:03:49
@_discord_687763954050793501:t2bot.ioRaul (Miller)
ssndx=: [: (,.~ [: +/\ 0,}.) #;.2
ssdir=: (({~;) <@(+i.)/"1)~
07:07:56
@_discord_687763954050793501:t2bot.ioRaul (Miller) Now, implemented that way, you don't get any efficiency gains -- ssdir, for example, allocates a box for every segmented string and an integer for every character. 07:09:10
@_discord_687763954050793501:t2bot.ioRaul (Miller) That said, Henry Rich has been doing a lot with virtual arrays, which means that at some point some of those intermediate results won't actually exist all at once. I haven't tested to see whether we are there, yet. 07:09:56
@_discord_687763954050793501:t2bot.ioRaul (Miller) But, also, it's straightforward to implement in C. Psuedocode:
sum the second column from ssdir to get the length of the result, and allocate storage for it. Then walk through the rows in sequence, copying (second column) characters starting at position (first column). It might even make sense to build this using i-beam (!:) so that the result can be a native J array with array headers and garbage collection support.
07:14:10
@_discord_687763954050793501:t2bot.ioRaul (Miller) A block is just sub-array, where you break up your work arbitrarily. It's a general technique for working with very large datasets. 07:15:48
@_discord_687763954050793501:t2bot.ioRaul (Miller) Ah, catching up, I see that you had already talked about that aspect. 07:16:56
@_discord_870115701279584326:t2bot.ioDiscoDoug Just curious. When you use K, which version is it? 07:31:16
@_discord_687763954050793501:t2bot.ioRaul (Miller) I don't, usually -- so I pick up whatever seems handy when I mess with it. 07:40:08
@_discord_687763954050793501:t2bot.ioRaul (Miller) I should add -- conceptually, when working with really large data sets, you might want to arrange so that distinct J instances are processing different parts of the data set -- they might not even be on the same machine. Something like a tensorflow mechanism, I imagine (though I've never messed with tensorflow -- when I was working with multi-terrabyte datasets, I did it "the hard way", spinning up dozens of ec2 machines... nowadays, there's a lot more tools for that kind of thing). 07:43:07
@_discord_870115701279584326:t2bot.ioDiscoDoug Ah so that was completely hypothetical. 07:57:09
@_discord_870115701279584326:t2bot.ioDiscoDoug More like “one could..” 08:02:15
@_discord_870115701279584326:t2bot.ioDiscoDoug FWIW, I don’t have a good picture of what you’re hoping to do here but at a high level it feels like the choice is to convert the data wholesale or reference the data and convert when accessing it. Which serves your purpose better probably is dictated by your usage patterns. 08:16:16
@_discord_870115701279584326:t2bot.ioDiscoDoug I’m wrestling with similar questions for an XML parser where I can detect the document structure before parsing individual nodes. My problem is I don’t have any immediate needs to dictate my choices. 08:20:43
@_discord_321366262292938752:t2bot.ioGrahnite That’s strange. I am a noob, so playing around. In the J playground, “httpget” is used, but that fails in my real environment. Had to load the addon “gethttp.ijs” and use the command “gethttp” for it to work (refer to Examples -> CSV in the online playground). I guess version differences? 11:57:34
@_discord_687763954050793501:t2bot.ioRaul (Miller) I think it's an inadvertent change, yes. It might be worth opening an compatibility issue on the playground. 12:04:20
@_discord_722191022557364315:t2bot.iojpf should work: load'pacman' 'install' jpkg 'http/get' load'http/get' gethttp '''www.google.com?q=jsoftware''' 17:18:42
@_discord_722191022557364315:t2bot.iojpfRedacted or Malformed Event17:18:56
@_discord_722191022557364315:t2bot.iojpfRedacted or Malformed Event17:19:27
@_discord_722191022557364315:t2bot.iojpfRedacted or Malformed Event17:19:37
@_discord_722191022557364315:t2bot.iojpf gethttp locally calls HTTPCMD_wgethttp_ (a global set at startup, curl on my machine). If
the command line args require quoting, you might need to add quotes to the url string. This may require adding the quotes for URLs that require it on the command line (e.g. if there's a dash in it.)
17:19:51
@_discord_722191022557364315:t2bot.iojpf * gethttp locally calls HTTPCMD_wgethttp_ (a global set at startup, curl on my machine). If
the command line args require quoting, you might need to add quotes to the url string. This may require adding the quotes for URLs that require it on the command line (e.g. if there's a dash in the URL).
17:20:36

There are no newer messages yet.


Back to Room List