7 Mar 2017 |
@gitter_neverfox:matrix.org | what would you say the relative impact is on performance of graph optimization vs just having the ability to calculate on a fast backend? | 18:39:01 |
@gitter_neverfox:matrix.org | right, which is the MxNet model | 18:39:18 |
@gitter_botev:matrix.org | so if the backend is very fast you can potentially go away with not too much impact | 18:39:41 |
@gitter_botev:matrix.org | however memory optimization is not possible on the fly | 18:39:51 |
@gitter_botev:matrix.org | since you don't know if you are not going to use something in the future, for gradients | 18:40:13 |
@gitter_botev:matrix.org | when the graph is completed you can look back and say - ah this is no longer needed after step X so I can recycle memory | 18:40:34 |
@gitter_botev:matrix.org | this is why for isntance Theano and Tensorflow have almost 50% memory usage compared to pytorch | 18:40:51 |
@gitter_neverfox:matrix.org | gotcha | 18:41:01 |
@gitter_botev:matrix.org | MXNet have even less as they have even more aggresive memory optimization | 18:41:03 |
@gitter_neverfox:matrix.org | and is that an easy thing to do? | 18:41:07 |
@gitter_botev:matrix.org | actually that is relatively easy yes | 18:41:18 |
@gitter_botev:matrix.org | if you have given operator schedule | 18:41:30 |
@gitter_botev:matrix.org | basically you know the last operation each tensor is part of | 18:41:39 |
@gitter_botev:matrix.org | so you know that after that it can be dorpped, or even used for inplace | 18:41:49 |
16 Mar 2017 |
@gitter_jonysy:matrix.org | Parenchyma 0.0.3! 🎉✌️ https://github.com/lychee-eng/parenchyma | 22:34:06 |
@gitter_botev:matrix.org | congrats :clap: | 22:42:07 |
17 Mar 2017 |
@gitter_jonysy:matrix.org | Thanks! | 01:52:47 |
@gitter_jonysy:matrix.org | Here’s another graph-based ML library https://github.com/millardjn/alumina | 12:57:49 |
@gitter_botev:matrix.org | that one seems mainly targeting rust | 14:19:18 |
@gitter_jonysy:matrix.org | @botev There’s no reason the sigmoid function found in Leaf couldn’t be converted to GIR and then compiled to CUDA/OpenCL, correct? | 16:23:33 |
@gitter_jonysy:matrix.org | (edited) ... to CUDA/OpenCL, correct? => ... to CUDA/OpenCL kernels, correct? | 16:24:14 |
@gitter_jonysy:matrix.org | (edited) ... to CUDA/OpenCL kernels, correct? => ... to a CUDA/OpenCL kernel, correct? | 16:24:30 |
@gitter_jonysy:matrix.org | While keeping Leaf’s API | 16:26:15 |
@gitter_jonysy:matrix.org | @botev I checked out your arrayfire crate for GIR.. You aren’t doing any source/kernel generation, you’re simply using arrayfire Array s instead of compiling the source to a kernel and then loading it in arrayfire. Is there a reason for that? | 16:51:51 |
@gitter_botev:matrix.org | you can not the kernel generation in Arrayfire | 16:52:14 |
@gitter_botev:matrix.org | the reason is arrayfire is easy to get things going, as it implements this and works on anything | 16:52:33 |
@gitter_botev:matrix.org | I'm currently working on the opencl bit | 16:52:40 |
@gitter_botev:matrix.org | where kernel generation will happen | 16:52:50 |
@gitter_botev:matrix.org | Arrayfire is a nice abstraction to use, and to show how the graph works, without needing to do kernel generation | 16:53:21 |
@gitter_jonysy:matrix.org | I understand. You’re basically creating a heavily optimized transpiler, which is a huge undertaking | 16:55:41 |