!MFogdGJfnZLrDmgkBN:matrix.org

WebGPU

464 Members
The future of graphics and compute on the Web! Related rooms: https://matrix.to/#/#webgraphics:matrix.org34 Servers

Load older messages


SenderMessageTime
25 Jan 2023
@stronglynormal:matrix.orgMike Solomon One follow-up here: why would barriers be necessary on SIMD architectures? My understanding is if you have an if/else statement and at least one thread runs on each branch, then SIMD will take all threads in a workgroup down both branches and just ignore the irrelevant branch. Wouldn't everything then be synchronized anyway, at which point wouldn't the barriers be unneeded? (I'm positive my understanding of SIMD is wrong here, but it'd be useful to know how it is wrong!) 11:51:26
@jrprice:matrix.orgJames Price The scope of a workgroupBarrier is all invocations in the workgroup. It is likely that the SIMD width of the hardware you are running on would only include a subset of those invocations (depends on the GPU and the workgroup size), and therefore a barrier is necessary.
In the future (post V1), we will have subgroups which will likely align to the SIMD width of the hardware, and a subgroup barrier to synchronize between them would be inexpensive (or free).
11:59:26
@stronglynormal:matrix.orgMike Solomon
In reply to @jrprice:matrix.org
The scope of a workgroupBarrier is all invocations in the workgroup. It is likely that the SIMD width of the hardware you are running on would only include a subset of those invocations (depends on the GPU and the workgroup size), and therefore a barrier is necessary.
In the future (post V1), we will have subgroups which will likely align to the SIMD width of the hardware, and a subgroup barrier to synchronize between them would be inexpensive (or free).
Ok, this makes a lot of sense. I can imagine that 256 (the max) well exceeds the size of the actual SIMD width, at which point the WebGPU API needs to manage the barriers.
How expensive is synchronization currently? Is it something to be avoided unless necessary, or is it pretty cheap and can be used without thinking about it.
12:12:02
@sparkypotato:matrix.orgsparkypotato joined the room.13:40:25
@dneto0:matrix.orgdneto0
In reply to @ghadeer.abousaleh:matrix.org
Portability-related question: this section of the specs is probably the only one that refers to endianness: https://gpuweb.github.io/gpuweb/wgsl/#internal-value-layout ... It suggests little-endianness. Wouldn't this make JavaScript's typed arrays bad for portability of WebGPU apps? I imagine that typed arrays are more efficient than the alternative (data views, which allow you to explicitly specify endianness)
Correct, that's the only one that deals with endianness. And shmookey found the right minutes (to my recollection). Basically we don't see the demand for a big-endian GPU programming model. And adding one significantly increases testing complexity.
15:25:47
@dneto0:matrix.orgdneto0
In reply to @stronglynormal:matrix.org

I'm struggling to understand the relationship between atomics, workgroup barriers, and storage barriers. Several questions:

  • When doing an atomic operation (ie atomicAdd), does it need to be placed before or after a barrier? For example, if I have 100k shader invocations and call var bar = atomicAdd(&foo, 1); in each one, is it guaranteed that bar will be a unique u32 in each shader that increments by 1 across the set of invocations (ie if the shader runs 200000 times and foo starts at 0 then it will be 200000 by the end and each shader will have a unique bar?), and are barriers necessary to achieve this?
  • Where are atomics for workgroups declared? I see how to declare them in storage, ie a storage array<atomic<u32>>, but I'm not sure how/where to declare them for workgroup memory.
  • How do storageBarrier and workgroupBarrier work across workgroups? My assumption is that workgroupBarrier only applies to a single workgroup, but does storageBarrier span all workgroups in a dispatch? ie if I do x.dispatchWorkgroups(1024,1024,1024), will all 1073741824 workgroups be run until the storage barrier and then resumed after the last one hits it?
  • Are workgroup barriers faster that storage barriers? Are n atomic operations on a workgroup variable faster than n atomic operations across different threads on a storage variable?
  • My limited use of atomics has led me to radically different memory profiles - sometimes they hardly matter, and sometimes their use takes me from 60fps to 2fps. Is there an "atomics best practices" set of guidelines that can help plan out where they're most effective (ie how often a single atomic can be interacted with, how many atomics per shader, uniformity and atomics, etc).

Thanks in advance for any advice y'all can give!

Re: storageBarrier vs. workgroupBarrer. I filed an issue to answer that hopefully well in a more permanent place. See my reply https://github.com/gpuweb/gpuweb/issues/3774#issuecomment-1403887129 Basically, they can't be used to coordinate access across workgroups.
16:31:41
@dneto0:matrix.orgdneto0
In reply to @stronglynormal:matrix.org

I'm struggling to understand the relationship between atomics, workgroup barriers, and storage barriers. Several questions:

  • When doing an atomic operation (ie atomicAdd), does it need to be placed before or after a barrier? For example, if I have 100k shader invocations and call var bar = atomicAdd(&foo, 1); in each one, is it guaranteed that bar will be a unique u32 in each shader that increments by 1 across the set of invocations (ie if the shader runs 200000 times and foo starts at 0 then it will be 200000 by the end and each shader will have a unique bar?), and are barriers necessary to achieve this?
  • Where are atomics for workgroups declared? I see how to declare them in storage, ie a storage array<atomic<u32>>, but I'm not sure how/where to declare them for workgroup memory.
  • How do storageBarrier and workgroupBarrier work across workgroups? My assumption is that workgroupBarrier only applies to a single workgroup, but does storageBarrier span all workgroups in a dispatch? ie if I do x.dispatchWorkgroups(1024,1024,1024), will all 1073741824 workgroups be run until the storage barrier and then resumed after the last one hits it?
  • Are workgroup barriers faster that storage barriers? Are n atomic operations on a workgroup variable faster than n atomic operations across different threads on a storage variable?
  • My limited use of atomics has led me to radically different memory profiles - sometimes they hardly matter, and sometimes their use takes me from 60fps to 2fps. Is there an "atomics best practices" set of guidelines that can help plan out where they're most effective (ie how often a single atomic can be interacted with, how many atomics per shader, uniformity and atomics, etc).

Thanks in advance for any advice y'all can give!

Re: "Are workgroup barriers faster that storage barriers? Are n atomic operations on a workgroup variable faster than n atomic operations across different threads on a storage variable?" I think it depends a lot on your application (sorry). Structurally workgroup address space is only visible from that workgroup, but storage buffer memory is ultimately addressible from any invocation in the device. Depending on how the GPU is architected it's possible for access to storage buffers is slower. Implicitly there's the question of: is there enough work that can be done within a workgroup so that it's worth temporarily moving the data from storage buffer to workgroup, and then copying results back out. It's very app dependent.
17:35:59
26 Jan 2023
@kohakukun:matrix.orgkohakukunHey folks, I know at some point isInf and isNan was available but was then later deprecated. My question is on how could one try to implement it. Is there a constant that one can check against to implement this functionality?17:32:27
@ben-clayton:matrix.orgBen ClaytonUnfortunately there is no reliable and portable way - which is the reason they were removed from core.17:38:48
@cwfitzgerald:matrix.orgcwfitzgeraldyour best bet is checking before the nan is created18:53:51
@unevenprankster:matrix.org@unevenprankster:matrix.org left the room.23:04:46
27 Jan 2023
@mehmetoguzderin:matrix.orgmehmetoguzderin changed their profile picture.07:34:54
@kohakukun:matrix.orgkohakukun
In reply to @ben-clayton:matrix.org
Unfortunately there is no reliable and portable way - which is the reason they were removed from core.
Thanks
09:45:45
@kohakukun:matrix.orgkohakukunAnother question. Today I just started my Chrome Canary and seems it was automatically updated and O see that the deprecated support for SPIRV has been removed. Is there a chance this can be enabled as a developer flag? I'm working in a project that would take a long time to move completely to WGSL. I also tryied webgpu on mac but, it doesn t seem to be supported yet. Any suggestion?09:47:56
@ben-clayton:matrix.orgBen ClaytonDawn and Naga both provide offline tooling to convert from SPIR-V to WGSL.09:49:12
@ben-clayton:matrix.orgBen ClaytonI believe some have successfully built both as WASM, which enables SPIR-V consumption in the browser.09:49:59
@kohakukun:matrix.orgkohakukunhum, what do you mean? Actually my project is a c++ -> webassembly initiative09:50:49
@kohakukun:matrix.orgkohakukunI do have in my todos to try naga as a runtime porting layer09:51:15
@kohakukun:matrix.orgkohakukun but even going through spirv in webassembly returns Unsupported sType (SType::ShaderModuleSPIRVDescriptor). Expected (SType::ShaderModuleWGSLDescriptor) 09:52:47
@ben-clayton:matrix.orgBen ClaytonTint is the compiler for Dawn. It can be fetched and built from https://dawn.googlesource.com/tint, and it has an executable that can consume SPIR-V and emit WGSL.09:53:46
@kohakukun:matrix.orgkohakukunThanks for the help. Let me try that out. I'm already using dawn a a library for the C++ app. Should be easy to try :) 09:54:39
@ben-clayton:matrix.orgBen Clayton(That Tint repo is a cut-down version of Dawn - it's updated with Dawn's changes, daily)09:55:28
@shmookey:matrix.orgshmookey I have a line in a shader that looks like let light = lighting.data[i]; where lighting is a uniform buffer. Tint expands that in HLSL with a call to a helper function that copies the entire buffer - it takes lighting by value which ultimately compiles into hundreds of consecutive MOVs - cf. the similar materials buffer is read-only-storage and the helper function it generates seems to take the buffer by reference. Is this expected behavior / how should I be writing this sort of thing? If it's expected, are these semantics in the spec? 16:49:56
@ben-clayton:matrix.orgBen Clayton As this is specific to Dawn / Tint - the Dawn chat room might be a better place to ask this. That said - to help me reproduce - what does your lighting uniform buffer look like? 16:56:40
@shmookey:matrix.orgshmookey

Ah whoops wrong channel - the buffer looks something like this:

  count: u32,
  data:  array<Light, 32>, 
}```
`Light` has a bunch of things, it adds up to 120 bytes ex padding, not sure if you need that as this behaviour seems to happen no matter what's in there.
17:01:28
@shmookey:matrix.orgshmookey *

Ah whoops wrong channel - the buffer looks something like this:

struct Light {
  count: u32,
  data:  array<Light, 32>, 
}```
`Light` has a bunch of things, it adds up to 120 bytes ex padding, not sure if you need that as this behaviour seems to happen no matter what's in there.
17:01:46
@shmookey:matrix.orgshmookey *

Ah whoops wrong channel - the buffer looks something like this:

struct Light {
  count: u32,
  data:  array<Light, 32>, 
}

Light has a bunch of things, it adds up to 120 bytes ex padding, not sure if you need that as this behaviour seems to happen no matter what's in there.

17:02:00
@shmookey:matrix.orgshmookeyRedacted or Malformed Event17:02:16
@shmookey:matrix.orgshmookey *

Ah whoops wrong channel - the buffer looks something like this:

struct Light {
  count: u32,
  data:  array<Light, 32>, 
}

Light has a bunch of things, it adds up to 120 bytes ex padding, not sure if you need that as this behaviour seems to happen no matter what's in there.

17:02:27
@ben-clayton:matrix.orgBen ClaytonI have to go AFK for a little bit. If possible, please can you file a minimal repro at crbug.com/tint? I'm curious to see these large-data copies.17:03:00

There are no newer messages yet.


Back to Room List