!NYhrWUmbYvYAjnzCQy:matrix.org

bsky-core-team-take-2

1456 Members
6 Servers

Load older messages


SenderMessageTime
2 Jan 2022
@_discord_927307986056151051:t2bot.iowjbright joined the room.21:17:37
3 Jan 2022
@_discord_878325096547254272:t2bot.ioargaremon joined the room.12:43:19
@_discord_890984699705561199:t2bot.ioPerry joined the room.14:20:47
@x1zpsj0:matrix.orgx1zpsj0 joined the room.16:22:29
@_discord_769683414621356072:t2bot.ioDarwin aka pepe hello! is there by any chance a data template of how the content of BlueSky protocol will be? 🙂 20:26:18
@_discord_769683414621356072:t2bot.ioDarwin aka pepe i think probably application/json -- but is there anything like post-type body or?, etc 20:27:17
4 Jan 2022
@_discord_424006809934888972:t2bot.ioBOYMETAVERSE joined the room.06:23:41
@_discord_890837427097321502:t2bot.iosaraverse joined the room.06:56:57
@_discord_890837427097321502:t2bot.iosaraverse Anyone tried ERC 725 Identity Standard? 06:57:22
@_discord_140583017520300033:t2bot.ioFlowers joined the room.10:58:24
@thib:ergaster.orgThib joined the room.12:53:27
@_discord_419671754072653826:t2bot.ioResonateSky joined the room.17:20:19
@_discord_835289074768019496:t2bot.ioAaronDGoldman#8819 there are no schemas yet but it will likely be split in layers for example allowing application/json or application/cbor as the document format and then a schema on top that is defined in terms of strings, lists, and dicts. for example a post could be any dict with an author and body {"author", "body"} but you may get more power by having those fields comply with a schema that give more meaning like the author being a did or the body being a Content-Typed
{
  "author": "did:some_method/some/path",
  "body":{
    "Content-Type text/plain; charset=UTF-8": "Hello World",
  },
  "signature": "whatever the did needs (keys to include + sig)"
}

We should be able to evolve the schema and the representation independently. The signature should not depend on whether the data is in json, cbor, parquet, or a proprietary index/datastore
https://github.com/blueskyCommunity/aozora/blob/main/anatomy-of-a-social-network.md#the-document-web

[personal opinion]
explicit post-types are an anti-pattern as a post is likely to match many different schemas and tools only care about the ones they care about. The fact that the post is a subclass and has an extra field could be totally irrelevant and the fact that your thing is type mammal when a tool is scanning for all animals should not fail due to the post-type strings not being the same a animals.
19:34:53
@_discord_712939614985388042:t2bot.ioianopolous#1330 joined the room.19:40:35
@_discord_712939614985388042:t2bot.ioianopolous#1330
The signature should not depend on whether the data is in json, cbor
19:40:35
@_discord_712939614985388042:t2bot.ioianopolous#1330 How do you plan to achieve this? 19:40:55
@_discord_835289074768019496:t2bot.ioAaronDGoldman#8819 my personal plan would be a hash based on the list or dict structure. For a list hash each element the hash each pair of hashes, then the pairs from those in a merkle tree.
For dicts hash the keys, build the radix tree or prefix trie. now hash the keyvalue pairs and walk up the tree calculating hashes. strings/bytes split into blocks and hash the list[blocks].

now you have a hash that is dependent on the list orders and the dict keys but not the encoding and where a small chunk can be validated without needing to send the whole object across the wire. I could have a big structure like the dict[twitter_user_id][tweet_id] = tweet and prove a tweet is in it and by a specific user with log2(||user_id||)+log2(||tweet_ids for user||) * hash_size + the tweet even large byte arrays or strings would be able to have a small proof of a slice in the middle of the bytes/string. by sending the merkle path.
20:37:22
@_discord_835289074768019496:t2bot.ioAaronDGoldman#8819 I should make slides for this I don't think I am explaining it well for anyone that does not already know what a merkle tree is. 🤔 20:38:24
@_discord_835289074768019496:t2bot.ioAaronDGoldman#8819 note: this pattern will have lots of branch misprediction and be much slower then hashing the bytes of an already canonicalised json or cbor. 20:41:53
@_discord_835289074768019496:t2bot.ioAaronDGoldman#8819 most elements, keys and values will be less then the block size and just be hashed but if some fool wanted to use a megabyte or petabyte sized string as a key I would want the tree structured hashing so we can go at it in parallel. 20:47:54
@_discord_712939614985388042:t2bot.ioianopolous#1330 This sounds a bit like defining a new canonical format, crossed with IPLD. What canonical byte encoding for strings will you use? Doubles? 20:53:55
@_discord_835289074768019496:t2bot.ioAaronDGoldman#8819 I would use string as a alias for byteArray utf-8,
If there is a way to define a typed byteArray Doubles are a tagged byte[8]
and then you need a way to express a unrecognised tag
If I know what a u8, i8, u16, i16, u32, i32, u64, i64, f32, f64 then I can treat them special if I don't then they can be expanded to a {"tag/f64", "value": b'\x1f\x85\xebQ\xb8\x1e\t@'} probably b85 for json where you can't have a bytes. This lets you have custom types but also self describing for when the receiver does not know the custom types.
22:09:48
@_discord_835289074768019496:t2bot.ioAaronDGoldman#8819 * I would use string as a alias for byteArray utf-8,
If there is a way to define a typed byteArray Doubles are a tagged byte[8]
and then you need a way to express a unrecognised tag
If I know what a u8, i8, u16, i16, u32, i32, u64, i64, f32, f64 then I can treat them special if I don't then they can be expanded to a {"tag/f64": b'\x1f\x85\xebQ\xb8\x1e\t@'} probably b85 for json where you can't have a bytes. This lets you have custom types but also self describing for when the receiver does not know the custom types.
22:10:26
@_discord_835289074768019496:t2bot.ioAaronDGoldman#8819 Yes, it is a canonical format but hopefully one that makes updating a part of the object only re-hash the path to the root. I am try to get the benefits of structural sharing to make working with immutable objects logarithmic compared to the linear costs associated with json/cbor. I want it to be affordable to hash objects whether they are laid out as documents, row store, column store, triple store, or whatever. 22:16:03
@_discord_835289074768019496:t2bot.ioAaronDGoldman#8819 Array of Structs, Structs of Arrays, or Array of Structs of Arrays we need the objects to have the canonical hash so we store and search efficiently. For small objects just serialize out and hash the bytes but for very large objects like the sum of all facebook public posts that won't work it needs to work more like a git hash then a document hash. 22:20:48
@_discord_835289074768019496:t2bot.ioAaronDGoldman#8819 * Array of Structs, Struct of Arrays, or Array of Structs of Arrays we need the objects to have the canonical hash so we store and search efficiently. For small objects just serialize out and hash the bytes but for very large objects like the sum of all facebook public posts that won't work it needs to work more like a git hash then a document hash. 22:21:07
@_discord_835289074768019496:t2bot.ioAaronDGoldman#8819 also yes I do think this is like IPLD https://ipld.io/specs/codecs/dag-cbor/spec/
and I hope they move more in this direction.
22:24:54
@_discord_712939614985388042:t2bot.ioianopolous#1330 It means everyone will need at least two encoders, json/cbor and this new one, and there are no pre existing libraries for this to build on which might slow adoption. There's also a size blowup because most strings are < hash size, and all primitive types like int doubles etc are as well. Is that a problem? For doubles specifically I was worrying about the canonical conversion from a double to 8 bytes, which is super error prone. 22:39:56
@_discord_712939614985388042:t2bot.ioianopolous#1330 I think you can do all this with IPLD as it is today. 22:40:06
@_discord_712939614985388042:t2bot.ioianopolous#1330 If you have a huge map or whatever then it just gets sharded in a standard way like a champ, or similar. And your max "small object" size is 1MiB 22:41:38

There are no newer messages yet.


Back to Room ListRoom Version: 6