!wZBXZwtTJAgBQYuVGZ:matrix.org

#tantivy:matrix.org

34 Members
1 Servers

Load older messages


Timestamp Message
20 May 2020
19:04:09@gitter_geometrically:matrix.org@gitter_geometrically:matrix.org left the room.
21 May 2020
03:59:46@gitter_jkathir:matrix.orgprog20901 (Gitter) how to create a website full of text files or documents or text categories with quick search with tantivy......ex: https://table.branham.org/#/main -- i want a website using tantivy where i want to import huge text files and quickly search for the info.. Please advise....It can be log files, it can be any text files....etc.,
06:53:33@gitter_madmaxio:matrix.orgmadmaxio (Gitter)Prog strikes back ^^
09:05:54@gitter_blabno:matrix.orgBernard Labno (Gitter) joined the room.
09:06:03@gitter_blabno:matrix.orgBernard Labno (Gitter) Hello, how fast can you get Tantivy to index documents? I'm getting like 1 document per second, so I'm obviously doing something wrong.
13:39:07@gitter_blabno:matrix.orgBernard Labno (Gitter) Right, committing after each add_document kills the performance. Indexing 100 docs with commit after each add takes 41569 millis while committing only once at the end takes it down to 1206 millis.
14:24:52@gitter_fulmicoton:matrix.orgPaul Masurel (Gitter) @blabno correct. You will get much better performance if you index in bulk
14:25:53@gitter_fulmicoton:matrix.orgPaul Masurel (Gitter)
1GB per min on any half decent 4 core CPU
14:28:08@gitter_fulmicoton:matrix.orgPaul Masurel (Gitter) I've been working at making it a little bit better but it will still be highly recommended to index in bulk.
22 May 2020
06:54:08@gitter_ppodolsky:matrix.orgPasha Podolsky (Gitter) joined the room.
06:54:09@gitter_ppodolsky:matrix.orgPasha Podolsky (Gitter)Hi, guys! Is it a right place to discuss some development contribution I would like to do?:)
23 May 2020
00:56:45@gitter_fulmicoton:matrix.orgPaul Masurel (Gitter) @ppodolsky yes if you want to casually discuss it before creating a ticket this is the right place
05:40:06@gitter_ppodolsky:matrix.orgPasha Podolsky (Gitter) @fulmicoton I've a search of about 100M items and a search box opened to wild users of the Internet. So, I'm very interested in a lenient mode in the query parser and in a limiting query size. I'm ready to implement both things.
For the first feature I saw tantivy-search/tantivy#382 , and would like you to provide a status update - what should be done to finish this PR.
For the second one I've made ppodolsky/tantivy@0f97929. The idea is to allow users to limit a number of leafs in AST and leaving leftmost ones if the number of leafs are exceeding the selected limit. This approach allows to set up more complex limiting policies and different weights for different boolean operations. What do you think about it?
05:41:12@gitter_ppodolsky:matrix.orgPasha Podolsky (Gitter) (edited) ... leafs are exceeding ... => ... leafs is exceeding ...
08:40:17@gitter_fulmicoton:matrix.orgPaul Masurel (Gitter)Iwould certainly love a lenient mode.
08:41:08@gitter_fulmicoton:matrix.orgPaul Masurel (Gitter)As for limiting the query size...bthere are many ways to do this.
08:41:29@gitter_fulmicoton:matrix.orgPaul Masurel (Gitter)Triming the AST feel a bit overkill.
08:41:51@gitter_fulmicoton:matrix.orgPaul Masurel (Gitter)But sure, why not !
09:26:20@gitter_ppodolsky:matrix.orgPasha Podolsky (Gitter) I'd like AST trimming as IMO it allows restrict queries without 1) breaking original query too violently like it could be done by plain length limiting 2) additional parsing - we already have parsed tree at this level.
Moreover, 1) can destroy boolean queries totally.
So, would you mind if I will open PR/issues for both features in a while to continue discussion there?
11:26:48@gitter_fulmicoton:matrix.orgPaul Masurel (Gitter) Yes! That sounds great
24 May 2020
07:49:58@gitter_rominf:matrix.orgRoman Inflianskas (Gitter) joined the room.
07:49:58@gitter_rominf:matrix.orgRoman Inflianskas (Gitter)

Hi.

I'm building an embedded Python No SQL database and I want to have a fast full-text search. Luckily I found tantivy-py.

Here are the problems I faced (not sure if they are bugs or just my misunderstanding):
I want to save ObjectId (similar to Mongo's ObjectId) in the doc.

>>> doc = tantivy.Document(b=b'abc')  
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-9693b204efef> in <module>
----> 1 doc = tantivy.Document(b=b'abc')

ValueError: Value unsupported b'abc'

If I use a separate method, it works:

>>> doc = tantivy.Document()          
>>> doc.add_bytes('b', b'abc')

However, when I want to get it back, I'm getting a list instead of bytes:

>>> doc.get_first('b')                
[97, 98, 99]

Also, when I retrieve this document (I follow the example at README), it contains no 'b' field:

>>> best_doc['b']
[]
07:52:30@gitter_rominf:matrix.orgRoman Inflianskas (Gitter) (edited) ... misunderstanding): I want to save ObjectId (similar to Mongo's ObjectId) in the doc. ``` ... => ... misunderstanding): I want to save ObjectId (similar to Mongo's ObjectId) in the doc. It is bytes. Here I put `b'abc'` for simplicity. ``` ...
09:21:43@gitter_fulmicoton:matrix.orgPaul Masurel (Gitter)It is probably a bad idea to use bytes field for this at the moment.
09:22:17@gitter_fulmicoton:matrix.orgPaul Masurel (Gitter)These are only fast fields, they are not stored nor indexable.
09:22:49@gitter_fulmicoton:matrix.orgPaul Masurel (Gitter)I suggest you use a string field and encode your objectid bytestr in base64 on your python code side
09:48:57@gitter_rominf:matrix.orgRoman Inflianskas (Gitter) @fulmicoton Thanks for your suggestion, answer, and tantivy* in general!
13:21:38@gitter_fulmicoton:matrix.orgPaul Masurel (Gitter) You are welcome! There seems to be an ibterest for a proper support of binary fields these days... It will likely happen soon
25 May 2020
21:34:07@gitter_bloodbare:matrix.orgRamon Navarro Bosch (Gitter) joined the room.
21:34:07@gitter_bloodbare:matrix.orgRamon Navarro Bosch (Gitter) I've added facets filtering and aggregation logic on the python interface tantivy-search/tantivy-py#21

There are no newer messages yet.


Back to Room List