Skip to content

Problem with bigint range filtering #3300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
5 tasks
zhangsanhuo opened this issue Apr 11, 2025 · 5 comments
Open
5 tasks

Problem with bigint range filtering #3300

zhangsanhuo opened this issue Apr 11, 2025 · 5 comments
Labels
bug waiting Waiting for the original poster (in most cases) or something else

Comments

@zhangsanhuo
Copy link

zhangsanhuo commented Apr 11, 2025

Bug Description:

Create Table Statement:

DROP TABLE t1;
CREATE TABLE t1 (
 id bigint,
 start_int bigint,
 last_int bigint,
 pub_date timestamp,
 up_date timestamp
);

Data Volume:
The data volume is approximately 3 million records.

Issue Description:
The following query works fine in actual use when filtering and comparing:
SELECT * FROM t1 WHERE start_int <= 16777217 AND last_int >= 16777217;
However, the following query often returns empty results. I am not certain, but in multiple tests of recreating the index, it returns empty results about 17 out of 20 times:
SELECT * FROM t1 WHERE start_int <= 16819712 AND last_int >= 16819712;
I am confident that this data exists in the index every time, as it can be retrieved using the following command:
SELECT * FROM t1 WHERE start_int = 16819712;
I truly haven't been able to identify the possible cause of this issue.

Additional Test 3:
I have tested dozens of times, and I found that it is not related to the number of imports, but seems to be related to the quantity of 8192. When the number of imports exceeds 8191, the query SELECT start_int, last_int FROM test1 WHERE start_int <= 16819712 AND last_int >= 16819712; does not work.
Here are my test results. To avoid interference from randomness, I did two rounds of testing for each row.
Table creation statement:

DROP TABLE test1;
CREATE TABLE test1 (id bigint, start_int bigint, last_int bigint);
SELECT count(*) FROM test1;

Query statements:

SELECT count(*) FROM test1;
SELECT start_int, last_int FROM test1 WHERE start_int <= 16777216 AND last_int >= 16777216;
SELECT start_int, last_int FROM test1 WHERE start_int = 16819712;
SELECT start_int, last_int FROM test1 WHERE start_int <= 16819712 AND last_int >= 16819712;

success:
Batch quantity per time 1000
Total quantity 5000

Batch quantity per time 1000
Total quantity 8000

Batch quantity per time 2000
Total quantity 8000

Batch quantity per time 3000
Total quantity 8000

Batch quantity per time 5000
Total quantity 8000

Batch quantity per time 1000
Total quantity 8100

Batch quantity per time 1000
Total quantity 8150

Batch quantity per time 1000
Total quantity 8175

Batch quantity per time 1000
Total quantity 8180

Batch quantity per time 1000
Total quantity 8190

Batch quantity per time 1000
Total quantity 8191

Batch quantity per time 8191
Total quantity 8191

failed:
Batch quantity per time 1000
Total quantity 10000

Batch quantity per time 1000
Total quantity 9000

Batch quantity per time 1000
Total quantity 8500

Batch quantity per time 1000
Total quantity 8200

Batch quantity per time 1000
Total quantity 8199

Batch quantity per time 1000
Total quantity 8195

Batch quantity per time 1000
Total quantity 8192

Batch quantity per time 8192
Total quantity 8192

Manticore Search Version:

Server version: 7.0.0 92c650401@25013002 (columnar 4.0.0 5aa8e43@25012409) (secondary 4.0.0 5aa8e43@25012409) (knn 4.0.0 5aa8e43@25012409) git branch manticore-7.0.0...origin/manticore-7.0.0

Operating System Version:

Linux hcss-ecs-1aa6 6.1.0-22-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.94-1 (2024-06-21) x86_64 GNU/Linux

Have you tried the latest development version?

None

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation reviewed
  • Changelog updated
@manticoresoftware manticoresoftware deleted a comment Apr 13, 2025
@tomatolog
Copy link
Contributor

could you provide your data that reproduce this case locally here?

As from your description it is not clear what cause this issue full-scan or secondary index

@zhangsanhuo
Copy link
Author

could you provide your data that reproduce this case locally here?

OK, I made a test data that can perfectly reproduce my problem.

Create a test table:
drop table test1;CREATE TABLE test1 (id bigint,start_int bigint,last_int bigint);select count(*) from test1;

First generate 8192 data:

import csv

def generate_continuous_csv(filename, start_value, rows, step=8):
    with open(filename, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(['start_int', 'last_int'])
        
        current_start = start_value
        for _ in range(rows):
            current_end = current_start + step
            writer.writerow([current_start, current_end])
            current_start = current_end + 1  

if __name__ == "__main__":
    FILENAME = 'manticoreTest.csv'
    INITIAL_START = 100000000  
    TOTAL_ROWS = 8192        
    
    generate_continuous_csv(FILENAME, INITIAL_START, TOTAL_ROWS)

Then batch import the import:

import csv
import json
import manticoresearch
from manticoresearch.rest import ApiException
from pprint import pprint
import time

input_csv = 'manticoreTest.csv'
batch_size = 1000  
table_name = "test1"

end_number = 8192

configuration = manticoresearch.Configuration(
    host="http://localhost:9308"
)


selected_fields = [
    'start_int', 'last_int'
]

def send_bulk(docs, api_instance):
    if not docs:
        return
    bulk_body = '\n'.join(map(json.dumps, docs))
    try:
        api_response = api_instance.bulk(bulk_body)
        pprint(api_response)
    except ApiException as e:
        print(" IndexApi->bulk error: %s\n" % e)

with manticoresearch.ApiClient(configuration) as api_client:
    api_instance = manticoresearch.IndexApi(api_client)
    
    batch_docs = []  
    total = 0       

    with open(input_csv, newline='', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            doc_fields = {field: row.get(field, "") for field in selected_fields}
            doc = {"insert": {"table": table_name, "doc": doc_fields}}
            batch_docs.append(doc)
            total += 1
            
            if len(batch_docs) >= batch_size:
                print(f" {total - len(batch_docs) + 1} to {total} ")
                send_bulk(batch_docs, api_instance)
                batch_docs = []  
            if total ==end_number:
                print(f"total: {total}, break")
                break
    

    if batch_docs:
        print(f"last {len(batch_docs)} record, total {total} ")
        send_bulk(batch_docs, api_instance)

print("success counts:", total)

Now there are 8192 records in the test1 table. Try to execute the following statement:
SELECT start_int,last_int FROM test1 WHERE start_int <= 100000153 AND last_int >= 100000153; SELECT start_int,last_int FROM test1 WHERE start_int = 100000198;SELECT start_int,last_int FROM test1 WHERE start_int <= 100000198 AND last_int >= 100000198;

Image

You can see that the third statement clearly meets the conditions, but returns an empty set.

Let's continue to try to import only 8191 data. First, rebuild the test1 table.
drop table test1;CREATE TABLE test1 (id bigint,start_int bigint,last_int bigint);select count(*) from test1;

Import again. The difference from the import code above is only in this part:

input_csv = 'manticoreTest.csv'
batch_size = 1000
table_name = "test1"
end_number = 8191

Image

You can see that the same statement returns the result for the third line.

I don’t know what the reason is, I hope this test data set can help you reproduce the problem.

@sanikolaev
Copy link
Collaborator

Hello @zhangsanhuo

I can't reproduce your issue. To be on the same page I've created repo https://github.com/sanikolaev/issue-3300 based on the details you provided. Here's how it works for me:

git clone [email protected]:sanikolaev/issue-3300.git
cd issue-3300
docker build -t test .
docker run -it --rm test bash
searchd
mysql -P9306 -h0 -e "CREATE TABLE test1 (id bigint,start_int bigint,last_int bigint);select count(*) from test1"
python3 /generate.py
python3 /import.py
mysql -P9306 -h0 -v -e "SELECT start_int,last_int FROM test1 WHERE start_int <= 100000153 AND last_int >= 100000153; SELECT start_int,last_int FROM test1 WHERE start_int = 100000198;SELECT start_int,last_int FROM test1 WHERE start_int <= 100000198 AND last_int >= 100000198;"

The last command returns:

--------------
SELECT start_int,last_int FROM test1 WHERE start_int <= 100000153 AND last_int >= 100000153
--------------

+-----------+-----------+
| start_int | last_int  |
+-----------+-----------+
| 100000153 | 100000161 |
+-----------+-----------+
--------------
SELECT start_int,last_int FROM test1 WHERE start_int = 100000198
--------------

+-----------+-----------+
| start_int | last_int  |
+-----------+-----------+
| 100000198 | 100000206 |
+-----------+-----------+
--------------
SELECT start_int,last_int FROM test1 WHERE start_int <= 100000198 AND last_int >= 100000198
--------------

+-----------+-----------+
| start_int | last_int  |
+-----------+-----------+
| 100000198 | 100000206 |
+-----------+-----------+

i.e. I can't see this:

You can see that the third statement clearly meets the conditions, but returns an empty set.

Can you please modify the Dockerfile, so the issue can be reproduced?

@sanikolaev sanikolaev added the waiting Waiting for the original poster (in most cases) or something else label Apr 25, 2025
@zhangsanhuo
Copy link
Author

Thanks for your reply.

I'm sorry I don't know much about docker, but I looked at the Dockerfile you tested. The version of manticore search is manticore:9.2.14. Then I upgraded the manticore version on the server.

It was:
Server version: 7.0.0 92c650401@25013002 (columnar 4.0.0 5aa8e43@25012409) (secondary 4.0.0 5aa8e43@25012409) (knn 4.0.0 5aa8e43@25012409) git branch manticore-7.0.0...origin/manticore-7.0.0
Now it is:
Server version: 9.2.14 23296c0f8@25032816 (columnar 4.2.0 417f074@25032808) (secondary 4.2.0 417f074@25032808) (knn 4.2.0 417f074@25032808) git branch manticore-9.2.14...origin/manticore-9.2.14
There is indeed no problem with the test.

But if you use Server version: 7.0.0 for testing, you will find that the BUG I described will appear 9 times out of 10 times.

It is really strange. This problem puzzled me for many days.

@sanikolaev
Copy link
Collaborator

@zhangsanhuo I can't reproduce the issue in 7.0.0 either:

root@43a9c03b2418:/var/lib/manticore# searchd -v
Manticore 7.0.0 92c650401@25013002
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)

root@43a9c03b2418:/var/lib/manticore# mysql -P9306 -h0 -v -e "SELECT start_int,last_int FROM test1 WHERE start_int <= 100000153 AND last_int >= 100000153; SELECT start_int,last_int FROM test1 WHERE start_int = 100000198;SELECT start_int,last_int FROM test1 WHERE start_int <= 100000198 AND last_int >= 100000198;"
--------------
SELECT start_int,last_int FROM test1 WHERE start_int <= 100000153 AND last_int >= 100000153
--------------

+-----------+-----------+
| start_int | last_int  |
+-----------+-----------+
| 100000153 | 100000161 |
+-----------+-----------+
--------------
SELECT start_int,last_int FROM test1 WHERE start_int = 100000198
--------------

+-----------+-----------+
| start_int | last_int  |
+-----------+-----------+
| 100000198 | 100000206 |
+-----------+-----------+
--------------
SELECT start_int,last_int FROM test1 WHERE start_int <= 100000198 AND last_int >= 100000198
--------------

+-----------+-----------+
| start_int | last_int  |
+-----------+-----------+
| 100000198 | 100000206 |
+-----------+-----------+

The Dockerfile is:

FROM manticoresearch/manticore:7.0.0
RUN apt -y update
RUN apt -y install python3 pip git
RUN pip install git+https://github.com/manticoresoftware/manticoresearch-python.git
ADD generate.py /
ADD import.py /

I'm afraid there's not much we can do until we can reproduce the issue. It would be really helpful if you could:

  • try to reproduce the issue in Docker
  • which might mean you'll need to change something in the Dockerfile or adjust the commands you use to reproduce the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug waiting Waiting for the original poster (in most cases) or something else
Projects
None yet
Development

No branches or pull requests

3 participants