Replies: 1 comment 4 replies
-
We are looking into this, but don't know how to do it:
Initially, we can provide read-only support, then users will ask for read-write support, that's the thing we decided to avoid in the beginning. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
First, congrats on the 1.0 GA release!
According to the FAQ section in the docs, accessing existing data in object store is not yet supported.
Do you have a plan to support this feature?
Would it be easier if the existing data is read-only?
Here is a compelling reason to do so: there are many AI/ML workloads that would like to access public (or private) datasets (often with average object size smaller than jfs blocksize) in read-only mode that's already in (cloud) some object stores. Having to copy them into juicefs is sub-optimal. One compelling reason to use jfs is the separate metadata store that makes metadata operations like listing all the files in a tree efficiently for multiple data parallel clients. Prefix listing a large object store tree takes several minutes per client. We worked around this problem by explicitly generating a shared static manifest file that could be fetched in a second or two. An ideal usage example:
The format command would scan the read-only import uris (check prefix conflicts) and create fs metadata for imported data without copying any data. Afterwards, If you
ls /mnt/jfs/<volume-name>
, you should see one directory (prefix from imported uris) per import uri instead of an empty directory.ls -R
should be fast because these directories are read-only and no rescan of original object stores is needed.Beta Was this translation helpful? Give feedback.
All reactions