Skip to content

Provide encoding-related APIs for editor extensions #824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
haoqunjiang opened this issue Nov 30, 2015 · 61 comments
Closed

Provide encoding-related APIs for editor extensions #824

haoqunjiang opened this issue Nov 30, 2015 · 61 comments
Assignees
Labels
api feature-request Request for new features or functionality file-encoding File encoding type issues on-testplan plan-item VS Code - planned item for upcoming
Milestone

Comments

@haoqunjiang
Copy link

Currently there are only two fields in the TextEditorOptions API, a few other TextEditor-related APIs, and none of them is able to deal with the text buffer's encoding.
Comparing to Atom's TextEdit API, that is far from enough.

Most importantly, the API limitation makes it (seems) impossible for vscode-editorconfig to implement features like charset support, which is a crucial need for many people.

@alexdima
Copy link
Member

👍 Also trimTrailingWhitespaces is missing

@alexdima alexdima added api feature-request Request for new features or functionality labels Nov 30, 2015
@kmpm
Copy link

kmpm commented Dec 1, 2015

#751 and #844 would probably be helped by this as well.
At least #751 would be solvable if working with BOM were better and my issues would completely go away if there was a way to, per editor extension, set the encoding and other stuff.
So 👍

@egamma egamma modified the milestone: Backlog Dec 10, 2015
@jednano

This comment was marked as resolved.

@jrieken
Copy link
Member

jrieken commented Apr 29, 2016

  • set/get encoding
  • set/get charset

@iHuahua

This comment was marked as spam.

@buzzzzer

This comment was marked as spam.

@vysker
Copy link

vysker commented Sep 8, 2016

I also need this for my extension, as I explain here. Did not realize this issue already existed.

Definitely want this feature!

@tomasiser
Copy link

I believe that full .editorconfig support is blocked (editorconfig/editorconfig-vscode#35) until the API for buffer encoding is added. Is there any chance this fact could speed this up?

@jednano

This comment was marked as spam.

@afucher

This comment was marked as spam.

3 similar comments
@irudoy

This comment was marked as spam.

@takekazuomi

This comment was marked as spam.

@pixelhuh

This comment was marked as spam.

@bizoo

This comment was marked as spam.

@chongchai

This comment was marked as spam.

@gingerbeardman

This comment was marked as spam.

@irudoy

This comment was marked as spam.

@bpasero
Copy link
Member

bpasero commented Feb 13, 2025

👋 I need more input to know what extensions actually need here. My work covers 3 areas that we have in core today but it is well possible that I miss an important aspect that would not be covered by these 3 areas and/or one of the areas is not useful at all:

Add a readonly encoding: string property to TextDocument

This seems obvious to me: a way for an extension to get the encoding of a text document. That property can change, when the encoding changes, signaled via onDidChangeTextDocument. The value would be the identifier of the encoding, as used in VS Code.

Add a options?: { encoding?: string } parameter to TextDocument.save()

This allows to define the encoding to be used when saving a TextDocument. This will take the current contents of the document and write them to disk with the specified encoding.

Add a options?: { encoding?: string } parameter to openTextDocument()

Finally, this would allow to load a document or create an untitled one with a specific encoding. Note that if the document is dirty, it will save first. The file contents will be resolved from disk and decoded given the specified encoding.


There was also a point made about charset. I am not entirely sure what that would mean, as internally in VS Code we only deal with encoding.

Finally, are people expecting extension APIs to convert raw Uint8Array to different encodings? We are actually using iconv-lite NPM module under the hood for all encoding matters, so I would assume extensions could do the same: https://www.npmjs.com/package/iconv-lite

//cc @haoqunjiang @SamVerschueren @tomasiser

@xlilos
Copy link

xlilos commented Feb 18, 2025

Thanks for taking this one on @bpasero, for my extension, I think the APIs you are proposing would fit all the needs. Maybe there is some case for adding encoding parameters in FileSystemProvider interfaces so the provider could enforce some encoding for the underlying resource as need be. That said, someone could still get the encoding from the text document because they have the URI, probably more of a "nice to have".

One other potential use case outside of the extension API could be a CLI option to set encoding for opening a file or diffing/merging files. Although you could mitigate this with user/extension encoding config defaults (until someone overrides them...).

bpasero added a commit that referenced this issue Feb 19, 2025
@bpasero
Copy link
Member

bpasero commented Feb 19, 2025

Finally, are people expecting extension APIs to convert raw Uint8Array to different encodings?

Fyi we are now also pushing proposed API in #241160 to encode and decode between Uint8Array and string. Optionally allowing to set the encoding to use, with a fallback to using the rules VS Code applies. Such API is outside of TextDocument and meant for raw file operations or dealing with 3rd party tools.

@bpasero
Copy link
Member

bpasero commented Feb 19, 2025

All, here is an updated proposal for API related to encoding/decoding:

export interface TextDocument {
	/**
	 * The file encoding of this document that will be used when the document is saved.
	 *
	 * Use the {@link workspace.onDidChangeTextDocument onDidChangeTextDocument}-event to
	 * get notified when the document encoding changes.
	 */
	readonly encoding: string;
}

export namespace workspace {

	/**
	 * Opens a text document with the provided encoding.
	 */
	export function openTextDocument(uri: Uri, options?: { encoding?: string; }): Thenable<TextDocument>;
	export function openTextDocument(path: string, options?: { encoding?: string; }): Thenable<TextDocument>;
	export function openTextDocument(options?: { encoding?: string; }): Thenable<TextDocument>;

	/**
	 * Decodes the content from a `Uint8Array` to a `string`.
	 *
	 * If no encoding is provided, will try to pick an encoding based
	 * on user settings and the content of the buffer (for example
	 * byte order marks).
	 */
	export function decode(content: Uint8Array, uri: Uri | undefined, options?: { encoding: string }): Thenable<string>;

	/**
	 * Encodes the content of a `string` to a `Uint8Array`.
	 *
	 * If no encoding is provided, will try to pick an encoding based
	 * on user settings.
	 */
	export function encode(content: string, uri: Uri | undefined, options?: { encoding: string }): Thenable<Uint8Array>;

I think that TextDocument.encoding and the encode/decode methods on workspace are pretty straightforward and have a good chance of being finalised.

As for openTextDocument there is some caveats to be aware of:

  • this will read the contents of the file from the file system and convert based on the encoding
  • this CAN result in a change of the content of the document for any extension that has the document opened
  • this method THROWS if the document is dirty because in that case we cannot force revert the file to the version on disk

Finally, I am not seeing a good use case for the initially suggested save({ encoding: }) proposed API. I would not want to add API as final that in the end has no use cases, so I am asking for feedback here as well if there is any scenario for having it.

@SunsetTechuila
Copy link
Contributor

SunsetTechuila commented Feb 20, 2025

@bpasero hello! So it would be possible to use openTextDocument to reopen an already opened document with different encoding, firing the onDidChangeTextDocument event instead of onDidOpenTextDocument? Or how to change the encoding of an already opened document?

@bpasero
Copy link
Member

bpasero commented Feb 20, 2025

@SunsetTechuila openTextDocument with an encoding will always attempt to read the contents of the document from the file system and convert it based on the encoding provided. If there are characters that previously did not render or now cannot render, the content of the document changes, signalled by events.

Thing is, if the document is dirty (has changes by the user), we cannot force this to happen. We can also not really just save the document because it may result in broken contents, given the encoding maybe wrong. So we throw an error.

If you call openTextDocument with an encoding that is different than what is currently set, this will cause the encoding of the document to change.

Thinking about this more, I wonder if we need a way to change the encoding of the document without forcing it to save. We do not really offer this to the user today, we only have these 2 options:

Image

Or alternatively, as initially suggested, we go back to having the encoding in the save method 🤔

Can you clarify the use case here?

@bpasero
Copy link
Member

bpasero commented Feb 20, 2025

With the new proposed APIs already pushed, one could emulate the save(encoding) routine:

  • take the value of a TextDocument as string
  • use workspace.encode to get the Uint8Array for a specific desired encoding
  • use workspace.fs to write the bytes to disk at the location of TextDocument
  • use the openTextDocument API to open the document with that encoding

@bpasero
Copy link
Member

bpasero commented Feb 21, 2025

Continues in #241449 for finalisation. I encourage extension authors to play with the proposed API and report issues as they encounter them. Thanks 🙏

Fyi, extension tests are here showing how to use the new API:

test('encoding: text document encodings', async () => {

@bpasero
Copy link
Member

bpasero commented Apr 8, 2025

Fyi we plan to finalise the API for our April release: #246016

@vs-code-engineering vs-code-engineering bot locked and limited conversation to collaborators Apr 11, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api feature-request Request for new features or functionality file-encoding File encoding type issues on-testplan plan-item VS Code - planned item for upcoming
Projects
None yet
Development

Successfully merging a pull request may close this issue.