Skip to content

Mixed output causes httpie to preprocess it incorrectly #1620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
danbulant opened this issue Mar 8, 2025 · 1 comment
Open
2 tasks done

Mixed output causes httpie to preprocess it incorrectly #1620

danbulant opened this issue Mar 8, 2025 · 1 comment
Labels
bug Something isn't working new Needs triage. Comments are welcome!

Comments

@danbulant
Copy link

danbulant commented Mar 8, 2025

Checklist

  • I've searched for similar issues.
  • I'm using the latest version of HTTPie.

Minimal reproduction code and steps

  1. Create a request to a service that returns mime type text/html with json body and escaped html inside a string
  2. Observe the HTML getting highlighted and characters converted to their unescaped versions
  3. Compare with piping to cat to remove preprocessing, where the characters are left as they are

Current result

For example, proxy dns.google but set it's return content-type to text/html (proxy_pass https://dns.google; add_header Content-Type text/html always; in nginx).

http "http://localhost/resolve?name=example.com%3Cscript%3Ealert(1)%3C%2Fscript%3E" -v | cat
GET /resolve?name=example.com%3Cscript%3Ealert(1)%3C%2Fscript%3E HTTP/1.1
Accept-Encoding: gzip, deflate, br
Accept: */*
Connection: keep-alive
User-Agent: HTTPie/3.2.4
Host: dns.google

HTTP/1.1 200 OK
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Access-Control-Allow-Origin: *
Date: Sat, 08 Mar 2025 11:22:11 GMT
Expires: Sat, 08 Mar 2025 11:22:11 GMT
Cache-Control: private, max-age=86399
Content-Type: text/html; charset=UTF-8
Content-Encoding: gzip
Server: HTTP server (unknown)
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
Transfer-Encoding: chunked

{"Status":3,"TC":false,"RD":true,"RA":true,"AD":true,"CD":false,"Question":[{"name":"example.com\u003cscript\u003ealert(1)\u003c/script\u003e.","type":1}],"Authority":[{"name":".","type":6,"TTL":86399,"data":"a.root-servers.net. nstld.verisign-grs.com. 2025030800 1800 900 604800 86400"}]}

is the raw code, but without |cat gets rendered as

{
    "AD": true,
    "Authority": [
        {
            "TTL": 86397,
            "data": "a.root-servers.net. nstld.verisign-grs.com. 2025030800 1800 900 604800 86400",
            "name": ".",
            "type": 6
        }
    ],
    "CD": false,
    "Question": [
        {
            "name": "example.com<script>alert(1)</script>.",
            "type": 1
        }
    ],
    "RA": true,
    "RD": true,
    "Status": 3,
    "TC": false
}

which is incorrect and can be confusing

Expected result

Same as |cat output as there's no real HTML to prettify

Additional information, screenshots, or code examples

Image

@danbulant danbulant added bug Something isn't working new Needs triage. Comments are welcome! labels Mar 8, 2025
@rhit-reillydj
Copy link

rhit-reillydj commented May 23, 2025

I’d like to clarify what’s going on under the hood:

By-design behavior for text/html
HTTPie treats any response labeled Content-Type: text/html as “opaque” text, so when you request pretty-printed JSON with --json it still (a) syntax-highlights it as HTML, and (b) hands the raw Python object to json.dumps(..., ensure_ascii=False). That parameter is explicitly chosen to improve human readability by unescaping \uXXXX sequences into their corresponding characters.

Why it feels like a bug
It only surfaces when a server mislabels a JSON payload as text/html. Because the JSON body contains escaped HTML ("\u003c"), you end up seeing < in the output, even though the original JSON literally contained \u003c.

Options to preserve your escapes

Fix upstream: Have your server use the correct Content-Type: application/json; charset=utf-8. Then HTTPie will (correctly) call json.dumps(..., ensure_ascii=True), preserving all \uXXXX sequences.

Workaround in HTTPie: You could add a flag (or patch) around that one call site in json.py to force ensure_ascii=True when you detect --json, or introduce a new option like --preserve-escapes.

Conclusion
The premature unescaping is indeed happening in HTTPie, but it’s an intentional readability feature for non-JSON content. The “real” bug is on the server side sending the wrong Content-Type header. If you’re blocked by a server you can’t change, we could consider adding a new HTTPie option to preserve all escapes regardless of content type. Let me know if you’d like to collaborate on implementing that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working new Needs triage. Comments are welcome!
Projects
None yet
Development

No branches or pull requests

2 participants