Skip to content

Display Source URL for Extracted Fields when Using custom.yaml in Katana #1182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kayra-s4e opened this issue Feb 17, 2025 · 4 comments · May be fixed by #1197
Open

Display Source URL for Extracted Fields when Using custom.yaml in Katana #1182

kayra-s4e opened this issue Feb 17, 2025 · 4 comments · May be fixed by #1197
Assignees
Labels
Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@kayra-s4e
Copy link

katana version:

v1.1.2 (latest)

Current Behavior:

When using a custom.yaml file and extracting a specific field with Katana, the output does not show the source URL from which the field was found.

Expected Behavior:

When using a field from custom.yaml, the output should include the source URL along with the extracted field.

Example Command:

katana -u https://redacted.com/ -flc custom.yaml -f tckno

Expected Output:

39478627938 ---> https://redacted.com/index.php
98575212324 ---> https://redacted.com/index.php
12345678950 ---> https://redacted.com/contact.php
44455354000 ---> https://redacted.com/index.php
23456789012 ---> https://redacted.com/about.php
10012345000 ---> https://redacted.com/index.php

Steps To Reproduce:

  1. Create a custom.yaml file with a field definition.
  2. Run Katana with the -flc and -f options.
  3. Observe that the extracted field is shown, but the source URL is missing from the output.

Anything else:

Adding the source URL to the output when using custom fields would improve traceability and usability. Implementing a flag to enable this functionality when using the -f parameter would also be beneficial.

Thank you!

@kayra-s4e kayra-s4e added the Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Feb 17, 2025
@ehsandeep
Copy link
Member

@kayra-s4e source information is part of jsonl output, did you see it missing in jsonl or CLI?

@kayra-s4e
Copy link
Author

Yes, I have also tried using the -j parameter. When I use -j to output in JSON format, it writes all the crawled endpoints to the output file. However, in this scenario, even though 7 TCK numbers were found, the JSON output contains 15 lines. How can I determine which TCK number belongs to which source?

Image

@dogancanbakir dogancanbakir self-assigned this Feb 17, 2025
@dogancanbakir
Copy link
Member

@dwisiswant0 I think we can export custom fields in JSON format(as part of JSON output) to enable this mapping.

@dwisiswant0 dwisiswant0 linked a pull request Feb 28, 2025 that will close this issue
@dwisiswant0 dwisiswant0 linked a pull request Feb 28, 2025 that will close this issue
@kayra-s4e
Copy link
Author

any update??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants