Skip to content

Fix to_biblatex_string and to_bibtex_string #76

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
anwaralameddin opened this issue May 5, 2025 · 0 comments
Open

Fix to_biblatex_string and to_bibtex_string #76

anwaralameddin opened this issue May 5, 2025 · 0 comments

Comments

@anwaralameddin
Copy link

Hello,

I have a rather large and incomplete bib file, and I am writing a tool that fetches bibliographic data using DOI, arXiv, MathSciNet and similar APIs to fill in missing fields in the original file. biblatex seems the most developed rust crate that supports biblatex. I would like to use it, especially for the provided structures like Person and Date.

The main issue I am facing is that biblatex does not preserve the parsed fields, i.e. when a bib file is parsed as Bibliography and later serialised using to_biblatex_string or to_bibtex_string, the resulting string problematically differs from the original one. For example,

fn main() {
    use biblatex::Bibliography;
    let src = r#"
    @article{key1,
    title = {Explicit homotopy limits of $\mathrm{dg}$-categories and twisted complexes},
    }
    @article{key2,
    title = {Hyper-K\"ahler Fourfolds Fibered by Elliptic Products},
    }
    "#;
    let bibliography = Bibliography::parse(src).unwrap();
    println!("{}", bibliography.to_biblatex_string());
}

outputs

@article{key1,
title = {Explicit homotopy limits of $\\mathrm\{dg\}$-categories and twisted complexes},
}

@article{key2,
title = {Hyper-Kähler Fourfolds Fibered by Elliptic Products},
}

The output differs not only by escaped Unicode characters but also by adding solidus and affecting braces. In the first case, the mathematical expression is messed up. The result in the second is not BibTeX-valid even when to_bibtex_string is used, and compiling the second case with bibtex results in an error:

document.bbl: error: 61: Invalid UTF-8 byte sequence (��h). \newblock Hyper-k��h

Given the project name, it is understandable if it does support BibTeX. Still, having the function to_bibtex_string available might be confusing when its results is not BibTeX-valid.

I would like to use the structures Person and Date as they simplify comparisons. Still, I want to preserve the fetched fields, as I want to use the final result with bibtex or biber. Also, as a basic check, I would like to be able to export the parsed bibliography before processing it and compare it with the original file to ensure that it was read completely. This is because some of the entries I have contain duplicate fields, and currently, some of the duplicate fields are silently ignored.

The issue seems to stem from the fact that functions used for parsing, like ContentParser::parse_impl and resolve::flatten, make irreversible changes. While this issue does not exist for RawBibliography, and in principle, I should be able to use it, RawBibliography lacks the desired structures and serialisation functions.

I realise that my objective does not necessarily align with this project’s as I would like the exported bib file to be usable with LaTeX rather than typst. Still, I think, in general, it’s easier to reason about the code if processing is separated from parsing. Also, a basic search on Google and GitHub suggests that to_biblatex_string and to_bibtex_string are not used by other typst repositories, so fixing the above does not seem to conflict with relevant projects.

If addressing the above is of interest, one may proceed in at least one of two directions,

  • Add the structures and serialisation functions to RawBibliography, in which case, Bibliography could be thought of as ProcessedBibliography or
  • Remove the processing from all functions called by Bibliography::parse to a separate function, possibly ChunksExt::process, which would mainly be called by ChunksExt::format_sentence and ChunksExt::format_verbatim.

I would gladly like to help if you would like to proceed with either direction or an alternative solution for the issue above.

Thank you for your time and consideration!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant