Perspective | Open Access

JATS and JATS4R: Its Importance in Publishing Workflows and the Dissemination of Information in the Scholarly Ecosystem

    Melissa Harison

    Head of Production Operations, eLife Sciences, UK



In a traditional journal publishing process an author submits a manuscript to a journal and if the article is accepted for publication (after peer review and revision) it goes to production where it can be copy edited, quality controlled, and converted to different formats for publication. On publication, the article will be hosted on the journal website, and details about the article (ie, metadata) can be posted to a number of different locations (eg, PubMed, Scopus, Web of Science, Crossref) to help promote the article to readers far and wide. Many publishers are experimenting with different forms of publication and peer review (for example, eLife, where I work, only reviews articles that have been posted as preprints). With these changes going on, having structured content and metadata in XML becomes even more useful. One of the traditional outputs of journal article publication, which many authors still see as a stamp of verification or point of pride, is the production of a typeset PDF in the style of the journal. However, this product is just one of the many outputs from the process, and XML can be used (along with a template) to produce this PDF easily, with the added advantage of making all the information available in a machine-readable format too.

Copyright © 2022 Melissa Harison. This is an open-access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 

XML stands for Extensible Markup Language, and it defines a set of rules for encoding the content of the document in machine- and human-readable format. This is a standard format that can be used to share with others to access information, repurpose it, store it, move it about and also display it. Because XML is a good format for complex documents, which journal articles are, the National Library of Medicine generated a DTD (document type definition) for publishers to use, which is now called JATS. A DTD is like a recipe, in the sense that if an item of content is tagged in a certain way, other platforms, users and so on will recognise it as that type of content. For example, if you put a surname in a given name field, the instruction is wrong to the machine that is reading it. A person might be able to work it out with context, but say one platform you send content to reduces give names to initials, then instead of Melissa Harrison, M Harrison, or Harrison M, you will see H Melissa. Even the human reader might assume my surname is Melissa and my given name starts with H. For citations and tools creating citation tracking, this is all the more important because I would lose a citation from my portfolio because the name is so different. Although the use of my ORCID could address this issue, that still relies on journals collecting my ORCID from me and pushing it to all these endpoints and those locations using it too. Also, pure and simple it's bad metadata!

People can be scared of the pointy brackets used in XML, but with a little bit of help and knowledge you can see the purpose and benefits (without needing to understand all of XML). However, because JATS needs to support all publishers, it cannot be too strict otherwise some people won't be able to use it for their content. So, it's a flexible standard that allows publishers to create a complex structure (such as a journal article) and display it in different ways. This is fine if it's only used for that one internal structure and display - think of one type of custom bricks being used to build a house, and a different type of custom bricks being used to build a second house-these two bricks are going to struggle to come together to build a third house without a new architectural plan.

This is where JATS4R comes in, if we can come up with more standardisation in the bricks, we can get closer to these items being reused with minimal extra effort to work around the differences. Considering 60% of a data scientist's time is spent in cleaning and organizing the data, if JATS4R can reduce that time, it's a win. This is important for open science and meta-research, and it can also reduce waste in the publishing ecosystem.

I work in open access and open science. Therefore, for new initiatives that come along or are emerging, I want to have a standard way to tag them in XML upfront so everyone can recognise the important metadata in lots of articles across many publishers quickly and easily- for instance who funded this work? What did the authors contribute? Is there any data or software associated with this article that will make it easier to reproduce the results? Are there any peer review documents associated with this article hosted elsewhere? Many of the JATS4R recommendations cover these topics.

Millions of journal articles are published every year so it is vital for authors that journals ensure that articles are indexed and disseminated far and wide to increase the chances of each article being read and making an impact. If publishers don't make it easy for this to happen, their journals and authors will lose out.

How to Cite this paper?


APA-7 Style
Harison, M. (2022). JATS and JATS4R: Its Importance in Publishing Workflows and the Dissemination of Information in the Scholarly Ecosystem. Trends Schol. Pub, 1(1), 9-10. https://doi.org/10.21124/2022.011

ACS Style
Harison, M. JATS and JATS4R: Its Importance in Publishing Workflows and the Dissemination of Information in the Scholarly Ecosystem. Trends Schol. Pub 2022, 1, 9-10. https://doi.org/10.21124/2022.011

AMA Style
Harison M. JATS and JATS4R: Its Importance in Publishing Workflows and the Dissemination of Information in the Scholarly Ecosystem. Trends in Scholarly Publishing. 2022; 1(1): 9-10. https://doi.org/10.21124/2022.011

Chicago/Turabian Style
Harison, Melissa . 2022. "JATS and JATS4R: Its Importance in Publishing Workflows and the Dissemination of Information in the Scholarly Ecosystem" Trends in Scholarly Publishing 1, no. 1: 9-10. https://doi.org/10.21124/2022.011