Writing Chemistry with LaTeX (3/3)

Posted on Mar 23, 2010

In this last part of writing chemistry with LaTeX, we will look at the chemscheme portion of chemstyle.  It offers automatic tracking for schemes and compounds, and is a major time-saver for synthetic chemists.  Download example packages here: version 1, version 2.

Before we look at how to work with chemscheme, let me show you what it can do.  Imagine you are writing the following report: Fischer Esterification.

If you’re writing this in a word processor, your work-flow would likely be to draw and number the structures in ChemDraw.  Then you will either drag it directly into the word processor, or prepare a .tiff of it and link/embed that into the document.  You would be careful to make sure that duplicated compounds (e.g., 1 here) gets the same numbering.  You would be very careful in making sure that the numbers in the main text are the right numbers throughout, and you’re very proud that it all works out just right.

And then, your supervisor made a thoughtful comment that you should include what the real target and its starting material is.  And those should go in front of all your structures.  Now you’re in a not-so-great place: the two compounds will displace all your numbering, and you have to renumber all your figures, and then track down all the stray numbers.  Our example is trivial, but imagine doing that with a paper that has over 100 compounds, spreading across 200 pages!

chemstyle provides a behind-the-scenes mechanism (by way of Stephan Schenk’s chemcompounds, or Bjørn Pedersen’s bpchem) to automatically track down the numbering of compounds and schemes, so the numbers are always right after you hit typeset.  To make the change to what the supervisor suggested, all I did here is just adding the additional scheme, and LaTeX-chemstyle does the hard work:

And in the rest of this post, we’ll look at the syntax and workflow that make this happen.  WordPress.com doesn’t color LaTeX syntax, so I’ll settle with screenshots.  You may want to follow with the examples, and they can be downloaded here: version 1, version 2.  Version 1 is the initial, version 2 is revised with the supervisor’s suggestion.

Structure Drawing

Draw the structures in your drawing program as usual (I’m using ChemDraw here).  However, instead of manually numbering the molecules, place a tag where there number would be; the default behaviour in chemstyle is to search for “TMP”.  Your structure file would thus look like:

You would probably want to save a copy as ChemDraw file as you usually do (in case you need to edit it), and you will need to save an additional copy as .eps (Encapsulated PostScript).  .eps, unlike .jpg or .png, is a vector file that LaTeX (through PSTricks) knows how to open up and replace content with.

In ChemDraw, choose Save As, and select .eps. On OSX, you’re given a choice of Mac or text format, and it does not matter which one you choose to use.  I like to organize my files in a /Figures/ directory, instead of saving directly with the paper.


Repeat for each scheme or structure.  When there are more than one structure you would want to label, use TMP1, TMP2… to denote them from left to right.

Annotating in LaTeX

The chemstyle package provide many different options to format the numbers.  For example: numbering with 1a, 1b, 1c; custom formatting for the labels (I use magenta here to make them stand out better); see the manual for more details.  We will only be concerned with getting the basics working here.

We need to first insert two lines in the preamble of our file (i.e., before begin{document}:

A – you’re already familiar with the chemstyle line.  The auto-pst-pdf allows you to use .eps files with pdfLaTeX by converting them into .pdf on the fly.  The 2 runs is required for the text-substitution (I think – if someone can clarify, that’d be great).  auto-pst-pdf is included in major LaTeX implementations.

B – these lines modify the schemerefformat command with the instructions to make it magenta and italics.  I tried this as part of the tutorial in Wright’s documentation, and I like it.  It stands out better, I think.  (When you are ready to submit your manuscript, just take out these lines to make the compound numbers bold, black, and boring again.)

The preamble is thus set up.  In your main text, whenever you need a scheme, all you need to do is:

  • The begin/end{scheme} indicates the scope of the scheme and that it is a scheme at first place – LaTeX counts quietly in the background to make sure the numbering in the caption and the textcomes out right.
  • begin/end{center} tells LaTeX you want the scheme to be in the middle (as opposed to being on the left or right side of the page).
  • schemeref[TMP]{cmpd:PhCOOiPr} indicates that the words “TMP” in the .eps file is to be substituted with a number, and that number is for a compound called “cmpd:PhCOOiPr”.  You could have name it anything you like – “isopropylbenzoate”, “1methyl2ethylbenzoate” – but it should be unique for that compound.  This is the handle you will refer to it in the main text.  (I like to append cmpd: to tell myself it’s a compound; sch: for schemes, and fig: for figures, etc.)
  • includegraphics line simply tells LaTeX which graphic to use here.

If you have multiple tags to substitute (TMP1, TMP2), then you will need one schemeref[]{} line for each tag.

In the main text, wherever you want the number for a compound to appear, you will use the command compound{name}; for example, for the ester above, it would be compound{cmpd:PhCOOiPr}.

Note that the first time you typeset this, a bunch of ?? is in the manuscript where I promised you numbers, even though the compounds in the scheme received the right numbering:

This is because LaTeX needs to go through the manuscript once to generate the numbers, and as it do so, it keeps a running tally of the name:number pair in an auxiliary file.  The second time you typeset, it looks and see that there is an auxiliary file available, and will use that to fill in the ??s.

There is a bug related to this.  If the manuscript had been typeset before, and new compounds are added in, the main text does not update correctly.  To avoid that, you’ll need to trash the .aux file so that the whole numbering is redone from scratch.  (Be careful about when you trash .auxs, especially when you’re generating a bibliography as well.)

And that will get your numbering right all the time.  A very acceptable trade-off for a short learning curve… thank you to Joseph Wright and others whose work he built upon.

Addendum April 6th 2010: I heard back from Joseph about the misnumbering of compounds.  This may be of interest to everyone using chemscheme/chemcompounds:

You’re the second person who’s mentioned this to me recently. The way that chemstyle works is just by loading the chemcompounds package. Due to the way floats are processed it’s safest to include compound* instructions before the floats to keep the numbering “in order”. I tend to have a block of compound* lines at the start of each chapter/file so that everything stays in order.

Joseph

Addendum May 29 2011: To get the numbering right for the ~250 compounds in the 1k page thesis, I built a tool to automatically generate the compound* from a TeX file. It’s entirely browser-based and you can find it here.  It’s not fully functional in that you need to delete the comma from the last entry; the fully functional version got mucked up in the coffeescript->JS->wordpress integration.  It should be a simple fix, but I’m not sure when I’ll have time to get back in the programming mode.

46 Comments

Join the conversation and post a comment.

  1. Piotr Wawrzyniak

    It might be regarded as a commercial (because I wrote this software), but there exist also another method for automatic compounds numbering in publications – using a konwerter software (http://piti.exroot.org/konwerter/). You can create the eps picture with the multiple reactions schemes, and all compounds there will be numbered. Might be a good choice – I know a few people have used it and were very happy.

    Cheers
    Piotr

  2. Deepak

    Very well explained. Thank you very much.

  3. dr.keshav mohan

    great. i am interested. link to latest version please.

  4. c-top

    you have too the package chemfig

  5. Silje

    You have just made my day. I have been so worried about how to get this sorted before I start typiong up my thesis (should have started months ago but I have kept putting it of) and you have just solved all my problems (apart from the obvious procastination problems I have).

    Thank you soooo much, it was just what I was looking for.

    • toeholds

      You’re welcome – if there are other LaTeX/chemistry questions, feel free to ask – I’ve probably bumped all the walls at least once by now. Additionally, if you happen to be a python-enabled synthetic chemist,

  6. Boudewijn

    Hoi Jon,

    We spoke before about some problems I had with schemeref and compounds and stiff. The tip for \compound* works greet. I just saw you made a script to auto make al list of alle the compounds,I will try it out asap (your website is doing a bit strange so I can not download it, probally due to an updat). But anyway I have two other questions for you:
    1) Can I reset the counte of the compound some how, so that every chapter starts with compound 1?
    2) when a start a new chapter and I list all my compound before the section I get at strange error: \unskip not allowed in vertical mode (or something like that) can you tell me how to fix it? Be because I need to put the \compound* list in the beginnening of the document right?

    Thanx again,

    \BoudewijnD

    • Jon

      Hi Boudewijn! The link was non-functional because the drag-drop doesn’t work when integrated it as a WordPress page, and in trying to fix it I managed to break the code completely (argh! I’ve currently restored the link, but it’s pointing to an older, standalone version. That version is still good enough to save some time, but is neither pretty nor robust.

      As to your questions, I don’t have a direct answer to either of them (sorry), but here’s some general input. For the first, my hunch is that you may be able to make it cmpd 1.1, 1.2… 2.1, 2.2 etc., but outright labeling multiple compounds as 1 would not be possible. (This also make references from later sections confusing, as your reader won’t know which one you’re referring to.) For the second question – the \compound* list does not need to be at the beginning of an empty chapter. It merely needs to precede the first \compound. (In fact, IIRC, I did recall seeing this, and I think I solved it by moving it to some internal places, likely after the first paragraph.) I’ll be re-familiarizing myself with this over the next few weeks as the thesis gets completely finalized.

  7. Josh

    I am really interested in this package, but when I put
    \usepackage[journal=rsc]{chemstyle}

    I get the following error. I’m using miktek2.9
    Latex error: Symbol font ‘TS1letters’ is not defined

    I’ve tried searching online, but have yet to find a solution to this problem.

    • Jon

      Hi Josh – I haven’t seen this error myself either. Since this is related to symbols, you may want to try throw in a \usepackage{amssymb}. The amssymb (American Math Soc Symbols) package allows LaTeX to use more esoteric symbols not accessible otherwise, and if these are re-used in chemstyle then it may have been a pre-requisite. Jon

      • Josh

        Hi Jon,

        Thanks for the help. I ended up just reinstalling miktek, something must have not gotten installed properly and it was struggling.

        In any case, I’m happily using chemscheme now. I was wondering if you knew of a way for the schemeref to match only a part of the text below a compound.

        For example, I want a label like:

        R = H (#)

        So I tried R = H (TMP1), however it no longer is matched by chemscheme, changing to simply TMP1 works. Alternatively, a way to temporarily modify the function schemerefformat. I could then modify the output of that labeled scheme number.

        Thanks again for the great post about this package. It is proving very helpful!

        • Jon

          Good to hear it working out. Chemscheme does not, and cannot, match part of text – AFAIK it’s a full label substitution (using PSTricks). What I did in these cases is simply have “R = H” and “TMP1″ as separate strings, so chemscheme can replace the latter.

          While we’re talking about substitution mechanisms, be careful with font selection. At some point I had labels that were not replaced, and those cases could be resolved by simply replacing it with a font that has no space in the name. So… “times” is great, but “myriad pro” is buggy.

  8. efqi

    Aided me a lot, just what I was looking for : D.

  9. lukemelb

    I can’t seem to get this to work properly. The numbering is perfect in the text but refuses to replace TMP with the appropriate number. I am using PdfLatex in TeXworks.

    Getting the error “Package caption Warning: Unused \cationsetup[subscheme]”

    Not sure if this is the reason for the problem. The graphics are dropping in fine but with TMP still in place. Any suggestions?

    • Jon

      Hi Luke,

      I did a search for the \captionsetup{} warning, but the only things that come up are in German and I don’t grok Deutsch. Since it’s a warning, I suspect it may not be related. How are you preparing the figures? In some of the times when the TMPs are not substituted, it was because I

      (i) used fonts with spaces in them (e.g., “Myriad Pro”, default font for Illustrator) — substituting with “Times” solved it.
      (ii) had the TMP broken up by Illustrator during the saving process, so that when it’s reopened as an EPS, the grouping became “TM”, and “P” separately.

      Have you tried taking the bundle I have here for a typeset? It’s pretty much a minimal setup so it would be a easy starting point for troubleshoot.

      • lukemelb

        Managed to sort out my labeling problem – was mostly a font issue but I am now looking into subnumbering and hitting a bit of a wall.

        What is the best method for consistent numbering and sublabeling of compounds within the document and schemes?

        Everything I have tried so far (using chemstyle/chemscheme and bpchem) ends up numbering incorrectly.

        Eg. numbering restarts for anything with a sub-label i.e. 1a instead of 3a, irrespective of whether the parent has already been defined as “3”

        • Jon

          Hi Luke – I have not used sub-labeling or bpchem, so I have no useful advice here. Does the usual trick of pre-defining compounds with compound*{} work here?

  10. Clemens

    Hi Jon,

    You say in your article, that “chemstyle provides a behind-the-scenes mechanism to automatically track down the numbering of compounds”.
    Actually that isn’t true: neither chemscheme nor chemstyle do the numbering. Instead either the chemcompounds package (default) or the bpchem package is used. I believe chemstyle has a package option to decide, which one you want to use.

    Regards,
    Clemens

  11. Kevyclicy

    very interesting, thanks

  12. Websites

    Aided me a lot, just what I was searching for : D.

  13. Tobi

    Hi
    thank you for this post, it already helped me a lot. Still, I cannot get the package to properly work at substituting the labels in .eps-schemes. After some lot of puzzling, I am so far that the TMP-labels actually are replaced by the correct numbers, but at the same time all other letters in the .eps are removed, such as the various atoms in my structures. Do you have any idea what I could do? I use MiKTeX 2.9 on Win7 and, for all I know, have all the required packages installed along with the newest perl and postscript versions

    • Jon

      There’s many moving parts in this LaTeX-chemstyle machine, so troubleshooting from afar is difficult. I see there being three parallel things that we can try: first is for you to try typesetting the chemstyle tex file documentation; then the test package I have here and see if it works properly. (If these fail, that would suggest your install is wonky.) Thirdly, you can try sending me a minimalist testing .tex/.eps package (and its pdf output; my email is jkwchui .at. uvic .dot. ca). With these I can do a typeset on my machine and compare our outputs.

      • Tobi

        Thanks for the quick answer. On making the minimum working script I noticed that the problem was not with the script or the package but with the .PDF-viewer that comes with TeXworks. I don’t know why, but somehow this viewer trashes the letters in the schemes the second time it views the file. When I open the typeset .PDF with another program, everything is fine…

  14. Alexander

    Unfortunately I’m unable to make this package work with auto-pst-pdf.
    This option to keep track of compound-numbering without having to edit the images yourself sounds really perfect, but I can’t get it to work in pdflatex.

    It seems I will have to do this manually :(

    • Alexander

      Ok, I still had usepackage[pdftex]{color,graphicx} in my preamble that was messing things up, so now it’s working pretty well.

      One thing though is I am wondering how you could ‘recycle’ compound-numbers through different schemes. Say I have the same starting-material in different schemes, how can I get the same number for these? I found out just placing the same TMP1 in the eps doesn’t work.
      Any advice on this?

      • Jon

        Hi Alexander. I was going through the comments chronologically – I see that you’ve resolved the original problem. Good stuff.

        The compound number is linked to the compound-ID you give to it. So for example, you had benzene in the first diagram, with the TMP1 underneath it; the code to include it would be something along the lines of (all that follows % is comment):

        begin{scheme}[htb] % starts the scheme, placing it “here”, “top”, or “bottom”, in that sequence
        begin{center} % center the scheme
        schemeref[TMP1]{benzene} % Find TMP1 in the scheme, and substitute it with a number. If “benzene” exists already, re-use that number; if “benzene” does not exist, then take the latest number, add one to it, and assign that number to “benzene”. Next time LaTeX sees “benzene” in a schemeref it will thus replace whatever is within [TMPx] with the proper number for benzene.**
        includegraphics[width=textwidth]{Schemes/synthesis.eps} % include the scheme, fit to width of text
        end{center}
        end{figure}

        The next time you need to refer to benzene, for example, in the next figure where the associated tag for benzene happens to be TMP34, you may do something like:

        begin{scheme}[htb]
        begin{center}
        schemeref[TMP34]{benzene} % As long as the tag remains the same, it will make the correct substitution.
        includegraphics[width=textwidth]{Schemes/anothersynthesis.eps}
        end{center}
        end{figure}

        Note that the compound-IDs are case-sensitive and white-space sensitive. It must be EXACT. For my thesis I had compounds with IDs like “BocAn-8-Ar-8-BocAn”, and yea, it’s different when it’s named “BOCAn-8-Ar-8-BOCAn”. Learnt this the painful way :P

        ** If you’ve kept the ID the same but it’s assigning inconsistent numbers, try the compound*{} command mentioned in one of the addendum of the post.

        • Alexander

          Yes, I figured this out by trial and error. That you need to repeat the schemeref{} line in the new scheme in order to recycle the number.

          Well, one ‘bug’ that I encountered is that the numbering for the schemes and the compound{}-references go somewhat independent. It seems that you _have_ to refer to each compound in your text, or the numbering in the text will start to diverge from the schemes.

          • Jon

            Hi Alexander – this is where the block of compound*{} comes in. When you’re done with the main-text and is reasonably sure that the order will stay, go through the .tex and pick out the tags inside the schemeref[TMP]{ApplePie} (“ApplePie” in this case). Add those to a list of compound*{} — the addition of a * at the end of the command suppresses the printing. I’ve built a tool to automate this process — drag-and-drop a .tex and it generates the compound*{} block — see link in addendum. (I think the version on here still have an extra comma by the end that needs to be manually deleted.)

            The other possibility is that the numbers are simply not yet synchronized. IIRC the (final) numbering gets generated in the 2nd typeset pass — during the first and second pass some reference makes use of the previous, outdated numbering. At some point I automated my typesets to do three-in-a-row; that, and the compound*{} block, tends to get everything right.

    • Jon

      This is abit strange – I exclusively use pdfLaTeX as the typeset engine (no DVI). Have you tried using the minimalist example? That will help seeing what’s the hiccup.

  15. Alexander

    Ok, another thing I ran in to:

    I have two schemes that are fairly narrow and I want to place them next together as subfloats, but I haven’t been able to after some tweaking.
    Since a scheme should be a floating environment, this should be possible right?

    Maybe you know how this can be done?

    • Jon

      I’m guessing so, but I’ve never needed schemes that sit together. That said, are you using the “subfig” subfigure package? (That was what I used for figures.) And then, how does it fail?

      • Alexander

        Yes I use subfig, I’ve messed with it some more and now figured it out! (a good night sleep seems to help these kind of things ;) )

        I can just do:

        begin{scheme}
        centering
        schemeref[TMP1]{cmpd:first}
        subfloat[This left.]{label{scheme:first}includegraphics[]{./path-to-first.eps}{}}
        schemeref[TMP2]{cmpd:second}
        subfloat[This right.]{label{scheme:second}includegraphics[]{./path-to-second.eps}{}}
        caption{Both compound-schemes next to each other.}
        label{scheme:both}
        end{scheme}

        :)

  16. Daniel

    Hello!

    Thanks for the explanation. I would like two track two different types of compounds. Is that possible with chemstyle??

    Lets say i want to number all aromatic compounds with Roman numerals and all non aromatic with normal Arabic numerals?

    so they don’t get mixed together. Like 1 2 I 3 II 4 5 III 6 7 IV having a total of 11 compounds

    cheers, Daniel

    • Jon

      Apologies for the late response. As far as I know, mix lettering is impossible with chemscheme.

      Chemscheme can, however, keep track of compounds with “sub-compounds”, such as 1a, 1b, […], 2a, 2b, […]. This may work for you: 1 = aromatic, 2 = non-aromatic. To do this, you’ll need to explicitly as chemscheme to use “bpchem” instead of “chemcompound” as the structure ledger. (See chemscheme documentation for more details.)

  17. Alexander

    Hmm, as of late I’m having an issue with (I think) auto-pst-pdf.

    In my schemes I have a number of subscripted characters (say for H2O). However after compiling to pdf the strings with these numbers are concatenated with the subscripted number overlayed on the normal scripts (HO with 2 right over the H).

    As far as I can tell my versions for chemstyle and auto-pst-pdf haven’t changed and I’m still using my old eps files (made in chemdraw). Any idea what could be causing this? (perhaps some other package or setting …)

    • Alexander

      Hmm, I actually backtracked it to this command in my preamble:

      renewcommand{rmdefault}{phv}

      Which sets the default font family to helvetica.
      It’s really odd that this besically destroys any subscripted text in schemes (mostly numbers in chemical formulae).

      • Jon

        Apologies for the late response.

        I have encountered something similar with other packages. I write exams with the “exam” class, which provides simple syntax for labeling the correct answer with multiple choices. With the exam class, in answer keys, the whole answer should be bolded, but this bolding is broken with non-default font subscripts. I half-heartedly chased down to that point but gave up eventually.

        With your problem there might be a workaround. Are there macros that can change the font/formatting for the text that’s not working? If there are, you may be able to specify the default font family for *that* text-only.

  18. Nala

    Thank you for this nice explanation.
    Still I have a problem (might be because I’m an absolute beginner in termes of Latex)
    -> I’m using MikTex and TeXStudio.
    -> when I add the usepackage [runs=2]{auto-pst-pdf} an error occurs, but when I put usepackage{epstopdf} I can add my .eps files to the document.
    -> but now TexStudio doesn’t substitute the TMP lables to numbers, but only gives some caption above the graphic, saying: TMP1>]TMP2>]

    Can you help me?

    • Jon

      Hi Nala,

      When you say, “an error occurs” for {auto-pst-pdf}, is that the write18 error, or something else? The LaTeX engine needs permission before it can (execute the commands to) substitute within postscript files (ps/eps), but it does not need to do that if it’s just embedding them (which I think is what epstopdf does). Without knowing what the specific error is, it’s quite difficult to troubleshoot.

      Jon

      • Nala

        Hi Jon.

        Thanks for the quick response! And sorry about the unclear question.
        When I put {auto-pst-pdf} in the preamble I get an error in the line I want to out my first scheme.

        It says: auto-pst-pdf: . Or trun off auto-pst-pdf.}
        For the line I put includegraphics-command it gives me the error: Package pdftex.def Error: File ‘MyTexFileName-pics.pdf’ not found…1.0textwidth{Schemes/MySchemeName.eps}
        Also the ce-command is not accepted anymore…

        What else should I print here?

        Nala

  19. kirti

    Is it possible to insert these structures in a latex table, while keeping the compound numbering?

Trackbacks/Pingbacks

  1. Chemistry with LaTeX « Polariton - [...] http://toeholds.wordpress.com/2010/03/23/writing-chemistry-with-latex-33/ [...]

Leave a Reply