Getting right-to-left output in Arabic and Persian/Farsi with pdfLaTeX

2,355

Short answer: Instead of \foreignlanguage{arabic} and \foreignlanguage{farsi}, use \AR and \FR.


Firstly, the MWE given in the question (at least as of the current revision) is most certainly not Minimal. Here is something shorter:

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[arabic,farsi,english]{babel}

\begin{document}
Arabic \foreignlanguage{arabic}{كحض}

Persian \foreignlanguage{farsi}{فروشگاه}
\end{document}

which produces

output from MWE

where the Arabic and Persian texts are not typeset right-to-left as they should be.

Why this happens is easy to explain: the Unicode representation of the Arabic text كحض consists of

and these three code points are supposed to be placed right-to-left (with additional rules like those for ligatures), giving كحض. Instead, when these characters are naively placed in the order they occur in the input (something like: ك x ح x ض where I used x to separate the characters), you see the kind of incorrect output you see above. (Similarly for Persian.) So what's missing are the instructions to TeX placing the characters in the right order.

This appears to be a bug in the babel package's support for these languages. Some comments on related questions (1, 2) refer to a \textRL command: loading the babel package with \usepackage[arabic,farsi,english]{babel} as above indeed defines a \textRL command, but this has a bug: \show\textRL shows that it expands to \expandafter \@farsi@R {#1} so the second language selected overrides the first.

A closer looks at the logs reveals that this \textRL command comes from arabi loaded by babel, whose documentation mentions this problem, and says that \textRL is deprecated. What it instead recommends are \AR and \FR for Arabic and Farsi respectively. So we can use those in our MWE:

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[arabic,farsi,english]{babel}

\begin{document}
Arabic \AR{كحض}

Persian \FR{فروشگاه}
\end{document}

which correctly produces:

fixed MWE output

For the non-MWE in the question, we can just blindly replace \foreignlanguage{arabic} and \foreignlanguage{farsi} with \AR and \FR respectively, to get this output:

output for non-MWE

Share:
2,355

Related videos on Youtube

burr
Author by

burr

Updated on May 06, 2020

Comments

  • burr
    burr over 3 years

    I have an English document in which I need to inject a few example words from many different languages, including Arabic and Persian.

    I've sorta gotten it to work with the babel package and the \foreignlanguage{arabic}{الأحد} command, but the characters come out garbled, presumably because of the right-to-left (RTL) thing. If I manually reverse all the characters (\foreignlanguage{arabic}{دحألا}), they apparently do not join together the way they are supposed to... again, because of RTL.

    The template/style I am forced to use compiles with pdflatex but NOT xelatex. Attempting to use the arabtex package or bidi packages breaks the template with a firehose of mind-exploding errors.

    Any suggestions?

    PS: copy-and-pasting the literal UTF-8 encoded tex snippet from my text editor seems to correct itself to RTL in this stackexchange editor, so I'm not sure I can give you the full picture of the problem I'm dealing with... :(

    EDIT: here's a MWE...

    \documentclass[10pt]{article}
    \usepackage[usenames]{color} %used for font color
    \usepackage{amssymb} %maths
    \usepackage{amsmath} %maths
    \usepackage{booktabs}
    \usepackage[utf8]{inputenc}
    \usepackage[arabic,farsi,bulgarian,greek,magyar,frenchb,german,english]{babel}
    \usepackage{CJKutf8}
    
    \begin{document}
    
    \begin{tabular}{p{1.8cm}ccccccc}
    \toprule
    Language & $\rho$ & 1 & 2 & 3 & 4 & 5 & 6 \\
    \midrule
    German & 0.568 & weißt & überrascht & teppich & schwäche & kompetent & verbündet \\
    Hungarian & 0.506 & tegyünk & recepciós & leírás & oktat & visszaveti & rengette \\
    French & 0.500 & envoyer & vélo & randonnée & blessure & mixte & matérialisme \\
    Bulgarian & 0.505 & \foreignlanguage{bulgarian}{време} & \foreignlanguage{bulgarian}{болка} & \foreignlanguage{bulgarian}{самотен} & \foreignlanguage{bulgarian}{съдружие} & \foreignlanguage{bulgarian}{надделеят} & \foreignlanguage{bulgarian}{уязвимите} \\
    Greek & 0.491 & \foreignlanguage{greek}{πόρτα} & \foreignlanguage{greek}{πατινάζ} & \foreignlanguage{greek}{εξοχή} & \foreignlanguage{greek}{επεξεργάζομαι} & \foreignlanguage{greek}{ορίζοντας} & \foreignlanguage{greek}{εδαφικός} \\
    Arabic & 0.512 & \foreignlanguage{arabic}{الأحد} & \foreignlanguage{arabic}{كحض} & \foreignlanguage{arabic}{ةرافسلا} & \foreignlanguage{arabic}{ةظتكملا} & \foreignlanguage{arabic}{يثراك} & \foreignlanguage{arabic}{ددب} \\
    Korean & 0.495 & \begin{CJK}{UTF8}{mj}비가\end{CJK} & \begin{CJK}{UTF8}{mj}기억\end{CJK} & \begin{CJK}{UTF8}{mj}무서운\end{CJK} & \begin{CJK}{UTF8}{mj}따라서\end{CJK} & \begin{CJK}{UTF8}{mj}왜곡\end{CJK} & \begin{CJK}{UTF8}{mj}지배하는\end{CJK} \\
    Chinese & 0.482 & \begin{CJK}{UTF8}{gbsn}星期三\end{CJK} & \begin{CJK}{UTF8}{gbsn}司机\end{CJK} & \begin{CJK}{UTF8}{gbsn}要求\end{CJK} & \begin{CJK}{UTF8}{gbsn}动态\end{CJK} & \begin{CJK}{UTF8}{gbsn}翻新\end{CJK} & \begin{CJK}{UTF8}{gbsn}锲而不舍\end{CJK} \\
    Persian & 0.433 & \foreignlanguage{farsi}{روزنامه} & \foreignlanguage{farsi}{فروشگاه} & \foreignlanguage{farsi}{درد} & \foreignlanguage{farsi}{فکری} & \foreignlanguage{farsi}{تقویت} & \foreignlanguage{farsi}{نزدیکی} \\
    Japanese & 0.326 & \begin{CJK}{UTF8}{min}月\end{CJK} & \begin{CJK}{UTF8}{min}スキー\end{CJK} & \begin{CJK}{UTF8}{min}祭り\end{CJK} & \begin{CJK}{UTF8}{min}正直\end{CJK} & \begin{CJK}{UTF8}{min}地質\end{CJK} & \begin{CJK}{UTF8}{min}撤退\end{CJK} \\
    \bottomrule
    \end{tabular}
    
    \end{document}
    

    The Arabic and Persian (Farsi) words render incorrectly for me.

    UPDATE: Here is what the output looks like for me. As you can see, the Arabic and Persian (Farsi) are reversed.

    • TeXnician
      TeXnician over 6 years
      Please add a MWE to help us help you.
    • burr
      burr over 6 years
      @TeXnician MWE added.
    • TeXnician
      TeXnician over 6 years
      For me it doesn't even compile. Do you (or your editor) use --interaction=nonstopmode?
    • burr
      burr over 6 years
      No, I just write in an old-school text editor and compile using pdflatex with no special flags. Compiles just fine for me using the MacTeX distribution and also in LaTeXiT.
    • Michael Fraiman
      Michael Fraiman over 6 years
      I won't compile for me too
    • ShreevatsaR
      ShreevatsaR over 6 years
      There is a \documentclass at the top indicating it is a LaTeX file, but there is no \begin{document} and \end{document}. How can it compile?
    • Michael Fraiman
      Michael Fraiman over 6 years
      \ctex command doesn't work for me. Your MWE should be something we copy-paste and it works. If you want to use multiple languages, then you should use Xe or LuaTeX and package polyglossia. There are plenty of answers on this site on how to use it.
    • burr
      burr over 6 years
      Not sure why \ctex was working in LaTeXiT. The updated MWE should be sufficient now. Unfortunately, I can't use polyglossia (I tried, and it could fix this issue) because it conflicts with the template provided by the publisher to whom I'm submitting this work.
    • ShreevatsaR
      ShreevatsaR over 6 years
      @burr Great! Of MWE ("Minimal Working Example"), you had an Example, now you got it Working; it only remains to make it Minimal :-) E.g. if the problem is only with Arabic and Persian, why include all the other languages? To illustrate the example, why does it have to be table? Why do you need amssymb or amsmath packages?
  • ShreevatsaR
    ShreevatsaR over 6 years
    Searching the site for [arabi babel fr] or even [arabi fr] shows only this answer(!), similarly for [babel fr farsi] or [babel fr persian]. Or even [fr persian] or [fr farsi]. So it indeed appears this question has never been answered before, at least in this way!
  • burr
    burr over 6 years
    FANTASTIC! Yes, this solves it. I found a lot of the more convoluted solutions you mentioned in my searches, but not this simple (and correct one). As for the non-MWE, I suppose I wanted to make sure that proposed solutions did not break any of the other languages in question, although I agree it's far from "minimal."
  • ShreevatsaR
    ShreevatsaR over 6 years
    @burr Fair enough. And this was pretty obscure to find too… tbh I'm not a great fan of the LaTeX culture of providing "solutions" that will work in the ideal case, without thinking about all the ways things can go wrong… that's why I include long-winded stuff in my answer like all that information about Unicode and how I went about finding something, so that at least it may be informative to somebody even if the exact solution doesn't work for them.