Getting right-to-left output in Arabic and Persian/Farsi with pdfLaTeX
Short answer: Instead of \foreignlanguage{arabic}
and \foreignlanguage{farsi}
, use \AR
and \FR
.
Firstly, the MWE given in the question (at least as of the current revision) is most certainly not Minimal. Here is something shorter:
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[arabic,farsi,english]{babel}
\begin{document}
Arabic \foreignlanguage{arabic}{كحض}
Persian \foreignlanguage{farsi}{فروشگاه}
\end{document}
which produces
where the Arabic and Persian texts are not typeset right-to-left as they should be.
Why this happens is easy to explain: the Unicode representation of the Arabic text كحض consists of
and these three code points are supposed to be placed right-to-left (with additional rules like those for ligatures), giving كحض. Instead, when these characters are naively placed in the order they occur in the input (something like: ك x ح x ض where I used x to separate the characters), you see the kind of incorrect output you see above. (Similarly for Persian.) So what's missing are the instructions to TeX placing the characters in the right order.
This appears to be a bug in the babel package's support for these languages. Some comments on related questions (1, 2) refer to a \textRL
command: loading the babel package with \usepackage[arabic,farsi,english]{babel}
as above indeed defines a \textRL
command, but this has a bug: \show\textRL
shows that it expands to \expandafter \@farsi@R {#1}
so the second language selected overrides the first.
A closer looks at the logs reveals that this \textRL
command comes from arabi
loaded by babel, whose documentation mentions this problem, and says that \textRL
is deprecated. What it instead recommends are \AR
and \FR
for Arabic and Farsi respectively. So we can use those in our MWE:
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[arabic,farsi,english]{babel}
\begin{document}
Arabic \AR{كحض}
Persian \FR{فروشگاه}
\end{document}
which correctly produces:
For the non-MWE in the question, we can just blindly replace \foreignlanguage{arabic}
and \foreignlanguage{farsi}
with \AR
and \FR
respectively, to get this output:
Related videos on Youtube
burr
Updated on May 06, 2020Comments
-
burr over 3 years
I have an English document in which I need to inject a few example words from many different languages, including Arabic and Persian.
I've sorta gotten it to work with the
babel
package and the\foreignlanguage{arabic}{الأحد}
command, but the characters come out garbled, presumably because of the right-to-left (RTL) thing. If I manually reverse all the characters (\foreignlanguage{arabic}{دحألا}
), they apparently do not join together the way they are supposed to... again, because of RTL.The template/style I am forced to use compiles with
pdflatex
but NOTxelatex
. Attempting to use thearabtex
package orbidi
packages breaks the template with a firehose of mind-exploding errors.Any suggestions?
PS: copy-and-pasting the literal UTF-8 encoded tex snippet from my text editor seems to correct itself to RTL in this stackexchange editor, so I'm not sure I can give you the full picture of the problem I'm dealing with... :(
EDIT: here's a MWE...
\documentclass[10pt]{article} \usepackage[usenames]{color} %used for font color \usepackage{amssymb} %maths \usepackage{amsmath} %maths \usepackage{booktabs} \usepackage[utf8]{inputenc} \usepackage[arabic,farsi,bulgarian,greek,magyar,frenchb,german,english]{babel} \usepackage{CJKutf8} \begin{document} \begin{tabular}{p{1.8cm}ccccccc} \toprule Language & $\rho$ & 1 & 2 & 3 & 4 & 5 & 6 \\ \midrule German & 0.568 & weißt & überrascht & teppich & schwäche & kompetent & verbündet \\ Hungarian & 0.506 & tegyünk & recepciós & leírás & oktat & visszaveti & rengette \\ French & 0.500 & envoyer & vélo & randonnée & blessure & mixte & matérialisme \\ Bulgarian & 0.505 & \foreignlanguage{bulgarian}{време} & \foreignlanguage{bulgarian}{болка} & \foreignlanguage{bulgarian}{самотен} & \foreignlanguage{bulgarian}{съдружие} & \foreignlanguage{bulgarian}{надделеят} & \foreignlanguage{bulgarian}{уязвимите} \\ Greek & 0.491 & \foreignlanguage{greek}{πόρτα} & \foreignlanguage{greek}{πατινάζ} & \foreignlanguage{greek}{εξοχή} & \foreignlanguage{greek}{επεξεργάζομαι} & \foreignlanguage{greek}{ορίζοντας} & \foreignlanguage{greek}{εδαφικός} \\ Arabic & 0.512 & \foreignlanguage{arabic}{الأحد} & \foreignlanguage{arabic}{كحض} & \foreignlanguage{arabic}{ةرافسلا} & \foreignlanguage{arabic}{ةظتكملا} & \foreignlanguage{arabic}{يثراك} & \foreignlanguage{arabic}{ددب} \\ Korean & 0.495 & \begin{CJK}{UTF8}{mj}비가\end{CJK} & \begin{CJK}{UTF8}{mj}기억\end{CJK} & \begin{CJK}{UTF8}{mj}무서운\end{CJK} & \begin{CJK}{UTF8}{mj}따라서\end{CJK} & \begin{CJK}{UTF8}{mj}왜곡\end{CJK} & \begin{CJK}{UTF8}{mj}지배하는\end{CJK} \\ Chinese & 0.482 & \begin{CJK}{UTF8}{gbsn}星期三\end{CJK} & \begin{CJK}{UTF8}{gbsn}司机\end{CJK} & \begin{CJK}{UTF8}{gbsn}要求\end{CJK} & \begin{CJK}{UTF8}{gbsn}动态\end{CJK} & \begin{CJK}{UTF8}{gbsn}翻新\end{CJK} & \begin{CJK}{UTF8}{gbsn}锲而不舍\end{CJK} \\ Persian & 0.433 & \foreignlanguage{farsi}{روزنامه} & \foreignlanguage{farsi}{فروشگاه} & \foreignlanguage{farsi}{درد} & \foreignlanguage{farsi}{فکری} & \foreignlanguage{farsi}{تقویت} & \foreignlanguage{farsi}{نزدیکی} \\ Japanese & 0.326 & \begin{CJK}{UTF8}{min}月\end{CJK} & \begin{CJK}{UTF8}{min}スキー\end{CJK} & \begin{CJK}{UTF8}{min}祭り\end{CJK} & \begin{CJK}{UTF8}{min}正直\end{CJK} & \begin{CJK}{UTF8}{min}地質\end{CJK} & \begin{CJK}{UTF8}{min}撤退\end{CJK} \\ \bottomrule \end{tabular} \end{document}
The Arabic and Persian (Farsi) words render incorrectly for me.
UPDATE: Here is what the output looks like for me. As you can see, the Arabic and Persian (Farsi) are reversed.
-
TeXnician over 6 yearsPlease add a MWE to help us help you.
-
burr over 6 years@TeXnician MWE added.
-
TeXnician over 6 yearsFor me it doesn't even compile. Do you (or your editor) use
--interaction=nonstopmode
? -
burr over 6 yearsNo, I just write in an old-school text editor and compile using pdflatex with no special flags. Compiles just fine for me using the MacTeX distribution and also in LaTeXiT.
-
Michael Fraiman over 6 yearsI won't compile for me too
-
ShreevatsaR over 6 yearsThere is a
\documentclass
at the top indicating it is a LaTeX file, but there is no\begin{document}
and\end{document}
. How can it compile? -
Michael Fraiman over 6 years
\ctex
command doesn't work for me. Your MWE should be something we copy-paste and it works. If you want to use multiple languages, then you should use Xe or LuaTeX and packagepolyglossia
. There are plenty of answers on this site on how to use it. -
burr over 6 yearsNot sure why
\ctex
was working in LaTeXiT. The updated MWE should be sufficient now. Unfortunately, I can't usepolyglossia
(I tried, and it could fix this issue) because it conflicts with the template provided by the publisher to whom I'm submitting this work. -
ShreevatsaR over 6 years@burr Great! Of MWE ("Minimal Working Example"), you had an Example, now you got it Working; it only remains to make it Minimal :-) E.g. if the problem is only with Arabic and Persian, why include all the other languages? To illustrate the example, why does it have to be table? Why do you need
amssymb
oramsmath
packages?
-
-
ShreevatsaR over 6 yearsSearching the site for [arabi babel fr] or even [arabi fr] shows only this answer(!), similarly for [babel fr farsi] or [babel fr persian]. Or even [fr persian] or [fr farsi]. So it indeed appears this question has never been answered before, at least in this way!
-
burr over 6 yearsFANTASTIC! Yes, this solves it. I found a lot of the more convoluted solutions you mentioned in my searches, but not this simple (and correct one). As for the non-MWE, I suppose I wanted to make sure that proposed solutions did not break any of the other languages in question, although I agree it's far from "minimal."
-
ShreevatsaR over 6 years@burr Fair enough. And this was pretty obscure to find too… tbh I'm not a great fan of the LaTeX culture of providing "solutions" that will work in the ideal case, without thinking about all the ways things can go wrong… that's why I include long-winded stuff in my answer like all that information about Unicode and how I went about finding something, so that at least it may be informative to somebody even if the exact solution doesn't work for them.