Which encoding used in LaTeX sources to convert PDF file accents to the doc format correctly?

1,235

Maybe you could try these packages:

\usepackage{cmap}               % Mapear caracteres especiais no PDF
\usepackage{lmodern}            % Usa a fonte Latin Modern          
\usepackage[T1]{fontenc}        % Selecao de codigos de fonte.
\usepackage[utf8]{inputenc}

And also overleaf.

Sorry, I'm late.

Share:
1,235

Related videos on Youtube

ricardoramos
Author by

ricardoramos

Updated on June 27, 2020

Comments

  • ricardoramos
    ricardoramos over 3 years

    I have the following configuration file in my LaTeX source:

    % !TEX encoding = UTF-8 Unicode
    \documentclass[a4paper,12pt]{article}
    
    \usepackage{lmodern}
    \usepackage[T1]{fontenc}
    \usepackage[utf8]{inputenc}
    

    After generating the PDF file, I tried to convert it to Microsoft Word (.doc) format in several sites, but the one closest to the expected result was:

    http://www.pdfonline.com/pdf-to-word-converter/

    By doing the conversion of PDF format to doc, the final result was as follows:

    enter image description here

    Accents are not converted correctly, please correct me if I'm wrong, I believe the error of conversion was because of the utf-8 encoding that I used on my source file in latex.

    Therefore, is there any package in LaTeX that facilitates the conversion of pdf file generated from LaTeX source for doc format without disturbing accent?

    • David Carlisle
      David Carlisle almost 8 years
      That looks like a fault in the convertor but you could try \usepackage[LY1]{fontenc} which is closer to latin1 than the T1 encoding. (The input encoding has no affect on the output pdf)
    • ricardoramos
      ricardoramos almost 8 years
      Hello @David Carlisle, thank you so much for your help, I tested here, but still not working :(
    • cfr
      cfr over 5 years
      You don't show the input which produces the problematic output. How are you converting the PDF to DOC? Generally, people convert TEX to DOC, perhaps using ODT or XML as an intermediate stage.
    • michal.h21
      michal.h21 over 5 years
      You will loose all document structure (sections, tables, footnotes, math) when you convert PDF to DOC, wrong characters are just top of an iceberg. As @cfr said, use convertor which can produce usable file directly. For example tex4ht can produce an ODT file using make4ht -f odt filename.tex. The ODT file can be then converted to Word by LibreOffice.
    • cfr
      cfr over 5 years
      @michal.h21 That's how I've always done it, too. At least, that's how I've done it successfully. Converting the other way is messier, I find. Much messier.