View Issue Details

IDProjectCategoryView StatusLast Update
0036852FPCPackagespublic2020-04-01 18:56
ReporterAnton Kavalenka Assigned ToMichael Van Canneyt  
PrioritynormalSeverityminorReproducibilityhave not tried
Status closedResolutionfixed 
Product Version3.3.1 
Fixed in Version3.3.1 
Summary0036852: fcl-pdf: Make document info supporting UNICODE
DescriptionCurrently string object (TPdfString) supports only one specification (without any encoding and explicit utf8 to ANSI conversion).
The PDF spec
https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
in chapter 7.9.2.2 Text String Type
described 3 encoding variants

Proposed patch implements UTF16-BE encoding
  
Steps To ReproduceRun the provided test (meta_test.pas) and examine document properties.
There should be no Cyrillic in document properties. With the patch - the should appear.
TagsNo tags attached.
Fixed in Revision44435
FPCOldBugId
FPCTarget4.0.0
Attached Files

Activities

Anton Kavalenka

2020-03-30 17:58

reporter  

fppdf.diff (1,668 bytes)   
--- /projects/fpc/packages/fcl-pdf/src/fppdf.pp	2019-11-15 22:05:12.001177469 +0300
+++ /projects/ulinuxg/fppdf.pp	2020-03-30 18:33:30.252636632 +0300
@@ -265,11 +265,11 @@
 
   TPDFString = class(TPDFAbstractString)
   private
-    FValue: AnsiString;
+    FValue: String;
   protected
     procedure Write(const AStream: TStream); override;
   public
-    constructor Create(Const ADocument : TPDFDocument; const AValue: AnsiString); overload;
+    constructor Create(Const ADocument : TPDFDocument; const AValue: String); overload;
     property    Value: AnsiString read FValue;
   end;
 
@@ -3350,11 +3350,39 @@
 
 { TPDFString }
 
+function oct_str(b:byte):string;
+begin
+  Result:='';
+  repeat
+     Result:=IntToStr(b and $7)+Result;
+     b:=b shr 3;
+  until b=0;
+end;
+
 procedure TPDFString.Write(const AStream: TStream);
 var
-  s: AnsiString;
+  i:integer;
+  w: WideString;
+  s:string;
+  wv:word;
 begin
-  s := Utf8ToAnsi(FValue);
+  w := Utf8Decode(FValue);
+  if (length(fValue)<>length(w)) then // quote
+  begin
+    s:='\376\377'; // UTF-16BE BOM
+    for i:=1 to length(w) do
+    begin
+      wv:=word(w[i]);
+      s:=s+'\'+oct_str(hi(wv));
+      s:=s+'\'+oct_str(lo(wv));
+    end;
+  end else
+  begin
+    if (Pos('(', FValue) > 0) or (Pos(')', FValue) > 0) or (Pos('\', FValue) > 0) then
+      s := InsertEscape(FValue)
+    else
+      s:=fValue;
+  end;
   WriteString('('+s+')', AStream);
 end;
 
@@ -3362,8 +3390,6 @@
 begin
   inherited Create(ADocument);
   FValue := AValue;
-  if (Pos('(', FValue) > 0) or (Pos(')', FValue) > 0) or (Pos('\', FValue) > 0) then
-    FValue := InsertEscape(FValue);
 end;
 
 { TPDFUTF8String }
fppdf.diff (1,668 bytes)   
meta_test.pas (1,077 bytes)   
program meta_test;

{$codepage utf-8}

uses sysutils,fpPDF;

var
  D:TpdfDocument;
  S:TPdfSection;
  P:TPdfPage;

begin
  D:=TpdfDocument.Create(nil);
  try
    D.Infos.Title := 'Урывак з паэмы "Новая Зямля"';
    D.Infos.Author := 'Якуб Колас';
    D.Infos.Producer := 'fcl-pdf';
    D.Infos.ApplicationName := 'нейкі тэст';
    D.Infos.CreationDate := Now;
    D.Infos.KeyWords:='fcl-pdf report';

    D.StartDocument;
    D.AddFont('FreeSans.ttf','FreeSans');

    D.Options := [poPageOriginAtTop,poSubsetFont,poCompressFonts,poCompressImages,poUseImageTransparency];
    S:=D.Sections.AddSection;      
   
    P:=D.Pages.AddPage;
    P.PaperType := ptA4;
    P.UnitOfMeasure := uomPixels;
    P.Orientation:=ppoPortrait;
    S.AddPage(P);

    P.SetFont(0,10);
    P.WriteText(100,100,'Мой родны кут,');
    P.WriteText(100,150,'Як ты мне мілы');
    P.WriteText(100,200,'Забыць цябе');
    P.WriteText(100,250,'Не маю сілы');
  finally
    D.SaveToFile('test.pdf');
    D.Free;
  end;
end.
meta_test.pas (1,077 bytes)   
test.pdf (10,289 bytes)

Anton Kavalenka

2020-03-30 18:09

reporter   ~0121768

FreeSans.ttf should be in current folder

Michael Van Canneyt

2020-03-30 18:20

administrator   ~0121769

You changed the basic string object.

That's not how the API is intended, so this patch will not be applied as it is now.

Instead of changing TPDFString, can you implement a TPDFUTF16String similar as it is done for TPDFUTF8String ?
Same for TPDFUTF8Text, add a TPDFUTF16Text

And similarly add CreateUTF16String etc.
So the user has the choice of how he wants to create/populate his PDF.

Anton Kavalenka

2020-03-30 18:31

reporter   ~0121770

Last edited: 2020-03-30 18:32

View 2 revisions

The method TPDFDocument.CreateInfoEntry; used for meta-info storage
and this method uses strictly
IDict.AddString('Title',Infos.Title);

or the yet another TPDFDocument.CreateInfoEntryUTF16BE; should be implemented?

In turn the TPDFDocument.CreateInfoEntry is called from TpdfDocument.StartDocument,
so the TpdfDocument.StartDocumentUTF16BE should be implemented?

Anton Kavalenka

2020-03-30 18:51

reporter   ~0121772

Next approach with explicit meta-info usage of AddUtf16String
fppdf-2.diff (4,704 bytes)   
--- /projects/fpc/packages/fcl-pdf/src/fppdf.pp	2019-11-15 22:05:12.001177469 +0300
+++ /projects/ulinuxg/fppdf.pp	2020-03-30 19:50:05.078654674 +0300
@@ -273,6 +273,16 @@
     property    Value: AnsiString read FValue;
   end;
 
+  TPDFUTF16String = class(TPDFAbstractString)
+  private
+    FValue: WideString;
+  protected
+    procedure Write(const AStream: TStream); override;
+  public
+    constructor Create(Const ADocument : TPDFDocument; const AValue: WideString); overload;
+    property    Value: WideString read FValue;
+  end;
+
   { TPDFRawHexString }
 
   TPDFRawHexString = class(TPDFDocumentObject)
@@ -592,6 +602,7 @@
     procedure AddInteger(const AKey : String; AInteger : Integer);
     procedure AddReference(const AKey : String; AReference : Integer);
     procedure AddString(const AKey, AString : String);
+    procedure AddUTF16String(const AKey:string;const AString : WideString);
     function IndexOfKey(const AValue: string): integer;
     procedure Write(const AStream: TStream); override;
     procedure WriteDictionary(const AObject: integer; const AStream: TStream);
@@ -1076,6 +1087,7 @@
     function CreateCIDToGIDMap(const AFontNum: integer): integer; virtual;
     procedure CreatePageStream(APage : TPDFPage; PageNum: integer);
     Function CreateString(Const AValue : String) : TPDFString;
+    Function CreateUTF16String(Const AValue : WideString) : TPDFUTF16String;
     Function CreateUTF8String(Const AValue : UTF8String; const AFontIndex: integer) : TPDFUTF8String;
     Function CreateGlobalXRef: TPDFXRef;
     Function AddGlobalXRef(AXRef : TPDFXRef) : Integer;
@@ -3366,6 +3378,53 @@
     FValue := InsertEscape(FValue);
 end;
 
+
+{ TPDFUTF16String }
+
+constructor TPDFUTF16String.Create(Const ADocument : TPDFDocument; const AValue: Widestring);
+begin
+  inherited Create(ADocument);
+  FValue := AValue;
+end;
+
+function oct_str(b:byte):string;
+begin
+  Result:='';
+  repeat
+     Result:=IntToStr(b and $7)+Result;
+     b:=b shr 3;
+  until b=0;
+end;
+
+procedure TPDFUTF16String.Write(const AStream: TStream);
+var
+  i:integer;
+  us:utf8string;
+  s:ansistring;
+  wv:word;
+begin
+  us := Utf8Encode(FValue);
+  if (length(us)<>length(fValue)) then // quote
+  begin
+    s:='\376\377'; // UTF-16BE BOM
+    for i:=1 to length(fValue) do
+    begin
+      wv:=word(fValue[i]);
+      s:=s+'\'+oct_str(hi(wv));
+      s:=s+'\'+oct_str(lo(wv));
+    end;
+  end else
+  begin
+    if (Pos('(', FValue) > 0) or (Pos(')', FValue) > 0) or (Pos('\', FValue) > 0) then
+      s := InsertEscape(FValue)
+    else
+      s:=fValue;
+  end;
+  WriteString('('+s+')', AStream);
+end;
+
+
+
 { TPDFUTF8String }
 
 function TPDFUTF8String.RemapedText: AnsiString;
@@ -4137,6 +4196,11 @@
   AddElement(AKey,Document.CreateString(AString));
 end;
 
+procedure TPDFDictionary.AddUTF16String(const AKey:string;const AString: WideString);
+begin
+  AddElement(AKey,Document.CreateUTF16String(AString));
+end;
+
 function TPDFDictionary.IndexOfKey(const AValue: string): integer;
 var
   i: integer;
@@ -4513,7 +4577,7 @@
   FInfos.Assign(AValue);
 end;
 
-procedure TPDFDocument.SetOptions(AValue: TPDFOptions);
+procedure TPDFDocument.SetOptions(aValue: TPDFOptions);
 begin
   if FOptions=AValue then Exit;
   if (poNoEmbeddedFonts in  aValue) then
@@ -4717,16 +4781,16 @@
   Trailer.AddReference('Info', GLobalXRefCount-1);
   (Trailer.ValueByName('Size') as TPDFInteger).Value:=GLobalXRefCount;
   if Infos.Title <> '' then
-    IDict.AddString('Title',Infos.Title);
+    IDict.AddUTF16String('Title',utf8decode(Infos.Title));
   if Infos.Author <> '' then
-    IDict.AddString('Author',Infos.Author);
+    IDict.AddUTF16String('Author',utf8decode(Infos.Author));
   if Infos.ApplicationName <> '' then
-    IDict.AddString('Creator',Infos.ApplicationName);
-  IDict.AddString('Producer',Infos.Producer);
+    IDict.AddUTF16String('Creator',utf8decode(Infos.ApplicationName));
+  IDict.AddUTF16String('Producer',utf8decode(Infos.Producer));
   if Infos.CreationDate <> 0 then
     IDict.AddString('CreationDate',DateToPdfDate(Infos.CreationDate));
   if Infos.Keywords <> '' then
-    IDict.AddString('Keywords', Infos.Keywords);
+    IDict.AddUTF16String('Keywords',utf8decode(Infos.Keywords));
 end;
 
 procedure TPDFDocument.CreateMetadataEntry;
@@ -5811,6 +5875,11 @@
   Result:=TPDFString.Create(Self,AValue);
 end;
 
+function TPDFDocument.CreateUTF16String(const AValue: WideString): TPDFUTF16String;
+begin
+  Result:=TPDFUTF16String.Create(Self,AValue);
+end;
+
 function TPDFDocument.CreateUTF8String(const AValue: UTF8String; const AFontIndex: integer): TPDFUTF8String;
 begin
   Result := TPDFUTF8String.Create(self, AValue, AFontIndex);
fppdf-2.diff (4,704 bytes)   

Michael Van Canneyt

2020-03-30 18:54

administrator   ~0121773

Add to TPDFOption a new poUTF16info and use that to decide between UTF8 and UTF16.
PDFDictionary will also need a AddString(aKey : string; aValue : Unicodestring);

Anton Kavalenka

2020-03-30 19:25

reporter   ~0121774

Next step with updated test.

Funny enough - nobody warns if options have to be set BEFORE calling StartDocument.
fppdf-3.diff (6,197 bytes)   
--- /projects/fpc/packages/fcl-pdf/src/fppdf.pp	2019-11-15 22:05:12.001177469 +0300
+++ /projects/ulinuxg/fppdf.pp	2020-03-30 20:18:15.141243393 +0300
@@ -70,7 +70,7 @@
   TPDFUnitOfMeasure = (uomInches, uomMillimeters, uomCentimeters, uomPixels);
 
   TPDFOption = (poOutLine, poCompressText, poCompressFonts, poCompressImages, poUseRawJPEG, poNoEmbeddedFonts,
-    poPageOriginAtTop, poSubsetFont, poMetadataEntry, poNoTrailerID, poUseImageTransparency);
+    poPageOriginAtTop, poSubsetFont, poMetadataEntry, poNoTrailerID, poUseImageTransparency,poUTF16info);
   TPDFOptions = set of TPDFOption;
 
   EPDF = Class(Exception);
@@ -273,6 +273,16 @@
     property    Value: AnsiString read FValue;
   end;
 
+  TPDFUTF16String = class(TPDFAbstractString)
+  private
+    FValue: UnicodeString;
+  protected
+    procedure Write(const AStream: TStream); override;
+  public
+    constructor Create(Const ADocument : TPDFDocument; const AValue: UnicodeString); overload;
+    property    Value: UnicodeString read FValue;
+  end;
+
   { TPDFRawHexString }
 
   TPDFRawHexString = class(TPDFDocumentObject)
@@ -592,6 +602,7 @@
     procedure AddInteger(const AKey : String; AInteger : Integer);
     procedure AddReference(const AKey : String; AReference : Integer);
     procedure AddString(const AKey, AString : String);
+    procedure AddString(const AKey:string;const AString : UnicodeString);
     function IndexOfKey(const AValue: string): integer;
     procedure Write(const AStream: TStream); override;
     procedure WriteDictionary(const AObject: integer; const AStream: TStream);
@@ -1051,6 +1062,7 @@
     function CreateContentsEntry(const APageNum: integer): integer;virtual;
     function CreateCatalogEntry: integer;virtual;
     procedure CreateInfoEntry;virtual;
+    procedure CreateInfoEntryUTF16;virtual;
     procedure CreateMetadataEntry;virtual;
     procedure CreateTrailerID;virtual;
     procedure CreatePreferencesEntry;virtual;
@@ -1076,6 +1088,7 @@
     function CreateCIDToGIDMap(const AFontNum: integer): integer; virtual;
     procedure CreatePageStream(APage : TPDFPage; PageNum: integer);
     Function CreateString(Const AValue : String) : TPDFString;
+    Function CreateUTF16String(Const AValue : UnicodeString) : TPDFUTF16String;
     Function CreateUTF8String(Const AValue : UTF8String; const AFontIndex: integer) : TPDFUTF8String;
     Function CreateGlobalXRef: TPDFXRef;
     Function AddGlobalXRef(AXRef : TPDFXRef) : Integer;
@@ -3366,6 +3379,53 @@
     FValue := InsertEscape(FValue);
 end;
 
+
+{ TPDFUTF16String }
+
+constructor TPDFUTF16String.Create(Const ADocument : TPDFDocument; const AValue: Unicodestring);
+begin
+  inherited Create(ADocument);
+  FValue := AValue;
+end;
+
+function oct_str(b:byte):string;
+begin
+  Result:='';
+  repeat
+     Result:=IntToStr(b and $7)+Result;
+     b:=b shr 3;
+  until b=0;
+end;
+
+procedure TPDFUTF16String.Write(const AStream: TStream);
+var
+  i:integer;
+  us:utf8string;
+  s:ansistring;
+  wv:word;
+begin
+  us := Utf8Encode(FValue);
+  if (length(us)<>length(fValue)) then // quote
+  begin
+    s:='\376\377'; // UTF-16BE BOM
+    for i:=1 to length(fValue) do
+    begin
+      wv:=word(fValue[i]);
+      s:=s+'\'+oct_str(hi(wv));
+      s:=s+'\'+oct_str(lo(wv));
+    end;
+  end else
+  begin
+    if (Pos('(', FValue) > 0) or (Pos(')', FValue) > 0) or (Pos('\', FValue) > 0) then
+      s := InsertEscape(FValue)
+    else
+      s:=fValue;
+  end;
+  WriteString('('+s+')', AStream);
+end;
+
+
+
 { TPDFUTF8String }
 
 function TPDFUTF8String.RemapedText: AnsiString;
@@ -4137,6 +4197,11 @@
   AddElement(AKey,Document.CreateString(AString));
 end;
 
+procedure TPDFDictionary.AddString(const AKey:string;const AString: UnicodeString);
+begin
+  AddElement(AKey,Document.CreateUTF16String(AString));
+end;
+
 function TPDFDictionary.IndexOfKey(const AValue: string): integer;
 var
   i: integer;
@@ -4513,7 +4578,7 @@
   FInfos.Assign(AValue);
 end;
 
-procedure TPDFDocument.SetOptions(AValue: TPDFOptions);
+procedure TPDFDocument.SetOptions(aValue: TPDFOptions);
 begin
   if FOptions=AValue then Exit;
   if (poNoEmbeddedFonts in  aValue) then
@@ -4708,10 +4773,8 @@
 end;
 
 procedure TPDFDocument.CreateInfoEntry;
-
 var
   IDict: TPDFDictionary;
-
 begin
   IDict:=CreateGlobalXRef.Dict;
   Trailer.AddReference('Info', GLobalXRefCount-1);
@@ -4726,9 +4789,30 @@
   if Infos.CreationDate <> 0 then
     IDict.AddString('CreationDate',DateToPdfDate(Infos.CreationDate));
   if Infos.Keywords <> '' then
-    IDict.AddString('Keywords', Infos.Keywords);
+    IDict.AddString('Keywords',Infos.Keywords);
 end;
 
+procedure TPDFDocument.CreateInfoEntryUTF16;
+var
+  IDict: TPDFDictionary;
+begin
+  IDict:=CreateGlobalXRef.Dict;
+  Trailer.AddReference('Info', GLobalXRefCount-1);
+  (Trailer.ValueByName('Size') as TPDFInteger).Value:=GLobalXRefCount;
+  if Infos.Title <> '' then
+    IDict.AddString('Title',utf8decode(Infos.Title));
+  if Infos.Author <> '' then
+    IDict.AddString('Author',utf8decode(Infos.Author));
+  if Infos.ApplicationName <> '' then
+    IDict.AddString('Creator',utf8decode(Infos.ApplicationName));
+  IDict.AddString('Producer',utf8decode(Infos.Producer));
+  if Infos.CreationDate <> 0 then
+    IDict.AddString('CreationDate',DateToPdfDate(Infos.CreationDate));
+  if Infos.Keywords <> '' then
+    IDict.AddString('Keywords',utf8decode(Infos.Keywords));
+end;
+
+
 procedure TPDFDocument.CreateMetadataEntry;
 var
   lXRef: TPDFXRef;
@@ -5465,7 +5549,10 @@
   CreateRefTable;
   CreateTrailer;
   FCatalogue:=CreateCatalogEntry;
-  CreateInfoEntry;
+  if poUTF16Info in Options then
+    CreateInfoEntryUTF16
+  else
+    CreateInfoEntry;
   if poMetadataEntry in Options then
     CreateMetadataEntry;
   if not (poNoTrailerID in Options) then
@@ -5811,6 +5898,11 @@
   Result:=TPDFString.Create(Self,AValue);
 end;
 
+function TPDFDocument.CreateUTF16String(const AValue: UnicodeString): TPDFUTF16String;
+begin
+  Result:=TPDFUTF16String.Create(Self,AValue);
+end;
+
 function TPDFDocument.CreateUTF8String(const AValue: UTF8String; const AFontIndex: integer): TPDFUTF8String;
 begin
   Result := TPDFUTF8String.Create(self, AValue, AFontIndex);
fppdf-3.diff (6,197 bytes)   
meta_test-2.pas (1,095 bytes)   
program meta_test;

{$codepage utf-8}

uses sysutils,fpPDF;

var
  D:TpdfDocument;
  S:TPdfSection;
  P:TPdfPage;

begin
  D:=TpdfDocument.Create(nil);
  try
    D.Infos.Title := 'Урывак з паэмы "Новая Зямля"';
    D.Infos.Author := 'Якуб Колас';
    D.Infos.Producer := 'fcl-pdf';
    D.Infos.ApplicationName := 'нейкі тэст';
    D.Infos.CreationDate := Now;
    D.Infos.KeyWords:='fcl-pdf report';

    D.Options := [poPageOriginAtTop,poSubsetFont,poCompressFonts,poCompressImages,poUseImageTransparency,poUTF16Info];

    D.StartDocument;
    D.AddFont('FreeSans.ttf','FreeSans');

    
    S:=D.Sections.AddSection;      
   
    P:=D.Pages.AddPage;
    P.PaperType := ptA4;
    P.UnitOfMeasure := uomPixels;
    P.Orientation:=ppoPortrait;
    S.AddPage(P);

    P.SetFont(0,10);
    P.WriteText(100,100,'Мой родны кут,');
    P.WriteText(100,150,'Як ты мне мілы');
    P.WriteText(100,200,'Забыць цябе');
    P.WriteText(100,250,'Не маю сілы');
  finally
    D.SaveToFile('test.pdf');
    D.Free;
  end;
end.
meta_test-2.pas (1,095 bytes)   

Anton Kavalenka

2020-03-30 20:16

reporter   ~0121775

BTW PDF spec tells about ALL the strings. Not only strings in document info.
So approach 0000001 was the best.

Michael Van Canneyt

2020-03-30 21:40

administrator   ~0121783

I suspect your problem could also have been solved more easily by simply adding UTF8 strings to the info dictionary, there was actually no need to add UTF16 support. But UTF16 support is welcome.

I applied the patch, and added TUTF16Text, so now you can use UTF16 for all texts if you so desire.

The reason for not hardcoding UTF16 is twofold:
1. Backwards compatibility.
1. Not all renderers may support UTF16.
Now we give the user complete control over all the options.

Many thanks for your patch and your time. I liked the poem ;-)

Issue History

Date Modified Username Field Change
2020-03-30 17:58 Anton Kavalenka New Issue
2020-03-30 17:58 Anton Kavalenka File Added: fppdf.diff
2020-03-30 17:58 Anton Kavalenka File Added: meta_test.pas
2020-03-30 17:58 Anton Kavalenka File Added: test.pdf
2020-03-30 18:09 Anton Kavalenka Note Added: 0121768
2020-03-30 18:11 Michael Van Canneyt Assigned To => Michael Van Canneyt
2020-03-30 18:11 Michael Van Canneyt Status new => assigned
2020-03-30 18:20 Michael Van Canneyt Note Added: 0121769
2020-03-30 18:31 Anton Kavalenka Note Added: 0121770
2020-03-30 18:32 Anton Kavalenka Note Edited: 0121770 View Revisions
2020-03-30 18:51 Anton Kavalenka File Added: fppdf-2.diff
2020-03-30 18:51 Anton Kavalenka Note Added: 0121772
2020-03-30 18:54 Michael Van Canneyt Note Added: 0121773
2020-03-30 19:25 Anton Kavalenka File Added: fppdf-3.diff
2020-03-30 19:25 Anton Kavalenka File Added: meta_test-2.pas
2020-03-30 19:25 Anton Kavalenka Note Added: 0121774
2020-03-30 20:16 Anton Kavalenka Note Added: 0121775
2020-03-30 21:40 Michael Van Canneyt Status assigned => resolved
2020-03-30 21:40 Michael Van Canneyt Resolution open => fixed
2020-03-30 21:40 Michael Van Canneyt Fixed in Version => 3.3.1
2020-03-30 21:40 Michael Van Canneyt Fixed in Revision => 44435
2020-03-30 21:40 Michael Van Canneyt FPCTarget => 4.0.0
2020-03-30 21:40 Michael Van Canneyt Note Added: 0121783
2020-04-01 18:56 Anton Kavalenka Status resolved => closed