View Issue Details

IDProjectCategoryView StatusLast Update
0032670FPCFCLpublic2018-10-27 19:19
ReporterRenat SuleymanAssigned ToMichael Van Canneyt 
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
PlatformOSLinux, Windows etcOS Version
Product VersionProduct Build 
Target Version3.2.0Fixed in Version3.3.1 
Summary0032670: The function "JSONStringToString" does not decode emoji
DescriptionJSONStringToString function from fpjson module.
If it is to decode JSON data, for example, "\ud83c\udf1f". It's should decoded as "🌟"... But emoji are not decoded correctly... as a result: "��"
Steps To ReproduceS:=JSONStringToString('\ud83c\udf1f');
Additional InformationPerhaps this is a problem not only for this function, but for fpc as a whole.
I found a [temporary] solution. Maybe this will help you solve the problem: if you specify UnicodeString instead of JSONString as the result type of the function, then everything works fine. I.e
-- function JSONStringToString(const S: TJSONStringType): TJSONStringType;
++ function JSONStringToString(const S: TJSONStringType): UnicodeString;
TagsNo tags attached.
Fixed in Revision40058
FPCOldBugId
FPCTarget
Attached Files
  • fpjson.patch (1,313 bytes)
    Index: src/fpjson.pp
    ===================================================================
    --- src/fpjson.pp	(revision 37513)
    +++ src/fpjson.pp	(working copy)
    @@ -717,8 +717,9 @@
     Var
       I,J,L : Integer;
       P : PJSONCharType;
    -  w : String;
    -
    +  w : integer;
    +  shelp : unicodestring;
    +  W2 : unicodechar;
     begin
       I:=1;
       J:=1;
    @@ -725,6 +726,7 @@
       L:=Length(S);
       Result:='';
       P:=PJSONCharType(S);
    +  W2:=#0;
       While (I<=L) do
         begin
         if (P^='\') then
    @@ -743,10 +745,22 @@
               'f' : Result:=Result+#12;
               'r' : Result:=Result+#13;
               'u' : begin
    -                W:=Copy(S,I+1,4);
    +                W:=strtoint('$'+Copy(S,I+1,4));
                     Inc(I,4);
                     Inc(P,4);
    -                Result:=Result+WideChar(StrToInt('$'+W));
    +                if w2=#0 then
    +                  begin
    +                    if (w and $d800)=0 then
    +                      result:=result+widechar(w)
    +                    else
    +                      w2:=widechar(w);
    +                  end
    +                else
    +                  begin
    +                     shelp:=w2+widechar(w);
    +                     result:=result+shelp;
    +                     w2:=#0;
    +                  end;
                     end;
             end;
             end;
    
    fpjson.patch (1,313 bytes)

Activities

Marco van de Voort

2017-11-11 18:26

manager  

fpjson.patch (1,313 bytes)
Index: src/fpjson.pp
===================================================================
--- src/fpjson.pp	(revision 37513)
+++ src/fpjson.pp	(working copy)
@@ -717,8 +717,9 @@
 Var
   I,J,L : Integer;
   P : PJSONCharType;
-  w : String;
-
+  w : integer;
+  shelp : unicodestring;
+  W2 : unicodechar;
 begin
   I:=1;
   J:=1;
@@ -725,6 +726,7 @@
   L:=Length(S);
   Result:='';
   P:=PJSONCharType(S);
+  W2:=#0;
   While (I<=L) do
     begin
     if (P^='\') then
@@ -743,10 +745,22 @@
           'f' : Result:=Result+#12;
           'r' : Result:=Result+#13;
           'u' : begin
-                W:=Copy(S,I+1,4);
+                W:=strtoint('$'+Copy(S,I+1,4));
                 Inc(I,4);
                 Inc(P,4);
-                Result:=Result+WideChar(StrToInt('$'+W));
+                if w2=#0 then
+                  begin
+                    if (w and $d800)=0 then
+                      result:=result+widechar(w)
+                    else
+                      w2:=widechar(w);
+                  end
+                else
+                  begin
+                     shelp:=w2+widechar(w);
+                     result:=result+shelp;
+                     w2:=#0;
+                  end;
                 end;
         end;
         end;
fpjson.patch (1,313 bytes)

Marco van de Voort

2017-11-11 18:28

manager   ~0104006

While looking at the problem the problem is probably that surrogates are converted to utf8 individually.

Solution, if surrogate, hold it till the next \uxxxx and then process.

Please test.

Renat Suleyman

2017-11-11 20:03

reporter   ~0104007

Last edited: 2017-11-11 20:03

View 2 revisions

A new version of the function from patch works correctly. Thank you

Renat Suleyman

2017-11-12 14:07

reporter   ~0104031

But one nuance. I probably have something wrong recompile...
So, It is work fine if I will copy the new function into my module. Like it turns out that as if it is the old version in the fpjson module ... But I completely recompiled my Lazarus application with cleaning, the result is the same. In any case, turn your attention, if I also copy the old version of the function in a separate module, it also does not work.

Luiz Americo

2017-11-13 10:54

developer   ~0104055

The best way to ensure the fix is correct is to writing a new unit test and compare results with and without the changes. fpjson comes with its own set of unit tests at packages/fcl-json/tests/

Marco van de Voort

2017-11-13 12:23

manager   ~0104063

I'll leave that to somebody more familiar with the used unittests suites.

Note that the reasons I wanted review rather than commiting is
(1) json is not really an area I'm deeply familiar with
(2) errorhandling on malformed input. <first surrogate><other char><last surrogate> will parse to <other char><surrogate encoded>. First part of surrogate without second will disappear etc.

Benito van der Zander

2018-09-24 22:42

reporter   ~0110998

Last edited: 2018-09-25 01:11

View 2 revisions

jsonscanner has the same problem with almost the same conversion code in DoFetchToken (although there is also needs to call Utf8Encode 0022310 )

And the errorhandling is a big problem for me. I need a JSON parser that reports any invalid encoding to replace it with a custom invalid marker

It is kind of pointless to have the decoding in the scanner. If this function here would take a pchar and length, the scanner could just output the token as such a pchar, and then the parser could call this function

Benito van der Zander

2018-09-29 00:07

reporter   ~0111066

And what happens if you have an invalid surrogate, followed by a correct one? "\ud83c\ud83c\udf1f" The invalid one should not break the next one, should it?

Michael Van Canneyt

2018-10-27 19:19

administrator   ~0111624

Fixed both scanner and JSONStringToString.

The patch was missing several corner cases, so I did it slightly different.

I'm not adding support for invalidly encoded unicode characters.

If someone wants to implement it and suplies a patch, I will look at it, but I won't put time in that.

Issue History

Date Modified Username Field Change
2017-11-11 09:56 Renat Suleyman New Issue
2017-11-11 18:26 Marco van de Voort File Added: fpjson.patch
2017-11-11 18:28 Marco van de Voort Note Added: 0104006
2017-11-11 20:03 Renat Suleyman Note Added: 0104007
2017-11-11 20:03 Renat Suleyman Note Edited: 0104007 View Revisions
2017-11-12 14:07 Renat Suleyman Note Added: 0104031
2017-11-13 10:54 Luiz Americo Note Added: 0104055
2017-11-13 12:23 Marco van de Voort Note Added: 0104063
2018-09-24 22:42 Benito van der Zander Note Added: 0110998
2018-09-25 01:11 Benito van der Zander Note Edited: 0110998 View Revisions
2018-09-29 00:07 Benito van der Zander Note Added: 0111066
2018-09-30 11:23 Michael Van Canneyt Assigned To => Michael Van Canneyt
2018-09-30 11:23 Michael Van Canneyt Status new => assigned
2018-10-27 19:19 Michael Van Canneyt Fixed in Revision => 40058
2018-10-27 19:19 Michael Van Canneyt Note Added: 0111624
2018-10-27 19:19 Michael Van Canneyt Status assigned => resolved
2018-10-27 19:19 Michael Van Canneyt Fixed in Version => 3.3.1
2018-10-27 19:19 Michael Van Canneyt Resolution open => fixed
2018-10-27 19:19 Michael Van Canneyt Target Version => 3.2.0