View Issue Details

IDProjectCategoryView StatusLast Update
0038624FPCFCLpublic2021-03-15 23:57
ReporterDo-wan Kim Assigned ToMichael Van Canneyt  
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionfixed 
Product Version3.3.1 
Fixed in Version3.3.1 
Summary0038624: Not processing "\u" characters in JSON String correctly with 4bytes unicode.
DescriptionJson \u character conversion doesn't work correctly with 4bytes unicode.

work
json 2 | 2 | 2 | 2 | 2 -> unicode 2 | 2 | 2 | 2 | 2

fail
json 2 | 4 | 2 | 2 | 2 -> unicode 2 | 2 | 2 | 2

Steps To Reproduceprocedure TForm1.Button1Click(Sender: TObject);
var
    str : String;
    u : UnicodeString;
    json : TJsonObject;
    l : Integer;
begin
    str := '{"name":"\u95e8\ud867\ude3d\u88ab\u8111\u5b50\u6324\u574f\u4e86"}';
    json := GetJson(str) as TJsonObject;

    u:=UTF8Decode(json.Get('name',''));
    l:=Length(u);
    json.Free;

    if l<>9 then
      ShowMessage(Format('%s %d',[u,l]))
    else
    ShowMessage('ok');
end;

TagsNo tags attached.
Fixed in Revision48980
FPCOldBugId
FPCTarget4.0.0
Attached Files

Activities

Do-wan Kim

2021-03-15 10:22

reporter   ~0129680

38624.patch (3,141 bytes)   
Index: packages/fcl-json/src/jsonscanner.pp
===================================================================
--- packages/fcl-json/src/jsonscanner.pp	(revision 48976)
+++ packages/fcl-json/src/jsonscanner.pp	(working copy)
@@ -354,20 +354,29 @@
                         Error(SErrInvalidCharacter, [CurRow,CurColumn,FTokenStr[0]]);
                       end;
                       end;
-                    // ToDo: 4-bytes UTF16
                     if u1<>0 then
                       begin
-                      if (joUTF8 in Options) or (DefaultSystemCodePage=CP_UTF8) then
-                        S:=Utf8Encode(WideString(WideChar(u1)+WideChar(u2))) // ToDo: use faster function
-                      else
-                        S:=String(WideChar(u1)+WideChar(u2)); // WideChar converts the encoding. Should it warn on loss?
-                      u1:=0;
-                      end
-                    else
-                      begin
-                      S:='';
-                      u1:=u2;
-                      end
+                        if ((u1>=$D800) and (u1<=$DBFF)) and 
+						   ((u2>=$DC00) and (u2<=$DFFF)) then
+                          begin
+                            // 4bytes
+                            if (joUTF8 in Options) or (DefaultSystemCodePage=CP_UTF8) then
+                              S:=Utf8Encode(WideString(WideChar(u1)+WideChar(u2))) // ToDo: use faster function
+                            else
+                              S:=String(WideChar(u1)+WideChar(u2)); // WideChar converts the encoding. Should it warn on loss?
+                            u2:=0;
+                          end
+                        else
+                          begin
+                          // 2bytes
+                          if (joUTF8 in Options) or (DefaultSystemCodePage=CP_UTF8) then
+                            S:=Utf8Encode(WideString(WideChar(u1))) // ToDo: use faster function
+                          else
+                            S:=String(WideChar(u1)); // WideChar converts the encoding. Should it warn on loss?
+                          end;
+                      end else
+                        S:='';
+                    u1:=u2;
                     end;
               #0  : Error(SErrOpenString,[FCurRow]);
             else
Index: packages/fcl-json/tests/testjsondata.pp
===================================================================
--- packages/fcl-json/tests/testjsondata.pp	(revision 48976)
+++ packages/fcl-json/tests/testjsondata.pp	(working copy)
@@ -4038,6 +4038,7 @@
   // Glowing star in UTF8
   GlowingStar = #$F0#$9F#$8C#$9F;
   Chinese = #$95e8#$88ab#$8111#$5b50#$6324#$574f#$4e86;
+  Chinese4b = #$95e8#$d867#$de3d#$88ab#$8111#$5b50#$6324#$574f#$4e86;
 
 begin
   TestFrom('','');
@@ -4082,6 +4083,7 @@
   TestFrom('\u0041\u0042\u0043','ABC');
   TestFrom('\u0041\u0042\u0043\u0044','ABCD');
   TestFrom('\u95e8\u88ab\u8111\u5b50\u6324\u574f\u4e86',Utf8Encode(Chinese));
+  TestFrom('\u95e8\ud867\ude3d\u88ab\u8111\u5b50\u6324\u574f\u4e86',Utf8Encode(Chinese4b));
 end;
 
 procedure TTestJSONString.TestStringToJSONString;
38624.patch (3,141 bytes)   

Michael Van Canneyt

2021-03-15 12:37

administrator   ~0129686

Last edited: 2021-03-15 12:38

View 2 revisions

I'm sorry to report there is still something wrong with your patch. It breaks all other cases:

List of failures:
  Failure:
    Message: TTestJSONString.TestJSONStringToString: "JSONStringToString('\u95e8\ud867\ude3d\u88ab\u8111\u5b50\u6324\u574f\u4e86')='门𩸽被脑子挤坏了'" expected: <门𩸽被脑子挤坏了> but was: <门被脑子挤坏了>
    Exception class: EAssertionFailedError
    Exception message: "JSONStringToString('\u95e8\ud867\ude3d\u88ab\u8111\u5b50\u6324\u574f\u4e86')='门𩸽被脑子挤坏了'" expected: <门𩸽被脑子挤坏了> but was: <门被脑子挤坏了>
        at $00000000004794FB
  Failure:
    Message: TTestReader.TestString: ""\u0041\u0042": Event number 0" expected: <string:AB> but was: <string:BA>
    Exception class: EAssertionFailedError
    Exception message: ""\u0041\u0042": Event number 0" expected: <string:AB> but was: <string:BA>
        at $000000000048D2B2
  Failure:
    Message: TTestJSONConsumerReader.TestString: ""\u0041\u0042": Event number 0" expected: <string:AB> but was: <string:BA>
    Exception class: EAssertionFailedError
    Exception message: ""\u0041\u0042": Event number 0" expected: <string:AB> but was: <string:BA>
        at $000000000048E356
  Failure:
    Message: TTestJSONEventReader.TestString: ""\u0041\u0042": Event number 0" expected: <string:AB> but was: <string:BA>
    Exception class: EAssertionFailedError
    Exception message: ""\u0041\u0042": Event number 0" expected: <string:AB> but was: <string:BA>
        at $000000000048ED8E

Without your patch, all other cases run OK.

Do-wan Kim

2021-03-15 15:17

reporter   ~0129690

There is also conversion procedure in fpjson.pp
38624_fpjson.pp.patch (1,337 bytes)   
Index: packages/fcl-json/src/fpjson.pp
===================================================================
--- packages/fcl-json/src/fpjson.pp	(revision 48948)
+++ packages/fcl-json/src/fpjson.pp	(working copy)
@@ -1012,12 +1012,21 @@
                    Raise EJSON.Create('Invalid unicode hex code: '+Copy(S,I+1,4));
                 Inc(I,4);
                 if (U1<>0) then
+                begin
+                  if ((U1>=$D800) and (U1<=$DBFF)) and
+                     ((U2>=$DC00) and (U2<=$DFFF)) then
                   begin
-                  App:={$IFDEF FPC_HAS_CPSTRING}UTF8Encode({$ENDIF}WideChar(U1)+WideChar(U2){$IFDEF FPC_HAS_CPSTRING}){$ENDIF};
-                  U2:=0;
-                  end
-                else
-                  U1:=U2;
+                    App:={$IFDEF FPC_HAS_CPSTRING}UTF8Encode({$ENDIF}WideChar(U1)+WideChar(U2){$IFDEF FPC_HAS_CPSTRING}){$ENDIF};
+                    U2:=0;
+                  end else
+                  begin
+                    App:={$IFDEF FPC_HAS_CPSTRING}UTF8Encode({$ENDIF}WideChar(U1){$IFDEF FPC_HAS_CPSTRING}){$ENDIF};
+                    Result:=Result+App;
+                    App:='';
+                  end;
+                end else
+                  App:='';
+                U1:=U2;
                 end;
         end;
         if App<>'' then
38624_fpjson.pp.patch (1,337 bytes)   

Michael Van Canneyt

2021-03-15 15:30

administrator   ~0129691

Your patch fails to apply.
After manually applying your changes, the chinese test no longer fails, but the other tests still fail:

List of failures:
  Failure:
    Message: TTestReader.TestString: ""\u0041\u0042": Event number 0" expected: <string:AB> but was: <string:BA>
    Exception class: EAssertionFailedError
    Exception message: ""\u0041\u0042": Event number 0" expected: <string:AB> but was: <string:BA>
        at $000000000048D2B2
  Failure:
    Message: TTestJSONConsumerReader.TestString: ""\u0041\u0042": Event number 0" expected: <string:AB> but was: <string:BA>
    Exception class: EAssertionFailedError
    Exception message: ""\u0041\u0042": Event number 0" expected: <string:AB> but was: <string:BA>
        at $000000000048E356
  Failure:
    Message: TTestJSONEventReader.TestString: ""\u0041\u0042": Event number 0" expected: <string:AB> but was: <string:BA>
    Exception class: EAssertionFailedError
    Exception message: ""\u0041\u0042": Event number 0" expected: <string:AB> but was: <string:BA>
        at $000000000048ED8E

What do you see when you run the testsuite ?

Do-wan Kim

2021-03-15 16:00

reporter   ~0129692

Last edited: 2021-03-15 16:01

View 2 revisions

I forgot fixing jsonscanner.pp

Fix jsonscanner.pp patch and include fpjson.pp patch.
38624_json_unicode4bytes.patch (3,613 bytes)   
Index: packages/fcl-json/src/fpjson.pp
===================================================================
--- packages/fcl-json/src/fpjson.pp	(revision 48948)
+++ packages/fcl-json/src/fpjson.pp	(working copy)
@@ -1012,12 +1012,21 @@
                    Raise EJSON.Create('Invalid unicode hex code: '+Copy(S,I+1,4));
                 Inc(I,4);
                 if (U1<>0) then
+                begin
+                  if ((U1>=$D800) and (U1<=$DBFF)) and
+                     ((U2>=$DC00) and (U2<=$DFFF)) then
                   begin
-                  App:={$IFDEF FPC_HAS_CPSTRING}UTF8Encode({$ENDIF}WideChar(U1)+WideChar(U2){$IFDEF FPC_HAS_CPSTRING}){$ENDIF};
-                  U2:=0;
-                  end
-                else
-                  U1:=U2;
+                    App:={$IFDEF FPC_HAS_CPSTRING}UTF8Encode({$ENDIF}WideChar(U1)+WideChar(U2){$IFDEF FPC_HAS_CPSTRING}){$ENDIF};
+                    U2:=0;
+                  end else
+                  begin
+                    App:={$IFDEF FPC_HAS_CPSTRING}UTF8Encode({$ENDIF}WideChar(U1){$IFDEF FPC_HAS_CPSTRING}){$ENDIF};
+                    Result:=Result+App;
+                    App:='';
+                  end;
+                end else
+                  App:='';
+                U1:=U2;
                 end;
         end;
         if App<>'' then
Index: packages/fcl-json/src/jsonscanner.pp
===================================================================
--- packages/fcl-json/src/jsonscanner.pp	(revision 48948)
+++ packages/fcl-json/src/jsonscanner.pp	(working copy)
@@ -254,7 +254,7 @@
   I : Integer;
   OldLength, SectionLength,  tstart,tcol, u1,u2: Integer;
   C , c2: char;
-  S : String[4];
+  S : String[8];
   Line : String;
   IsStar,EOC: Boolean;
 
@@ -353,20 +353,27 @@
                         Error(SErrInvalidCharacter, [CurRow,CurColumn,FTokenStr[0]]);
                       end;
                       end;
-                    // ToDo: 4-bytes UTF16
                     if u1<>0 then
                       begin
-                      if (joUTF8 in Options) or (DefaultSystemCodePage=CP_UTF8) then
-                        S:=Utf8Encode(WideString(WideChar(u1)+WideChar(u2))) // ToDo: use faster function
-                      else
-                        S:=String(WideChar(u1)+WideChar(u2)); // WideChar converts the encoding. Should it warn on loss?
-                      u1:=0;
-                      end
-                    else
-                      begin
-                      S:='';
-                      u1:=u2;
-                      end
+                        if ((u1>=$D800) and (u1<=$DBFF)) and 
+                           ((u2>=$DC00) and (u2<=$DFFF)) then
+                          begin
+                            // 4bytes
+                            if (joUTF8 in Options) or (DefaultSystemCodePage=CP_UTF8) then
+                              S:=Utf8Encode(WideString(WideChar(u1)+WideChar(u2))) // ToDo: use faster function
+                            else
+                              S:=String(WideChar(u1)+WideChar(u2)); // WideChar converts the encoding. Should it warn on loss?
+                            u2:=0;
+                          end
+                        else
+                          begin
+                          // 2bytes
+                          MaybeAppendUnicode;
+                          S:='';
+                          end;
+                      end else
+                        S:='';
+                    u1:=u2;
                     end;
               #0  : Error(SErrOpenString,[FCurRow]);
             else
38624_json_unicode4bytes.patch (3,613 bytes)   

Michael Van Canneyt

2021-03-15 16:14

administrator   ~0129693

I fixed it myself slightly differently. It will now also raise an error on invalid surrogate pairs.

Do-wan Kim

2021-03-15 17:05

reporter   ~0129694

It need fix on line 381. And commented writeln, it works ok :)

Michael Van Canneyt

2021-03-15 19:07

administrator   ~0129697

You are of course right ! Done in 48983.

Thanks for pointing it out !

Do-wan Kim

2021-03-15 23:57

reporter   ~0129702

Thank you!

Issue History

Date Modified Username Field Change
2021-03-15 02:57 Do-wan Kim New Issue
2021-03-15 10:22 Do-wan Kim Note Added: 0129680
2021-03-15 10:22 Do-wan Kim File Added: 38624.patch
2021-03-15 12:31 Michael Van Canneyt Assigned To => Michael Van Canneyt
2021-03-15 12:31 Michael Van Canneyt Status new => assigned
2021-03-15 12:37 Michael Van Canneyt Status assigned => feedback
2021-03-15 12:37 Michael Van Canneyt FPCTarget => -
2021-03-15 12:37 Michael Van Canneyt Note Added: 0129686
2021-03-15 12:38 Michael Van Canneyt Note Edited: 0129686 View Revisions
2021-03-15 15:17 Do-wan Kim Note Added: 0129690
2021-03-15 15:17 Do-wan Kim File Added: 38624_fpjson.pp.patch
2021-03-15 15:17 Do-wan Kim Status feedback => assigned
2021-03-15 15:30 Michael Van Canneyt Note Added: 0129691
2021-03-15 16:00 Do-wan Kim Note Added: 0129692
2021-03-15 16:00 Do-wan Kim File Added: 38624_json_unicode4bytes.patch
2021-03-15 16:01 Do-wan Kim Note Edited: 0129692 View Revisions
2021-03-15 16:14 Michael Van Canneyt Status assigned => resolved
2021-03-15 16:14 Michael Van Canneyt Resolution open => fixed
2021-03-15 16:14 Michael Van Canneyt Fixed in Version => 3.3.1
2021-03-15 16:14 Michael Van Canneyt Fixed in Revision => 48980
2021-03-15 16:14 Michael Van Canneyt FPCTarget - => 4.0.0
2021-03-15 16:14 Michael Van Canneyt Note Added: 0129693
2021-03-15 17:05 Do-wan Kim Note Added: 0129694
2021-03-15 19:07 Michael Van Canneyt Note Added: 0129697
2021-03-15 23:57 Do-wan Kim Status resolved => closed
2021-03-15 23:57 Do-wan Kim Note Added: 0129702