View Issue Details

IDProjectCategoryView StatusLast Update
0012206FPCDatabasepublic2019-08-12 13:31
ReporterEugen BolzAssigned ToLacaK 
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionreopened 
Product VersionProduct Build 
Target VersionFixed in Version 
Summary0012206: ODBC Query in DBGrid, some Cells are Empty (Encoding)
DescriptionDont know since when this error exists, but all data which are from ODBC ( and some Zeos-Connections too ) dont get converted to UTF8 and non-UTF8 chars dont get displayed on many controls.

This Error is exists also with Oracle ( through ODBC or Zeos connecting to a OracleXE-DB ) the Data dont get displayed.
TagsDatabase, odbc, utf-8, utf8
Fixed in Revision
FPCOldBugId0
FPCTarget-
Attached Files

Relationships

related to 0022095 resolvedLacaK FPC [Patch] odbc unicode support added 
related to 0017376 resolvedLacaK FPC TSQLite3Connection not show whole content for string field when the field is asia language 
related to 0024843 resolvedReinier Olislagers Lazarus Code issue in functions TWideStringField GetAsString & SetAsString 

Activities

2008-09-22 18:24

 

DBGrid.JPG (2,460 bytes)
DBGrid.JPG (2,460 bytes)

2008-09-22 18:24

 

msaccess.JPG (5,666 bytes)
msaccess.JPG (5,666 bytes)

2008-09-22 18:25

 

test.mdb (155,648 bytes)

Eugen Bolz

2008-09-22 18:30

reporter   ~0022362

Last edited: 2008-09-22 18:44

Im using Windows XP Pro SP2
MS-Access 2003
Oracle XE (the newest from Oracle, i think)
FPC Version 2.2.3 (also tried it with 2.3.1)

Jesus Reyes

2008-09-22 18:47

developer   ~0022363

this seems a general problem of ODBC support, it seems the unicode api needs to be used in order to retrieve unicode data.

Eugen Bolz

2008-09-28 20:11

reporter   ~0022471

Thanks to Jesus Reyes and skalogryyz i now have a workaround which is working at me fine

procedure TForm1.Button1Click(Sender: TObject);
begin
  SQLQuery1.Close;
  SQLQuery1.SQL.Text := 'select * from unicodetest';
  SQlQuery1.Open;
  SetupFields(SQlQuery1);
end;

procedure TForm1.SetUpFields(Q: TSqlQuery);
var
  i: Integer;
begin
  if not Q.Active then
    exit;
  for i:=0 to Q.FieldCount-1 do begin
    if Q.Fields[i].DataType=ftString then
    begin
      Q.Fields[i].OnGetText:=@GenericOnGetText;
      Q.Fields[i].OnSetText:=@GenericOnSetText;
    end;
  end;
end;

procedure TForm1.GenericOnGetText(Sender: TField; var aText: string;
  DisplayText: Boolean);
begin
  aText := AnsiToUTF8(Sender.AsString);
end;

procedure TForm1.GenericOnSetText(Sender: TField; const aText: string);
begin
  if Sender<>nil then
    Sender.AsString:= UTF8ToAnsi(aText);
end;


and

function TCustomDBGrid.GetEditText(aCol, aRow: Longint): string;
var
  aField: TField;
begin
  if FDataLink.Active then begin
    aField := GetFieldFromGridColumn(aCol);
    if aField<>nil then begin
      Result := AnsiToUtf8(aField.AsString);
    end;
  end;
end;

with this code the Values get shown in the DBGrid and Editors and the edited value will be also converted back to ansi :)

Dmitry Boyarintsev

2008-09-28 22:03

developer   ~0022473

Jesus Reyes is right! changes should be taken to ODBC components, not DBControls

Jesus Reyes

2008-09-29 08:33

developer   ~0022480

For the record, in r16740, dbgrid use field's GetText to get editing text this means OnGetText would be called and it's not necessary to use the GetEditText technique anymore.

Joost van der Sluis

2009-05-09 16:07

manager   ~0027461

I don't see any bug. Data retrieved from the server is passed directly to the client. When the server/odbc-api uses ansi-strings when passing data to the client, you'll get ansi-strings. Nothing I can do about that. For most servers you can set the characterset in which they send the data. If Access can't do that, you have to convert on your own.

Using the unicode-odbc-api calls will change nothing on that. Only difference is that in that case you must use a ucs16-encoded connection string. The data stays the same. (Although text-fields will be mapped to ftWidestring fields, but see below)

What could be done is using widestrings instead of ansistrings. But that doesn't solve your case either, because widestrings are converted to ansistrings by the widestringmanager before they reach the screen. So you'll have the same problem.

And also in that case, the solution is the same as you suggest in one of the comments. Although it's easier to convert ucs16 to utf8, then ansi to utf8.

Eugen Bolz

2009-05-31 01:56

reporter   ~0028158

so this is "by-design" ?

It is easy to convert ansi to utf8 if u have this litte function:

I think I found it on a russian forum:

function IsUTF8String(S: string): boolean;
var
  WS: WideString;
begin
  WS := UTF8Decode(S);

  Result := (WS <> S) and (WS <> '');
end;

Joost van der Sluis

2009-06-15 14:32

manager   ~0028548

So you suggest that we use a db-server which stores it's data using ansi, then convert it to ucs16 then send it to the client which converts it to utf-8 then convert it to ucs16 again (to see if it's really utf-8) then convert it back to utf-8 again to see if those two are the same, and if they are not, convert the original to utf-8, else nothing has to be done.

Does that sounds logical to you?!?

The proper design is: use the same character set on server and client and you're done. When the access-odbc-client can only handle ansi, you have to convert it on your own.
Adding the 'magic' above is bad design, imho.

Boguslaw Brandys

2009-06-21 19:05

reporter   ~0028673

Ok,now look from the user perspective on that example 'workaround' and imagine an application accessing such MS Access 97 database ,consisted of many forms and queries.I have such requirement now.

At least such workaround should be included into TSQLQuery as a property for example ConvertToUtf8 Boolean which will turn on such 'workaround'
or inside any GUI control connected to TDatasource or in tDatasource, NEVERMIND - just choose the simplest way - but it should NOT be in user code IMHO.

Btw Access 2003 should support UTF8 charset in odbc connection (but Access 97 surely doesn't)

martinus

2009-12-08 15:20

reporter   ~0032808

Last edited: 2009-12-08 15:21

Hi,

In many situations, it is not possible to change the dbms charset on the client side. The most usual reason is because other applications which are not maintained are using the same database and odbc source, but other reasons exist.

We must admit that many developpers do not understand well the charset issues, and what Lazarus shows currently, when using non UTF-8 databases, sounds to them like a lack of functionality or even a bug, and will prevent them to move to this EDI.

I understand, after Joost, that there is no bug, but I think it would be really useful to implement a workaround somewhere, enabling the charset output to be different of UTF-8 and converting it on the fly when necessary. May be it is possible to do things like Eugen shows in 0022471 but more integrated in the components.

I'll try myself to override a TQuery, but I'm probably not skilled enough to find the best solutions.

Regards,

Martinus

LacaK

2010-07-19 14:08

developer   ~0039453

Last edited: 2010-07-19 14:19

I agree here with martinus.

In our case we are using MS SQL Server with database with Slovak_CI_AI collation (Win1250 code page), which is also our Windows non-unicode client codepage (Win1250). So on both sides same codepage.

MS SQL Server ODBC driver does not provide ConnectionString parameter, which can describe/change encoding/codepage used in character data sent (at least I do not know abou them).
So it expect, that character data are encoded in current system codepage (in our case Win1250).

But really they are not, because LCL (so all DB aware components also) uses UTF-8 encoding. To be more clear:
1. when I insert data using DBGrid they are OK when I look at this data only with DBGrid, but when I query data using Query Analyzer(MS query tool), then I see mismasched character data.
2. when I insert data using Query Analyzer and I see at this data using DBGrid, then data are mismasched (because character data are read from ODBC as ansi-strings encoded in current windows codepage, not UTF8 encoded strings which are expected by DBGrid)

So solution will be provide some property/parameter (may be already presented CharSet property of TSQLConnection) which will force character conversion between UTF8 and actual system codepage - UTF8Decode/UTF8Encode or AnsiToUTF8/UTF8ToAnsi.

See also: http://support.microsoft.com/kb/232580

Marco van de Voort

2010-07-19 14:14

manager   ~0039454

LCL chose unilaterally to try to make the GUI UTF-8.

FPC decided to wait with full unicode support till we get a decent UTF-8 type, so that constant manual conversions are not necessary (or at least minimized)

LacaK

2010-07-19 14:29

developer   ~0039457

So now we are in "meantime", and when all string types will be implemented (ansistring, utf8string, widestring, unicodestring) as base types, then also conversions between them will be automatic and also LCL will be adjusted to use correct types where necessary ? (so for example: TEdit.Text property will be declared as "utf8string" and not as "string"?)

dusan.halicky

2011-05-25 09:47

reporter   ~0048563

I have similar problem. DBGrid shows "?" instead of regional characters.

Database Informix
ODBC settings: Server locale sk_SK.852, Client locale sk_SK.1250 (wont accept anything else)
Windows locale: CP1250
Lazarus locale: utf-8

I tried set ODBCConnection1.CharSet to various values but doesn't seems to have effect. How to set it up properly?

LacaK

2011-05-25 11:42

developer   ~0048568

TODBCConnection.CharSet is not used ATM so changing its value does not have any effect.
If your odbc driver supports UTF-8 as "client charset" then set it else you must use some kind of workaround (see comments in this bug report).

Marcos Douglas

2011-09-01 18:15

reporter   ~0051359

Something changed in TODBCConnection.CharSet?
As LacaK2 I also use MSSQL and I have the same problem.

DestinyR

2012-01-25 10:07

reporter   ~0056025

I have a similar problem! Using Firebird 2.5. Russian characters are displayed as "?". I tried various settings Charset - nothing helped.

LacaK

2019-08-12 12:59

developer   ~0117650

TODBCConnection is nowadays capable to pass information in CharSet property to string fields.
i.e. when for example I set Charset:='Windows-1250' and on Windows ODBC driver will supply character data encoded using Ansi charset Windows-1250 then such character data will be properly displayed in TDBGrid.
You must set TODBCConnection.CharSet explicitly if it is different than UTF-8!

Issue History

Date Modified Username Field Change
2008-09-22 18:24 Eugen Bolz New Issue
2008-09-22 18:24 Eugen Bolz File Added: DBGrid.JPG
2008-09-22 18:24 Eugen Bolz Widgetset => Win32
2008-09-22 18:24 Eugen Bolz File Added: msaccess.JPG
2008-09-22 18:25 Eugen Bolz File Added: test.mdb
2008-09-22 18:30 Eugen Bolz Note Added: 0022362
2008-09-22 18:40 Jesus Reyes Project Lazarus => FPC
2008-09-22 18:41 Jesus Reyes FPCOldBugId => 0
2008-09-22 18:41 Jesus Reyes Category Database => Database Components
2008-09-22 18:41 Jesus Reyes Product Version 0.9.25 (SVN) =>
2008-09-22 18:44 Eugen Bolz Note Edited: 0022362
2008-09-22 18:47 Jesus Reyes Note Added: 0022363
2008-09-28 20:11 Eugen Bolz Note Added: 0022471
2008-09-28 21:58 Jonas Maebe Status new => assigned
2008-09-28 21:58 Jonas Maebe Assigned To => Joost van der Sluis
2008-09-28 22:03 Dmitry Boyarintsev Note Added: 0022473
2008-09-29 08:33 Jesus Reyes Note Added: 0022480
2009-05-09 16:07 Joost van der Sluis Note Added: 0027461
2009-05-09 16:07 Joost van der Sluis Status assigned => feedback
2009-05-31 01:56 Eugen Bolz Note Added: 0028158
2009-06-15 14:32 Joost van der Sluis Note Added: 0028548
2009-06-21 19:05 Boguslaw Brandys Note Added: 0028673
2009-12-08 15:20 martinus Note Added: 0032808
2009-12-08 15:21 martinus Note Edited: 0032808
2010-07-19 14:08 LacaK Note Added: 0039453
2010-07-19 14:14 Marco van de Voort Note Added: 0039454
2010-07-19 14:19 LacaK Note Edited: 0039453
2010-07-19 14:29 LacaK Note Added: 0039457
2011-05-25 09:47 dusan.halicky Note Added: 0048563
2011-05-25 11:42 LacaK Note Added: 0048568
2011-09-01 18:15 Marcos Douglas Note Added: 0051359
2012-01-25 10:07 DestinyR Note Added: 0056025
2012-03-12 10:45 Marco van de Voort Summary ODBC Query in DBGrid, some Cells are Empty => ODBC Query in DBGrid, some Cells are Empty (Encoding)
2013-07-16 10:43 Reinier Olislagers Relationship added related to 0022095
2013-08-12 23:03 Juha Manninen Relationship added related to 0024843
2014-03-10 10:58 Reinier Olislagers Tag Attached: Database
2014-03-10 10:58 Reinier Olislagers Tag Attached: odbc
2014-03-10 10:58 Reinier Olislagers Tag Attached: utf-8
2014-03-10 10:58 Reinier Olislagers Tag Attached: utf8
2019-08-12 12:59 LacaK Assigned To Joost van der Sluis => LacaK
2019-08-12 12:59 LacaK Status feedback => resolved
2019-08-12 12:59 LacaK Resolution open => fixed
2019-08-12 12:59 LacaK FPCTarget => -
2019-08-12 12:59 LacaK Note Added: 0117650
2019-08-12 13:31 LacaK Status resolved => feedback
2019-08-12 13:31 LacaK Resolution fixed => reopened
2019-08-12 13:31 LacaK Relationship added related to 0017376
2019-08-12 13:31 LacaK Status feedback => resolved