View Issue Details

IDProjectCategoryView StatusLast Update
0037938FPCFCLpublic2020-10-17 19:48
ReporterPedro Gimeno Assigned To 
PrioritynormalSeverityminorReproducibilityalways
Status newResolutionopen 
Platformx86_64, build: trunk r47066OSWindows 
Summary0037938: 3-letter country codes improperly converted to 2-letter country codes
DescriptionI don't use Lazarus on Windows, but when checking the code I noticed this in the Windows version of GetLanguageIDs:

    // some 2 letter codes are not the first two letters of the 3 letter code
    // there are probably more, but first let us see if there are translations
    if (Buffer='PRT') then Country:='PT';


So I decided to write a proper ISO-3166-1 3-letter country code to 2-letter country code converter, which I hereby donate to the public domain:

{ Transforms an ISO-3166 3-letter country code into an ISO-3166 2-letter
  country code. If the result is not aligned to a 3 letter boundary, it's not
  considered to be in the table, and the first 2 letters are returned instead.
  The table only contains those 3-letter country codes whose first 2 letters
  don't match their 2-letter country code. It's been verified that none of
  the 3-letter codes from the first table appear earlier in the string. }
function Country3to2(c3: string): string;
var
  posn: Integer;
begin
  posn := Pos(c3, 'ALAANDAGOATAATGARMAUTBHSBGDBRBBLRBLZBENBESBIHBRNBDICPVCYM'
    + 'CAFTCDCHLCHNCOMCOGCODCOKCUWDNKSLVGNQESTSWZFLKFROGUFPYFATFGRLGRDGLPGIN'
    + 'GNBGUYIRQIRLISRJAMKAZPRKKORLBRLBYMACMDGMDVMLTMTQMYTMEXFSMMNEMOZNIUMNP'
    + 'PAKPLWPNGPRYPCNPOLPRTMAFSPMSENSRBSYCSVKSVNSLBSGSSURSWETUNTURTKMTUVUKR'
    + 'AREURYWLFESHABW');
  if posn mod 3 = 1 then
  begin
    posn := posn div 3 * 2;
    Exit(copy('AXADAOAQAGAMATBSBDBBBYBZBJBQBABNBICVKYCFTDCLCNKMCGCDCKCWDKSVGQEE'
      + 'SZFKFOGFPFTFGLGDGPGNGWGYIQIEILJMKZKPKRLRLYMOMGMVMTMQYTMXFMMEMZNUMPPKPW'
      + 'PGPYPNPLPTMFPMSNRSSCSKSISBGSSRSETNTRTMTVUAAEUYWFEHAW',
      posn + 1, 2));
  end;
  Exit(copy(c3, 1, 2));
end;


to hopefully help in solving that issue. The table contains 93 entries, so there's a good number of countries that were being incorrectly determined.
TagsNo tags attached.
Fixed in Revision
FPCOldBugId
FPCTarget
Attached Files

Activities

Pedro Gimeno

2020-10-17 19:48

reporter   ~0126375

I've checked the function above against this list: http://computer-programming-forum.com/84-vc-stl/d49bc1e00e9d0ffd.htm (I can't link to a particular post, so please check a bit past the middle of the page to find the list).

There are still mismatches, because apparently Windows is not too rigorous in following ISO 3166 (or ISO 639):
- Macao (Macau) is MAC on ISO, but the Windows list shows it as MCO, therefore the function above produces MC instead of MO. Monaco is MCO on both Windows and ISO; the Windows list has that one duplicated, therefore I don't think there's much that can be done about that one.
- The list contains Caribbean (CAR), but there's no such country code in ISO 3166; it seems to be split into several entries (e.g. Caribbean Netherlands is BES, code BQ, which comprises three countries). Perhaps CAR could be mapped to BES as an exception.

A few entries are truncated and I couldn't verify them.

As for languages, the Windows doc ( https://docs.microsoft.com/en-us/windows/win32/intl/locale-sabbrev-constants ) claims that "the name is created by taking the two-letter language abbreviation from ISO Standard 639 and adding a third letter"; I have not verified the language list in the same detail as the country list, but on cursory examination I noticed an outstanding violation: the ISO-639 code for Chinese is zh, but the list references CH (CHT for Taiwan's Chinese, CHS for China's Chinese), which corresponds to the ISO-639 code for Chamorro (a language used in the Marianas). Therefore, a list of exceptions probably needs to be added for languages as well.

Since I'm here, I also wanted to note that I don't have the proper means to verify a complete patch for the above issue, because I don't use Windows, that's why I didn't provide one. Hopefully it should be simple.

Issue History

Date Modified Username Field Change
2020-10-17 02:48 Pedro Gimeno New Issue
2020-10-17 19:48 Pedro Gimeno Note Added: 0126375