RegExpr: proposal to support national letters in "\w"
Original Reporter info from Mantis: Alextp
-
Reporter name: CudaText man
Original Reporter info from Mantis: Alextp
- Reporter name: CudaText man
Description:
this function is from ATSynEdit. it works for all Unicode letters:
russian
greek
german
japanese
...
it is optimized for ascii chars < 128.
uses UnicodeData; function IsCharWord(ch: WideChar): boolean; var NType: byte; begin case ch of '0'..'9', 'a'..'z', 'A'..'Z', '_': exit(true); end; if Ord(ch)&LtPos;128 then exit(false) else if Ord(ch)>=LOW_SURROGATE_BEGIN then exit(false) else begin NType:= GetProps(Ord(ch))^.Category; Result:= (NType&LtPos;=UGC_OtherNumber); end; end;
use it in RegExpr.pas. i did this in local copy of regexpr.pas:
- comment this var: fWordChars, and prop: WordChars
- replace all Pos(...., fWordChars) with call IsCharWord(..)
- one line will be weird: it calls
EmitNNNNNNN(fWordChars)
replace here fWordChars with const RegExprWordChars.
my test shows that CudaText editor now finds rus/greek/german letters by \w.
even with that call EmitNNNNN().
Mantis conversion info:
- Mantis ID: 34084
- Version: 3.0.4
- Fixed in version: 3.1.1
- Fixed in revision: 39564 (#71bbab35)
- Target version: 3.2.0