loading a Unicode file line by line
loading a Unicode file line by line
Hi.
I have a question.
I'm loading a UTF8 Unicode into a string array like this: I load the entire text into a string then from the third character to the end I split in lines (at #13#0#10#0).
After I make some modifications to the lines I want to load them to RichViewEdit then save all the text as Unicode or Ansi.
I tried with "AddTextNLW" but I see some strange characters in the text and when I save I get a file that can't be loaded in any text editor.
What should I do? Please help me... Thank you.
I have a question.
I'm loading a UTF8 Unicode into a string array like this: I load the entire text into a string then from the third character to the end I split in lines (at #13#0#10#0).
After I make some modifications to the lines I want to load them to RichViewEdit then save all the text as Unicode or Ansi.
I tried with "AddTextNLW" but I see some strange characters in the text and when I save I get a file that can't be loaded in any text editor.
What should I do? Please help me... Thank you.
-
- Site Admin
- Posts: 17647
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
-
- Site Admin
- Posts: 17647
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
Yes, I do that but it's not working...
I see that in editor http://www.imagehosting.gr/show.php/974 ... e.PNG.html
PS: I checked again the code that extracts the lines from text: works 100% fine.
The text should begin like that: "LETHAL WEAPON 4.#13#10Riggs,...".
I see that in editor http://www.imagehosting.gr/show.php/974 ... e.PNG.html
PS: I checked again the code that extracts the lines from text: works 100% fine.
The text should begin like that: "LETHAL WEAPON 4.#13#10Riggs,...".
-
- Site Admin
- Posts: 17647
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
If I do that:
ScaleRichView.RichviewEdit.LoadTextW(FileName, 0, 0, False);
ScaleRichView.RichviewEdit.Format;
Then it works ok.
But If I do this:
Stream := TFileStream.Create(FileName, fmOpenRead);
SetLength(s, Stream.Size);
Stream.ReadBuffer(PChar(s)^, Stream.Size);
ScaleRichView.RichviewEdit.AddTextNLW(s, 0, 0, 0, False);
Stream.Free;
ScaleRichView.RichviewEdit.Format;
Then I get the text as you see in the picture. This code is from TCustomRichView.LoadTextW >> TCustomRVData.LoadTextW >> TCustomRVData.LoadTextFromStreamW.
It doesn't matter if it's a line or an entire text and it doesn't matter if I use UTF8Decode or not.
Maybe I'm doing something wrong - but what is it?
ScaleRichView.RichviewEdit.LoadTextW(FileName, 0, 0, False);
ScaleRichView.RichviewEdit.Format;
Then it works ok.
But If I do this:
Stream := TFileStream.Create(FileName, fmOpenRead);
SetLength(s, Stream.Size);
Stream.ReadBuffer(PChar(s)^, Stream.Size);
ScaleRichView.RichviewEdit.AddTextNLW(s, 0, 0, 0, False);
Stream.Free;
ScaleRichView.RichviewEdit.Format;
Then I get the text as you see in the picture. This code is from TCustomRichView.LoadTextW >> TCustomRVData.LoadTextW >> TCustomRVData.LoadTextFromStreamW.
It doesn't matter if it's a line or an entire text and it doesn't matter if I use UTF8Decode or not.
Maybe I'm doing something wrong - but what is it?
-
- Site Admin
- Posts: 17647
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
If file can be loaded by LoadTextW, this is not a UTF-8, but UTF-16 file (each character = 2 bytes).
Your code is not correct. You load file in string, so each Unicode character is read in two adjacent characters. When you pass this string to WideString parameter of TRichView.AddTextNLW, the string is converted to WideString implicitly, that makes no sense if the string contains data like this.
Why the similar code works in TCustomRVData? Because TCustomRVData.AddTextNLW is different from TRichView.AddTextNLW and intended for private use. While TRichView.AddTextNLW expects WideString parameter, TCustomRVData.AddTextNLW expects String containg data like yours (each Unicode character in two adjacent string characters).
The correct code:
Your code is not correct. You load file in string, so each Unicode character is read in two adjacent characters. When you pass this string to WideString parameter of TRichView.AddTextNLW, the string is converted to WideString implicitly, that makes no sense if the string contains data like this.
Why the similar code works in TCustomRVData? Because TCustomRVData.AddTextNLW is different from TRichView.AddTextNLW and intended for private use. While TRichView.AddTextNLW expects WideString parameter, TCustomRVData.AddTextNLW expects String containg data like yours (each Unicode character in two adjacent string characters).
The correct code:
Code: Select all
s: WideString;
Stream := TFileStream.Create(FileName, fmOpenRead);
if Stream.Size mod 2 = 1 then
!!! error, the file is not Unicode UTF-16 !!!
else begin
SetLength(s, Stream.Size div 2);
Stream.ReadBuffer(Pointer(s)^, Stream.Size);
ScaleRichView.RichviewEdit.AddTextNLW(s, 0, 0, 0, False);
end;
Stream.Free;
ScaleRichView.RichviewEdit.Format;
Just one small problem.
For example I have item "Hello world!" (index 0) which IsFromNewLine returns True.
I insert a special character with Insert >> Symbol.
Now I have 3 items:
Item[0] = 'Hello'
Item[1] = special character
Item[2] = ' world!'
I understand that IsFromNewLine(2) = True (that's normal) but why IsFromNewLine(0) returns also True? Strange is that the new line is not visible and when I save the text with "Save.." it's not saved also.
I ask because I don't save the text with "Save...", instead I get text from each item with GetTextA/W and I add #13#10 if IsFromNewLine returns True. In this case Item[1] is on a new line...
What can I do...?
For example I have item "Hello world!" (index 0) which IsFromNewLine returns True.
I insert a special character with Insert >> Symbol.
Now I have 3 items:
Item[0] = 'Hello'
Item[1] = special character
Item[2] = ' world!'
I understand that IsFromNewLine(2) = True (that's normal) but why IsFromNewLine(0) returns also True? Strange is that the new line is not visible and when I save the text with "Save.." it's not saved also.
I ask because I don't save the text with "Save...", instead I get text from each item with GetTextA/W and I add #13#10 if IsFromNewLine returns True. In this case Item[1] is on a new line...
What can I do...?
-
- Site Admin
- Posts: 17647
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
-
- Site Admin
- Posts: 17647
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
If there are two paragraph, each containing 1 item, both IsFromNewLine(0) and IsFromNewLine(1) are True.
Code: Select all
text := '';
for i := 0 to Editor.ItemCount-1 do
begin
if (i>0) and Editor.IsFromNewLine(i) then
text := text + #13#10;
if Editor.GetItemStyle(i)=rvsTab then
text := text + #9
else if Editor.GetItemStyle(i)>=0 then
text := text + Editor.GetItemTextA(i);
end;
Sorry to bother you again but I met a text where I can't use your code.
Looks like that:
Item[0] = 'LETHAL WEAPON 4.' IsFromNewLine = True
Item[1] = 'Riggs, are you ...' IsFromNewLine = True
I insert a character in first item Now it's like this:
Item[0] = 'LETHAL' IsFromNewLine = True
Item[1] = character IsFromNewLine = False
Item[2] = ' WEAPON 4.' IsFromNewLine = False
Item[3] = 'Riggs, are you...' IsFromNewLine = True
The problem is that IsFromNewLine(2) switched from True to False.
And it's not happening only to the first line from text. Everywhere I break an item (which is a line) into three the first piece has isFromNewLine True and the last has False.
If I convert the text to rtf, I load the file with LoadRtf and I test the items before and after I insert he character then it's the same thing.
PS: it's a normal text, nothing special about it but if you want I will send it to you.
Looks like that:
Item[0] = 'LETHAL WEAPON 4.' IsFromNewLine = True
Item[1] = 'Riggs, are you ...' IsFromNewLine = True
I insert a character in first item Now it's like this:
Item[0] = 'LETHAL' IsFromNewLine = True
Item[1] = character IsFromNewLine = False
Item[2] = ' WEAPON 4.' IsFromNewLine = False
Item[3] = 'Riggs, are you...' IsFromNewLine = True
The problem is that IsFromNewLine(2) switched from True to False.
And it's not happening only to the first line from text. Everywhere I break an item (which is a line) into three the first piece has isFromNewLine True and the last has False.
If I convert the text to rtf, I load the file with LoadRtf and I test the items before and after I insert he character then it's the same thing.
PS: it's a normal text, nothing special about it but if you want I will send it to you.