RTF to HTML, and HTML to RTF

General TRichView support forum. Please post your questions here
Post Reply
rjwerning
Posts: 5
Joined: Mon Sep 08, 2014 10:41 pm

RTF to HTML, and HTML to RTF

Post by rjwerning »

I'm currently researching a serverside solution to being able to convert RTF data in our DB to HTML for use in mobile/web applications. The memo may be modified by the user, so I need to be able to convert it back from HTML to RTF so our windows clients handle it correctly. The RTF stored in the database is pretty simple for the most part, bold/italics, maybe font size change, possibly colored. There should be no images embedded.

I came up with a convoluted solution using TWebBrowser, but then found TRichView components. From what I've been reading I think they may be the solution we need. What I'm looking for is some input on which components, methods, demo's or help file information that would help me evaluate this as quickly as possible. I am researching & reading on my own, but turning to you all could speed up the process.

What I need is to be able to do:
- load a RichView from a dataset Tfield that is RTF format.
- Convert that to HTML and use it as a string
- Later, take an HTML string and convert that to RTF
- Load the RTF into a dataset TField


Thank you for any assistance you can provide,
Rich Werning
Sergey Tkachenko
Site Admin
Posts: 17632
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Post by Sergey Tkachenko »

Yes, you can do it. But conversion to HTML and back may loose some formatting.

- For loading RTF from db, you can use LoadRTFFromStream
- For saving to HTML, you can use SaveHTMLToStreamEx (or a simplified version, SaveHTMLToStream) to save to TStringStream
- For HTML loading, you need to use additional free components (TrvHtmlViewImporter (recommended) or TrvHtmlImporter), see http://www.trichview.com/resources/
- For RTF saving, you can use SaveRTFToStream
rjwerning
Posts: 5
Joined: Mon Sep 08, 2014 10:41 pm

Thanks for the response

Post by rjwerning »

Thanks for the prompt response. Eventually this will all be done nonvisual on the server, but for my demo I'm using visual components to help view the results. After multiple trial and errors I have the following code working to load the RichView component and move it as HTML to a Memo component. Does this look correct to you?

Code: Select all

  aStream2 := nil;
  aStream := nil;
  try
    Memo1.Clear;
    RichView1.clear;
    RichView1.DeleteUnusedStyles(True, True, True);
    RichView1.RTFReadProperties.TextStyleMode := rvrsAddIfNeeded;
    RichView1.RTFReadProperties.ParaStyleMode := rvrsAddIfNeeded;
    RichView1.Options := RichView1.Options + [rvoTagsArePChars];
    RichView1.RVFOptions := RichView1.RVFOptions + [rvfoSaveTextStyles, rvfoSaveParaStyles];

    Value := SqlQuery1.FieldByName('MEMO_TEXT').AsString;
    aStream := TStringStream.Create(Value);
    aStream.Position := 0;
    if not RichView1.LoadRTFFromStream(aStream) then
      ShowMessage('failed LoadRTFFromStream');
    RichView1.Format;

    aStream2 := TStringStream.create;
    RichView1.SaveHTMLToStream(aStream2, 'c:\', 'RtfToHtml','RtfToHtml',[]);
    aStream2.Position := 0;
    Memo1.Text := aStream2.ReadString(aStream2.Size);
  finally
    aStream.free;
    aStream2.Free;
  end;
Sergey Tkachenko
Site Admin
Posts: 17632
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Post by Sergey Tkachenko »

Yes, this code looks correct.

Notes:
1) If you just need to convert RTF to HTML, and do not need displaying, you can skip calling Format. Format is quite slow, and the only saving procedures requiring it are saving RTF and DocX containing tables.
2) I recommend to include the options [rvsoUTF8, rvsoUseCheckpointsNames] in HTML saving
3) SaveHTMLToStream generates a basic HTML. SaveHTMLToStreamEx produces better result.
rjwerning
Posts: 5
Joined: Mon Sep 08, 2014 10:41 pm

HTMLViewImport

Post by rjwerning »

I can't seem to get the HtmlViewImport working non-visual to convert the HTML back to RTF. I get a message "Control '' has no parent window'" when I try to LoadFromString. As this is going to be done in a Windows Service there is no main form or parent control. I'll try the other HTML converter (RvHtmlImporter) tomorrow, but wanted to ask and see if you had any suggestions on how to proceed with this one.

Thank you again
Rich

Code: Select all

procedure SetupRichView(Sender: TRichView);
begin
  // Set up the RichView component
  Sender.clear;
  Sender.DeleteUnusedStyles(True, True, True);
  Sender.RTFReadProperties.TextStyleMode := rvrsAddIfNeeded;
  Sender.RTFReadProperties.ParaStyleMode := rvrsAddIfNeeded;
  Sender.Options := Sender.Options + [rvoTagsArePChars];
  Sender.RVFOptions := Sender.RVFOptions + [rvfoSaveTextStyles, rvfoSaveParaStyles];
end;

function HTMLToRTF(value: string): string;
var
  Viewer: THTMLViewer;
  Importer: TRVHTMLViewImporter;
  RichView: TRichView;
  Style: TRvStyle;
  p1 : integer;
  aStream: TStringStream;
begin
  Viewer := nil;
  aStream := nil;
  Importer := nil;
  Style := nil;

  RichView := TRichView.create(nil);
  try
    Style := TRvStyle.Create(nil);
    RichView.Style := Style;
    SetupRichView(RichView); // Standard setup of the RichView component

    if value <> '' then
    begin
      Viewer := THTMLViewer.Create(nil);
      Viewer.Visible := False;
//      Viewer.Parent := RichView.Parent;
      Viewer.DefBackground := clWhite;

      Importer := TRVHTMLViewImporter.Create(nil);
      Viewer.LoadFromString(value);  // <<<< Error occurs here <<<
      Importer.ImportHtmlViewer(Viewer, RichView );

      aStream := TStringStream.create;
      RichView.SaveRTFToStream(aStream, False);
      aStream.Position := 0;
      result := aStream.ReadString(aStream.Size);
    end;
  finally
    RichView.Free;
    Importer.Free;
    Viewer.Free;
    aStream.Free;
    Style.Free;
  end;
end;
Sergey Tkachenko
Site Admin
Posts: 17632
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Post by Sergey Tkachenko »

You can try to create an invisible form and place THTMLViewer in it.
rjwerning
Posts: 5
Joined: Mon Sep 08, 2014 10:41 pm

Additional help if possible

Post by rjwerning »

Thank you once again, I believe I can make it work this way. I am having issues with the RTF format once I do the final convert back from HTML to RTF, the bullet items are not displaying correctly. I am sending a demo application that demonstrates the problem to the gmail address have listed elsewhere.

- Rich

Instead of showing like normal bullets:

Code: Select all

A blank window, known as a form, on which to design the UI for your application.
■	Extensive class libraries with many reusable objects.
■	An Object Inspector for examining and changing object traits.
They are showing like this:

Code: Select all

A blank window, known as a form, on which to design the UI for your application.
â–         Extensive class libraries with many reusable objects.
â–         An Object Inspector for examining and changing object traits.
rjwerning
Posts: 5
Joined: Mon Sep 08, 2014 10:41 pm

Follow up

Post by rjwerning »

I should have mentioned that we're using Delphi XE6. If it matters we're using Windows 7 pro, US english.

I can't seem to edit a previous post that I made.
Sergey Tkachenko
Site Admin
Posts: 17632
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Post by Sergey Tkachenko »

I received your email.

I’ll try to explain what’s happening.

Loading RTF
This RTF file actually contains not bullets, but characters of “Wingdings” font followed by a tab characters. LoadRTF loads these characters as they are.

Saving HTML
But it is impossible to save “Wingdings” character (or character of any font with SYMBOL_CHARSET) to HTML. HTML can only contain characters belonging to Unicode standard (currently, only IE can display SYMBOL_CHARSET characters, and only in compatibility mode).
Two symbol fonts are extremely important, because they are the most frequently used symbol fonts: “Wingdings” and “Symbol”. TRichView has a special code for saving these characters to HTML: it converts characters of these fonts to the most similar Unicode characters (or named HTML entities).
So a character from your file is converted to Unicode character $25A0 (“black square”).
Tab characters cannot be saved to HTML as well, so they are converted to several space characters.

Loading HTML
Your HTML have UTF-8 encoding (a good choice)
But you perform some implicit conversions to UnicodeString, and data are corrupted if you do not specify explicitly that they are in Unicode.
Required changes:
1) In TForm1.btnStep2Click(Sender: TObject), aStream must be created as
aStream := TStringStream.create('', TEncoding.UTF8);
2) In TForm1.btnrvHtmlImportClick, the first time aStream must be created as
aStream := TStringStream.Create(mmoHtml.Text, TEncoding.UTF8);
3) In the same procedure, assign importer.Encoding := rvhtmleUTF8;
Post Reply