Convert HTML to WORD in .Net

Last month I faced a requirement that needed some Word format generation. After a lot of searching for 3rd party components that was costing us lot of money 🙂 . I realized that the most simple and yet cheap is converting HTML documents to Word format.

Actually it was a lot easier than what it looks and I did what 1000$ components does for free. Take a look at this snippet that converts a HTML file to a Word file:

//Copyright © 2010 - harouny.com
//Developer : Ahmed EL-Harouny
//Date : 06/2010
//Purpose: A static class to convert from HTML document to Word document
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.Office.Interop.Word;
using System.IO;

namespace Utilities
{
    /// <summary>
    /// A static class to convert from HTML document to Word document
    /// </summary>
    public static class HTML2WordConverter
    {

        private static Application word;
        private static Document document;

        /// <summary>
        /// converts a HTML file to Word file
        /// </summary>
        /// <param name="htmlSrcFilePath">the path to the source HTML file</param>
        /// <param name="wordDestFilePath">the path of the destination word file</param>
        public static void Convert(string htmlSrcFilePath, string wordDestFilePath, bool embedImages)
        {
            FileInfo SrcFile = new FileInfo(htmlSrcFilePath);
            FileInfo DestFile = new FileInfo(wordDestFilePath);
            if (SrcFile.Exists == false)
            {
                throw new Exception(htmlSrcFilePath + " doesn't exist.");
            }

            word = new Application();
            document = new Document();
            try
            {
                document = word.Documents.Add();
                word.Visible = false;

                document = word.Documents.Open(SrcFile.FullName);

                document.Activate();

                if (embedImages)
                {
                    //embed inline images in the document
                    foreach (InlineShape image in document.InlineShapes)
                    {
                        if (image.LinkFormat != null)
                        {
                            try
                            {
                                image.LinkFormat.SavePictureWithDocument = true;
                                image.LinkFormat.BreakLink();
                            }
                            catch (Exception ex) { /* nothing */ }
                        }
                    }
                }

                document.ActiveWindow.View.Type = Microsoft.Office.Interop.Word.WdViewType.wdPrintView;

                document.SaveAs(DestFile.FullName, WdSaveFormat.wdFormatDocument);

                document.Close(false);
                word.Quit();
            }
            catch (Exception ex)
            {
                try
                {
                    document.Close(false);
                    word.Quit();
                }
                catch (Exception ex2) {/* nothing */}
                throw ex;
            }
        }
    }
}
Advertisements
This entry was posted in Software and tagged . Bookmark the permalink.

7 Responses to Convert HTML to WORD in .Net

  1. Pingback: Convert Word documents to PDF and XPS documents by code in .Net | Ahmed EL-Harouny

  2. harouny says:

    ur welcome..glad it helped

  3. Jack Baker says:

    Here is a nice .NET lib for generating word documents. We have used it in our project though it has some limitations. Link: http://www.devtriogorup.com/wordtextron/wordtextronfree.aspx

  4. Doua says:

    Thanks a bunch but it’s taking too much time at the line:
    document = word.Documents.Open(SrcFile.FullName);
    It won’t probably work. Have you got any clue what is going wrong? and thanks very much anyway.. gazak Allah khayran

  5. harouny says:

    Hi @Doua, I don’t have a clue but did you try opening the same file with your Office Word? Its basically the same thing so both operations should take the same time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s