Tuesday, May 8, 2012

C#.net - Read Text from Image in C#.net

In this article i will show you how to read text from image by using OCR Components in C#.net.


What is OCR?
OCR (Optical Character Recognition) is the recognition of printed or written text characters by a computer. This involves photoscanning of the text character-by-character, analysis of the scanned-in image, and then translation of the character image into character codes, such as ASCII, commonly used in data processing.


OCR translates images of text, such as scanned documents, into actual text characters. Also known as text recognition, OCR makes it possible to edit and reuse the text that is normally locked inside scanned images. OCR works using a form of artificial intelligence known as pattern recognition, to identify individual text characters on a page, including punctuation marks, spaces, and ends of lines.


First off, you need to have MS Office 2007 installed or later version. This is obviously a dependency if you develop an application to use the OCR capabilites in the field – it won’t work without Office installed. Furthermore, the OCR capability doesn’t install by default when you install Office, you need to add a component called ‘Microsoft Office Document Imaging’ (MODI).


Instructions on how to add the required MODI component.


Step 1
Click Start, click Run, type appwiz.cpl in the Open box, and then click OK.


Step 2
Click to select the Office 2007 version that you have installed.


Step 3
Click Change.


Step 4
Click Add or Remove Features, and then click Continue.


Step 5
Expand Office Tools.




Click on Image for better View.


Step 6
Click Microsoft Office Document Imaging, and then click Run all from My Computer.




Click on Image for better View.


Step 7
Click Continue.


Now MODI Components installed on your Machine.lets create OCR Application in Visual Stdio.


Step 8
Create a Console Application and give the solution name as SolReadTextFromImage.


Step 9
Copy a Sample image file in Application BaseDirectory.(./bin/debug/SampleImage.JPG)






Click on Image for better View.


Step 10
Add a MODI Reference in our application.so we can use in our application for reading text from image.Right Click on project in Solution Explorer.right click on References,select the COM tab,then select Microsoft Office Document Imaging 12.0 Type Library.  




Click on Image for better View.


Step 11
The Code below will read text from image and store in text file,it is look like this
#region Methods

        /// <summary>
        ///  Read Text from Image and display in console App
        /// </summary>
        /// <param name="ImagePath">specify the Image Path</param>
        private static void ReadTextFromImage(String ImagePath)
        {
            try
            {
                // Grab Text From Image
                MODI.Document ModiObj = new MODI.Document();
                ModiObj.Create(ImagePath);
                ModiObj.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);

                //Retrieve the text gathered from the image
                MODI.Image ModiImageObj = (MODI.Image)ModiObj.Images[0];
                

                System.Console.WriteLine(ModiImageObj.Layout.Text);

                ModiObj.Close();
            }
            catch (Exception ex)
            {
                throw new Exception(ex.Message);
            }
        }

        /// <summary>
        ///  Read Text from Image and Store in Text File
        /// </summary>
        /// <param name="ImagePath">specify the Image Path</param>
        /// <param name="StoreTextFilePath">Specify the Store Text File</param>
        private static void ReadTextFromImage(String ImagePath, String StoreTextFilePath)
        {
            try
            {
                // Grab Text From Image
                MODI.Document ModiObj = new MODI.Document();
                ModiObj.Create(ImagePath);
                ModiObj.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);

                //Retrieve the text gathered from the image
                MODI.Image ModiImageObj = (MODI.Image)ModiObj.Images[0];
               

                // Store Image Content in Text File
                FileStream CreateFileObj = new FileStream(StoreTextFilePath, FileMode.Create);
                //save the image text in the text file 
                StreamWriter WriteFileObj = new StreamWriter(CreateFileObj);
                WriteFileObj.Write(ModiImageObj.Layout.Text);
                WriteFileObj.Close();

                ModiObj.Close();
            }
            catch (Exception ex)
            {
                throw new Exception(ex.Message); 
            }
        }

        #endregion

        

Step 12
Call both methods in main function,it is look like this
static void Main(string[] args)
        {
            // Set Sample Image Path
            String ImagePath = AppDomain.CurrentDomain.BaseDirectory + "SampleImage.jpg";

            ReadTextFromImage(ImagePath);

            // Set Store Image Content text file Path
            String StoreTextFilePath = AppDomain.CurrentDomain.BaseDirectory + "SampleText.txt";

            ReadTextFromImage(ImagePath, StoreTextFilePath);
        }


Full Code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;


namespace SolReadTextFromImage
{
    class Program
    {
        static void Main(string[] args)
        {
            // Set Sample Image Path
            String ImagePath = AppDomain.CurrentDomain.BaseDirectory + "SampleImage.jpg";

            ReadTextFromImage(ImagePath);

            // Set Store Image Content text file Path
            String StoreTextFilePath = AppDomain.CurrentDomain.BaseDirectory + "SampleText.txt";

            ReadTextFromImage(ImagePath, StoreTextFilePath);
        }

        #region Methods

        /// <summary>
        ///  Read Text from Image and display in console App
        /// </summary>
        /// <param name="ImagePath">specify the Image Path</param>
        private static void ReadTextFromImage(String ImagePath)
        {
            try
            {
                // Grab Text From Image
                MODI.Document ModiObj = new MODI.Document();
                ModiObj.Create(ImagePath);
                ModiObj.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);

                //Retrieve the text gathered from the image
                MODI.Image ModiImageObj = (MODI.Image)ModiObj.Images[0];
               

                System.Console.WriteLine(ModiImageObj.Layout.Text);

                ModiObj.Close();
            }
            catch (Exception ex)
            {
                throw new Exception(ex.Message);
            }
        }

        /// <summary>
        ///  Read Text from Image and Store in Text File
        /// </summary>
        /// <param name="ImagePath">specify the Image Path</param>
        /// <param name="StoreTextFilePath">Specify the Store Text File</param>
        private static void ReadTextFromImage(String ImagePath, String StoreTextFilePath)
        {
            try
            {
                // Grab Text From Image
                MODI.Document ModiObj = new MODI.Document();
                ModiObj.Create(ImagePath);
                ModiObj.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);

                //Retrieve the text gathered from the image
                MODI.Image ModiImageObj = (MODI.Image)ModiObj.Images[0];
               

                // Store Image Content in Text File
                FileStream CreateFileObj = new FileStream(StoreTextFilePath, FileMode.Create);
                //save the image text in the text file 
                StreamWriter WriteFileObj = new StreamWriter(CreateFileObj);
                WriteFileObj.Write(ModiImageObj.Layout.Text);
                WriteFileObj.Close();

                ModiObj.Close();
            }
            catch (Exception ex)
            {
                throw new Exception(ex.Message); 
            }
        }

        #endregion
    }
}


Output

Click on Image for better View.

Download
Download source Code

79 comments:

  1. Brothers always brothers, Thank you Very Much for U re Kindness. I Am Pleasure to say This Article for me. :)

    ReplyDelete
    Replies
    1. I prefer xsocr tool http://www.xspdf.com/guide-ocr/text-recognition-from-image/, it's using tesseract 3 engine, and recognize text with higher accuracy

      Delete
  2. Awesome!
    Thanks for share your knowledge n__n

    ReplyDelete
  3. Thanks a lot. its really worth full one.

    ReplyDelete
  4. Replies
    1. Naik,

      Thanks for ur Effort to build this one.From this article we can read the text from Image.Do u have any idea, to read the text(which will change dynamically like the Verification code) inside the image from the image.

      Delete
    2. Most Welcome Mahendran.
      I think it's may not possible to read verification code from image.

      Delete
  5. ReadTextFromImage function is throwing exception

    Object hasn't been initialized and can't be used yet...don't know why...

    ReplyDelete
    Replies
    1. Can you sned your solution copy to my mail id????

      Delete
  6. Hi Kishor,

    This is a nice post, thanks for sharing.

    In my case, I am having image with English but the font is different, the font is "BlackJackRegular".
    How can we set the font while reading image?

    Thanks

    ReplyDelete
    Replies
    1. Hi Anant

      I hope you like my article.

      MODI Supports only few Standard fonts.i tried lots of font but no result.even you cant read capcha too.

      Delete
  7. Thanks Kishor for sharing your valuable knowledge, It just give me the beginning to include this feature in my project, thanks a lot

    ReplyDelete
  8. Thanks Brother, but i want take image from webcam then after extract from image.

    ReplyDelete
  9. I will write article on taking image from webcam very soon.

    ReplyDelete
  10. Thanks for your code, Can you please let met know how to read tab separated text from image using MODI object.

    ReplyDelete
  11. Thanks a lot it is easier to understand and very helpful.
    works fine.

    ReplyDelete
  12. is that give a result for corsive writing??

    ReplyDelete
  13. thanks, can i use this code to extract information from .E01 hard disk images?

    ReplyDelete
  14. Thanks its very good article , but i have one doubt , its working if the content of the image is in horizontal , but if content is in vertical its not working , please help me out in this .

    Thanks
    Mohan

    ReplyDelete
  15. Great tutorial. Can this read pdf document?

    ReplyDelete
  16. cant it take path from a folder in c drive,instead of the image file being in bin folder.

    ReplyDelete
  17. Hi
    thanks for the code.
    @ sign is not converted in MODI.dll

    when i get the text from email address rrizwann@gmail.com then @ sign skip and return rrizwangmail.com.

    ReplyDelete
  18. im getting some exception error...

    ReplyDelete
  19. IS IT WORKING FOR CURSIVE TEXT IMAGES
    ?

    ReplyDelete
    Replies
    1. MODI Supports only Standard Fonts......it's not designed for Cursive Text...I tried but i failed to read cursive Text from Image.

      Delete
  20. Hi Kishor,
    I am getting this error :
    Retrieving the COM class factory for component with CLSID {40942A6C-1520-4132-BDF8-BDC1F71F547B} failed due to the following error: 80040154.

    please let me know how to resolve it?

    thanks

    ReplyDelete
    Replies
    1. Did you add reference Microsoft Office Document Imaging 12.0 Type Library on your Solution????

      Delete
  21. can we do the same implementation using web application?
    if yes then procedure please?

    ReplyDelete
  22. Its really good.
    But i have a problem, it reads normal font accurately but it is failing to read out different font text.
    Suppose there is an image file having different font text, it doesn't produce correct output.
    Can we find out font of text using MODI so that it can give correct output.

    ReplyDelete
    Replies
    1. MODI Supports only Standard Fonts.......

      Delete
  23. Hi Kishor
    I have Microsoft office 2013 installed in my system. I don't se the option Microsoft Office Document Imaging. Instead I see Optical Character Recognition(OCR.Should I proceed with the same?

    ReplyDelete
    Replies
    1. Install MODI DLL

      http://social.technet.microsoft.com/Forums/office/en-US/93d6f285-dc98-46e2-b7e0-872bba9c4e35/microsoft-office-document-imaging

      Delete
  24. can u plz give some idea what we have to do if we want to read cursive text images?

    ReplyDelete
  25. i want to read a text from natural scene images, what shud i do for it?

    ReplyDelete
  26. Hi Kishor
    Thanks for your great post,,,
    But i had one problem...while reading it comes on unknown formart....
    Is it possible set font for reading.....It is not detecting all fonts
    Pls help me....

    ReplyDelete
  27. Thanks for the good post..is it possible to read different different fonts .....???

    ReplyDelete
  28. hi........... is it possible to get starting point of text from all the four sides.... Actually i have to auto crop image... plz reply ASAP.....

    ReplyDelete
  29. Worked like a charm, Thanks!

    ReplyDelete
  30. acually this code support only some font-families how will it support to all font families???????plzzz give me suggesion its ma task....

    ReplyDelete
  31. its not supporting all fonts...can u plzz suggest me code for it

    ReplyDelete
  32. Is it possible to do it in MS word 2013?

    ReplyDelete
  33. Hi Kishor,

    I am using MS Visual Studio 2008, I have done all these steps, I am getting Retrieving the COM class factory for component with CLSID {40942A6C-1520-4132-BDF8-BDC1F71F547B} failed due to the following error: 80040154 error. I rechecked the reference. I could see the MODI reference added in the solution explorer, still I am getting this error. Please let me know what need to done.

    ReplyDelete
  34. hi sir..
    what will be the source code for converting handwritten image text into text

    ReplyDelete
  35. Let me know how to get particular text from image ?

    ReplyDelete
  36. very helpful. thank you so much

    ReplyDelete
  37. Does this method also work for PDF images?

    ReplyDelete
  38. Thank you for this! MUCH appreciated! Thank you :)

    ReplyDelete
  39. Team, this is awesome! It works perfectly with clear images and has a very good approach for dark images. But now, I have two questions:

    1. I'm using Windows Azure for my website where I have this solution installed. But, do you know, how can I install the Sharepoint Designer in the Windows Azure server?

    2. Do you know, how can I edit the size of the pictures? With this solution, the pictures that can be loaded must be from a certain MPx and size!

    ReplyDelete
  40. Retrieving the COM class factory for component with CLSID {40942A6C-1520-4132-BDF8-BDC1F71F547B} failed due to the following error: 80040154.

    ReplyDelete
  41. I’m searching for OCR solution recently and this ocr solution is one of my testing. Till now, it works.

    Test: .net ocr sdk, c# ocr api

    ReplyDelete
  42. Step 8
    Create a Console Application and give the solution name as SolReadTextFromImage.
    pls help

    ReplyDelete
    Replies
    1. What you want to know?
      Please mail @ achaltrehan5@gmail.com :)
      I will help you.. :)

      Delete
  43. While executing this code am getting error like 'OCR running error' please help

    ReplyDelete
  44. Thanks, Kishor. This is going to be a great hit in the data entry process. Can you please tell me if the code recognizes hand written characters....

    ReplyDelete
  45. thanks but there is no reliable results. can you guide how we could get accurate results ?

    ReplyDelete
  46. I'm not a developer, i always use the free online ocr to recognize and scan text from image.

    ReplyDelete
  47. How to convert cursive image to txt

    ReplyDelete
  48. While executing this code am getting error like image below (dont have type for (MODI.Image)ModiObj.Images[0]; requre dynamic express)

    http://prntscr.com/8szq02

    Plz help me...Thank you somuch

    ReplyDelete
  49. Very Nice, Thanks for sharing your knowledge

    ReplyDelete
  50. i want to store text from image taken by web cam ,could u sugest me..how to do it.Thanks in advance..

    ReplyDelete
  51. hi, can u plz give some idea what we have to do if we want to read cursive text images?

    ReplyDelete
  52. That's a very great sharing. I just learn that we can extract text from image file. Is that possible to do using image stream instead of physical file?

    ReplyDelete
  53. Did not show reference in COM Reference list....plz help

    ReplyDelete
  54. Hi there. I'm trying to read text from a tiff file, but the program throws exception every time I get to the "Create" method saying that the file is empty or corrupt.

    ReplyDelete
  55. a great work bro keep it up plz give me your mobile number

    ReplyDelete
  56. i want to convert handwritten image to text,please anyone help me

    ReplyDelete
  57. Kishor,

    I have MS office 2016, I could not find MS Office Document imaging under Office Tools while I add or remove features using Control panel. Please help me sir.

    ReplyDelete
  58. thanks you very much for very much sir can this is possible in Asp,net

    ReplyDelete
  59. sir i cannot convert .. i think my file is in cursive text font .. have any option ?

    ReplyDelete
  60. Hello,
    Can i get Specific text from the Image file like if Image file Contains first name and last name

    ReplyDelete
  61. how can it support in office 2013.bacause MODI Dll not a part of office 13 .I have already try.

    ReplyDelete