Discussions

XML & Web services: HELP! How can i access data in form fields (MS word) with java

  1. Hi,
    I created a form with microsoft word 2003 and saved it as .doc document. The form contains complex form fields with drop-down form-fields, checkboxes,text from fields etc.. After creating the form with word, I now realize that I need create an application that reads the data in the form for further processing. The application is to be written in JAVA.
     
    How do I go about this? Since I am a bit under time pressure, I thought the best thing to do is to simply save the form document as an XML file, then parse that file using a sax parser. I realize that it actually more efficient if I create an XSD schema for the form and then re-create the form again according the schema. This would save me from having to come in touch with WordML. The problem with this is that I cannot use form fields in the xsd schema for the form and consequently no form fields in the form. So I decided to stick to simply saving the form as an XML document and 'fight' with wordml.

    So the question that I have is: HOW DO I GET ACCESS TO THE DATA IN THE FORM FIELDS, i.e. after the form is filled in and saved as an XML file. Data entry for normal text fields does not seem to be a problem. The problem is rather with checkboxes and dropdown fields.

    Here is a WordML code snipet for a drop down field that is filled in with an integer (value should be '4', but I do not see it):

    <w:tc><w:tcPr>
    <w:tcW w:w="2589" w:type="dxa"/>
    <w:gridSpan w:val="2"/>
    <w:vAlign w:val="center"/>
    </w:tcPr><w:p><w:pPr
    ><w:pStyle w:val="BodyText"/>
    <w:jc w:val="left"/>
    </w:pPr><w:r><w:rPr>
    <w:rFonts w:cs="Tahoma"/>
    <w:sz w:val="20"/>
    <w:sz-cs w:val="20"/>
    <w:lang w:val="EN-GB"/>
    </w:rPr>
    <w:t> Number of cups: </w:t>
    </w:r>
    <aml:annotation aml:id="2" w:type="Word.Bookmark.Start" w:name="Dropdown1"
    />
    <w:r>
    <w:rPr>
    <w:rFonts w:cs="Tahoma"/>
    <w:sz w:val="20"/>
    <w:sz-cs w:val="20"/>
    <w:lang w:val="EN-GB"/>
    </w:rPr>
    <w:fldChar w:fldCharType="begin">
    <w:fldData>/////wqAAAAAAAkARAByAG8AcABkAG8AdwBuADEAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAA//8GAAAABAAgACAAIAAgAAEAMwABADQAAQA1AAEANgAEAG0AbwByAGUA</w:fldDa
    ta>
    </w:fldChar>
    </w:r>
    <w:r>
    <w:rPr>
    <w:rFonts w:cs="Tahoma"/>
    <w:sz w:val="20"/>
    <w:sz-cs w:val="20"/>
    <w:lang w:val="EN-GB"/>
    </w:rPr>
    <w:instrText> FORMDROPDOWN </w:instrText>
    </w:r>
    <w:r>
    <w:rPr>
    <w:rFonts w:cs="Tahoma"/>
    <w:sz w:val="20"/>
    <w:sz-cs w:val="20"/>
    <w:lang w:val="EN-GB"/>
    </w:rPr>
    </w:r>
    <w:r>
    <w:rPr>
    <w:rFonts w:cs="Tahoma"/>
    <w:sz w:val="20"/>
    <w:sz-cs w:val="20"/>
    <w:lang w:val="EN-GB"/>
    </w:rPr>
    <w:fldChar w:fldCharType="end"/>
    </w:r>
    <aml:annotation aml:id="2" w:type="Word.Bookmark.End"/>
    </w:p>
    </w:tc>
     

    Sorry if it too long. Like I said, the value should be four, but I do not see it. And even if I could see, what is the pattern, where and when do tell the SAX parser to stop. Is there a better way to do this, i.e. using JAVA to access from data in word.

    1001 thanks in advance.

    Lloyd
  2. Have you checked out POI http://jakarta.apache.org/poi/?
    Might handle this fine. I've used with excel but not with word.
  3. Help :[ Go to top ]

    Hi Lloyd,

    Have you got the solution for the below one as I am facing the same currently.

    Please share the same if you have a solution.

    Regards
    Wayne