用 docx4j 对文档添加/删除自定义属性

=Start=

缘由:

简单记录一下用 docx4j 对Office文档(docx/xlsx/pptx)添加删除自定义属性的方法,方便有需要的参考。

正文:

参考解答:

Maven的相关依赖

<!-- https://mvnrepository.com/artifact/org.docx4j/docx4j-JAXB-Internal -->
<!-- use the JAXB shipped in Java 8 -->
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-Internal</artifactId>
    <version>8.3.4</version>
</dependency>

<!-- use the JAXB Reference Implementation -->
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-ReferenceImpl</artifactId>
    <version>8.3.4</version>
</dependency>

<!-- use the MOXy JAXB implementation -->
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-MOXy</artifactId>
    <version>8.3.4</version>
</dependency>

<!-- https://mvnrepository.com/artifact/commons-io/commons-io -->
<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.11.0</version>
</dependency>

测试验证代码

package com.example;

import org.docx4j.XmlUtils;
import org.docx4j.docProps.custom.Properties;
import org.docx4j.jaxb.Context;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.exceptions.InvalidFormatException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.DocPropsCustomPart;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;

import java.io.File;
import java.util.List;

/**
 * @author ixyzero
 * Created on 2022-04-18
 */
public class docx4jCodeExample {
    public static void main(String[] args) {
        String filePath = "welcome.docx";
        String content = "welcome to docx4j!";
        docxCreate(filePath, content);
        docxReadProps(filePath);

        docxAddProps(filePath);
        docxReadProps(filePath);

        docxDeleteProp(filePath, "1");
        docxReadProps(filePath);

        docxDeleteProp(filePath, "1st custom property");
        docxReadProps(filePath);

        docxDeleteProp(filePath, "all_key");
        docxReadProps(filePath);
    }

    public static int docxCreate(String filePath, String contentString) {
        WordprocessingMLPackage wordMLPackage = null;
        try {
            wordMLPackage = WordprocessingMLPackage.createPackage();
        } catch (InvalidFormatException e) {
            e.printStackTrace();
            return 1;
        }
        MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();
        mainDocumentPart.addStyledParagraphOfText("Title", "Hello World!");
        mainDocumentPart.addParagraphOfText(contentString);

        File exportFile = new File(filePath);
        try {
            wordMLPackage.save(exportFile);
        } catch (Docx4JException e) {
            e.printStackTrace();
            return 1;
        }

        return 0;
    }

    public static int docxReadProps(String filePath) {
        WordprocessingMLPackage wordMLPackage = null;
        try {
            wordMLPackage = WordprocessingMLPackage.load(new File(filePath));
        } catch (Docx4JException e) {
            e.printStackTrace();
            return 1;
        }

        // 1. 查看核心属性-getDocPropsCorePart core properties
        org.docx4j.openpackaging.parts.DocPropsCorePart docPropsCorePart = wordMLPackage.getDocPropsCorePart();
        org.docx4j.docProps.core.CoreProperties coreProps = (org.docx4j.docProps.core.CoreProperties) docPropsCorePart.getJaxbElement();
        // Title of the document
        // Note: Word for Mac 2010 doesn't set title
        String title = "Missing";
        if (coreProps.getTitle() != null) {
            List<String> list = coreProps.getTitle().getValue().getContent();
            if (list.size() > 0) {
                title = list.get(0);
            }
        }
        System.out.println("'dc:title' is " + title);

        // 2. 查看扩展属性-getDocPropsExtendedPart extended properties
        org.docx4j.openpackaging.parts.DocPropsExtendedPart docPropsExtendedPart = wordMLPackage.getDocPropsExtendedPart();
        org.docx4j.docProps.extended.Properties extendedProps = (org.docx4j.docProps.extended.Properties) docPropsExtendedPart.getJaxbElement();
        // Document creator Application
        System.out.println("'Application' is " + extendedProps.getApplication() + " v." + extendedProps.getAppVersion());

        // 3. 查看自定义属性-getDocPropsCustomPart custom properties
        org.docx4j.openpackaging.parts.DocPropsCustomPart docPropsCustomPart = wordMLPackage.getDocPropsCustomPart();
        if (docPropsCustomPart == null) {
            System.out.println("[read]No Document Custom Properties. getDocPropsCustomPart returns null");
        } else {
            // traverse custom properties and print
            org.docx4j.docProps.custom.Properties customProps = (org.docx4j.docProps.custom.Properties) docPropsCustomPart.getJaxbElement();
            System.out.println(String.format("[read]size of Custom Properties = %d",customProps.getProperty().size()));
            for (org.docx4j.docProps.custom.Properties.Property prop : customProps.getProperty()) {
                // Could create a generic Object getValue() method.
                if (prop.getLpwstr() != null) {
                    System.out.println(prop.getName() + " = " + prop.getLpwstr());
                } else {
                    System.out.println(prop.getName() + ": \n " + XmlUtils.marshaltoString(prop, true, Context.jcDocPropsCustom));
                }
            }
            // System.out.println(customProps.getProperty()); // [[email protected], ...]
        }
        System.out.println();
        return 0;
    }

    // 测试起见,没有把新增属性key/value配置成参数变成动态可调的,不过修改起来很简单
    public static int docxAddProps(String filePath) {
        WordprocessingMLPackage wordMLPackage = null;
        try {
            wordMLPackage = WordprocessingMLPackage.load(new File(filePath));
        } catch (Docx4JException e) {
            e.printStackTrace();
            return 1;
        }

        // 自定义属性可能为空
        org.docx4j.openpackaging.parts.DocPropsCustomPart docPropsCustomPart = wordMLPackage.getDocPropsCustomPart();
        if (docPropsCustomPart == null) {
            System.out.println("[add]No Document Custom Properties. let's add some.");
            // 方法一(较为繁琐)
            // 1. create a DocPropsCustomPart and custom.Properties
            try {
                docPropsCustomPart = new DocPropsCustomPart();
                wordMLPackage.addTargetPart(docPropsCustomPart);
            } catch (InvalidFormatException e) {
                e.printStackTrace();
                return 1;
            }
            org.docx4j.docProps.custom.ObjectFactory objFactory = new org.docx4j.docProps.custom.ObjectFactory();
            org.docx4j.docProps.custom.Properties customProps = objFactory.createProperties();
            docPropsCustomPart.setJaxbElement(customProps);

            // 2. create a custom property.
            org.docx4j.docProps.custom.Properties.Property newProp = objFactory.createPropertiesProperty();
            newProp.setName("1st custom property");
            newProp.setFmtid(docPropsCustomPart.fmtidValLpwstr ); // Magic string
            newProp.setPid( customProps.getNextId() );
            newProp.setLpwstr("value_here");

            // 3. add it
            customProps.getProperty().add(newProp);

            // 方法二(简便快捷)
            // step all in one
            wordMLPackage.addDocPropsCustomPart();
            docPropsCustomPart = wordMLPackage.getDocPropsCustomPart();
            docPropsCustomPart.setProperty("key1", "value1");
        } else {
            org.docx4j.docProps.custom.Properties customProps = (org.docx4j.docProps.custom.Properties) docPropsCustomPart.getJaxbElement();

            // 1. createPropertiesProperty, create a custom property.
            org.docx4j.docProps.custom.ObjectFactory objFactory = new org.docx4j.docProps.custom.ObjectFactory();
            org.docx4j.docProps.custom.Properties.Property newProp = objFactory.createPropertiesProperty();
            newProp.setName("last custom prop by add");
            newProp.setFmtid(docPropsCustomPart.fmtidValLpwstr ); // Magic string
            newProp.setPid( customProps.getNextId() );
            newProp.setLpwstr("value_here2");
            // 2. add it
            customProps.getProperty().add(newProp);
        }
        
        // save changes
        try {
            wordMLPackage.save(new File(filePath));
        } catch (Docx4JException e) {
            e.printStackTrace();
            return 1;
        }

        return 0;
    }

    private static boolean isNumeric(String str){
        return str != null && str.matches("[0-9.]+");
    }
    private static boolean isDigit(String str){
        return str != null && str.matches("[0-9]+");
    }

    public static int docxDeleteProp(String filePath, String filterStr) {
        WordprocessingMLPackage wordMLPackage = null;
        try {
            wordMLPackage = WordprocessingMLPackage.load(new File(filePath));
        } catch (Docx4JException e) {
            e.printStackTrace();
            return 1;
        }

        // 自定义属性可能为空
        org.docx4j.openpackaging.parts.DocPropsCustomPart docPropsCustomPart = wordMLPackage.getDocPropsCustomPart();
        if (docPropsCustomPart == null) {
            System.out.println("[delete]No Document Custom Properties. Do nothing.");
            return 0;
        }

        org.docx4j.docProps.custom.Properties customProps = (org.docx4j.docProps.custom.Properties) docPropsCustomPart.getJaxbElement();
        List<Properties.Property> customPropList = customProps.getProperty();
        if (filterStr.equalsIgnoreCase("all_key")) {
            // delete all
            System.out.println("[delete]try to delete all Custom Properties.");
            customProps.getProperty().clear();
        } else if (isDigit(filterStr)) {
            // delete by index
            System.out.println("[delete]try to delete Custom Properties by index(start from 0).");
            int index = Integer.parseInt(filterStr);
            if (index >= customPropList.size()) {
                index = customPropList.size()-1;
            }
            customProps.getProperty().remove(index);
        } else {
            // delete by name
            org.docx4j.docProps.custom.Properties.Property tmpProp = new Properties.Property();
            for (org.docx4j.docProps.custom.Properties.Property prop : customProps.getProperty()) {
                if (prop.getName().equalsIgnoreCase(filterStr.toLowerCase())) {
                    System.out.println("[delete]try to delete by Custom Property's key->" + prop.getName());
                    tmpProp = prop;
                    // customProps.getProperty().remove(prop); // java.util.ConcurrentModificationException (modCount != expectedModCount)
                }
            }
            customProps.getProperty().remove(tmpProp);
        }

        // 必须要保存,否则更改不会生效
        // save changes before exit
        try {
            wordMLPackage.save(new File(filePath));
        } catch (Docx4JException e) {
            e.printStackTrace();
            return 1;
        }

        return 0;
    }
}
参考链接:

Java Examples for org.docx4j.docProps.custom.Properties.Property #读取docx文件的自定义属性,代码简单直接可用
https://www.javatips.net/api/org.docx4j.docprops.custom.properties.property
https://github.com/plutext/docx4j/blob/master/docx4j-samples-docx4j/src/main/java/org/docx4j/samples/DocProps.java

上面的链接里面对于没有自定义属性的docx文档进行添加自定义属性时会报错,可以参考这个链接来从头开始添加自定义属性
https://www.javatips.net/api/docx4j-master/src/scratchpad/org/docx4j/model/fields/AbstractFormattingSwitchTest.java

java docx4j 使用教程_docx4j深入学习整理 #对docx文件有一些简单介绍可以学习参考
https://blog.csdn.net/weixin_33374585/article/details/114719233

Introduction To Docx4J
https://www.baeldung.com/docx4j

How to remove custom properties in docx4j
https://stackoverflow.com/questions/13143542/how-to-remove-custom-properties-in-docx4j

=END=

声明: 除非注明,ixyzero.com文章均为原创,转载请以链接形式标明本文地址,谢谢!
https://ixyzero.com/blog/archives/5227.html

《用 docx4j 对文档添加/删除自定义属性》上的4个想法

  1. Apache POI or docx4j for dealing with docx documents [closed]
    https://stackoverflow.com/questions/15013837/apache-poi-or-docx4j-for-dealing-with-docx-documents
    `
    在处理 Office 文件的时候,该选择 Apache POI 还是 docx4j ?

    简单来说,处理 docx 文档的时候,建议用 docx4j (其实也可以用来处理 xlsx/pptx 文档);
    Apache POI 一般用来处理 xls/xlsx 文档,但性能不高,好像还挺复杂的。

    另,在处理 Excel 文件的时候,可以考虑用alibaba开源的EasyExcel。EasyExcel是一个基于Java的简单、省内存的读写Excel的开源项目。在尽可能节约内存的情况下支持读写百M的Excel。
    `

  2. DrawingML – Image Properties – Image Data
    http://officeopenxml.com/drwPic-ImageData.php
    `
    The image data for a picture is specified with the element within the element (BLIP = binary large image or picture). Note the namespace used is xmlns:a=”http://schemas.openxmlformats.org/drawingml/2006/main”. The image data can be stored locally within the same file, or linked to a location outside of the file. The state of compression for the data is also specified. All of this information is contained within the attributes for .

    cstate: Specifies the compression state with which the picture is stored.

    embed: Specifies a relationship in the .rels part for the part that references the picture.
    link: Specifies a relationship in the .rels part for the part that references the picture. It is used to specify an image that does not reside within the package.
    `

    DocumentFormat.OpenXml.Drawing – Blip Class
    https://docs.microsoft.com/en-us/dotnet/api/documentformat.openxml.drawing.blip?view=openxml-2.8.1#remarks
    `
    cstate (Compression State): Specifies the compression state with which the picture is stored. This allows the application to specify the amount of compression that has been applied to a picture.

    简单来说就是,在 Office Open XML 里面, a:blip 元素用于指定图片文件的位置或来源,embed属性一般用来指定在本地的图片文件(实测发现指定外部图片也可以),link属性用来指定不在本地的图片文件。

    embed (Embedded Picture Reference): Specifies the identification information for an embedded picture. This attribute is used to specify an image that resides locally within the file.

    link (Linked Picture Reference): Specifies the identification information for a linked picture. This attribute is used to specify an image that does not reside within this file.
    `

发表评论

您的电子邮箱地址不会被公开。