用 docx4j 对文档添加/删除自定义属性


=Start=

缘由:

简单记录一下用 docx4j 对Office文档(docx/xlsx/pptx)添加删除自定义属性的方法,方便有需要的参考。

正文:

参考解答:

Maven的相关依赖

<!-- https://mvnrepository.com/artifact/org.docx4j/docx4j-JAXB-Internal -->
<!-- use the JAXB shipped in Java 8 -->
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-Internal</artifactId>
    <version>8.3.4</version>
</dependency>

<!-- use the JAXB Reference Implementation -->
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-ReferenceImpl</artifactId>
    <version>8.3.4</version>
</dependency>

<!-- use the MOXy JAXB implementation -->
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-MOXy</artifactId>
    <version>8.3.4</version>
</dependency>

<!-- https://mvnrepository.com/artifact/commons-io/commons-io -->
<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.11.0</version>
</dependency>

测试验证代码

package com.example;

import org.docx4j.XmlUtils;
import org.docx4j.docProps.custom.Properties;
import org.docx4j.jaxb.Context;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.exceptions.InvalidFormatException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.DocPropsCustomPart;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;

import java.io.File;
import java.util.List;

/**
 * @author ixyzero
 * Created on 2022-04-18
 */
public class docx4jCodeExample {
    public static void main(String[] args) {
        String filePath = "welcome.docx";
        String content = "welcome to docx4j!";
        docxCreate(filePath, content);
        docxReadProps(filePath);

        docxAddProps(filePath);
        docxReadProps(filePath);

        docxDeleteProp(filePath, "1");
        docxReadProps(filePath);

        docxDeleteProp(filePath, "1st custom property");
        docxReadProps(filePath);

        docxDeleteProp(filePath, "all_key");
        docxReadProps(filePath);
    }

    public static int docxCreate(String filePath, String contentString) {
        WordprocessingMLPackage wordMLPackage = null;
        try {
            wordMLPackage = WordprocessingMLPackage.createPackage();
        } catch (InvalidFormatException e) {
            e.printStackTrace();
            return 1;
        }
        MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();
        mainDocumentPart.addStyledParagraphOfText("Title", "Hello World!");
        mainDocumentPart.addParagraphOfText(contentString);

        File exportFile = new File(filePath);
        try {
            wordMLPackage.save(exportFile);
        } catch (Docx4JException e) {
            e.printStackTrace();
            return 1;
        }

        return 0;
    }

    public static int docxReadProps(String filePath) {
        WordprocessingMLPackage wordMLPackage = null;
        try {
            wordMLPackage = WordprocessingMLPackage.load(new File(filePath));
        } catch (Docx4JException e) {
            e.printStackTrace();
            return 1;
        }

        // 1. 查看核心属性-getDocPropsCorePart core properties
        org.docx4j.openpackaging.parts.DocPropsCorePart docPropsCorePart = wordMLPackage.getDocPropsCorePart();
        org.docx4j.docProps.core.CoreProperties coreProps = (org.docx4j.docProps.core.CoreProperties) docPropsCorePart.getJaxbElement();
        // Title of the document
        // Note: Word for Mac 2010 doesn't set title
        String title = "Missing";
        if (coreProps.getTitle() != null) {
            List<String> list = coreProps.getTitle().getValue().getContent();
            if (list.size() > 0) {
                title = list.get(0);
            }
        }
        System.out.println("'dc:title' is " + title);

        // 2. 查看扩展属性-getDocPropsExtendedPart extended properties
        org.docx4j.openpackaging.parts.DocPropsExtendedPart docPropsExtendedPart = wordMLPackage.getDocPropsExtendedPart();
        org.docx4j.docProps.extended.Properties extendedProps = (org.docx4j.docProps.extended.Properties) docPropsExtendedPart.getJaxbElement();
        // Document creator Application
        System.out.println("'Application' is " + extendedProps.getApplication() + " v." + extendedProps.getAppVersion());

        // 3. 查看自定义属性-getDocPropsCustomPart custom properties
        org.docx4j.openpackaging.parts.DocPropsCustomPart docPropsCustomPart = wordMLPackage.getDocPropsCustomPart();
        if (docPropsCustomPart == null) {
            System.out.println("[read]No Document Custom Properties. getDocPropsCustomPart returns null");
        } else {
            // traverse custom properties and print
            org.docx4j.docProps.custom.Properties customProps = (org.docx4j.docProps.custom.Properties) docPropsCustomPart.getJaxbElement();
            System.out.println(String.format("[read]size of Custom Properties = %d",customProps.getProperty().size()));
            for (org.docx4j.docProps.custom.Properties.Property prop : customProps.getProperty()) {
                // Could create a generic Object getValue() method.
                if (prop.getLpwstr() != null) {
                    System.out.println(prop.getName() + " = " + prop.getLpwstr());
                } else {
                    System.out.println(prop.getName() + ": \n " + XmlUtils.marshaltoString(prop, true, Context.jcDocPropsCustom));
                }
            }
            // System.out.println(customProps.getProperty()); // [org.docx4j.docProps.custom.Properties$Property@2da3b078, ...]
        }
        System.out.println();
        return 0;
    }

    // 测试起见,没有把新增属性key/value配置成参数变成动态可调的,不过修改起来很简单
    public static int docxAddProps(String filePath) {
        WordprocessingMLPackage wordMLPackage = null;
        try {
            wordMLPackage = WordprocessingMLPackage.load(new File(filePath));
        } catch (Docx4JException e) {
            e.printStackTrace();
            return 1;
        }

        // 自定义属性可能为空
        org.docx4j.openpackaging.parts.DocPropsCustomPart docPropsCustomPart = wordMLPackage.getDocPropsCustomPart();
        if (docPropsCustomPart == null) {
            System.out.println("[add]No Document Custom Properties. let's add some.");
            // 方法一(较为繁琐)
            // 1. create a DocPropsCustomPart and custom.Properties
            try {
                docPropsCustomPart = new DocPropsCustomPart();
                wordMLPackage.addTargetPart(docPropsCustomPart);
            } catch (InvalidFormatException e) {
                e.printStackTrace();
                return 1;
            }
            org.docx4j.docProps.custom.ObjectFactory objFactory = new org.docx4j.docProps.custom.ObjectFactory();
            org.docx4j.docProps.custom.Properties customProps = objFactory.createProperties();
            docPropsCustomPart.setJaxbElement(customProps);

            // 2. create a custom property.
            org.docx4j.docProps.custom.Properties.Property newProp = objFactory.createPropertiesProperty();
            newProp.setName("1st custom property");
            newProp.setFmtid(docPropsCustomPart.fmtidValLpwstr ); // Magic string
            newProp.setPid( customProps.getNextId() );
            newProp.setLpwstr("value_here");

            // 3. add it
            customProps.getProperty().add(newProp);

            // 方法二(简便快捷)
            // step all in one
            wordMLPackage.addDocPropsCustomPart();
            docPropsCustomPart = wordMLPackage.getDocPropsCustomPart();
            docPropsCustomPart.setProperty("key1", "value1");
        } else {
            org.docx4j.docProps.custom.Properties customProps = (org.docx4j.docProps.custom.Properties) docPropsCustomPart.getJaxbElement();

            // 1. createPropertiesProperty, create a custom property.
            org.docx4j.docProps.custom.ObjectFactory objFactory = new org.docx4j.docProps.custom.ObjectFactory();
            org.docx4j.docProps.custom.Properties.Property newProp = objFactory.createPropertiesProperty();
            newProp.setName("last custom prop by add");
            newProp.setFmtid(docPropsCustomPart.fmtidValLpwstr ); // Magic string
            newProp.setPid( customProps.getNextId() );
            newProp.setLpwstr("value_here2");
            // 2. add it
            customProps.getProperty().add(newProp);
        }
        
        // save changes
        try {
            wordMLPackage.save(new File(filePath));
        } catch (Docx4JException e) {
            e.printStackTrace();
            return 1;
        }

        return 0;
    }

    private static boolean isNumeric(String str){
        return str != null && str.matches("[0-9.]+");
    }
    private static boolean isDigit(String str){
        return str != null && str.matches("[0-9]+");
    }

    public static int docxDeleteProp(String filePath, String filterStr) {
        WordprocessingMLPackage wordMLPackage = null;
        try {
            wordMLPackage = WordprocessingMLPackage.load(new File(filePath));
        } catch (Docx4JException e) {
            e.printStackTrace();
            return 1;
        }

        // 自定义属性可能为空
        org.docx4j.openpackaging.parts.DocPropsCustomPart docPropsCustomPart = wordMLPackage.getDocPropsCustomPart();
        if (docPropsCustomPart == null) {
            System.out.println("[delete]No Document Custom Properties. Do nothing.");
            return 0;
        }

        org.docx4j.docProps.custom.Properties customProps = (org.docx4j.docProps.custom.Properties) docPropsCustomPart.getJaxbElement();
        List<Properties.Property> customPropList = customProps.getProperty();
        if (filterStr.equalsIgnoreCase("all_key")) {
            // delete all
            System.out.println("[delete]try to delete all Custom Properties.");
            customProps.getProperty().clear();
        } else if (isDigit(filterStr)) {
            // delete by index
            System.out.println("[delete]try to delete Custom Properties by index(start from 0).");
            int index = Integer.parseInt(filterStr);
            if (index >= customPropList.size()) {
                index = customPropList.size()-1;
            }
            customProps.getProperty().remove(index);
        } else {
            // delete by name
            org.docx4j.docProps.custom.Properties.Property tmpProp = new Properties.Property();
            for (org.docx4j.docProps.custom.Properties.Property prop : customProps.getProperty()) {
                if (prop.getName().equalsIgnoreCase(filterStr.toLowerCase())) {
                    System.out.println("[delete]try to delete by Custom Property's key->" + prop.getName());
                    tmpProp = prop;
                    // customProps.getProperty().remove(prop); // java.util.ConcurrentModificationException (modCount != expectedModCount)
                }
            }
            customProps.getProperty().remove(tmpProp);
        }

        // 必须要保存,否则更改不会生效
        // save changes before exit
        try {
            wordMLPackage.save(new File(filePath));
        } catch (Docx4JException e) {
            e.printStackTrace();
            return 1;
        }

        return 0;
    }
}
参考链接:

Java Examples for org.docx4j.docProps.custom.Properties.Property #读取docx文件的自定义属性,代码简单直接可用
https://www.javatips.net/api/org.docx4j.docprops.custom.properties.property
https://github.com/plutext/docx4j/blob/master/docx4j-samples-docx4j/src/main/java/org/docx4j/samples/DocProps.java

上面的链接里面对于没有自定义属性的docx文档进行添加自定义属性时会报错,可以参考这个链接来从头开始添加自定义属性
https://www.javatips.net/api/docx4j-master/src/scratchpad/org/docx4j/model/fields/AbstractFormattingSwitchTest.java

java docx4j 使用教程_docx4j深入学习整理 #对docx文件有一些简单介绍可以学习参考
https://blog.csdn.net/weixin_33374585/article/details/114719233

Introduction To Docx4J
https://www.baeldung.com/docx4j

How to remove custom properties in docx4j
https://stackoverflow.com/questions/13143542/how-to-remove-custom-properties-in-docx4j

=END=


《 “用 docx4j 对文档添加/删除自定义属性” 》 有 6 条评论

  1. Apache POI or docx4j for dealing with docx documents [closed]
    https://stackoverflow.com/questions/15013837/apache-poi-or-docx4j-for-dealing-with-docx-documents
    `
    在处理 Office 文件的时候,该选择 Apache POI 还是 docx4j ?

    简单来说,处理 docx 文档的时候,建议用 docx4j (其实也可以用来处理 xlsx/pptx 文档);
    Apache POI 一般用来处理 xls/xlsx 文档,但性能不高,好像还挺复杂的。

    另,在处理 Excel 文件的时候,可以考虑用alibaba开源的EasyExcel。EasyExcel是一个基于Java的简单、省内存的读写Excel的开源项目。在尽可能节约内存的情况下支持读写百M的Excel。
    `

  2. DrawingML – Image Properties – Image Data
    http://officeopenxml.com/drwPic-ImageData.php
    `
    The image data for a picture is specified with the element within the element (BLIP = binary large image or picture). Note the namespace used is xmlns:a=”http://schemas.openxmlformats.org/drawingml/2006/main”. The image data can be stored locally within the same file, or linked to a location outside of the file. The state of compression for the data is also specified. All of this information is contained within the attributes for .

    cstate: Specifies the compression state with which the picture is stored.

    embed: Specifies a relationship in the .rels part for the part that references the picture.
    link: Specifies a relationship in the .rels part for the part that references the picture. It is used to specify an image that does not reside within the package.
    `

    DocumentFormat.OpenXml.Drawing – Blip Class
    https://docs.microsoft.com/en-us/dotnet/api/documentformat.openxml.drawing.blip?view=openxml-2.8.1#remarks
    `
    cstate (Compression State): Specifies the compression state with which the picture is stored. This allows the application to specify the amount of compression that has been applied to a picture.

    简单来说就是,在 Office Open XML 里面, a:blip 元素用于指定图片文件的位置或来源,embed属性一般用来指定在本地的图片文件(实测发现指定外部图片也可以),link属性用来指定不在本地的图片文件。

    embed (Embedded Picture Reference): Specifies the identification information for an embedded picture. This attribute is used to specify an image that resides locally within the file.

    link (Linked Picture Reference): Specifies the identification information for a linked picture. This attribute is used to specify an image that does not reside within this file.
    `

  3. 卧底升职记——谍变
    https://mp.weixin.qq.com/s/tfhyaw1Rvx45GWVNpLtCeA
    `
    # 文档的版本控制
    1. 被保护单元格不支持非授权人员的修改
    2. 文档操作版本记录&恢复
    3. 标记删除而非彻底删除,删除后可由管理员操作恢复

    # 文档的权限控制
    4. 禁止非授权人员的下载/导出
    5. 禁止非授权人员的复制
    6. 仅企业/团队/指定内部人员可访问文档

    # 文档的“水印”
    7. 屏幕水印,可用于截图/拍照还原
    8. 离线文档的打开/编辑也受权限控制(不知道wps在这点上是不是和ooxml兼容的,如果是的话是不是也可以通过直接读取xml文件来拿原始内容?还是自己针对受控文件做了套供自己解析的私有格式?)
    9. 账号异常登录监控
    `

  4. SpringBoot 实现 PDF 添加水印,我有 5 种实现方案
    https://mp.weixin.qq.com/s/gphcp_L80OzOXxsPAijgig
    `
    PDF(Portable Document Format,便携式文档格式)是一种流行的文件格式,它可以在多个操作系统和应用程序中进行查看和打印。在某些情况下,我们需要对 PDF 文件添加水印,以使其更具有辨识度或者保护其版权。

    本文将介绍如何使用 Spring Boot 来实现 PDF 添加水印的方式。

    1. 使用 Apache PDFBox 库
    PDFBox 是一个流行的、免费的、用 Java 编写的库,它可以用来创建、修改和提取 PDF 内容。PDFBox 提供了许多 API,包括添加文本水印的功能。

    2. 使用 iText 库
    iText 是一款流行的 Java PDF 库,它可以用来创建、读取、修改和提取 PDF 内容。iText 提供了许多 API,包括添加文本水印的功能。

    3. 用 Ghostscript 命令行
    Ghostscript 是一款流行的、免费的、开源的 PDF 处理程序,它可以用来创建、读取、修改和提取 PDF 内容。Ghostscript 中提供了命令行参数来添加水印。

    4. Free Spire.PDF for Java
    Free Spire.PDF for Java 是一款免费的 Java PDF 库,它提供了一个简单易用的 API,用于创建、读取、修改和提取 PDF 内容。Free Spire.PDF for Java 也支持添加文本水印以及图片水印。

    5. Aspose.PDF for Java
    Aspose.PDF for Java 是一个强大的 PDF 处理库,提供了添加水印的功能。

    本文介绍了几种使用 Spring Boot 实现 PDF 添加水印的方式,分别是使用 Apache PDFBox 库、iText 库以及 Ghostscript 命令行等。选择哪种方式,可以根据项目需求和个人偏好来决定。

    无论采用哪种方式,都需要注意保护原始 PDF 文件,不要在不必要的情况下直接修改原始文件。
    `

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注