比较 Word 文档中的文本位置

发布于 2025-01-11 10:01:38 字数 1274 浏览 0 评论 0原文

我通过用我的值替换模板文档中的一些占位符文本来生成 Word 文档。为此,我使用 GemBox.Document,更具体地说,来自 查找和替换示例:

var document = DocumentModel.Load("input.docx");

var firstPlaceholder = document.Content.Find("%Text1%").First();
firstPlaceholder.LoadText("Value 1");

var secondPlaceholder = document.Content.Find("%Text2%").First();
firstPlaceholder.LoadText("Value 2");

document.Save("output.docx");

效果很好。

但现在我有一个场景,其中将替换占位符的值取决于它们的位置,更具体地说,占位符是出现在文档中某个特定段落之前还是之后。

我确实尝试使用类似的方法:

Paragraph separator = ...

string firstPlaceholderText = "%Text1%";
string separatorText = seperator.Content.ToString();
string wholeDocumentText = document.Content.ToString();

if (wholeDocumentText.IndexOf(firstPlaceholderText) < wholeDocumentText.IndexOf(separatorText))
{
    // The placeholder is before the separator...
}
else
{
    // The placeholder is after the separator...
}

但是,相同的 separatorText 值可能出现在文档中的多个位置,因此 string.IndexOf() 对我来说不是一个可行的解决方案。

是否有另一种方法可以进行这种比较,或者有另一种方法可以确定某些占位符与其他文档元素相比的位置?

I'm generating a Word document by replacing some placeholder text in the template document with my values. For this, I'm using GemBox.Document, more specifically, this code from Find and Replace example:

var document = DocumentModel.Load("input.docx");

var firstPlaceholder = document.Content.Find("%Text1%").First();
firstPlaceholder.LoadText("Value 1");

var secondPlaceholder = document.Content.Find("%Text2%").First();
firstPlaceholder.LoadText("Value 2");

document.Save("output.docx");

That works fine.

But now I have a scenario in which the values that will replace the placeholders depend on their location, more specifically, does the placeholder appear before or after some specific paragraph in the document.

I did try using something like this:

Paragraph separator = ...

string firstPlaceholderText = "%Text1%";
string separatorText = seperator.Content.ToString();
string wholeDocumentText = document.Content.ToString();

if (wholeDocumentText.IndexOf(firstPlaceholderText) < wholeDocumentText.IndexOf(separatorText))
{
    // The placeholder is before the separator...
}
else
{
    // The placeholder is after the separator...
}

However, that same separatorText value might occur in multiple places in the document so string.IndexOf() is not a viable solution for me.

Is there another way how I could make this comparison, or another way how I could determine the location of some placeholder compared to some other document element?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

单身情人 2025-01-18 10:01:38

试试这个:

static bool IsPositionBefore(ContentPosition position1, ContentPosition position2)
{
    var parentIndexes1 = GetParentIndexes(position1.Parent);
    var parentIndexes2 = GetParentIndexes(position2.Parent);

    int count = Math.Min(parentIndexes1.Count, parentIndexes2.Count);
    for (int i = 0; i < count; i++)
    {
        if (parentIndexes1[i] < parentIndexes2[i])
            return true;

        if (parentIndexes1[i] > parentIndexes2[i])
            return false;
    }

    // Both positions are inside the same parent element.
    var parent = position1.Parent;
    var parentClone = parent.Clone(true);

    string positionMarker1 = "\u0001";
    string positionMarker2 = "\u0002";
    position1.LoadText(positionMarker1);
    position2.LoadText(positionMarker2);

    string parentContent = parent.Content.ToString();
    int positionOffset1 = parentContent.IndexOf(positionMarker1, StringComparison.Ordinal);
    int positionOffset2 = parentContent.IndexOf(positionMarker2, StringComparison.Ordinal);

    parent.Content.Set(parentClone.Content);

    return positionOffset1 < positionOffset2;
}

static IList<int> GetParentIndexes(Element element)
{
    var parentIndexes = new List<int>();

    while (element.Parent != null)
    {
        parentIndexes.Add(element.ParentCollection.IndexOf(element));
        element = element.Parent;
    }

    parentIndexes.Reverse();

    return parentIndexes;
}

另外,这里是如何使用这个 IsPositionBefore 方法:

if (IsPositionBefore(firstPlaceholder.Start, separator.Content.Start))
{
    // The placeholder is before the separator...
}
else
{
    // The placeholder is after the separator...
}

棘手的部分是当两个位置都在同一元素内时如何确定哪个位置先出现。

这是因为 ContentPosition 目前没有某种偏移 API 可以告诉您它在元素内的确切位置。

因此,我正在做的就是暂时添加两个随机控制字符,检查哪一个出现在另一个之前,然后将其删除。

我认为这种方法是安全的,因为 Word 文档不能包含控制字符(Word 应用程序会将它们显示为已损坏),并且如果您尝试保存具有此类字符的 DocumentModel ,您将收到异常。

Try this:

static bool IsPositionBefore(ContentPosition position1, ContentPosition position2)
{
    var parentIndexes1 = GetParentIndexes(position1.Parent);
    var parentIndexes2 = GetParentIndexes(position2.Parent);

    int count = Math.Min(parentIndexes1.Count, parentIndexes2.Count);
    for (int i = 0; i < count; i++)
    {
        if (parentIndexes1[i] < parentIndexes2[i])
            return true;

        if (parentIndexes1[i] > parentIndexes2[i])
            return false;
    }

    // Both positions are inside the same parent element.
    var parent = position1.Parent;
    var parentClone = parent.Clone(true);

    string positionMarker1 = "\u0001";
    string positionMarker2 = "\u0002";
    position1.LoadText(positionMarker1);
    position2.LoadText(positionMarker2);

    string parentContent = parent.Content.ToString();
    int positionOffset1 = parentContent.IndexOf(positionMarker1, StringComparison.Ordinal);
    int positionOffset2 = parentContent.IndexOf(positionMarker2, StringComparison.Ordinal);

    parent.Content.Set(parentClone.Content);

    return positionOffset1 < positionOffset2;
}

static IList<int> GetParentIndexes(Element element)
{
    var parentIndexes = new List<int>();

    while (element.Parent != null)
    {
        parentIndexes.Add(element.ParentCollection.IndexOf(element));
        element = element.Parent;
    }

    parentIndexes.Reverse();

    return parentIndexes;
}

Also, here is how you can use this IsPositionBefore method:

if (IsPositionBefore(firstPlaceholder.Start, separator.Content.Start))
{
    // The placeholder is before the separator...
}
else
{
    // The placeholder is after the separator...
}

The tricky part is how to determine which position comes first when both positions are inside the same element.

That's because the ContentPosition currently doesn't have some kind of offset API that would tell you where exactly it's located inside the element.

So, what I'm doing is temporarily adding two random control characters, checking which one occurs before the other, and then removing them.

I think this approach is safe because Word documents cannot have control characters (Word applications will show them as corrupted) and if you try to save a DocumentModel that has such characters you will get an exception.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文