使用Python从RDL中抓取DataSet和查询数据

发布于 2024-09-09 08:32:16 字数 1254 浏览 12 评论 0原文

我今天打算使用 Python 解析 SSRS RDL 文件 (XML),以收集数据集和查询数据。最近的一个项目让我回溯了各种报告和数据源,目的是整合和清理我们发布的内容。

我能够使用此脚本创建包含以下列的 CSV 文件: 系统路径|报告文件名|命令类型|命令文本|

它不是很优雅,但是很有效。

我希望能够通过这篇文章来请求那些已经尝试过此操作或在使用 Python 进行 XML 解析方面有经验的专家尝试清理它,并提供以下能力:

  • 包含标头,这将是 XML 标记
  • 在列中包含数据集名称
  • 将结果交付到单个文件

以下是我的“rdlparser.py”文件中的完整代码:

import sys, os

from xml.dom import minidom
xmldoc = minidom.parse(sys.argv[1])

content = ""
TargetFile = sys.argv[1].split(".", 1)[0] + ".csv"
numberOfQueryNodes = 0

queryNodes = xmldoc.getElementsByTagName('Query')
numberOfQueryNodes = queryNodes.length -1


while (numberOfQueryNodes > -1):
    content = content + os.path.abspath(sys.argv[1])+ '|'+ sys.argv[1].split(".", 1)[0]+ '|' 
    outputNode = queryNodes.__getitem__(numberOfQueryNodes)
    children = [child for child in outputNode.childNodes if child.nodeType==1]
    numberOfQueryNodes = numberOfQueryNodes - 1
    for node in children:
        if node.firstChild.nodeValue != '\n          ':
            if node.firstChild.nodeValue != 'true':
                content = content + node.firstChild.nodeValue + '|'
    content = content + '\n'

fp = open(TargetFile, 'wb')
fp.write(content)
fp.close()

I set out today with the intent to parse an SSRS RDL file (XML) using Python in order to gather the DataSet and Query data. A recent project has me back tracking on a variety of reports and data sources with the intention of consolidating and cleaning up what we have published.

I was able to use this script to create a CSV file with the following columns:
system path|report file name|command type|command text|

It's not very elegant, but it works.

What I'm hoping to be able to do with this post is solicit for any of you experts out there who have either tried this already or are experienced in XML parsing with Python to take a shot at cleaning it up and provided the ability to:

  • Include Headers, which would be XML tags
  • Include DataSet name in the column
  • Deliver results into single file

Here is the full code in my "rdlparser.py" file:

import sys, os

from xml.dom import minidom
xmldoc = minidom.parse(sys.argv[1])

content = ""
TargetFile = sys.argv[1].split(".", 1)[0] + ".csv"
numberOfQueryNodes = 0

queryNodes = xmldoc.getElementsByTagName('Query')
numberOfQueryNodes = queryNodes.length -1


while (numberOfQueryNodes > -1):
    content = content + os.path.abspath(sys.argv[1])+ '|'+ sys.argv[1].split(".", 1)[0]+ '|' 
    outputNode = queryNodes.__getitem__(numberOfQueryNodes)
    children = [child for child in outputNode.childNodes if child.nodeType==1]
    numberOfQueryNodes = numberOfQueryNodes - 1
    for node in children:
        if node.firstChild.nodeValue != '\n          ':
            if node.firstChild.nodeValue != 'true':
                content = content + node.firstChild.nodeValue + '|'
    content = content + '\n'

fp = open(TargetFile, 'wb')
fp.write(content)
fp.close()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

飘落散花 2024-09-16 08:32:16

我知道你要求使用 Python;但我认为 Powershell 内置的 xml 处理功能将使这变得相当简单。虽然我确信它不是大师级别的,但我认为它的结果相当不错(以 # 开头的行是注释):

# The directory to search 
$searchpath = "C:\"

# List all rdl files    from the given search path recusrivley searching sub folders, store results into a variable
$files = gci $searchpath -recurse -filter "*.rdl" | SELECT FullName, DirectoryName, Name 

# for each of the found files pass the folder and file name  and the xml content
$files | % {$Directory = $_.DirectoryName; $Name = $_.Name; [xml](gc $_.FullName)}
            # in the xml content navigate to the the DataSets Element
            | % {$_.Report.DataSets} 
                    # for each query retrieve the Report directory , File Name, DataSource Name, Command Type, Command Text output thwese to a csv file
                    | % {$_.DataSet.Query} | SELECT  @{N="Path";E={$Directory}}, @{N="File";E={$Name}}, DataSourceName, CommandType, CommandText | Export-Csv Test.csv -notype   

I know you asked for Python; but I figured Powershell's built in xml handling capabilities would make this fairly simple. While I'm sure it is not guru level, I think it came out pretty nicely (the lines starting with # are comments):

# The directory to search 
$searchpath = "C:\"

# List all rdl files    from the given search path recusrivley searching sub folders, store results into a variable
$files = gci $searchpath -recurse -filter "*.rdl" | SELECT FullName, DirectoryName, Name 

# for each of the found files pass the folder and file name  and the xml content
$files | % {$Directory = $_.DirectoryName; $Name = $_.Name; [xml](gc $_.FullName)}
            # in the xml content navigate to the the DataSets Element
            | % {$_.Report.DataSets} 
                    # for each query retrieve the Report directory , File Name, DataSource Name, Command Type, Command Text output thwese to a csv file
                    | % {$_.DataSet.Query} | SELECT  @{N="Path";E={$Directory}}, @{N="File";E={$Name}}, DataSourceName, CommandType, CommandText | Export-Csv Test.csv -notype   
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文