在TensorFlow中显示来自Summarization数据集的数据(使用TensorFlow数据集)

发布于 2025-02-03 09:09:39 字数 3393 浏览 3 评论 0原文

我是机器学习的新手,也是使用Python的Tensorflow模块的新手。

我目前正在处理摘要,Tensorflow中的数据集库具有许多方便的数据集可用于培训摘要器。但是,我想在选项特别选择之前先看一下他们的内容,有人知道如何将数据集显示为Python控制台中的表格吗?

到目前为止,我从TensorFlow网站上拥有示例代码(对于Opinosis数据集),如下:

# Copyright 2022 The TensorFlow Datasets Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Opinosis Opinion Dataset."""

import os

import tensorflow as tf
import tensorflow_datasets.public_api as tfds

_CITATION = """
@inproceedings{ganesan2010opinosis,
  title={Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions},
  author={Ganesan, Kavita and Zhai, ChengXiang and Han, Jiawei},
  booktitle={Proceedings of the 23rd International Conference on Computational Linguistics},
  pages={340--348},
  year={2010},
  organization={Association for Computational Linguistics}
}
"""

_DESCRIPTION = """
The Opinosis Opinion Dataset consists of sentences extracted from reviews for 51 topics.
Topics and opinions are obtained from Tripadvisor, Edmunds.com and Amazon.com.
"""

_URL = "https://github.com/kavgan/opinosis-summarization/raw/master/OpinosisDataset1.0_0.zip"

_REVIEW_SENTS = "review_sents"
_SUMMARIES = "summaries"


class Opinosis(tfds.core.GeneratorBasedBuilder):
  """Opinosis Opinion Dataset."""

  VERSION = tfds.core.Version("1.0.0")

  def _info(self):
    return tfds.core.DatasetInfo(
        builder=self,
        description=_DESCRIPTION,
        features=tfds.features.FeaturesDict({
            _REVIEW_SENTS: tfds.features.Text(),
            _SUMMARIES: tfds.features.Sequence(tfds.features.Text())
        }),
        supervised_keys=(_REVIEW_SENTS, _SUMMARIES),
        homepage="http://kavita-ganesan.com/opinosis/",
        citation=_CITATION,
    )

  def _split_generators(self, dl_manager):
    """Returns SplitGenerators."""
    extract_path = dl_manager.download_and_extract(_URL)
    return [
        tfds.core.SplitGenerator(
            name=tfds.Split.TRAIN,
            gen_kwargs={"path": extract_path},
        ),
    ]

  def _generate_examples(self, path=None):
    """Yields examples."""
    topics_path = os.path.join(path, "topics")
    filenames = tf.io.gfile.listdir(topics_path)
    for filename in filenames:
      file_path = os.path.join(topics_path, filename)
      topic_name = filename.split(".txt")[0]
      with tf.io.gfile.GFile(file_path, "rb") as src_f:
        input_data = src_f.read()
      summaries_path = os.path.join(path, "summaries-gold", topic_name)
      summary_lst = []
      for summ_filename in sorted(tf.io.gfile.listdir(summaries_path)):
        file_path = os.path.join(summaries_path, summ_filename)
        with tf.io.gfile.GFile(file_path, "rb") as tgt_f:
          data = tgt_f.read().strip()
          summary_lst.append(data)
      summary_data = summary_lst
      yield filename, {_REVIEW_SENTS: input_data, _SUMMARIES: summary_data}```

I'm new to Machine Learning and a newbie when it comes to utilizing the TensorFlow Module in Python.

I'm currently working with summarization and the dataset library in TensorFlow has many convenient datasets available for training the summarizers. However, I wanted to take a look at their contents before chosing one in particular, does anyone know how to display the dataset as a Table in the Python console?

So far, I have the example code (for the Opinosis dataset) from the TensorFlow website, which is the following:

# Copyright 2022 The TensorFlow Datasets Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Opinosis Opinion Dataset."""

import os

import tensorflow as tf
import tensorflow_datasets.public_api as tfds

_CITATION = """
@inproceedings{ganesan2010opinosis,
  title={Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions},
  author={Ganesan, Kavita and Zhai, ChengXiang and Han, Jiawei},
  booktitle={Proceedings of the 23rd International Conference on Computational Linguistics},
  pages={340--348},
  year={2010},
  organization={Association for Computational Linguistics}
}
"""

_DESCRIPTION = """
The Opinosis Opinion Dataset consists of sentences extracted from reviews for 51 topics.
Topics and opinions are obtained from Tripadvisor, Edmunds.com and Amazon.com.
"""

_URL = "https://github.com/kavgan/opinosis-summarization/raw/master/OpinosisDataset1.0_0.zip"

_REVIEW_SENTS = "review_sents"
_SUMMARIES = "summaries"


class Opinosis(tfds.core.GeneratorBasedBuilder):
  """Opinosis Opinion Dataset."""

  VERSION = tfds.core.Version("1.0.0")

  def _info(self):
    return tfds.core.DatasetInfo(
        builder=self,
        description=_DESCRIPTION,
        features=tfds.features.FeaturesDict({
            _REVIEW_SENTS: tfds.features.Text(),
            _SUMMARIES: tfds.features.Sequence(tfds.features.Text())
        }),
        supervised_keys=(_REVIEW_SENTS, _SUMMARIES),
        homepage="http://kavita-ganesan.com/opinosis/",
        citation=_CITATION,
    )

  def _split_generators(self, dl_manager):
    """Returns SplitGenerators."""
    extract_path = dl_manager.download_and_extract(_URL)
    return [
        tfds.core.SplitGenerator(
            name=tfds.Split.TRAIN,
            gen_kwargs={"path": extract_path},
        ),
    ]

  def _generate_examples(self, path=None):
    """Yields examples."""
    topics_path = os.path.join(path, "topics")
    filenames = tf.io.gfile.listdir(topics_path)
    for filename in filenames:
      file_path = os.path.join(topics_path, filename)
      topic_name = filename.split(".txt")[0]
      with tf.io.gfile.GFile(file_path, "rb") as src_f:
        input_data = src_f.read()
      summaries_path = os.path.join(path, "summaries-gold", topic_name)
      summary_lst = []
      for summ_filename in sorted(tf.io.gfile.listdir(summaries_path)):
        file_path = os.path.join(summaries_path, summ_filename)
        with tf.io.gfile.GFile(file_path, "rb") as tgt_f:
          data = tgt_f.read().strip()
          summary_lst.append(data)
      summary_data = summary_lst
      yield filename, {_REVIEW_SENTS: input_data, _SUMMARIES: summary_data}```

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

2025-02-10 09:09:40

这是Opinosis数据集的源代码。您无需将其复制到您的代码。 this 应该给您一个很好的了解如何使用TensorFlow数据集。 Opinosis没有多大意义作为桌子,因此要了解内容,我只会打印一些示例。例如:

import tensorflow_datasets as tfds
ds, info = tfds.load('opinosis', split='train', with_info=True)
ds_iter = iter(ds)
for i in range(3): 
  print(next(ds_iter))

如果您真的想看桌子,则可以使用:

print(tfds.as_dataframe(ds.take(3), info))

That is the source code for the Opinosis dataset. You don't need to copy it over to your code. This should give you a good idea of how to use tensorflow datasets. Opinosis doesn't make much sense displayed as a table, so to get an idea of the contents I would just print a few examples. E.g:

import tensorflow_datasets as tfds
ds, info = tfds.load('opinosis', split='train', with_info=True)
ds_iter = iter(ds)
for i in range(3): 
  print(next(ds_iter))

If you really want to see a table, you can use:

print(tfds.as_dataframe(ds.take(3), info))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文