指标及性能

发布于 2024-12-22 06:55:17 字数 2195 浏览 2 评论 0原文

我是 Geotools 新手,面临这个问题:我在 PostGis 中注入大约 2MB 的 shapefile 信息(大约 5800 个条目),令人惊讶的是,它大约需要 6 分钟才能完成!相当烦人,因为我的“真实”数据集按 shapefile 组(shp、dbf...)计算可能高达 25MB,需要 100 个组。

有人告诉我这可能是索引问题,因为 Postgre 在每次插入时都会更新表的索引。有没有办法在我的批量插入期间“禁用”这些索引并告诉数据库最后创建所有索引?或者有更好的方法吗?

这是我的代码片段:

Map<String, Object> shpparams = new HashMap<String, Object>();
shpparams.put("url", "file://" + path);
FileDataStore shpStore = (FileDataStore) shpFactory.createDataStore(shpparams);
SimpleFeatureCollection features = shpStore.getFeatureSource().getFeatures();
if (schema == null) {
    // Copy schema and change name in order to refer to the same
    // global schema for all files
    SimpleFeatureType originalSchema = shpStore.getSchema();
    Name originalName = originalSchema.getName();
    NameImpl theName = new NameImpl(originalName.getNamespaceURI(), originalName.getSeparator(), POSTGIS_TABLENAME);
    schema = factory.createSimpleFeatureType(theName, originalSchema.getAttributeDescriptors(), originalSchema.getGeometryDescriptor(),
            originalSchema.isAbstract(), originalSchema.getRestrictions(), originalSchema.getSuper(), originalSchema.getDescription());
    pgStore.createSchema(schema);
}
// String typeName = shpStore.getTypeNames()[0];
SimpleFeatureStore featureStore = (SimpleFeatureStore) pgStore.getFeatureSource(POSTGIS_TABLENAME);

// Ajout des objets du shapefile dans la table PostGIS
DefaultTransaction transaction = new DefaultTransaction("create");
featureStore.setTransaction(transaction);
try {
    featureStore.addFeatures(features);
    transaction.commit();
} catch (Exception problem) {
    LOGGER.error(problem.getMessage(), problem);
    transaction.rollback();
} finally {
    transaction.close();
}
shpStore.dispose();

谢谢您的帮助!


所以我测试了你的解决方案,但没有什么比这对我更有帮助了......完成时间仍然相同。这是我的表定义:

  • fid serial 10
  • the_geom Geometry 2147483647
  • xxx varchar 10
  • xxx int4 10
  • xxx varchar 3
  • xxx varchar 2
  • xxx float8 17
  • xxx float8 17
  • xxx float8 17

所以我不认为问题直接链接到我的代码或数据库,可能是由于系统限制(RAM、缓冲区...)。我将在接下来的几天内看看这个。

您还有更多想法吗?

I'm new to Geotools and facing this issue : I'm injecting in PostGis about 2MB of shapefile info (about 5800 entries) and surprisingly it takes more or less 6 minutes to complete! Quite annoying because my "real" data set might be up to 25MB by shapefile group (shp, dbf...), 100 groups needed.

I was told that it might be an index issue, because Postgre updates tables' indexes on each INSERT. Is there a way to "disable" these indexes during my mass INSERTs and tell the database to create all indexes on the end? Or is there a better way to do that?

Here is my code snippet :

Map<String, Object> shpparams = new HashMap<String, Object>();
shpparams.put("url", "file://" + path);
FileDataStore shpStore = (FileDataStore) shpFactory.createDataStore(shpparams);
SimpleFeatureCollection features = shpStore.getFeatureSource().getFeatures();
if (schema == null) {
    // Copy schema and change name in order to refer to the same
    // global schema for all files
    SimpleFeatureType originalSchema = shpStore.getSchema();
    Name originalName = originalSchema.getName();
    NameImpl theName = new NameImpl(originalName.getNamespaceURI(), originalName.getSeparator(), POSTGIS_TABLENAME);
    schema = factory.createSimpleFeatureType(theName, originalSchema.getAttributeDescriptors(), originalSchema.getGeometryDescriptor(),
            originalSchema.isAbstract(), originalSchema.getRestrictions(), originalSchema.getSuper(), originalSchema.getDescription());
    pgStore.createSchema(schema);
}
// String typeName = shpStore.getTypeNames()[0];
SimpleFeatureStore featureStore = (SimpleFeatureStore) pgStore.getFeatureSource(POSTGIS_TABLENAME);

// Ajout des objets du shapefile dans la table PostGIS
DefaultTransaction transaction = new DefaultTransaction("create");
featureStore.setTransaction(transaction);
try {
    featureStore.addFeatures(features);
    transaction.commit();
} catch (Exception problem) {
    LOGGER.error(problem.getMessage(), problem);
    transaction.rollback();
} finally {
    transaction.close();
}
shpStore.dispose();

Thank you for your help!


So I tested your solutions but nothing helped me more... The completion time is still the same. Here is my table definition :

  • fid serial 10
  • the_geom geometry 2147483647
  • xxx varchar 10
  • xxx int4 10
  • xxx varchar 3
  • xxx varchar 2
  • xxx float8 17
  • xxx float8 17
  • xxx float8 17

So I do not think that the problem is directly linked to my code or the database, maybe it is due to system limitations (RAM, buffers...). I will have a look at this in the next few days.

Do you have more ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

我偏爱纯白色 2024-12-29 06:55:17

我带着这个问题的解决方案回来了。经过多次调查,我发现物理网络是问题所在:使用本地数据库(geotools 应用程序本地)没有问题。网络为每个 INSERT 语句请求添加了 200 或 300 毫秒。随着大量数据注入数据库,响应时间变得很长!

所以原始的 Postgis 配置或我的代码片段没有问题...

感谢大家的参与。

I'm back with the solution for this problem. After many investigations, I found that the physical network was the issue : with a local DB (local to geotools app) there were no problem. The network added 200 or 300 millisec to each INSERT statement request. With the large amount of data injected in DB came the very long response time!

So no problem with the orignal Postgis config or my code snippet...

Thank you all for your participation.

毁我热情 2024-12-29 06:55:17

您可以通过以下步骤检查数据库中的索引或 PK/FK 约束是否确实是瓶颈:

1) 确保数据插入到单个事务中(禁用自动提交)

2) 删除所有索引并在执行后重新创建它们数据导入(您不能禁用索引)

DROP INDEX my_index;
CREATE INDEX my_index ON my_table (my_column);

3) 删除或禁用 PK/FK 约束,并在数据导入后重新创建或重新启用它们。您可以在数据导入期间跳过 PK/FK 约束的检查,而不必使用以下方法删除它们。

ALTER TABLE my_table DISABLE trigger ALL;
-- data import
ALTER TABLE my_table ENABLE trigger ALL;

此方法的缺点是,在检查时插入/更新的数据不会检查 PK/FK 约束被禁用。当然,当您在数据导入后重新创建现有数据时,也会对现有数据强制执行 PK/FK 约束。

您还可以将 PK/FK 约束的检查推迟到事务结束时。当且仅当 PK/FK 约束被定义为可延迟(不是默认值):

ALTER TABLE my_table ADD PRIMARY KEY (id) DEFERRABLE INITIALLY DEFERRED;

START TRANSACTION;
-- data import
COMMIT; -- constraints are checked here

ALTER TABLE my_table ADD PRIMARY KEY (id) DEFERRABLE INITIALLY IMMEDIATE;

START TRANSACTION;
SET CONSTRAINTS ALL DEFERRED;
-- data import
COMMIT; -- constraints are checked here

编辑:

要缩小问题原因的范围时,您可以导入数据与您的应用程序,创建数据库转储(使用插入语句)并再次导入该数据库转储。这应该可以让您了解简单导入需要多长时间以及应用程序的开销是多少。

使用 INSERT 语句创建数据库的纯数据转储(COPY 语句会更快,但您的应用程序也使用插入,因此这更适合比较):

pg_dump <数据库>; --data-only --column-inserts -f data.sql

再次创建空数据库架构并导入数据(使用基本计时):

date; psql <数据库> --single-transaction -f data.sql > > /dev/null; date

也许你可以通过这个更深入地了解问题。

You can check if indexes or PK/FK constraints in the database are really the bottleneck with the following steps:

1) Make sure the data is inserted in a single transaction (disable autocommit)

2) Drop all indexes and re-create them after the data import (you cannot disable an index)

DROP INDEX my_index;
CREATE INDEX my_index ON my_table (my_column);

3) Drop or disable PK/FK constraints and re-create or re-enable them after the data import. You can skip the check of PK/FK constraints during data import without dropping them with

ALTER TABLE my_table DISABLE trigger ALL;
-- data import
ALTER TABLE my_table ENABLE trigger ALL;

The downside of this approach is that the PK/FK constraints are not checked for the data that was inserted/updated while the check was disabled. Of course the PK/FK constraints are enforced also for existing data when you re-create them after the data import.

You can also defer the check of PK/FK constraints to the end of a transaction. This is possible if and only if the PK/FK constraint is defined as deferrable (not the default):

ALTER TABLE my_table ADD PRIMARY KEY (id) DEFERRABLE INITIALLY DEFERRED;

START TRANSACTION;
-- data import
COMMIT; -- constraints are checked here

or

ALTER TABLE my_table ADD PRIMARY KEY (id) DEFERRABLE INITIALLY IMMEDIATE;

START TRANSACTION;
SET CONSTRAINTS ALL DEFERRED;
-- data import
COMMIT; -- constraints are checked here

EDIT:

To narrow down the cause of the problem you can import the data with your application, make a database dump (with insert statements) and import that database dump again. This should give you an idea of how long the plain import takes and what's the overhead of the application.

Create a data-only dump of the database with INSERT statements (COPY statements would be faster, but your application also uses inserts so this is better for comparison):

pg_dump <database> --data-only --column-inserts -f data.sql

Create the empty database schema again and import the data (with basic timing):

date; psql <database> --single-transaction -f data.sql > /dev/null; date

Maybe you can get a little more insight into the problem with this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文