如何通过提供的属性名称列表从文本形式中解析基于键值的数据并将其写回JSON-Format?

发布于 2025-02-06 21:47:45 字数 997 浏览 2 评论 0原文

我正在尝试编写一个包含类似文本的匹配单词的JSON文件,

servicepoint ‏ 200135644 watchid ‏ 7038842

因此每次使用此代码仅一次将每个ServicePoint和WatchID插入到对象表中:

function readfile() {
  Tesseract.recognize('form.png', 'ara', {

    logger: m => console.log(m)

  }).then(({ data: { text } }) => {
    console.log(text); /* this line here */
    var obj = {
      table: []
    };
    const info = ['servicepoint', 'watchid'];
    for (k = 0; k < info.length; k++) {
      var result = text.match(new RegExp(info[k] + '\\s+(\\w+)'))[1];
      obj.table.push({
        servicepoint: /* Here i want to insert the number after servicepoint*/ ,
        watchid: /*also i want to insert the number after watchid to the Object table*/
      });
    }
    var json = JSON.stringify(obj); /* converting the object table to json file*/
    var fs = require('fs'); /* and then write json file contians the data*/
    fs.writeFile('myjsonfile.json', json, 'utf8', callback);
  })
};

I'm trying to write a JSON file that contains a matched words from a text like this

servicepoint ‏ 200135644 watchid ‏ 7038842

so each servicepoint and watchid will be inserted to the Object table only one time using this code:

function readfile() {
  Tesseract.recognize('form.png', 'ara', {

    logger: m => console.log(m)

  }).then(({ data: { text } }) => {
    console.log(text); /* this line here */
    var obj = {
      table: []
    };
    const info = ['servicepoint', 'watchid'];
    for (k = 0; k < info.length; k++) {
      var result = text.match(new RegExp(info[k] + '\\s+(\\w+)'))[1];
      obj.table.push({
        servicepoint: /* Here i want to insert the number after servicepoint*/ ,
        watchid: /*also i want to insert the number after watchid to the Object table*/
      });
    }
    var json = JSON.stringify(obj); /* converting the object table to json file*/
    var fs = require('fs'); /* and then write json file contians the data*/
    fs.writeFile('myjsonfile.json', json, 'utf8', callback);
  })
};

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

秋叶绚丽 2025-02-13 21:47:45

从我上述评论之一...

一个人甚至可以删除 u nicode标志,然后将上述模式转换回 /servicepoint \ w+(\ w+)/ 它非常接近OP的原始代码。 OP只需要将\\ s+交易为\\ W+

除了提出的模式更改外,我还将更改OP的 thenables 进入更明确的任务,例如将文件阅读分开,只是数据解析/提取 json-conversion/Writing

我还想指出,OP的所需data.table基于格式不可能是数组,而必须是纯键值结构(对象),因为一个人只能汇总一个对象(一个对象(一个一个对象)迭代属性名称时,或者仅将单个属性项目(在迭代属性名称迭代时,一个项目)时,请按时进入数组。 (尽管也尝试创建一个多输入对象,但也将其推动。)

下一个提供的代码显示了该方法。实现遵循OP的原始代码。它只是使用异步 - 瓦特语法并模拟/伪造文件阅读和写作过程。

async function readFile(fileName) {
  console.log({ fileName });

  // return await Tesseract.recognize(fileName, 'ara', {
  // 
  //   logger: m => console.log(m)
  // });

  // fake it ...
  return (await new Promise(resolve =>
    setTimeout(
      resolve,
      1500,
      { data: { text: 'servicepoint ‏ 200135644 watchid ‏ 7038842'} }
    )
  ));
}
/*async */function parseDataFromTextAndPropertyNames(text, propertyNames) {
  console.log({ text, propertyNames });

  return propertyNames
    .reduce((table, key) =>
      Object.assign(table, {

        [ key ]: RegExp(`${ key }\\W+(\\w+)`)
          .exec(text)?.[1] ?? ''

      }), {});
}
async function writeParsedTextDataAsJSON(fileName, table) {
  console.log({ table });

  // const fs = require('fs');
  // fs.writeFile(fileName, JSON.stringify({ table }), 'utf8', callback);

  // fake it ...
  return (await new Promise(resolve =>
    setTimeout(() => {

      console.log({ fileName, json: JSON.stringify({ table }) });
      resolve({ success: true });

    }, 1500)
  ));
}

console.log('... running ...');

(async () => {
  const { data: { text } } = await readFile('form.png');

  const data = /*await*/
    parseDataFromTextAndPropertyNames(text, ['servicepoint', 'watchid']);

  const result = await writeParsedTextDataAsJSON('myjsonfile.json', data);

  console.log({ result });
})();
.as-console-wrapper { min-height: 100%!important; top: 0; }

From one of my above comments ...

One could even drop the unicode flag and translate the above pattern back into /servicepoint\W+(\w+)/ which is pretty close to the OP's original code. The OP just needs to trade the \\s+ for an \\W+

In addition to the proposed pattern change I also would change the OP's thenables into more clear tasks like separating file-reading from just data parsing/extraction from json-conversion/writing.

I also want to state that the OP's desired data.table based format can not be an array but has to be a pure key-value structure (object) since one can just either aggregate a single object (one entry at time while iterating the property names) or push single property only items (one item at time while iterating the property names) into an array. (The OP tries to create a multi-entry object though also pushing it.)

The next provided code shows the approach. The implementation follows the OP's original code. It just uses async-await syntax and mocks/fakes the file reading and writing processes.

async function readFile(fileName) {
  console.log({ fileName });

  // return await Tesseract.recognize(fileName, 'ara', {
  // 
  //   logger: m => console.log(m)
  // });

  // fake it ...
  return (await new Promise(resolve =>
    setTimeout(
      resolve,
      1500,
      { data: { text: 'servicepoint ‏ 200135644 watchid ‏ 7038842'} }
    )
  ));
}
/*async */function parseDataFromTextAndPropertyNames(text, propertyNames) {
  console.log({ text, propertyNames });

  return propertyNames
    .reduce((table, key) =>
      Object.assign(table, {

        [ key ]: RegExp(`${ key }\\W+(\\w+)`)
          .exec(text)?.[1] ?? ''

      }), {});
}
async function writeParsedTextDataAsJSON(fileName, table) {
  console.log({ table });

  // const fs = require('fs');
  // fs.writeFile(fileName, JSON.stringify({ table }), 'utf8', callback);

  // fake it ...
  return (await new Promise(resolve =>
    setTimeout(() => {

      console.log({ fileName, json: JSON.stringify({ table }) });
      resolve({ success: true });

    }, 1500)
  ));
}

console.log('... running ...');

(async () => {
  const { data: { text } } = await readFile('form.png');

  const data = /*await*/
    parseDataFromTextAndPropertyNames(text, ['servicepoint', 'watchid']);

  const result = await writeParsedTextDataAsJSON('myjsonfile.json', data);

  console.log({ result });
})();
.as-console-wrapper { min-height: 100%!important; top: 0; }

九公里浅绿 2025-02-13 21:47:45

您可以使用String.Match获取两个键所需的变量值:“ ServicePoint”和“ WatchID”。

我建议使用此匹配模式以获取您的两个数据点。

然后,您需要创建然后将JSON串起,为您提供类似的东西:{servicepoint:1323,watchID:234}

我假设您有很多这样的行,因此您需要添加每个json键值到一个数组。然后,您可以json.stringify(dataarray)生成有效的JSON文本以写入文件。

You can use String.match to get the required variable values for your two keys: "servicepoint" and "watchid".

I would suggest using this match pattern to get your two data points.

Then you'll need to create and then stringify the JSON, giving you something like: {servicepoint: 1323, watchid: 234}

I assume you have many of these rows, so you'll want to add each JSON key-value to an array. You can then JSON.stringify(dataArray) to generate the valid JSON text to write to a file.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文