如何基于代表性记录和特定于属性的生成器函数随机模拟记录数组?
当给出一个记录的结构时,如何随机模拟相同结构的 n 记录?
示例
考虑我有一系列记录,例如:
[
{
"id": 12345,
"createdAt": "2021-12-25",
"data": {
"age": {"value": 25},
"height": {"value": 100},
"weight": {"value": 160},
"n_of_kids": {"value": 0},
"fam_status": {"value": "married"},
"preferred_pet": {"value": "dog"},
"preferred_color": {"value": "purple"},
"preferred_movie": {"value": "titanic"}
}
},
{...} // another record
]
我的任务:我想模拟与上面的结构相同结构的 n 记录。
注释。我特别想找到一个可以适用于任何给定结构的解决方案。因此,尽管我知道这里给出的结构是次优的(例如,冗余value
属性不会增加太多),但我仍然希望能够考虑任何可能的给定结构。
我可以解决这个问题的一种方法是创建一个对象,其 value 是指定每个值应该是什么。
const structureTemplateRegex = {
id: "^[0-9]{5}$", // 5-digit number
createdAt: /^\d{4}\-(0[1-9]|1[012])\-(0[1-9]|[12][0-9]|3[01])$/, // yyyy-mm-dd
data: {
age: { value: "/^(?:[0-9]|[1-9][0-9]|100)$/" }, // 0-100
height: { value: "/^(?:1[0-9]|[2-9][0-9]|1[0-9]{2}|2[01][0-9]|220)$/" }, // 10-220
weight: { value: '/^(?:3[0-9]|[4-9][0-9]|[12][0-9]{2}|300)$/' }, // 30-300
n_of_kids: { value: '/^(0|[1-9][0-9]?|7)$/' }, // 0-7
fam_status: { value: '/^(married|single|divorced|widowed)$/' },
preferred_pet: { value: '/^(dog|cat|hamster|fish|rabbit|zebra)$/' },
preferred_color: { value: '/^(red|green|yellow|black|orange|blue)$/' },
preferred_movie: {
value: '/^(titanic|alien|se7en|batman|goodfellas|argo)$/',
},
},
};
好吧,structuretemplateregex
可能适合验证,但不能生成数据。因此,解决问题的另一种方法是为记录中的每个属性编写一个生成器功能。
const generateId = (n = 5) => [...Array(n)].map(_=>Math.random()*10|0).join`` // https://stackoverflow.com/a/70598339/6105259
const generateDate = (start = new Date(2018, 8, 9), end = new Date(2021, 12, 15)) => new Date(start.getTime() + Math.random() * (end.getTime() - start.getTime())).toISOString().slice(0,10); // https://stackoverflow.com/a/39472913/6105259
const randomInteger = (min, max) => Math.floor(Math.random() * (max - min + 1)) + min; // https://stackoverflow.com/a/29246176/6105259
const randomElement = (arr) => arr[(Math.random() * arr.length) | 0] // https://stackoverflow.com/a/38448710/6105259
const generateFamStatus = () => randomElement(["married", "single", "divorced", "widowed"])
const generatePet = () => randomElement(["dog", "cat", "hamster", "fish", "rabbit", "zebra"])
const generateColor = () => randomElement(["red", "green", "yellow", "black", "orange", "blue"])
const generateMovie = () => randomElement(["titanic", "alien", "se7en", "batman", "goodfellas", "argo"])
// and then
const structureTemplateGenerators = {
id: generateId(), // 5-digit number
createdAt: generateDate(), // yyyy-mm-dd
data: {
age: { value: randomInteger(0, 101) }, // 0-100
height: { value: randomInteger(10, 221) }, // 10-220
weight: { value: randomInteger(30, 301) }, // 30-300
n_of_kids: { value: randomInteger(0, 8) }, // 0-7
fam_status: { value: generateFamStatus() },
preferred_pet: { value: generatePet() },
preferred_color: { value: generateColor() },
preferred_movie: {
value: generateMovie(),
},
},
};
但是我不太确定如何沿着这条道路进行。我有我需要的材料,但没有技术。从本质上讲,我想要的是调用作为参数采用的函数:(1)一个代表性记录的结构,以及(2)n
要模拟的记录数量。并且该功能将带有随机生成的记录的长度n
。
// pseudocode
generateRecords(structureTemplateGenerators, 5) // but `n` could potentially be 10 or 10000 or 3e7
// would return
const possibleOutput = [
{
id: 12045,
createdAt: '2021-02-21',
data: {
age: { value: 15 },
height: { value: 80 },
weight: { value: 100 },
n_of_kids: { value: 1 },
fam_status: { value: 'widowed' },
preferred_pet: { value: 'dog' },
preferred_color: { value: 'purple' },
preferred_movie: { value: 'se7en' },
},
},
{
id: 39847,
createdAt: '2020-12-02',
data: {
age: { value: 33 },
height: { value: 56 },
weight: { value: 210 },
n_of_kids: { value: 3 },
fam_status: { value: 'married' },
preferred_pet: { value: 'zebra' },
preferred_color: { value: 'blue' },
preferred_movie: { value: 'argo' },
},
},
{
id: 22435,
createdAt: '2018-10-10',
data: {
age: { value: 25 },
height: { value: 103 },
weight: { value: 165 },
n_of_kids: { value: 5 },
fam_status: { value: 'married' },
preferred_pet: { value: 'dog' },
preferred_color: { value: 'green' },
preferred_movie: { value: 'titanic' },
},
},
{
id: 61194,
createdAt: '2019-04-10',
data: {
age: { value: 20 },
height: { value: 90 },
weight: { value: 100 },
n_of_kids: { value: 3 },
fam_status: { value: 'divorced' },
preferred_pet: { value: 'hamster' },
preferred_color: { value: 'blue' },
preferred_movie: { value: 'batman' },
},
},
{
id: 22231,
createdAt: '2021-10-01',
data: {
age: { value: 77 },
height: { value: 160 },
weight: { value: 69 },
n_of_kids: { value: 1 },
fam_status: { value: 'divorced' },
preferred_pet: { value: 'dog' },
preferred_color: { value: 'red' },
preferred_movie: { value: 'titanic' },
},
},
];
更多上下文,
我有一个JavaScript应用程序,该应用程序接受了一系列记录,并对该数据进行了一系列聚合计算。最终,该应用返回JSON
类似输出。知道输入记录的结构,我想模拟,随机,一个包含 n 记录的数组。这样的过程将使我可以根据(1)计算可靠性和(2)速度测试我的应用程序(如 n 增加)。
When given a structure of one record, how can I randomly simulate n records of the same structure?
Example
Consider that I have an array of records such as:
[
{
"id": 12345,
"createdAt": "2021-12-25",
"data": {
"age": {"value": 25},
"height": {"value": 100},
"weight": {"value": 160},
"n_of_kids": {"value": 0},
"fam_status": {"value": "married"},
"preferred_pet": {"value": "dog"},
"preferred_color": {"value": "purple"},
"preferred_movie": {"value": "titanic"}
}
},
{...} // another record
]
My task: I want to simulate an array of n records of the same structure as the one above.
Note. I specifically want to find a solution that would work for any given structure. So while I'm aware that the structure given here is sub-optimal (e.g., the redundant value
property doesn't add much), I still want to be able to account for any possible given structure.
One way I can approach this is by creating an object whose values are regex that specify what each value should be.
const structureTemplateRegex = {
id: "^[0-9]{5}quot;, // 5-digit number
createdAt: /^\d{4}\-(0[1-9]|1[012])\-(0[1-9]|[12][0-9]|3[01])$/, // yyyy-mm-dd
data: {
age: { value: "/^(?:[0-9]|[1-9][0-9]|100)$/" }, // 0-100
height: { value: "/^(?:1[0-9]|[2-9][0-9]|1[0-9]{2}|2[01][0-9]|220)$/" }, // 10-220
weight: { value: '/^(?:3[0-9]|[4-9][0-9]|[12][0-9]{2}|300)$/' }, // 30-300
n_of_kids: { value: '/^(0|[1-9][0-9]?|7)$/' }, // 0-7
fam_status: { value: '/^(married|single|divorced|widowed)$/' },
preferred_pet: { value: '/^(dog|cat|hamster|fish|rabbit|zebra)$/' },
preferred_color: { value: '/^(red|green|yellow|black|orange|blue)$/' },
preferred_movie: {
value: '/^(titanic|alien|se7en|batman|goodfellas|argo)$/',
},
},
};
Well, structureTemplateRegex
might be good for validation, but not for generating data. So another way to approach the problem is to write a generator function for each property in the record.
const generateId = (n = 5) => [...Array(n)].map(_=>Math.random()*10|0).join`` // https://stackoverflow.com/a/70598339/6105259
const generateDate = (start = new Date(2018, 8, 9), end = new Date(2021, 12, 15)) => new Date(start.getTime() + Math.random() * (end.getTime() - start.getTime())).toISOString().slice(0,10); // https://stackoverflow.com/a/39472913/6105259
const randomInteger = (min, max) => Math.floor(Math.random() * (max - min + 1)) + min; // https://stackoverflow.com/a/29246176/6105259
const randomElement = (arr) => arr[(Math.random() * arr.length) | 0] // https://stackoverflow.com/a/38448710/6105259
const generateFamStatus = () => randomElement(["married", "single", "divorced", "widowed"])
const generatePet = () => randomElement(["dog", "cat", "hamster", "fish", "rabbit", "zebra"])
const generateColor = () => randomElement(["red", "green", "yellow", "black", "orange", "blue"])
const generateMovie = () => randomElement(["titanic", "alien", "se7en", "batman", "goodfellas", "argo"])
// and then
const structureTemplateGenerators = {
id: generateId(), // 5-digit number
createdAt: generateDate(), // yyyy-mm-dd
data: {
age: { value: randomInteger(0, 101) }, // 0-100
height: { value: randomInteger(10, 221) }, // 10-220
weight: { value: randomInteger(30, 301) }, // 30-300
n_of_kids: { value: randomInteger(0, 8) }, // 0-7
fam_status: { value: generateFamStatus() },
preferred_pet: { value: generatePet() },
preferred_color: { value: generateColor() },
preferred_movie: {
value: generateMovie(),
},
},
};
But I'm not really sure how to proceed down this path. I have the materials I need, but not the technique. Essentially what I want is to call a function that takes as parameters: (1) a structure of one representative record, and (2) n
number of records to simulate. And the function would return an array of length n
with randomly generated records.
// pseudocode
generateRecords(structureTemplateGenerators, 5) // but `n` could potentially be 10 or 10000 or 3e7
// would return
const possibleOutput = [
{
id: 12045,
createdAt: '2021-02-21',
data: {
age: { value: 15 },
height: { value: 80 },
weight: { value: 100 },
n_of_kids: { value: 1 },
fam_status: { value: 'widowed' },
preferred_pet: { value: 'dog' },
preferred_color: { value: 'purple' },
preferred_movie: { value: 'se7en' },
},
},
{
id: 39847,
createdAt: '2020-12-02',
data: {
age: { value: 33 },
height: { value: 56 },
weight: { value: 210 },
n_of_kids: { value: 3 },
fam_status: { value: 'married' },
preferred_pet: { value: 'zebra' },
preferred_color: { value: 'blue' },
preferred_movie: { value: 'argo' },
},
},
{
id: 22435,
createdAt: '2018-10-10',
data: {
age: { value: 25 },
height: { value: 103 },
weight: { value: 165 },
n_of_kids: { value: 5 },
fam_status: { value: 'married' },
preferred_pet: { value: 'dog' },
preferred_color: { value: 'green' },
preferred_movie: { value: 'titanic' },
},
},
{
id: 61194,
createdAt: '2019-04-10',
data: {
age: { value: 20 },
height: { value: 90 },
weight: { value: 100 },
n_of_kids: { value: 3 },
fam_status: { value: 'divorced' },
preferred_pet: { value: 'hamster' },
preferred_color: { value: 'blue' },
preferred_movie: { value: 'batman' },
},
},
{
id: 22231,
createdAt: '2021-10-01',
data: {
age: { value: 77 },
height: { value: 160 },
weight: { value: 69 },
n_of_kids: { value: 1 },
fam_status: { value: 'divorced' },
preferred_pet: { value: 'dog' },
preferred_color: { value: 'red' },
preferred_movie: { value: 'titanic' },
},
},
];
More Context
I have a JavaScript application that accepts an array of records and does a bunch of aggregative calculations over that data. Ultimately, the app returns a JSON
-like output. Knowing the structure of the input records, I want to simulate, randomly, an array containing n records. Such procedure would allow me to test my app both in terms of (1) calculations reliability and (2) speed (as n increases).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
structuretemplategenerators应该是返回一个新生成的数据结构的函数。现在,它只是创建一个单个结构,其中已经设置了值。
然后,您可以创建一个一定长度的数组,然后通过地图调用该功能,以获取新的随机项目。
structureTemplateGenerators should be a function that returns one new generated data structure. Right now, it just creates a single structure with the values set already.
Then you can just create an array of a certain length and call that function via map, to get new random items to add.