获取表中所有记录的样本

发布于 2024-12-21 21:32:36 字数 411 浏览 2 评论 0原文

我有一个表,其中有几百万条记录。我想要进行采样,返回表中的每条记录。我当前的解决方案如下所示:

myQuery.Where((rec, index) => index % interval == 0);

但是 Linq to Entities 不支持此操作并引发异常。我也尝试过这个:

myQuery.Select((rec, index) => new { Index = index, Record = rec })
       .Where(x => x.Index % interval == 0);

但是 Linq to Entities 也不支持这一点。

即使确实如此,它看起来也不是很优雅。还有另一种方法可以做到这一点吗?

I have a table with a few million records in it. I want to get a sampling, where I return every nth record in the table. My current solution looks like this:

myQuery.Where((rec, index) => index % interval == 0);

However Linq to Entities does not support this and throws an exception. I also tried this:

myQuery.Select((rec, index) => new { Index = index, Record = rec })
       .Where(x => x.Index % interval == 0);

But Linq to Entities does not support that either.

Even if it did work that way, it doesn't seem very elegant. Is there another way of doing this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

锦欢 2024-12-28 21:32:36

这里有一个类似的 Linq to SQL 讨论 LINQ to SQL Every Nth Row From Table,看看这个。

我认为 SQL 确实是一条出路。在 EF 中,您可以将 SQL 放入 DefiningQuery

There is a similar Linq to SQL discussion going on here LINQ to SQL Every Nth Row From Table, check this out.

I think SQL is really the way to go. In EF you can put your SQL in a DefiningQuery

许仙没带伞 2024-12-28 21:32:36

您似乎没有定义 n 作为表索引的顺序。由于表中的数据是无序的,如果不指定顺序,记录号是没有意义的,因此第n条记录是未定义的。这是否意味着您只想要某个随机百分比的记录?如果是这样,那么随机抽取 1% 的数据样本的示例将如下所示:

Customer[] onePercentSample = db.Customers.Take(db.Customer.Count() / 100).ToArray();

这实际上并不能保证是随机样本。它只是从数据库中的所有记录中返回一个未定义的样本,该样本可能是也可能不是多个查询中的同一组。同样,这是由于表是无序记录集的性质造成的。

It doesn't seem like you've defined an order by which n is the index of the table. Since data in a table in unordered, without specifying an order the record number is meaningless, and so the nth record is undefined. Does that mean you just want a certain random percent of the records? If so, an example of taking you a random one percent sample of the data would look like so:

Customer[] onePercentSample = db.Customers.Take(db.Customer.Count() / 100).ToArray();

This is not actually guaranteed to be a random sample. It simply returns an undefined sample out of all the records in the database, which could or could not be the same set across several queries. Again, this is due to the nature of a table being an unordered record set.

岁月如刀 2024-12-28 21:32:36

尝试按索引%间隔排序。如果有效,您可以将第一个间隔/总记录作为样本。

Try ordering by index%interval. If that works, you can take the first interval/total records as your sample.

半寸时光 2024-12-28 21:32:36

直接 SQL 方式可能是我的方式,但如果您有顺序 Id,则可以利用 Min 和 Max:

注意我使用 Linqpad 测试了这一点,它使用 linq to sql - 但我认为结果在 EF 中是相同的。

var a = ReceivedPayments.Select(c=>c.Id);
int interval = 50;
var ids = new List<int>();
int min = a.Min();
int max = a.Max();
for (int i = min; i < max; i++)
{
    if((i % interval) == 0)
    {
        ids.Add(i);
    } 
}

var b = ReceivedPayments.Where(c=>ids.Contains(c.Id));

生成的SQL

SELECT MIN([t0].[Id]) AS [value]
FROM [ReceivedPayments] AS [t0]
GO

SELECT MAX([t0].[Id]) AS [value]
FROM [ReceivedPayments] AS [t0]
GO

-- Region Parameters
DECLARE @p0 Int = 50
DECLARE @p1 Int = 100
DECLARE @p2 Int = 150
DECLARE @p3 Int = 200
DECLARE @p4 Int = 250
DECLARE @p5 Int = 300
DECLARE @p6 Int = 350
DECLARE @p7 Int = 400
DECLARE @p8 Int = 450
DECLARE @p9 Int = 500
DECLARE @p10 Int = 550
DECLARE @p11 Int = 600
DECLARE @p12 Int = 650
DECLARE @p13 Int = 700
DECLARE @p14 Int = 750
DECLARE @p15 Int = 800
DECLARE @p16 Int = 850
DECLARE @p17 Int = 900
DECLARE @p18 Int = 950
DECLARE @p19 Int = 1000
DECLARE @p20 Int = 1050
DECLARE @p21 Int = 1100
DECLARE @p22 Int = 1150
DECLARE @p23 Int = 1200
DECLARE @p24 Int = 1250
DECLARE @p25 Int = 1300
DECLARE @p26 Int = 1350
DECLARE @p27 Int = 1400
DECLARE @p28 Int = 1450
DECLARE @p29 Int = 1500
DECLARE @p30 Int = 1550
DECLARE @p31 Int = 1600
DECLARE @p32 Int = 1650
DECLARE @p33 Int = 1700
DECLARE @p34 Int = 1750
DECLARE @p35 Int = 1800
DECLARE @p36 Int = 1850
DECLARE @p37 Int = 1900
DECLARE @p38 Int = 1950
DECLARE @p39 Int = 2000
DECLARE @p40 Int = 2050
DECLARE @p41 Int = 2100
DECLARE @p42 Int = 2150
DECLARE @p43 Int = 2200
DECLARE @p44 Int = 2250
DECLARE @p45 Int = 2300
DECLARE @p46 Int = 2350
DECLARE @p47 Int = 2400
DECLARE @p48 Int = 2450
DECLARE @p49 Int = 2500
DECLARE @p50 Int = 2550
DECLARE @p51 Int = 2600
DECLARE @p52 Int = 2650
DECLARE @p53 Int = 2700
DECLARE @p54 Int = 2750
DECLARE @p55 Int = 2800
DECLARE @p56 Int = 2850
DECLARE @p57 Int = 2900
DECLARE @p58 Int = 2950
DECLARE @p59 Int = 3000
DECLARE @p60 Int = 3050
DECLARE @p61 Int = 3100
DECLARE @p62 Int = 3150
DECLARE @p63 Int = 3200
DECLARE @p64 Int = 3250
DECLARE @p65 Int = 3300
DECLARE @p66 Int = 3350
DECLARE @p67 Int = 3400
DECLARE @p68 Int = 3450
DECLARE @p69 Int = 3500
DECLARE @p70 Int = 3550
DECLARE @p71 Int = 3600
DECLARE @p72 Int = 3650
DECLARE @p73 Int = 3700
DECLARE @p74 Int = 3750
DECLARE @p75 Int = 3800
DECLARE @p76 Int = 3850
DECLARE @p77 Int = 3900
DECLARE @p78 Int = 3950
DECLARE @p79 Int = 4000
DECLARE @p80 Int = 4050
DECLARE @p81 Int = 4100
DECLARE @p82 Int = 4150
DECLARE @p83 Int = 4200
DECLARE @p84 Int = 4250
DECLARE @p85 Int = 4300
DECLARE @p86 Int = 4350
DECLARE @p87 Int = 4400
DECLARE @p88 Int = 4450
DECLARE @p89 Int = 4500
DECLARE @p90 Int = 4550
DECLARE @p91 Int = 4600
DECLARE @p92 Int = 4650
DECLARE @p93 Int = 4700
DECLARE @p94 Int = 4750
DECLARE @p95 Int = 4800
DECLARE @p96 Int = 4850
DECLARE @p97 Int = 4900
DECLARE @p98 Int = 4950
DECLARE @p99 Int = 5000
DECLARE @p100 Int = 5050
DECLARE @p101 Int = 5100
DECLARE @p102 Int = 5150
DECLARE @p103 Int = 5200
DECLARE @p104 Int = 5250
DECLARE @p105 Int = 5300
DECLARE @p106 Int = 5350
DECLARE @p107 Int = 5400
DECLARE @p108 Int = 5450
DECLARE @p109 Int = 5500
DECLARE @p110 Int = 5550
DECLARE @p111 Int = 5600
DECLARE @p112 Int = 5650
DECLARE @p113 Int = 5700
DECLARE @p114 Int = 5750
DECLARE @p115 Int = 5800
DECLARE @p116 Int = 5850
DECLARE @p117 Int = 5900
DECLARE @p118 Int = 5950
DECLARE @p119 Int = 6000
DECLARE @p120 Int = 6050
DECLARE @p121 Int = 6100
DECLARE @p122 Int = 6150
DECLARE @p123 Int = 6200
-- EndRegion
SELECT [t0].[Id], [t0].[TxnID], [t0].[TxnDate], [t0].[TotalAmount], [t0].[Memo], [t0].[AppliedToTxnTxnID], [t0].[AppliedToTxnTxnType], [t0].[AppliedToTxnAmount], [t0].[FQPrimaryKey], [t0].[RefNumber], [t0].[ARAccount_Id], [t0].[Customer_Id]
FROM [ReceivedPayments] AS [t0]
WHERE [t0].[Id] IN (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9, @p10, @p11, @p12, @p13, @p14, @p15, @p16, @p17, @p18, @p19, @p20, @p21, @p22, @p23, @p24, @p25, @p26, @p27, @p28, @p29, @p30, @p31, @p32, @p33, @p34, @p35, @p36, @p37, @p38, @p39, @p40, @p41, @p42, @p43, @p44, @p45, @p46, @p47, @p48, @p49, @p50, @p51, @p52, @p53, @p54, @p55, @p56, @p57, @p58, @p59, @p60, @p61, @p62, @p63, @p64, @p65, @p66, @p67, @p68, @p69, @p70, @p71, @p72, @p73, @p74, @p75, @p76, @p77, @p78, @p79, @p80, @p81, @p82, @p83, @p84, @p85, @p86, @p87, @p88, @p89, @p90, @p91, @p92, @p93, @p94, @p95, @p96, @p97, @p98, @p99, @p100, @p101, @p102, @p103, @p104, @p105, @p106, @p107, @p108, @p109, @p110, @p111, @p112, @p113, @p114, @p115, @p116, @p117, @p118, @p119, @p120, @p121, @p122, @p123)

The direct SQL way is probably how I would go but if you have sequential Ids, you can leverage Min and Max:

Note I tested this with Linqpad which uses linq to sql - but I think the result will be the same in EF.

var a = ReceivedPayments.Select(c=>c.Id);
int interval = 50;
var ids = new List<int>();
int min = a.Min();
int max = a.Max();
for (int i = min; i < max; i++)
{
    if((i % interval) == 0)
    {
        ids.Add(i);
    } 
}

var b = ReceivedPayments.Where(c=>ids.Contains(c.Id));

Sql generated

SELECT MIN([t0].[Id]) AS [value]
FROM [ReceivedPayments] AS [t0]
GO

SELECT MAX([t0].[Id]) AS [value]
FROM [ReceivedPayments] AS [t0]
GO

-- Region Parameters
DECLARE @p0 Int = 50
DECLARE @p1 Int = 100
DECLARE @p2 Int = 150
DECLARE @p3 Int = 200
DECLARE @p4 Int = 250
DECLARE @p5 Int = 300
DECLARE @p6 Int = 350
DECLARE @p7 Int = 400
DECLARE @p8 Int = 450
DECLARE @p9 Int = 500
DECLARE @p10 Int = 550
DECLARE @p11 Int = 600
DECLARE @p12 Int = 650
DECLARE @p13 Int = 700
DECLARE @p14 Int = 750
DECLARE @p15 Int = 800
DECLARE @p16 Int = 850
DECLARE @p17 Int = 900
DECLARE @p18 Int = 950
DECLARE @p19 Int = 1000
DECLARE @p20 Int = 1050
DECLARE @p21 Int = 1100
DECLARE @p22 Int = 1150
DECLARE @p23 Int = 1200
DECLARE @p24 Int = 1250
DECLARE @p25 Int = 1300
DECLARE @p26 Int = 1350
DECLARE @p27 Int = 1400
DECLARE @p28 Int = 1450
DECLARE @p29 Int = 1500
DECLARE @p30 Int = 1550
DECLARE @p31 Int = 1600
DECLARE @p32 Int = 1650
DECLARE @p33 Int = 1700
DECLARE @p34 Int = 1750
DECLARE @p35 Int = 1800
DECLARE @p36 Int = 1850
DECLARE @p37 Int = 1900
DECLARE @p38 Int = 1950
DECLARE @p39 Int = 2000
DECLARE @p40 Int = 2050
DECLARE @p41 Int = 2100
DECLARE @p42 Int = 2150
DECLARE @p43 Int = 2200
DECLARE @p44 Int = 2250
DECLARE @p45 Int = 2300
DECLARE @p46 Int = 2350
DECLARE @p47 Int = 2400
DECLARE @p48 Int = 2450
DECLARE @p49 Int = 2500
DECLARE @p50 Int = 2550
DECLARE @p51 Int = 2600
DECLARE @p52 Int = 2650
DECLARE @p53 Int = 2700
DECLARE @p54 Int = 2750
DECLARE @p55 Int = 2800
DECLARE @p56 Int = 2850
DECLARE @p57 Int = 2900
DECLARE @p58 Int = 2950
DECLARE @p59 Int = 3000
DECLARE @p60 Int = 3050
DECLARE @p61 Int = 3100
DECLARE @p62 Int = 3150
DECLARE @p63 Int = 3200
DECLARE @p64 Int = 3250
DECLARE @p65 Int = 3300
DECLARE @p66 Int = 3350
DECLARE @p67 Int = 3400
DECLARE @p68 Int = 3450
DECLARE @p69 Int = 3500
DECLARE @p70 Int = 3550
DECLARE @p71 Int = 3600
DECLARE @p72 Int = 3650
DECLARE @p73 Int = 3700
DECLARE @p74 Int = 3750
DECLARE @p75 Int = 3800
DECLARE @p76 Int = 3850
DECLARE @p77 Int = 3900
DECLARE @p78 Int = 3950
DECLARE @p79 Int = 4000
DECLARE @p80 Int = 4050
DECLARE @p81 Int = 4100
DECLARE @p82 Int = 4150
DECLARE @p83 Int = 4200
DECLARE @p84 Int = 4250
DECLARE @p85 Int = 4300
DECLARE @p86 Int = 4350
DECLARE @p87 Int = 4400
DECLARE @p88 Int = 4450
DECLARE @p89 Int = 4500
DECLARE @p90 Int = 4550
DECLARE @p91 Int = 4600
DECLARE @p92 Int = 4650
DECLARE @p93 Int = 4700
DECLARE @p94 Int = 4750
DECLARE @p95 Int = 4800
DECLARE @p96 Int = 4850
DECLARE @p97 Int = 4900
DECLARE @p98 Int = 4950
DECLARE @p99 Int = 5000
DECLARE @p100 Int = 5050
DECLARE @p101 Int = 5100
DECLARE @p102 Int = 5150
DECLARE @p103 Int = 5200
DECLARE @p104 Int = 5250
DECLARE @p105 Int = 5300
DECLARE @p106 Int = 5350
DECLARE @p107 Int = 5400
DECLARE @p108 Int = 5450
DECLARE @p109 Int = 5500
DECLARE @p110 Int = 5550
DECLARE @p111 Int = 5600
DECLARE @p112 Int = 5650
DECLARE @p113 Int = 5700
DECLARE @p114 Int = 5750
DECLARE @p115 Int = 5800
DECLARE @p116 Int = 5850
DECLARE @p117 Int = 5900
DECLARE @p118 Int = 5950
DECLARE @p119 Int = 6000
DECLARE @p120 Int = 6050
DECLARE @p121 Int = 6100
DECLARE @p122 Int = 6150
DECLARE @p123 Int = 6200
-- EndRegion
SELECT [t0].[Id], [t0].[TxnID], [t0].[TxnDate], [t0].[TotalAmount], [t0].[Memo], [t0].[AppliedToTxnTxnID], [t0].[AppliedToTxnTxnType], [t0].[AppliedToTxnAmount], [t0].[FQPrimaryKey], [t0].[RefNumber], [t0].[ARAccount_Id], [t0].[Customer_Id]
FROM [ReceivedPayments] AS [t0]
WHERE [t0].[Id] IN (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9, @p10, @p11, @p12, @p13, @p14, @p15, @p16, @p17, @p18, @p19, @p20, @p21, @p22, @p23, @p24, @p25, @p26, @p27, @p28, @p29, @p30, @p31, @p32, @p33, @p34, @p35, @p36, @p37, @p38, @p39, @p40, @p41, @p42, @p43, @p44, @p45, @p46, @p47, @p48, @p49, @p50, @p51, @p52, @p53, @p54, @p55, @p56, @p57, @p58, @p59, @p60, @p61, @p62, @p63, @p64, @p65, @p66, @p67, @p68, @p69, @p70, @p71, @p72, @p73, @p74, @p75, @p76, @p77, @p78, @p79, @p80, @p81, @p82, @p83, @p84, @p85, @p86, @p87, @p88, @p89, @p90, @p91, @p92, @p93, @p94, @p95, @p96, @p97, @p98, @p99, @p100, @p101, @p102, @p103, @p104, @p105, @p106, @p107, @p108, @p109, @p110, @p111, @p112, @p113, @p114, @p115, @p116, @p117, @p118, @p119, @p120, @p121, @p122, @p123)
听风吹 2024-12-28 21:32:36

您始终可以使用 Skip and Take 来完成此操作。但我敢打赌它的效率非常低。

int n = 10;
int currentIndex = 0;
yourEntity current;
List<yourEntity> lstSampling = new List<yourEntity>();

while((current = context.yourEntities.Skip(currentIndex).Take(1).FirstOrDefault()) != null)
{
    listSampling.Add(current);
    currentIndex += n;
}

You could always do it with Skip and Take. Bet its horribly inefficient though.

int n = 10;
int currentIndex = 0;
yourEntity current;
List<yourEntity> lstSampling = new List<yourEntity>();

while((current = context.yourEntities.Skip(currentIndex).Take(1).FirstOrDefault()) != null)
{
    listSampling.Add(current);
    currentIndex += n;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文