差↓一点笑了

文章 评论 浏览 27

差↓一点笑了 2025-02-21 00:40:21

您可以在ofter之后分配名称:

o <- outer(df.r$B, df.f$B, '-')
dimnames(o) <- list(df.r$ID, df.f$ID)
o
#     A_1 A_2  A_3
# B_1  40 -80 -160
# B_2 220 100   20
# B_3 350 230  150

或为每个向量:

o <- outer(setNames(df.r$B, df.r$ID), setNames(df.f$B, df.f$ID), '-')
o
#     A_1 A_2  A_3
# B_1  40 -80 -160
# B_2 220 100   20
# B_3 350 230  150

熔化良好:

res <- reshape2::melt(o)
res
#   Var1 Var2 value
# 1  B_1  A_1    40
# 2  B_2  A_1   220
# 3  B_3  A_1   350
# 4  B_1  A_2   -80
# 5  B_2  A_2   100
# 6  B_3  A_2   230
# 7  B_1  A_3  -160
# 8  B_2  A_3    20
# 9  B_3  A_3   150

res[res$value >= 200 & res$value <= 300,]
#   Var1 Var2 value
# 2  B_2  A_1   220
# 6  B_3  A_2   230

另外,您可以执行外部join(MERGE noth noth)和一些简单的分配:

with(merge(df.f, df.r, by = c()),
     data.frame(Var1=ID.x, Var2=ID.y, Diff=B.y-B.x))
#   Var1 Var2 Diff
# 1  A_1  B_1   40
# 2  A_2  B_1  -80
# 3  A_3  B_1 -160
# 4  A_1  B_2  220
# 5  A_2  B_2  100
# 6  A_3  B_2   20
# 7  A_1  B_3  350
# 8  A_2  B_3  230
# 9  A_3  B_3  150

((并根据需要)。

You can assign names after outer:

o <- outer(df.r$B, df.f$B, '-')
dimnames(o) <- list(df.r$ID, df.f$ID)
o
#     A_1 A_2  A_3
# B_1  40 -80 -160
# B_2 220 100   20
# B_3 350 230  150

or to each vector:

o <- outer(setNames(df.r$B, df.r$ID), setNames(df.f$B, df.f$ID), '-')
o
#     A_1 A_2  A_3
# B_1  40 -80 -160
# B_2 220 100   20
# B_3 350 230  150

Which melts well:

res <- reshape2::melt(o)
res
#   Var1 Var2 value
# 1  B_1  A_1    40
# 2  B_2  A_1   220
# 3  B_3  A_1   350
# 4  B_1  A_2   -80
# 5  B_2  A_2   100
# 6  B_3  A_2   230
# 7  B_1  A_3  -160
# 8  B_2  A_3    20
# 9  B_3  A_3   150

res[res$value >= 200 & res$value <= 300,]
#   Var1 Var2 value
# 2  B_2  A_1   220
# 6  B_3  A_2   230

Alternatively, you can do an outer-join (merge on nothing) and some simple assignment:

with(merge(df.f, df.r, by = c()),
     data.frame(Var1=ID.x, Var2=ID.y, Diff=B.y-B.x))
#   Var1 Var2 Diff
# 1  A_1  B_1   40
# 2  A_2  B_1  -80
# 3  A_3  B_1 -160
# 4  A_1  B_2  220
# 5  A_2  B_2  100
# 6  A_3  B_2   20
# 7  A_1  B_3  350
# 8  A_2  B_3  230
# 9  A_3  B_3  150

(and subset as desired).

对给定范围内的两个数字向量的成对减法和结果选择

差↓一点笑了 2025-02-21 00:24:08

当然,有许多方法,例如同步请求,保证,但是根据我的经验,我认为您应该使用回调方法。 JavaScript的异步行为很自然。

因此,您的代码片段可以重写有些不同:

function foo() {
    var result;

    $.ajax({
        url: '...',
        success: function(response) {
            myCallback(response);
        }
    });

    return result;
}

function myCallback(response) {
    // Does something.
}

Of course there are many approaches like synchronous request, promise, but from my experience I think you should use the callback approach. It's natural to asynchronous behavior of JavaScript.

So, your code snippet can be rewritten to be a little different:

function foo() {
    var result;

    $.ajax({
        url: '...',
        success: function(response) {
            myCallback(response);
        }
    });

    return result;
}

function myCallback(response) {
    // Does something.
}

如何从异步电话中返回响应?

差↓一点笑了 2025-02-20 22:25:41

您的问题是您永远不会停止计时器 - 即使您关闭了工作簿,它也会保持活跃。分钟结束后,VBA想要调用当前尚不可用的sub(save1)(由于工作簿关闭),因此VBA要求Excel打开文件,以便它可以执行例程。

如果您以某种方式添加了工作簿,则无济于事,因为当时,它 已经再次打开。

您需要做的是关闭工作簿时停止计时器。该事件是workbook_beforeclose。现在停止在VBA中停止计时器有点棘手:您再次调用application.intime -Method,您必须提供当时设置时发布时发出的确切参数:例程的名称需要匹配的时间。唯一的区别是,您将第四参数(Schedule)设置为false。

结果,您的代码需要跟踪在最后一次呼叫ontime的时间上提供的时间。查看以下代码:

在工作簿模块中:

Private Sub Workbook_Open()
    StartTimer
End Sub

Private Sub Workbook_BeforeClose(Cancel As Boolean)
    StopTimer
End Sub

在常规代码模块中:

Option Explicit

Dim FireTime As Variant     ' This holds the time when Ontime was called the last time

Sub StartTimer()
    FireTime = Now + TimeValue("00:01:00")
    Application.OnTime FireTime, "Save1"
    ' ThisWorkbook.Sheets(1).Cells(1, 1) = FireTime
End Sub

Sub StopTimer()
    If Not IsEmpty(FireTime) Then
        Application.OnTime FireTime, "Save1", , Schedule:=False
        FireTime = Empty
    End If
End Sub

Sub Save1()
    Debug.Print "tick"
    ' put your actions here, eg saving 
    StartTimer  ' Schedule the next Timer event
End Sub

Your problem is that you never stop the timer - it stays active even if you close the workbook. When the minute is over, VBA want to call a Sub (Save1) that is currently not available (as the workbook is closed), so VBA asks Excel to open the file so that it can execute the routine.

It will not help you if you would somehow add a check if the workbook is open because at that time, it is already open again.

What you need to do is to stop your timer when you close the workbook. The event for that is Workbook_BeforeClose. Now stopping a timer in VBA is a little bit tricky: You call the Application.OnTime-method again, and you have to provide the exact parameter that you issued when the timer was set the last time: The name of the routine and the time need to match. The only difference is that you set the 4th parameter (Schedule) to False.

As a consequence, your code need to keep track what time was provided at the last call to OnTime. Have a look to the following code:

In the Workbook-Module:

Private Sub Workbook_Open()
    StartTimer
End Sub

Private Sub Workbook_BeforeClose(Cancel As Boolean)
    StopTimer
End Sub

In a regular code module:

Option Explicit

Dim FireTime As Variant     ' This holds the time when Ontime was called the last time

Sub StartTimer()
    FireTime = Now + TimeValue("00:01:00")
    Application.OnTime FireTime, "Save1"
    ' ThisWorkbook.Sheets(1).Cells(1, 1) = FireTime
End Sub

Sub StopTimer()
    If Not IsEmpty(FireTime) Then
        Application.OnTime FireTime, "Save1", , Schedule:=False
        FireTime = Empty
    End If
End Sub

Sub Save1()
    Debug.Print "tick"
    ' put your actions here, eg saving 
    StartTimer  ' Schedule the next Timer event
End Sub

仅在打开时保存工作簿的条件

差↓一点笑了 2025-02-20 01:55:02

您可以使用响应式Flexbox:

  <div class="container">
    <div class="row">
      <div class="col p-3 rounded border text-center">
        <div class="fw-bolder">
          Item 1
        </div>
        <div>
          Desc Line 1
        </div>
        <div>
          Desc Line 2
        </div>
        <div>
          Desc Line 3
        </div>
        <a href="http://localhost:8000/products/1/edit" class="align-bottom">Edit</a>
      </div>
      <div class="col p-3 rounded border text-center d-flex flex-column justify-content-between">
        <div class="fw-bolder">
          Item 2
        </div>
        <div>
          Desc Line 1
        </div>

        <a href="http://localhost:8000/products/2/edit" class="align-text-bottom">Edit</a>
      </div>
    </div>
  </div>

使用d-flex类转换为Flex项目和Flex-Column来定义项目的方向,您只需要使用<代码>合理范围类以垂直对齐,并沿主轴均匀地分布在容器中

You can use responsive flexbox:

  <div class="container">
    <div class="row">
      <div class="col p-3 rounded border text-center">
        <div class="fw-bolder">
          Item 1
        </div>
        <div>
          Desc Line 1
        </div>
        <div>
          Desc Line 2
        </div>
        <div>
          Desc Line 3
        </div>
        <a href="http://localhost:8000/products/1/edit" class="align-bottom">Edit</a>
      </div>
      <div class="col p-3 rounded border text-center d-flex flex-column justify-content-between">
        <div class="fw-bolder">
          Item 2
        </div>
        <div>
          Desc Line 1
        </div>

        <a href="http://localhost:8000/products/2/edit" class="align-text-bottom">Edit</a>
      </div>
    </div>
  </div>

Using the d-flex class to transform into flex items and flex-column to define the direction of the items, you just need to use the justify-content-between class to align the items vertically and distribute them evenly inside the container along the main axis

如何将链接对齐到网格列中的底部?

差↓一点笑了 2025-02-18 09:48:40

推荐

 <StatusBar backgroundColor="#000000" barStyle="light-content"  />

为我工作

Putting

 <StatusBar backgroundColor="#000000" barStyle="light-content"  />

worked for me

Android状态棒颜色不会从白色变化

差↓一点笑了 2025-02-18 06:55:22

最简单的方法,没有更改代码结构,直到最后才打印。使用StringBuffer或StringBuilder保持输出,然后在打印之前清理您不需要的内容。

import java.util.*;


public class Solution {


public static void main(String[] args) {
    Scanner scanner = new Scanner(System.in);

    StringBuilder sb = new StringBuilder();
    
    int T = scanner.nextInt();
    for(int i = 0; i <= T; i++){
        String SJ = scanner.nextLine();
        
        if(SJ.length() % 2 == 1){
            for(int k = 0; k < (SJ.length()+1)/2; k++){
                //System.out.print(SJ.charAt(k*2));
                sb.append(SJ.charAt(k*2));
            }
            
            //System.out.print(" ");
            sb.append(" ");
            
            for(int j = 0; j < (SJ.length()-1)/2; j++){
                //System.out.print(SJ.charAt(j*2+1));
                sb.append(SJ.charAt(j*2+1));
            }
        }
        else{
            for(int k = 0; k < SJ.length()/2; k++){
                //System.out.print(SJ.charAt(k*2));
                sb.append(SJ.charAt(k*2));
            }
            
            //System.out.print(" ");
            sb.append(" ");
            
            for(int j = 0; j < SJ.length()/2; j++){
                //System.out.print(SJ.charAt(j*2+1));
                sb.append(SJ.charAt(j*2+1));
            }
        }
        
        //System.out.print("\n");
        sb.append("\n");
       
    }

    String str = sb.toString().replaceFirst("^\\s+", "") ;
    
    System.out.print(str) ;
    
    scanner.close();
}

}

The easiest way, without changing your code structure is not to print until the end. Use a StringBuffer or StringBuilder to hold the output and then clean up what you don't want before you print.

import java.util.*;


public class Solution {


public static void main(String[] args) {
    Scanner scanner = new Scanner(System.in);

    StringBuilder sb = new StringBuilder();
    
    int T = scanner.nextInt();
    for(int i = 0; i <= T; i++){
        String SJ = scanner.nextLine();
        
        if(SJ.length() % 2 == 1){
            for(int k = 0; k < (SJ.length()+1)/2; k++){
                //System.out.print(SJ.charAt(k*2));
                sb.append(SJ.charAt(k*2));
            }
            
            //System.out.print(" ");
            sb.append(" ");
            
            for(int j = 0; j < (SJ.length()-1)/2; j++){
                //System.out.print(SJ.charAt(j*2+1));
                sb.append(SJ.charAt(j*2+1));
            }
        }
        else{
            for(int k = 0; k < SJ.length()/2; k++){
                //System.out.print(SJ.charAt(k*2));
                sb.append(SJ.charAt(k*2));
            }
            
            //System.out.print(" ");
            sb.append(" ");
            
            for(int j = 0; j < SJ.length()/2; j++){
                //System.out.print(SJ.charAt(j*2+1));
                sb.append(SJ.charAt(j*2+1));
            }
        }
        
        //System.out.print("\n");
        sb.append("\n");
       
    }

    String str = sb.toString().replaceFirst("^\\s+", "") ;
    
    System.out.print(str) ;
    
    scanner.close();
}

}

输出额外的线路

差↓一点笑了 2025-02-18 01:25:51

没有一个没有寿命的参考类型,但RUST使我们可以跳过常见情况下写出寿命,称为 lifetime Elision

fn return_ref(passed_ref: &str) -> &str

该代码还可以,因为编译器看到“一个参考,一个参考”(实际上是任何数量的参考文献),并得出结论,应该将其完全像您写作一样,

fn return_ref<'a>(passed_ref: &'a str) -> &'a str

将两个参考的寿命相同。但是在

fn return_ref() -> &str {

没有输入寿命中可以说出输出寿命应该是什么。现在,鉴于该函数的 body ,我们实际上确实知道它可以拥有的寿命:'static,因为它正在返回对字符串文字的引用。但是,生锈永远不会根据功能的身体(这会使更改签名和打破呼叫者变得过于容易),从而侵犯了函数签名的任何部分(或(几乎)),因此我们必须明确地写下它:

fn return_ref() -> &'static str {
    let local_ref = &"world"[..];
    local_ref
}

但是您的通过未使用的论点编译的最后一个函数?为什么允许这?好吧,省略的工作原理相同,

fn return_ref<'a>(passed_ref: &'a str) -> &'a str {
    let local_ref = &"world"[..];
    local_ref
}

您可以使用参考,就好像它的寿命较短一样。因此,local_ref返回后,其'static寿命缩短为'a lifetime。对于一个比您想出的更有意义的示例,您可以编写此功能:

fn display_value<'a>(value: Option<&'a str>) -> &'a str {
    match value {
        Some(s) => s,
        None => "null",
    }
}

根据输入,这是从两个可能的来源返回字符串,这很好,因为所有字符串数据至少适用于 'a

There's no such thing as a reference type without a lifetime attached, but Rust allows us to skip writing out the lifetime in common situations, called lifetime elision.

fn return_ref(passed_ref: &str) -> &str

That code is okay because the compiler sees "one reference in, one reference out" (actually any number of references out) and concludes that this should be treated exactly as if you wrote

fn return_ref<'a>(passed_ref: &'a str) -> &'a str

making the two references' lifetimes the same. But in

fn return_ref() -> &str {

there is no input lifetime to say what the output lifetime should be. Now, given the function's body, we actually do know a lifetime that it can have: 'static, since it's returning a reference to a string literal. But Rust never infers a lifetime (or (almost) any part of a function's signature) based on the function's body (that would make it too easy to change signatures and break callers), so we have to write it explicitly:

fn return_ref() -> &'static str {
    let local_ref = &"world"[..];
    local_ref
}

But what about your last function that did compile with an unused argument? Why is this allowed? Well, the elision works the same,

fn return_ref<'a>(passed_ref: &'a str) -> &'a str {
    let local_ref = &"world"[..];
    local_ref
}

and you are allowed to use a reference as if it had a shorter lifetime. So, the local_ref, when returned, has its 'static lifetime shortened to the 'a lifetime. For an example that makes more sense than the one you came up with, you could write this function:

fn display_value<'a>(value: Option<&'a str>) -> &'a str {
    match value {
        Some(s) => s,
        None => "null",
    }
}

This returns strings from two possible sources depending on the input, and that's fine, since all of the string data lives for at least 'a.

仅在传递参考时才返回参考

差↓一点笑了 2025-02-17 17:42:35
local HEIGHT = 6
local WIDTH  = 6

local State = {
    "C1", "--", "--", "--", "--", "C2",
    "--", "--", "--", "o",  "--", "--",
    "--", "--", "--", "--", "--", "--",
    "--", "--", "--", "--", "--", "--",
    "--", "--", "--", "--", "--", "--",
    "C3", "--", "--", "--", "--", "C4"
}


function get_state(r, c)
        return State[(r - 1) * WIDTH + c]
end

for row = 1, HEIGHT do
        for col = 1, WIDTH do
                print(string.format("(%d, %d) -> %s", row, col, get_state(row, col)))
        end
end

这将通用“将1D数据结构视为2D网格”,如此处所述,在C的另一个示例中,但由于LUA的1折叠表而进行了调整。

治疗-A-1D-DATA-structure-AS-2D-GRID

http://www.bytemuse.com/post/usision-a-1d-aray-aray-as-a-a-2d-array-in-in-in-c/

local HEIGHT = 6
local WIDTH  = 6

local State = {
    "C1", "--", "--", "--", "--", "C2",
    "--", "--", "--", "o",  "--", "--",
    "--", "--", "--", "--", "--", "--",
    "--", "--", "--", "--", "--", "--",
    "--", "--", "--", "--", "--", "--",
    "C3", "--", "--", "--", "--", "C4"
}


function get_state(r, c)
        return State[(r - 1) * WIDTH + c]
end

for row = 1, HEIGHT do
        for col = 1, WIDTH do
                print(string.format("(%d, %d) -> %s", row, col, get_state(row, col)))
        end
end

This implements the generic "Treat a 1D data-structure as a 2D grid" as described here generically, and in another example for C, but adjusted because of Lua's 1-offset tables.

https://softwareengineering.stackexchange.com/questions/212808/treating-a-1d-data-structure-as-2d-grid

http://www.bytemuse.com/post/using-a-1d-array-as-a-2d-array-in-c/

如何按行和列获得行

差↓一点笑了 2025-02-17 12:53:12

您的问题是,您将var定义为字节大小的变量,但是您在它上操作,就好像它是一个dword大小的变量一样。这会导致您围绕变量读取/编写无关字节,从而导致您观察到的奇怪数字。

要解决此问题,请始终在具有正确数据大小的数据上操作。例如,请

movzx ebx, byte [Var]
inc ebx
mov [Var], bl

注意不对称:我们可以使用mov bl,byte ptr [var],但是写入部分寄存器的速度很慢(即blEBX)。指令movzx确保在仅从内存中获取单个字节的同时编写完整寄存器。

甚至更简单

inc byte [Var]

Your problem is that you defined Var to be a byte-sized variable, but you operate on it as if it was a dword-sized variable. This causes you to read/write unrelated bytes around the variable, causing the strange numbers you observe.

To fix this, always operate on data with the correct data size. For example, do

movzx ebx, byte [Var]
inc ebx
mov [Var], bl

Note the asymmetry: we could have used mov bl, byte ptr [Var], but it's slow to write to partial registers (i.e. bl being a part of ebx). The instruction movzx makes sure to write the full register while only fetching a single byte from memory.

Or even simpler

inc byte [Var]

增加nasm组装中的变量

差↓一点笑了 2025-02-17 05:31:21
  1. 您正在重新分配未分配的内存。
  2. 您正在尝试将字符串与非初始化的缓冲区相结合。您需要在分配后的第一个缓冲区的第一个元素“ memset()”此缓冲区或设置'\ 0'字符。
  3. 您正在尝试将字符串拆分为字符串数组,但是您没有有关分裂后在数组中写入多少令牌的信息。当您需要此数组的迭代时,这也将是一个问题。您可以在最后一个数组元素中写入null,也可以将第三个指针参数传递到拆分函数,您将能够保存数组长度。

PS已经实现了分裂的实现,您可以检查一下或将其用作片段。

这是项目: libxutils 。在src/xstr.c文件中检查xstrsplit()函数。

示例代码看起来像这样:

const char *pMyString = "this:is:test:strig";

xarray_t *pTokens = xstrsplit(myString, ":");
size_t i, nUsed = XArray_Used(pTokens);

for (i = 0; i < nUsed; i++)
{
    char *pTok = (char *)XArray_GetData(pTokens, i);
    printf("token: %s, order id: %zu\n", pTok, i);
}

XArray_Destroy(pTokens);
  1. You are reallocating unallocated memory.
  2. You are trying to catenate string to the uninitialized buffer. You need to 'memset()' this buffer or set '\0' character to the first element of this buffer after allocation.
  3. You are trying to split a string into a string array, but you do not have information on how many tokens are written in the array after splitting. This will be also a problem when you need an iteration of this array. You can either write NULL in the last array element or pass the third pointer argument to the split function where you will be able to save array length.

P.S. there is an already made implementation of split and you can check this out or use it as a snippet.

This is the project: libxutils. Check xstrsplit() function in src/xstr.c file.

The example code is looking something like that:

const char *pMyString = "this:is:test:strig";

xarray_t *pTokens = xstrsplit(myString, ":");
size_t i, nUsed = XArray_Used(pTokens);

for (i = 0; i < nUsed; i++)
{
    char *pTok = (char *)XArray_GetData(pTokens, i);
    printf("token: %s, order id: %zu\n", pTok, i);
}

XArray_Destroy(pTokens);

我试图将其插入C中,我不知道为什么我的代码在做strcat时会崩溃

差↓一点笑了 2025-02-17 02:04:24

你能做这样的事情:

# create a storage df; store this outside of your loop
total_df = pd.DataFrame()

# create your dataframe with results for one regression in the loop
results = pd.DataFrame({
                "Window":window_range,
                f"{t} Beta":beta_coef,
                f"{t}Beta STD": std_error,                
            },index=[0])

# concat the result in the loop like so
total_df = pd.concat([total_df, results], axis=0)

Can you do something like this:

# create a storage df; store this outside of your loop
total_df = pd.DataFrame()

# create your dataframe with results for one regression in the loop
results = pd.DataFrame({
                "Window":window_range,
                f"{t} Beta":beta_coef,
                f"{t}Beta STD": std_error,                
            },index=[0])

# concat the result in the loop like so
total_df = pd.concat([total_df, results], axis=0)

我如何连接滚动回归结果| Python

差↓一点笑了 2025-02-16 13:39:08

“ @和立即下一个空白空间之间的字符串中的任何地方” - &gt;

 REGEXP "@\\w*off\\w*\\s"

"anywhere in string between @ and immediate next blank space" -->

 REGEXP "@\\w*off\\w*\\s"

mysql的正则表达式

差↓一点笑了 2025-02-16 04:47:25

要使用项目插槽,您必须“覆盖”插槽和示例插槽。

<!-- DataTable.vue -->
<template>
  <v-data-table
    ...
  >
    <template v-for="(_, name) in $slots">
      <template :slot="name">
        <slot :name="name"></slot>
      </template>
    </template>

    <template
      v-for="(_, name) in $scopedSlots"
      #[name]="data"
    >
      <slot
        :name="name"
        v-bind="data"
      ></slot>
    </template>
  </v-data-table>
</template>

To use item slots you must "override" slots and scoped slots.

<!-- DataTable.vue -->
<template>
  <v-data-table
    ...
  >
    <template v-for="(_, name) in $slots">
      <template :slot="name">
        <slot :name="name"></slot>
      </template>
    </template>

    <template
      v-for="(_, name) in $scopedSlots"
      #[name]="data"
    >
      <slot
        :name="name"
        v-bind="data"
      ></slot>
    </template>
  </v-data-table>
</template>

如何使用Vuetify V-Data-table项目插槽?

差↓一点笑了 2025-02-15 20:44:48

您无法在React Native中显示布尔值。如果您想在JSX中显示布尔值:

<Text>{yourBooleanValue.toString()}</Text>

You can't display boolean in React Native. If you wanna display boolean in JSX:

<Text>{yourBooleanValue.toString()}</Text>

REECT本地:显示Firestore布尔值

差↓一点笑了 2025-02-15 19:26:03

Spark是一个分布式处理框架。除了支持数据帧功能外,它还需要运行JVM,调度程序,互操作/机器通信,它旋转数据库等。同样的方式,更广泛的答案是,任何分布式处理库自然都涉及巨大的开销。大量工作要减少这个开销,但这永远不会是微不足道的。

dask(另一个分布式处理库,带有 dataframe> dataframe 实现)在最佳实践。在其中,第一个建议是不使用dask,除非您必须:

并行性带来了额外的复杂性和开销。有时候,这是有必要解决更大问题的,但通常不是。在添加像DASK这样的并行计算系统之前,您可能需要先尝试一些替代方案:

  • 使用更好的算法或数据结构: numpy,pandas,scikit-learn可能具有更快的功能。可能值得与专家咨询或再次阅读其文档以找到更好的预构建算法。

  • 更好的文件格式:支持随机访问的有效二进制格式通常可以帮助您有效地管理大于内存的数据集。请参阅存储数据有效下面的部分。

  • 编译的代码:用numba或Cython编译您的Python代码可能会使并行性不必要。或者您可以使用这些库中可用的多核并行性。

  • 采样:即使您有很多数据,使用所有数据也可能没有太大的优势。通过智能采样,您也许可以从更易于管理的子集中得出相同的见解。

  • 个人资料:如果您试图加快慢速代码,请务必首先了解其慢速。适度的时间投资在分析您的代码方面可以帮助您确定速度放慢的东西。此信息可以帮助您更好地决定并行性是否可能有所帮助,或者其他方法是否可能更有效。

有一个很好的理由。对于小型数据集,内存中的单线程应用程序总是会更快。

非常简单地说,如果您想象工作流的单线程运行时是t,则分布式工作流的壁时间将为t_parallelizable / n_cores + t_not_parallelizable + thebhadebled + busphadebadebhend < / code>。对于Pyspark而言,此开销非常重要。很多时候值得。但这不是一无所有。

spark is a distributed processing framework. In addition to supporting the DataFrame functionality, it needs to run a JVM, a scheduler, cross-process/machine communication, it spins up databases, etc. So while of course, the answer to your question is no, not exactly everything is implemented the same way, the wider answer is that any distributed processing library naturally involves immense overhead. Lots of work goes into reducing this overhead, but it will never be trivial.

Dask (another distributed processing library with a DataFrame implementation) has a great section on best practices. In it, the first recommendation is not to use dask unless you have to:

Parallelism brings extra complexity and overhead. Sometimes it’s necessary for larger problems, but often it’s not. Before adding a parallel computing system like Dask to your workload you may want to first try some alternatives:

  • Use better algorithms or data structures: NumPy, Pandas, Scikit-Learn may have faster functions for what you’re trying to do. It may be worth consulting with an expert or reading through their docs again to find a better pre-built algorithm.

  • Better file formats: Efficient binary formats that support random access can often help you manage larger-than-memory datasets efficiently and simply. See the Store Data Efficiently section below.

  • Compiled code: Compiling your Python code with Numba or Cython might make parallelism unnecessary. Or you might use the multi-core parallelism available within those libraries.

  • Sampling: Even if you have a lot of data, there might not be much advantage from using all of it. By sampling intelligently you might be able to derive the same insight from a much more manageable subset.

  • Profile: If you’re trying to speed up slow code it’s important that you first understand why it is slow. Modest time investments in profiling your code can help you to identify what is slowing you down. This information can help you make better decisions about if parallelism is likely to help, or if other approaches are likely to be more effective.

There's a very good reason for this. In-memory, single-threaded applications are always going to be much faster for small datasets.

Very simplistically, if you imagine the single-threaded runtime for your workflow is T, the wall time of a distributed workflow will be T_parallelizable / n_cores + T_not_parallelizable + overhead. For pyspark, this overhead is very significant. It's worth it a lot of the time. But it's not nothing.

可以用Pyspark DataFrames复制PANDAS数据范围可以完成的所有操作吗?

更多

推荐作者

櫻之舞

文章 0 评论 0

弥枳

文章 0 评论 0

m2429

文章 0 评论 0

野却迷人

文章 0 评论 0

我怀念的。

文章 0 评论 0

    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文