在Julia中的DataFrame中添加数千个分离器

发布于 2025-02-13 22:26:45 字数 250 浏览 1 评论 0 原文

我有一个带有两个A和B列的数据框,目前两者看起来都像A列,但是我想添加分离器,以便B列看起来像下面。我尝试使用包装格式。但是我还没有得到结果。也许值得一提的是,这两个列均为INT64,列名A和B是类型符号。

 a      |    b
150000  | 1500,00 
27      | 27,00
16614   | 166,14

除了使用format.jl外,还有其他方法可以解决此问题吗?还是格式是要走的路?

I have a dataframe with two columns a and b and at the moment both are looking like column a, but I want to add separators so that column b looks like below. I have tried using the package format.jl. But I haven't gotten the result I'm afte. Maybe worth mentioning is that both columns is Int64 and the column names a and b is of type symbol.

 a      |    b
150000  | 1500,00 
27      | 27,00
16614   | 166,14

Is there some other way to solve this than using format.jl? Or is format.jl the way to go?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

失而复得 2025-02-20 22:26:45

假设您希望逗号处于典型位置而不是写逗号,这是一种方法:

julia> using DataFrames, Format

julia> f(x) = format(x, commas=true)
f (generic function with 1 method)

julia> df = DataFrame(a = [1000000, 200000, 30000])
3×1 DataFrame
 Row │ a       
     │ Int64
─────┼─────────
   1 │ 1000000
   2 │  200000
   3 │   30000

julia> transform(df, :a => ByRow(f) => :a_string)
3×2 DataFrame
 Row │ a        a_string  
     │ Int64    String
─────┼────────────────────
   1 │ 1000000  1,000,000
   2 │  200000  200,000
   3 │   30000  30,000

如果您要替换该行,请使用 transform(df,:a => byrow(f),renamecols = false )
如果您只需要输出向量而不是更改数据框,则可以使用格式。(df.a,commas = true)

您可以编写自己的函数 f 以实现相同的行为,但您也可以使用有人已经在格式中写的人

但是,一旦将数据转换为 String 如上所述,您将无法过滤/排序/分析数据框中的数值数据。我建议您使用 PrettyTables 软件包。这可以立即格式化整个数据帧。

julia> using DataFrames, PrettyTables

julia> df = DataFrame(a = [1000000, 200000, 30000], b = [500, 6000, 70000])
3×2 DataFrame
 Row │ a        b     
     │ Int64    Int64 
─────┼────────────────
   1 │ 1000000    500
   2 │  200000   6000
   3 │   30000  70000

julia> pretty_table(df, formatters = ft_printf("%'d"))
┌───────────┬────────┐
│         a │      b │
│     Int64 │  Int64 │
├───────────┼────────┤
│ 1,000,000 │    500 │
│   200,000 │  6,000 │
│    30,000 │ 70,000 │
└───────────┴────────┘

Assuming you want the commas in their typical positions rather than how you wrote them, this is one way:

julia> using DataFrames, Format

julia> f(x) = format(x, commas=true)
f (generic function with 1 method)

julia> df = DataFrame(a = [1000000, 200000, 30000])
3×1 DataFrame
 Row │ a       
     │ Int64
─────┼─────────
   1 │ 1000000
   2 │  200000
   3 │   30000

julia> transform(df, :a => ByRow(f) => :a_string)
3×2 DataFrame
 Row │ a        a_string  
     │ Int64    String
─────┼────────────────────
   1 │ 1000000  1,000,000
   2 │  200000  200,000
   3 │   30000  30,000

If you instead want the row replaced, use transform(df, :a => ByRow(f), renamecols=false).
If you just want the output vector rather than changing the DataFrame, you can use format.(df.a, commas=true)

You could write your own function f to achieve the same behavior, but you might as well use the one someone already wrote inside the Format.jl package.

However, once you transform you data to Strings as above, you won't be able to filter/sort/analyze the numerical data in the DataFrame. I would suggest that you apply the formatting in the printing step (rather than modifying the DataFrame itself to contain strings) by using the PrettyTables package. This can format the entire DataFrame at once.

julia> using DataFrames, PrettyTables

julia> df = DataFrame(a = [1000000, 200000, 30000], b = [500, 6000, 70000])
3×2 DataFrame
 Row │ a        b     
     │ Int64    Int64 
─────┼────────────────
   1 │ 1000000    500
   2 │  200000   6000
   3 │   30000  70000

julia> pretty_table(df, formatters = ft_printf("%'d"))
┌───────────┬────────┐
│         a │      b │
│     Int64 │  Int64 │
├───────────┼────────┤
│ 1,000,000 │    500 │
│   200,000 │  6,000 │
│    30,000 │ 70,000 │
└───────────┴────────┘
夏の忆 2025-02-20 22:26:45

(编辑以反映问题中的更新规格)

julia> df = DataFrame(a = [150000, 27, 16614]);

julia> function insertdecimalcomma(n)
         if n < 100
           return string(n) * ",00"
         else
           return replace(string(n), r"(..)$" => s",\1")
         end
       end
insertdecimalcomma (generic function with 1 method)

julia> df.b = insertdecimalcomma.(df.a)

julia> df
3×2 DataFrame
 Row │ a       b       
     │ Int64   String  
─────┼─────────────────
   1 │ 150000  1500,00
   2 │     27  27,00
   3 │  16614  166,14

请注意,此更改后, b 列必定是 string ,因为整数类型无法在其中存储格式化信息。

如果您有很多数据并发现需要更好的性能,则可能还需要使用 inlinestress 软件包:

julia> #same as before upto the function definition

julia> using InlineStrings

julia> df.b = inlinestrings(insertdecimalcomma.(df.a))
3-element Vector{String7}:
 "1500,00"
 "27,00"
 "166,14"

这将 b 列的数据存储为固定大小的字符串( String7 在此处键入),通常将其视为普通 String s,但对于性能而言可能会更好。

(Edited to reflect the updated specs in the question)

julia> df = DataFrame(a = [150000, 27, 16614]);

julia> function insertdecimalcomma(n)
         if n < 100
           return string(n) * ",00"
         else
           return replace(string(n), r"(..)
quot; => s",\1")
         end
       end
insertdecimalcomma (generic function with 1 method)

julia> df.b = insertdecimalcomma.(df.a)

julia> df
3×2 DataFrame
 Row │ a       b       
     │ Int64   String  
─────┼─────────────────
   1 │ 150000  1500,00
   2 │     27  27,00
   3 │  16614  166,14

Note that the b column will necessarily be a String after this change, as integer types cannot store formatting information in them.

If you have a lot of data and find that you need better performance, you may also want to use the InlineStrings package:

julia> #same as before upto the function definition

julia> using InlineStrings

julia> df.b = inlinestrings(insertdecimalcomma.(df.a))
3-element Vector{String7}:
 "1500,00"
 "27,00"
 "166,14"

This stores the b column's data as fixed-size strings (String7 type here), which are generally treated like normal Strings, but can be significantly better for performance.

孤独岁月 2025-02-20 22:26:45

可能有人会在这里搜索一个空间的千分隔符,我煮了以下正则转录的转录:

#first conversion of a float to string (w/o scientific notation)
str = @sprintf("%.2f",nbr) 
matches = eachmatch(r"\d{1,3}(?=(?:\d{3})+(?!\d))", str)
decimal_match = match(r"\d{3}\.\d+$", str)
... vcat the vector of matches with decimal part, then join with " "... 

您只需要处理数字up&lt; 100.00因为正则不考虑它们(这很简单,但要完成工作)。我将此函数用作格式化框架的格式化。

May be someone will search here for a space thousand separator I cooked up the following regex based transcription :

#first conversion of a float to string (w/o scientific notation)
str = @sprintf("%.2f",nbr) 
matches = eachmatch(r"\d{1,3}(?=(?:\d{3})+(?!\d))", str)
decimal_match = match(r"\d{3}\.\d+
quot;, str)
... vcat the vector of matches with decimal part, then join with " "... 

You only need to deal with numbers up < 100.00 as the regex doesn't account for them (it's simplistic but do the job). I use this function as a formatter in a pretty print of a DataFrame.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文