如何从已经包含双引号的文件中读取字符串?

发布于 2025-01-30 10:18:45 字数 664 浏览 3 评论 0原文

我在.txt文件中有一个名称列表:

“ tim”,“ dave”,“ Simon”

输入将始终是引号中的单个值名,逗号分隔和一行。

我想将它们读为string []名称

我有以下代码,但是输出将它们都带入双引号,这意味着它看起来像:

“ tim”,“”,“ dave”,“”,“ Simon”,“

我也无法使用任何第三方libs。

如何获得它,以便字符串阵列中的每个元素只有一组双引号?

String[] names = {};

// arraylist to store strings
List<String> listOfStrings = new ArrayList<String>();

// load content of file based on specific delimiter
Scanner sc = new Scanner(new FileReader("names.txt")).useDelimiter(",");
String str;

while (sc.hasNext()) {
    str = sc.next();
    listOfStrings.add(str);
}

I have a list of names in a .txt file which are in the format:

"Tim", "Dave", "Simon"

The input will always be single value names in quotes, comma separated and on a single line.

I want to read these into String[] names.

I have the following code, but the output puts each of them in double quotes, meaning it looks like:

""Tim"", ""Dave"", ""Simon""

I'm also not able to use any third party libs.

How do I get it so that each element in the String array only has one set of double quotes?

String[] names = {};

// arraylist to store strings
List<String> listOfStrings = new ArrayList<String>();

// load content of file based on specific delimiter
Scanner sc = new Scanner(new FileReader("names.txt")).useDelimiter(",");
String str;

while (sc.hasNext()) {
    str = sc.next();
    listOfStrings.add(str);
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

倾听心声的旋律 2025-02-06 10:18:45

我在.txt文件中具有字符串格式的名称列表:

实际上不是;那不是“字符串格式”;实际上,没有“字符串格式”之类的东西。

鉴于输入文件包含引号,并且您知道这些引号实际上不是输入的一部分,而只是划定输入,我们可以减少有关实际是什么格式的合理猜测。仅到两种常用格式,实际上:

标准CSV格式

“ CSV”(“字符分离值”)是一种极为常见的数据互换格式。不幸的是,没有规格。但是到目前为止,这种格式最常见的“采用”涉及以下逃脱规则:

  • Newline分开记录。
  • 一些指定的字符将单个记录中的2个项目分开;通常是逗号,一个选项卡或半龙 - 您的输入中显然是逗号。
  • 那么..如果其中一个物品包含一个逗号或newline,该怎么办?通常的答案是在这种情况下将输入包装在引号中,有时,CSV输出工具引用了所有内容,即使不需要(例如,也许是您的示例)。但是,这就提出了另一个问题:如果输入包含引号,该怎么办。然后,答案是将它们加倍。因此,文字字符串:Jane说:“好吧,您好!”example.csv中:
"Jane said: ""Well, hello there!"""

甚至有一个标准: rfc 4180 。这是一名斗篷。随意快速查看它。

Backslash-Escape CSV

是一种替代方案,鉴于大约90%的所有编程语言具有这样有效的字符串常数,它已变得更加流行,就是将后斜线符号视为逃生符号:后斜线总是遵循一个字符和角色和一个角色和一起告诉您基于查找表的实际意图。常见的逃逸是:

  • \ n - &gt;那是一个newline
  • \ t - &gt; TAB
  • \“ - &gt; a文字引用
  • \, - &gt; a字面逗号

和其他几个(\ r> 的。

\ f ,\ b\ 123\ u1234都是常见 此文本文件的来源告诉您哪种格式,或者通过获得包含此类字符串的更复杂的输入。 ,将其导出到该文本文件,然后将

其解析为

复杂的代码

很 通常的方法是使用 opencsv - 这是一个教程,可以通过如何使用

它只需要一个字符串带有tim,dave,simon的字符串

,这不是您的输入文件所说的;显然,您的输入文件是某种未知格式的,您将不得不说明您从文本文件中包含“ tim”,“ dave”,“ simon” 在单个字符串变量中,需要Tim,Dave,Simon。也许输入确实是在CSV中,您只是希望每个项目将连接在一起,并由逗号分开。在这种情况下,请使用OpenCSV读取它,然后编写将项目连接所需的非常简单的代码。 OpenCSV可以为您提供list&lt; string&gt;来表示输入的“行” - 将其转换为单个逗号分隔字符串,这很容易:

String[] csvLine = opencsv.readNext();
String output = String.join(", ", csvLine);
assert output.equals("Tim, Dave, Simon");

I have a list of names in a .txt file which are already in a String format:

They actually aren't; that is not 'string format'; there is in fact no such thing as 'string format'.

Given that the input file contains quotes and you know those quotes aren't literally part of the input, merely delimiting the input, we can reduce the reasonable guesses as to what format this actually is. Down to just two commonly used formats, in fact:

Standard CSV format

"CSV" ("Character Separated Values") is an extremely common data interchange format. Unfortunately, there is no spec. But by far the most common 'take' on this format involves the following escaping rules:

  • Newline separates record.
  • Some specified character separates 2 items in a single record; usually a comma, a tab, or a semicolon - clearly comma in your input.
  • So.. what to do if one of the items contains, literally, a comma or newline? The usual answer is to enclose the input in quotes in this case, and sometimes, a CSV output tool quote-delimits everything, even if it wasn't needed (Such as, presumably, your example). However, this then raises yet another question: What if the input contains quotes. Then, the answer is to double them up. So, the literal string: Jane said: "Well, hello there!" becomes, in example.csv:
"Jane said: ""Well, hello there!"""

There's even a standard for this: RFC 4180. It's a one-pager. Feel free to have a quick look at it.

backslash-escape CSV

An alternative, that has become more popular given that about 90% of all programming languages have string constants that work like this, is to treat the backslash symbol as an escape symbol: A backslash is always followed by a character and the pair together tell you what's actually intended based on a lookup table. The common escapes are:

  • \n -> That's a newline
  • \t -> a tab
  • \" -> a literal quote
  • \, -> a literal comma

and a few more (\r, \f, \b, \123, \u1234 are all somewhat common).

There's simply no way to know unless the source of this text file tells you which format it is, or by getting more complicated inputs that contain such strings. If you can control the actual literal text that is outputted, make a complicated string with newlines and commas and double quotes in the literal text, export it to this text file and see what it looks like.

So how do I parse this?

It's very complicated - code that properly parses it all is many pages long. You're in luck, though! Plenty of libraries exist.

The usual way to go is to use OpenCSV - that's a tutorial that takes you through how to use it.

I just want one string with literally Tim, Dave, Simon

Well, that's just not what your input file says; clearly then your input file is in some unknown format, and you're going to have to explain how in the blazes you get from the notion that the text file contains "Tim", "Dave", "Simon" to desiring Tim, Dave, Simon in a single string variable. Perhaps the input is indeed in CSV and you simply want each item concatenated together, separated out by a comma. In which case, use OpenCSV to read it, and then write the very simple code required to concatenate the items. OpenCSV can give you a List<String> to represent a 'line' of input - to turn that into a single comma separated string, that's easy:

String[] csvLine = opencsv.readNext();
String output = String.join(", ", csvLine);
assert output.equals("Tim, Dave, Simon");
不弃不离 2025-02-06 10:18:45

对不起。其实这更好

add(s.replace("\"", ""));

Sorry. Actually this is better

add(s.replace("\"", ""));
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文