这段代码还能如何针对防御性编程进行优化?
对于我的数据结构项目,目标是读取提供的文件,其中包含 10000 多首歌曲,其中明确标记了艺术家、标题和歌词,并且每首歌曲均由带有单双引号的行分隔。我编写了这段代码来解析文本文件,并且它可以工作,运行时间不到 3 秒
读取 422K 行文本
创建一个 Song 对象
将所说的歌曲添加到 ArrayList
我编写的解析代码是:
if (songSource.canRead()) { //checks to see if file is valid to read
readIn= new Scanner(songSource);
while (readIn.hasNextLine()) {
do {
readToken= readIn.nextLine();
if (readToken.startsWith("ARTIST=\"")) {
artist= readToken.split("\"")[1];
}
if (readToken.startsWith("TITLE=\"")) {
title= readToken.split("\"")[1];
}
if (readToken.startsWith("LYRICS=\"")) {
lyrics= readToken.split("\"")[1];
} else {
lyrics+= "\n"+readToken;
}//end individual song if block
} while (!readToken.startsWith("\"")); //end inner while loop
songList.add(new Song(artist, title, lyrics));
}//end while not EOF
} //end if file can be read
我正在与我的算法简介教授讨论该项目的代码,他表示我应该尝试在代码中更具防御性,以允许不一致在其他人提供的数据中。最初,我在艺术家、标题和歌词字段之间使用 if/else 块,根据他的建议,我更改为顺序 if 语句。虽然我可以理解他的观点,但使用此代码示例,我如何才能更加防御性地允许输入不一致?
For my data structures project, the goal is to read in a provided file containing over 10000 songs with artist, title and lyrics clearly marked, and each song is separated by a line with a single double quote. I've written this code to parse the text file, and it works, with a running time of just under 3 seconds to
read the 422K lines of text
create a Song object
add said Song to an ArrayList
The parsing code I wrote is:
if (songSource.canRead()) { //checks to see if file is valid to read
readIn= new Scanner(songSource);
while (readIn.hasNextLine()) {
do {
readToken= readIn.nextLine();
if (readToken.startsWith("ARTIST=\"")) {
artist= readToken.split("\"")[1];
}
if (readToken.startsWith("TITLE=\"")) {
title= readToken.split("\"")[1];
}
if (readToken.startsWith("LYRICS=\"")) {
lyrics= readToken.split("\"")[1];
} else {
lyrics+= "\n"+readToken;
}//end individual song if block
} while (!readToken.startsWith("\"")); //end inner while loop
songList.add(new Song(artist, title, lyrics));
}//end while not EOF
} //end if file can be read
I was talking with my Intro to Algorithms professor about the code for this project, and he stated that I should try to be more defensive in my code to allow for inconsistencies in data provided by other people. Originally I was using if/else blocks between the Artist, Title and Lyrics fields, and on his suggestion I changed to sequential if statements. While I can see his point, using this code example, how can I be more defensive about allowing for input inconsistencies?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我会将 例如: 替换
为
其他修改将包括:
I would replace e.g.:
with
Other modifications would include:
您假设输入是完美的。如果您查看应用程序当前的设置方式,根据对算法的快速读取,数据将如下所示
但考虑这种情况
根据您的算法,您现在有 2 首歌曲,其特征为
使用当前算法,艺术家和标题已公开并将出现在第二首歌曲中,即使它们尚未定义。您需要重置三个变量。
在你的 else 中,你只是将完整的台词倾倒到歌词中。如果你已经把歌词拿出来了怎么办,你现在就覆盖了它。测试用例
考虑将此记录发送到错误状态。因此,当批量读取完成时,可以生成并修复错误报告。
另外,您只在读入艺术家后才考虑 EOF。如果在艺术家读取期间发生 EOF,并且文件不以“结尾,该怎么办。您将在那里得到一个异常。在您的 do/while 中添加另一个对 hasNextLine 的检查()
You are assuming that the input is perfect. If you look at the way your application is currently setup, Based on a quick read of your algorithm the data would look like this
But consider the case
Based on your algorithm, you now have 2 songs characterized as
With the current algorithm, the artist and title are exposed and will show up in the 2nd song even though they were not defined. You need to reset your three variables.
in your else you are just dumping the complete line into lyrics. What if you had already pulled Lyrics out, you are now overriding that. Test case
Consider sending this record to an Error state. So when the batch read is completed, an error report can be generated and fixed.
Also you only consider EOF after an artist was read in. What if the EOF occurs during the Artist read, and the file does not end in ". You are going to get an exception there. In your do/while add another check for hasNextLine()
在现实世界中,对数据完整性做出了一些保证。在处理用户输入(无论是来自标准输入还是文件)的情况下,有一些项目定义的范例用于通知用户需要注意的问题。
例如,当编译器编译代码或执行脚本的 shell 遇到不一致时,它可能会停止并打印包含不一致的行,并在其下面使用“^”符号指示问题位置的第二行。
所以这里有一些基本问题要问自己:
1. 是否保证每一行都包含每个字段?
2. 字段的顺序是否有保证?
如果这些是输入合同的条件并且被违反,您应该忽略/报告该行。如果它们不是输入的条件,那么您需要处理它..您目前没有。
In the real world, there are some guarantees made regarding data integrity. In the case of dealing with user input (whether from stdin or a file) there is some project defined paradigm for notifying the user of a problem that requires attention.
For instance, when a compiler compiling code or a shell executing a script encounters an inconsistency it might halt and print the line containing the inconsistency with a second line below it that uses the "^" symbol to indicate the location of the problem.
So here are some basic question to ask yourself:
1. Is every line guaranteed to contain every field?
2. Is the ordering of the fields guaranteed?
If those are conditions of the input contract and are violated, you should ignore/report the line. If they are not conditions of the input, then you need to handle it .. which you currently do not.
杰森,我发现这里缺少一些东西。
我认为 if/else 很好,它不会改变逻辑。但是,您应该尽可能限制变量的范围。通过在 while 循环内声明艺术家、标题等,它们将被初始化为 null(或其他),因此如果某个条目缺少艺术家,则它将无法获得最后一个条目的值。
另外,如果标题、艺术家等中有引用,会发生什么情况?那是怎么处理的呢?歌词好像是多行的,对吧?
如果有一个未知字段(可能是拼写错误)会发生什么?它会被添加到歌词的末尾,这似乎不正确。仅当找到 LYRICS 字段后,您才应将其附加到该字段。如果歌词为空,那么它将以“null”开头。
I see a couple of things that are missing here Jason.
I think the if/else was fine and it won't change the logic. However, you should restrict the scope of your variables as much as possible. By declaring artist, title, etc. inside of the while loop, they will be initialized to null (or whatever) so if an entry is missing the artist then it won't get the last entry's value.
Also, what happens if title, artist, etc. has a quote in it? How is that handled? How about the Lyrics which seem to be multiple lines right?
What happens if there is an unknown field -- maybe a misspelling? It will be added to the end of Lyrics which doesn't seem right. Only once the LYRICS field has been found should you append to it. If lyrics is null then it will start with "null".
以下是一些可以解决的问题:
您的代码假设(例如)“ARTIST”之前没有空格,“=”符号周围没有空格等等。
您的代码假定关键字全部大写。有人可能会使用小写或混合大小写。
您的代码假定不以
keyword=\"
开头的行是歌曲歌词的延续。但是如果用户输入ARTOST="Sting"
会怎样? >?或者如果用户尝试使用两行作为艺术家姓名怎么办?最后,我不相信在这种情况下用“if”替换“else if”会对代码的稳健性产生任何影响。
Here are some issues that could be addressed:
Your code assumes that there is no whitespace before (for example) "ARTIST", none around the "=" sign and so on.
Your code assumes that the keywords are in all-caps. Someone could use lowercase or mixed case.
Your code assumes that a line that does not start with
keyword=\"
is a continuation of the song's lyrics. But what if the user enteredARTOST="Sting"
? Or what if the user tried to use two lines for an artist name?Finally, I'm not convinced that replacing "else if" with "if" in this case has made any difference to the code's robustness.
处理异常(我猜 Scanner 可能会因无效字符抛出 InputMismatchException )。
如果文件格式错误,并且到达文件末尾,则看起来
do { } while (...)
可以无限循环。没有什么可以阻止
artist
或title
为空。Deal with exceptions (I guess Scanner could throw InputMismatchException for an invalid character).
It looks like the
do { } while (...)
can loop endlessly if the file is ill-formed, and the end of the file is reached.Nothing prevents
artist
ortitle
from being empty.