解析非常繁忙的空格分隔文件
我正在努力帮助我的父亲——他给了我一份他工作中的日程安排应用程序的导出信息。我们正在尝试是否可以将其导入 mysql 数据库,以便他/同事可以与其在线协作。
我尝试了许多不同的方法,但似乎没有一个能正常工作——而且这不是我的专业领域。
导出可以在此处查看:http://roikingon.com/export.txt
有关如何导出的任何帮助/建议去解析这个将不胜感激!
谢谢 !!
I'm trying to help my dad out -- he gave me an export from a scheduling application at his work. We are trying to see if we can import it into a mysql database so he/co-workers can collaborate online with it.
I've tried a number of different methods but none seem to work right -- and this is not my area of specialties.
Export can be seen here: http://roikingon.com/export.txt
Any help / advice on how to go about parsing this would be greatly appreciated!
Thanks !!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我尝试编写一个(有点动态的)固定列解析器。看一下:http://codepad.org/oAiKD0e7(对于SO来说太长了,但大部分只是“数据”)。
我注意到
"hello___"
(_
= space)如果你想使用我的代码,还有一些事情要做:
I've made an attempt to write a (somewhat dynamic) fixed-with-column parser. Take a look: http://codepad.org/oAiKD0e7 (it's too long for SO, but it's mostly just "data").
What I've noticed
"hello___"
(_
= space)"___42"
If you want to use my code there's yet stuff to do:
使用该文件结构,您基本上需要对专有格式进行逆向工程。是的,它是空格分隔的,但格式不遵循任何类型的标准,如 CSV、YAML 等。它完全是专有的,似乎是一个标头和带有自己标头的单独部分。
我认为你最好的选择是尝试看看是否有其他类型的导出可以完成,例如 Excel 或 XML,并从那里开始工作。如果没有,则查看是否有某种可以在屏幕上抓取的 html 输出,然后粘贴到 Excel 中,看看会得到什么。
由于我上面提到的一切,将当前形式的文件调整为可以明智地导入数据库的内容将非常困难。 (请注意,从文件结构来看,将需要许多表。)
With that file structure you're basically in need of reverse engineering a proprietary format. Yes, it is space delimited but the format does not follow any kind of standard like CSV, YAML etc. It is completely proprietary with what seems to be a header and separate section with headers of their own.
I think your best bet is to try and see if there's some other type of export that can be done such as Excel or XML and working from there. If there isn't then see if there's an html output of some kind that can be screen scraped, and pasted into Excel and seeing what you get.
Due to everything I mentioned above it will be VERY difficult to massage the file in its current form into something that can be sensibly imported into a database. (Note that from the file structure a number of tables would be needed.)
您可以将 split 与正则表达式一起使用(零个或多个空格)。
我会尽力让你知道。
您的数据似乎没有结构。
试试这个
preg_split("/ +/", $data);
,它将行分割成零个或多个空格,然后你将得到一个可以处理的漂亮数组。但是查看您的数据,没有结构,因此您必须知道哪个数组元素对应于什么数据。祝你好运。
you can use split with a regular expression (zero or more spaces).
I will try and let you know.
There doesnt seem to be a structure with you data.
Try this
preg_split("/ +/", $data);
which splits the line by zero or more spaces, then you will have a nice array, that you can process. But looking at your data, there is no structure, so you will have to know which array element corresponds to what data.Good luck.
用excel打开它并保存为逗号分隔。将连续分隔符视为一个或不视为一个。然后用excel重新保存为csv,这样会以逗号分隔,更容易导入到mysql中。
编辑:
那个说在“[ +]”上使用 preg_split 的人给你的答案基本上与我上面所做的相同。
问题是之后该怎么办。
您确定有多少种“行类型”吗?一旦确定了这一点并定义了它们的特征,编写一些代码来完成它就会容易得多。
如果保存为csv,则可以使用PHP fgetcsv函数及相关函数。对于每一行,您将检查其类型并根据类型执行操作。
我注意到您的数据行可能会根据第一列的数据是否包含“。”进行划分。这是一个如何循环遍历文件的示例。
while($row = fgetcsv($file_handle)) {
if(strpos($row[0],'.') === false) {
// 做某事
} 别的 {
// 做点别的事
}
“
做某事”类似于“CREATE TABLE
table_$row[0]
”或“INSERT INTOtable
”等。好的,这里有更多观察结果
:文件实际上就像多个文件粘在一起。它包含多种格式。请注意,接下来以“4”开头的所有行都有一个 4 个字母的公司缩写,后跟完整的公司名称。其中之一是“可可”。如果您搜索“caco”,您会在文件内的多个“表”中找到它。
我还注意到周围散布着“smuwtfa”(一周中的几天)。
使用这样的线索来确定如何处理每一行的逻辑。
Open it with excel and save it as comma-delimited. Treat consecutive delimiters as one, or not. Then resave it with excel as a csv, which will be comma-separated and easier to import to mysql.
EDIT:
The guy who says to use preg_split on "[ +]" is giving you essentially the same answer as I just did above.
The question is what to do after that, then.
Have you determined yet how many "row types" there are? Once you've determined that and defined their characteristics it will be a lot easier to write some code to go through it.
If you save it in csv, you can use the PHP fgetcsv function and related functions. For each row, you would check it's type and perform operations depending on the type.
I noticed that your data rows could possibly be divided on whether or not the first column's data contains a "." so here's an example of how you might loop through the file.
while($row = fgetcsv($file_handle)) {
if(strpos($row[0],'.') === false) {
// do something
} else {
// do something else
}
}
"do something" would be something like "CREATE TABLE
table_$row[0]
" or "INSERT INTOtable
" etc.Ok, and here's some more observation:
Your file is really like multiple files glued together. It contains multiple formats. Notice all the rows starting with "4" next have a 4-letter company abbreviation followed by full company name. One of them is "caco". If you search for "caco", you find it in multiple "tables" within the file.
I also notice "smuwtfa" (days of the week) sprinkled around.
Use clues like that to determine the logic of how to treat each row.