The common recommendation is "find/change all the code by hand". When the code base gets big, this gets to be a problem, as you have observed.
I'll note your problem is much like the Y2K problem, which I characterize as a special case of "field expansion" (which has happened with phone numbers, license plates, bar codes,transaction ids in large scale stock trading systems, will happen with social security numbers).
What is ideally needed is a tool that can identify all the instances of the problem data, and for each instance, determine what code changes are needed there. For the Y2K problem, one had to find the date fields with 2 digit years, and for each occurrence of such data in the code, patch that code (e.g., expand data declarations to include 2 more digits, remove "19"+ string concatenations that generate 4 digit dates from 2 digit dates, etc.
Finding the data itself can be hard. How do you know something is a date (or in your case, an extended id)? Fundamentally you need to identify the sources or sinks of such data (e.g., date fields on screens, calls to get_current_year, compare to other things already knows as dates, etc.) and trace where that data flows (to arguments in calls, to assignments of copies, to prints, .... [The Y2K guys also used X mod 100 == 0 as a hint that something was a year because this computation is likely a leapyear check and therefore the involved data must be a year].
Then for each use of the data, you need to decide what to do about that use: leave it alone (date copies aren't wrong if they work when extended), fix it (e.g., remove the addition of century prefixes, etc.). For your extended IDs, what matters here is what kinds of things can be done to extended IDs? can they be torn apart into the digit section and alphanumeric section? Does the first alpha letter signify something by itself? Based on the answers to these questions, it is generally obvious what to do at each point of use in the code.
Now in fact, you can do all of the above by hand, and that's at least more organized than "give it to the programmers and let them do as they will".
But in fact, like the Y2K adventure, you can get tools (much better than Y2K tools) to automate most of this. Such tools have to be capable of processing the programming langauges of interest (you didn't say what you had) with compiler-level semantic analysis (e.g., knows the language data type), must be able to match sources/sinks of the data type, must be able to follow data flows ("flow analysis" in the langauge of the compiler community), and be able to mechanically apply usage-specific transformations.
The tools that can do this are called program transformation systems. Most of these tools can apply source-to-source transformations like the following:
[This example format here is for our DMS Software Reengineering Toolkit]. We are assuming that 2-digit dates are represented as strings and we want to find/fix these. A rule has a name (so human beings can name the specific rule of interest, just like functions in C have names) and a source and replacement pattern separated by rewrites to. The " surrounding the source and target patterns are meta-quotes, and indicate the text inside the meta-quotes are from the programming langauge named in the domain. The reason the \" inside the domain metaquotes are backslashed, is to allow domain/language specific quotes inside the pattern. The \s represents in subexpression which is part of the concatenation expression. The pattern definition allows one to match possible sources of dates.]
So the rules describe how to handle each of the cases encountered, but they have to be qualified by use of entities of the appropriate datatype; you don't want the above rule to run on every string concatenation. Most of the existing program transformation tools don't provide much help here.
DMS does provide the ability, at least for C, Java and COBOL, to do quite serious data flow tracing. So you'd have to revise the rule:
rule remove_century_prefix(s: sum): expression -> expression
" \"19\"+\s "
rewrites to
" \s " if is_date(s);
where is_date detects a data flow (using DMS's built-in flow analysis machinery), and the patterns for recognizing the generation of a date as shown above.
Using such program transformation machinery, you can automate a large part of such field expansion tasks.
此外,对于文本文件,您可以在 Windows 中搜索包含该字段的文件,然后将所有文件添加到 notepad++ 中,然后“查找并替换”notepad++ 中的所有文件。
对于 Excel 文件和以不可读形式(我的意思是非文本形式)存储数据的任何其他格式,最好使用 Apache POI 等通过一些 java 程序进行编辑。
You can try a java program and edit the field in the program. Please note to have multiple logic for reading and writing for multiple file-types.
Also, for text files you can do a search in windows for files that contain the field and then add all the files into notepad++ and then "find and replace" for all files in the notepad++.
For excel files and any other format that stores the data in unreadable form (I mean non text form), it is better to edit through some java program using Apache POI etc.
您需要一次在一个应用程序中添加支持。这意味着应用程序应该能够处理这两种 id 而不会崩溃。还添加某种配置标志,可以将其设置为以新方式开始生成 ids(但尚未启用它)。
对每个应用程序执行此操作并进行测试。
当所有应用程序都经过测试后,只需更改其配置即可开始生成新的 Id。
There is not pattern for this since the Ids are validated.
You need to add support in one application at a time. Which means that the application should be able to handle both kind of ids without breaking down. Also add some sort of configuration flag which can be set to start generate ids in the new way (but don't enable it yet).
Do this for each application and test it.
When all applications have been tested, simply change their configurations so that they start generating the new Ids.
这次没人会救你了。将 ID 验证移至在一个位置管理并共享的一组组件 - 这样您下次就能够快速更改格式。如果所有或大多数应用程序都在网络上,您可以添加“加载新格式定义”功能并以这种方式为所有应用程序分发正则表达式。
Nobody will save you this time. Move ID validation(s) to a set of components managed on one place and shared - this way you'll ba able to change the format quickly next time. If all or majority of the applications are on network, you can add "load new format definition" functionality and distribute regular expressions for al the apps aou there this way.
发布评论
评论(4)
常见的建议是“手动查找/更改所有代码”。正如您所观察到的,当代码库变大时,这就会成为一个问题。
我会注意到你的问题很像Y2K问题,我将其描述为“领域扩展”的特殊情况(这种情况在大规模股票交易系统中发生在电话号码、车牌、条形码、交易ID上,将会发生与社会安全号码)。
理想情况下,我们需要一个工具来识别问题数据的所有实例,并针对每个实例确定需要进行哪些代码更改。对于 Y2K 问题,必须找到具有 2 位年份的日期字段,并且对于代码中每次出现此类数据,修补该代码(例如,扩展数据声明以包含另外 2 位数字,删除 “19” + 字符串连接,从 2 位日期生成 4 位日期,等等。
查找数据本身可能很困难,您如何知道某个东西是日期(或者在您的情况下,是一个扩展的 id)?识别此类数据的源或接收器(例如,屏幕上的日期字段、对 get_current_year 的调用、与其他已知的日期进行比较等)并跟踪数据流向的位置(调用中的参数、副本的分配、打印、 .... [Y2K 人员还使用 X mod 100 == 0 作为暗示某物是一年,因为此计算可能是闰年检查,因此涉及的数据必须是一年]。
然后对于每个使用数据,您需要决定如何处理该使用:不要管它(如果扩展时日期副本不会出错),修复它(例如,删除世纪的添加前缀等)。对于你的扩展ID来说,这里重要的是可以对扩展ID做哪些事情?它们可以被分成数字部分和字母数字部分吗?第一个字母本身是否表示某种含义?根据这些问题的答案,通常很明显在代码中的每个使用点要做什么。
现在实际上,您可以手动完成上述所有操作,这至少比“交给程序员并让他们按照自己的意愿去做”更有组织性。
但事实上,就像 Y2K 冒险一样,您可以获得工具(比 Y2K 工具好得多)来自动化大部分工作。此类工具必须能够通过编译器级语义分析(例如,了解语言数据类型)来处理感兴趣的编程语言(您没有说您拥有什么),必须能够匹配数据的源/接收器类型,必须能够遵循数据流(编译器社区语言中的“流分析”),并且能够机械地应用特定于用法的转换。
可以做到这一点的工具称为程序转换系统。这些工具中的大多数都可以应用源到源的转换,如下所示:
[此处的示例格式适用于我们的 DMS 软件再工程工具包]。我们假设 2 位日期表示为字符串,并且我们希望找到/修复这些日期。 规则有一个名称(因此人类可以命名感兴趣的特定规则,就像 C 中的函数有名称一样)以及由 分隔的源和替换模式重写为。围绕源模式和目标模式的 " 是元引号,表示元引号内的文本来自域中命名的编程语言。 < strong>\" 域内的元引号是反斜杠的,是为了允许在模式内使用域/语言特定的引号。 \s 表示 in 子表达式,它是串联表达式的一部分。模式定义允许匹配可能的日期源。]
因此,规则描述了如何处理遇到的每种情况,但必须使用适当数据类型的实体来限定它们;您不希望上述规则在每个字符串连接上运行。大多数现有的程序转换工具在这里并没有提供太多帮助。
DMS 确实提供了进行相当严格的数据流跟踪的能力,至少对于 C、Java 和 COBOL 来说是这样。因此,您必须修改规则:
其中 is_date 检测数据流(使用 DMS 的内置 流分析机制),以及识别日期生成的模式如上所示。
使用此类程序转换机制,您可以自动执行大部分此类现场扩展任务。
The common recommendation is "find/change all the code by hand". When the code base gets big, this gets to be a problem, as you have observed.
I'll note your problem is much like the Y2K problem, which I characterize as a special case of "field expansion" (which has happened with phone numbers, license plates, bar codes,transaction ids in large scale stock trading systems, will happen with social security numbers).
What is ideally needed is a tool that can identify all the instances of the problem data, and for each instance, determine what code changes are needed there. For the Y2K problem, one had to find the date fields with 2 digit years, and for each occurrence of such data in the code, patch that code (e.g., expand data declarations to include 2 more digits, remove "19"+ string concatenations that generate 4 digit dates from 2 digit dates, etc.
Finding the data itself can be hard. How do you know something is a date (or in your case, an extended id)? Fundamentally you need to identify the sources or sinks of such data (e.g., date fields on screens, calls to get_current_year, compare to other things already knows as dates, etc.) and trace where that data flows (to arguments in calls, to assignments of copies, to prints, .... [The Y2K guys also used X mod 100 == 0 as a hint that something was a year because this computation is likely a leapyear check and therefore the involved data must be a year].
Then for each use of the data, you need to decide what to do about that use: leave it alone (date copies aren't wrong if they work when extended), fix it (e.g., remove the addition of century prefixes, etc.). For your extended IDs, what matters here is what kinds of things can be done to extended IDs? can they be torn apart into the digit section and alphanumeric section? Does the first alpha letter signify something by itself? Based on the answers to these questions, it is generally obvious what to do at each point of use in the code.
Now in fact, you can do all of the above by hand, and that's at least more organized than "give it to the programmers and let them do as they will".
But in fact, like the Y2K adventure, you can get tools (much better than Y2K tools) to automate most of this. Such tools have to be capable of processing the programming langauges of interest (you didn't say what you had) with compiler-level semantic analysis (e.g., knows the language data type), must be able to match sources/sinks of the data type, must be able to follow data flows ("flow analysis" in the langauge of the compiler community), and be able to mechanically apply usage-specific transformations.
The tools that can do this are called program transformation systems. Most of these tools can apply source-to-source transformations like the following:
[This example format here is for our DMS Software Reengineering Toolkit]. We are assuming that 2-digit dates are represented as strings and we want to find/fix these. A rule has a name (so human beings can name the specific rule of interest, just like functions in C have names) and a source and replacement pattern separated by rewrites to. The " surrounding the source and target patterns are meta-quotes, and indicate the text inside the meta-quotes are from the programming langauge named in the domain. The reason the \" inside the domain metaquotes are backslashed, is to allow domain/language specific quotes inside the pattern. The \s represents in subexpression which is part of the concatenation expression. The pattern definition allows one to match possible sources of dates.]
So the rules describe how to handle each of the cases encountered, but they have to be qualified by use of entities of the appropriate datatype; you don't want the above rule to run on every string concatenation. Most of the existing program transformation tools don't provide much help here.
DMS does provide the ability, at least for C, Java and COBOL, to do quite serious data flow tracing. So you'd have to revise the rule:
where is_date detects a data flow (using DMS's built-in flow analysis machinery), and the patterns for recognizing the generation of a date as shown above.
Using such program transformation machinery, you can automate a large part of such field expansion tasks.
您可以尝试一个java程序并在程序中编辑该字段。请注意,要有多个逻辑来读取和写入多种文件类型。
此外,对于文本文件,您可以在 Windows 中搜索包含该字段的文件,然后将所有文件添加到 notepad++ 中,然后“查找并替换”notepad++ 中的所有文件。
对于 Excel 文件和以不可读形式(我的意思是非文本形式)存储数据的任何其他格式,最好使用 Apache POI 等通过一些 java 程序进行编辑。
You can try a java program and edit the field in the program. Please note to have multiple logic for reading and writing for multiple file-types.
Also, for text files you can do a search in windows for files that contain the field and then add all the files into notepad++ and then "find and replace" for all files in the notepad++.
For excel files and any other format that stores the data in unreadable form (I mean non text form), it is better to edit through some java program using Apache POI etc.
由于 ID 已经过验证,因此没有任何模式。
您需要一次在一个应用程序中添加支持。这意味着应用程序应该能够处理这两种 id 而不会崩溃。还添加某种配置标志,可以将其设置为以新方式开始生成 ids(但尚未启用它)。
对每个应用程序执行此操作并进行测试。
当所有应用程序都经过测试后,只需更改其配置即可开始生成新的 Id。
There is not pattern for this since the Ids are validated.
You need to add support in one application at a time. Which means that the application should be able to handle both kind of ids without breaking down. Also add some sort of configuration flag which can be set to start generate ids in the new way (but don't enable it yet).
Do this for each application and test it.
When all applications have been tested, simply change their configurations so that they start generating the new Ids.
这次没人会救你了。将 ID 验证移至在一个位置管理并共享的一组组件 - 这样您下次就能够快速更改格式。如果所有或大多数应用程序都在网络上,您可以添加“加载新格式定义”功能并以这种方式为所有应用程序分发正则表达式。
Nobody will save you this time. Move ID validation(s) to a set of components managed on one place and shared - this way you'll ba able to change the format quickly next time. If all or majority of the applications are on network, you can add "load new format definition" functionality and distribute regular expressions for al the apps aou there this way.