At least in substantial part, this is my day job. From your Question, it seems you are thinking of the discipline of Machine Learning (rather than the broader rubric, AI). And i think your instincts are correct--an ML algorithm is ideally suited to fraud prediction/detection because it can generalize over a highly non-linear domain and it can adapt (as new data is fed to it). So because of these two primary characteristics, it is far more difficult for fraudsters to discern the algorithms' "rules" for prediction--because these rules are in fact a complexly reticulated set of soft-constraints and which change over time as the algorithm learns against new data. (I might suggest though setting aside A* unless you have a particular reason to believe pathfinding is a useful heuristic for your problem--i am reluctant to say there is no connection, but if there is, it's certainly an unorthodox one--i have never seen pathfinding applied to this sort of problem).
The only fact you mentioned about the type of online fraud you are interested in identifying is multiple accounts by a single user. No doubt a variety of techniques could be applied here, but i'll mention one analytical technique in particular because: (i) i have actually used it in the scenario you mentioned; and (ii) it is outside the scope of the other Answers, so far.
The technique is based in graph theory.
The premise: accounts which are owned by the same user are often best identified not by their individual behavior (clickstream) but by their relationship to one another--in other words by their network behavior.
An example: chip-dumping in online poker. Here, an individual opens multiple new accounts on a poker site (using bogus information) and then claims the advertised bonus for each account (e..g, deposit of $100 is matched by a $100 bonus). Of course, the bonus has highly restrictive "cash-out rules, generally a threshold number of hands played before the bonus becomes like cash and can be withdrawn from the player's accounts as cash.
So the goal of chip dumping is to turn those bonus dollars in to real cash. One person opens five separate accounts (as five different people) then opens one more "legitimate" account (using their genuine identity). These six players--again actually just a single player--will play at one table against each other and the five sham accounts will quickly lose their stacks to the legitimate account, which quickly cashes out their winnings because of course the cash-out restrictions on bonuses apply only to the account to which they were originally given; hence the cash-out restrictions are completely circumvented.
What's difficult about this type of scheme is that the illegal conduct is virtually impossible to detect on an individual account basis--*the bad behavior, collusion, arises from the interaction of a group of commonly-owned accounts*--in other words, the behavior of interest needs to be studied at the network level.
And therefore, Graph Theory is a natural framework for analysis.
The fraud scenario at the heart of this paper is this: a seller on eBay wishes to sell a very expensive item (which they likely don't even own, but in any event, have no intention of ever shipping to the buyer) to a willing buyer. In order to induce the innocent buyer to willingly engage in the transaction, the fraudulent seller first acquires a very high (artificially high) reputation by engaging in a number of "successful" sales of items to a group of buyers; these buyers are often sham accounts controlled by the buyer.
More specifically, the authors of this Paper combine data across two levels (account level and network level) using a Belief Propagation algorithm over a Markov Random Field.
The signature graph structure, by the way, is known as a bipartite core, arising from a group of accounts which have a very high number of transactions among the members of this group, but very few outside of this group (i.e., with the rest of the eBay community).
If you have access to the user's game-movements log you could use clustering to group users who play 'similar'. Once you have the clusters you could use the IP to filter users inside each cluster.
Another approach may be to use a supervised-learning algorithm like Desicion-Trees, IBK, etc. But for this to work you need a training set with samples of users you already know have cheated.
You can use Weka data mining software to find patterns inside the data. And it has an option to connect directly to a database. It includes clustering, desicion-trees, ibk and a lot of algorithms to try with. But you need a basic understanding of each algorithm in order to interpret the results.
发布评论
评论(2)
至少在很大程度上,这是我的日常工作。从您的问题来看,您似乎正在考虑机器学习学科(而不是更广泛的标题,人工智能)。我认为你的直觉是正确的——机器学习算法非常适合欺诈预测/检测,因为它可以在高度非线性的领域泛化并且可以适应(当新数据被输入时)。因此,由于这两个主要特征,欺诈者要辨别算法的预测“规则”要困难得多,因为这些规则实际上是一组复杂的网状软约束,并且随着算法的学习而随着时间的推移而变化针对新数据。 (我可能会建议搁置 A*,除非你有特殊的理由相信寻路对于你的问题是一种有用的启发法——我不愿意说没有联系,但如果有的话,那肯定是一个非正统的联系——我从未见过寻路应用于此类问题)。
您提到的关于您有兴趣识别的在线欺诈类型的唯一事实是单个用户的多个帐户。毫无疑问,这里可以应用多种技术,但我将特别提到一种分析技术,因为:(i)我实际上已经在您提到的场景中使用了它;到目前为止,(ii)它超出了其他答案的范围。
该技术基于图论。
前提:同一用户拥有的帐户通常最好不是通过他们的个人行为(点击流)来识别,而是通过他们之间的关系 - 换句话说他们的网络行为。
一个例子:在线扑克中的筹码倾销。此处,某人在扑克网站上开设多个新账户(使用虚假信息),然后领取每个账户的广告奖金(例如,存款 100 美元为并附赠 100 美元奖金)。当然,奖金有严格的“提现规则”,一般来说,在奖金变得像现金一样,可以从玩家的账户中以现金形式提取之前,必须玩到一定的手数阈值。
因此,筹码倾销的目的就是将这些奖金变成现金。一个人开设五个单独的账户(作为五个不同的人),然后再开设一个“合法”账户(使用他们的真实身份)。这六名玩家(实际上只是一个玩家)将在玩。 >每个一个表其他和五个虚假账户将很快将筹码转移到合法账户,合法账户很快就会兑现他们的奖金,因为奖金的兑现限制仅适用于最初获得奖金的账户; 此类计划的困难之处在于
,实际上不可能在个人帐户的基础上检测到非法行为——*不良行为,共谋,源于互动<。 /strong> 一个团体共同拥有的账户*——换句话说,需要在网络层面研究利益行为。
因此,图论是一个自然的分析框架。
我所应用的技术是基于 Chau 等人的一篇学术论文。卡内基梅隆大学,标题为检测在线拍卖商网络中的欺诈性人格< /a> (PDF)。
本文核心的欺诈场景是这样的:eBay 上的卖家希望将一件非常昂贵的商品(他们可能根本不拥有该商品,但无论如何,无意将其运送给买家)出售给愿意的买家。为了诱使无辜的买家自愿参与交易,欺诈卖家首先通过向一群“成功”的商品销售来获得非常高的(人为的高)声誉。买家;这些买家往往是买家控制的虚假账户。
更具体地说,本文的作者使用置信传播跨两个级别(帐户级别和网络级别)组合数据em> 基于马尔可夫随机场的算法。
顺便说一下,签名图结构被称为二分核,由一组交易次数非常多的账户产生该群体的成员中,但该群体之外的人很少(即,与 eBay 社区的其他成员)。
At least in substantial part, this is my day job. From your Question, it seems you are thinking of the discipline of Machine Learning (rather than the broader rubric, AI). And i think your instincts are correct--an ML algorithm is ideally suited to fraud prediction/detection because it can generalize over a highly non-linear domain and it can adapt (as new data is fed to it). So because of these two primary characteristics, it is far more difficult for fraudsters to discern the algorithms' "rules" for prediction--because these rules are in fact a complexly reticulated set of soft-constraints and which change over time as the algorithm learns against new data. (I might suggest though setting aside A* unless you have a particular reason to believe pathfinding is a useful heuristic for your problem--i am reluctant to say there is no connection, but if there is, it's certainly an unorthodox one--i have never seen pathfinding applied to this sort of problem).
The only fact you mentioned about the type of online fraud you are interested in identifying is multiple accounts by a single user. No doubt a variety of techniques could be applied here, but i'll mention one analytical technique in particular because: (i) i have actually used it in the scenario you mentioned; and (ii) it is outside the scope of the other Answers, so far.
The technique is based in graph theory.
The premise: accounts which are owned by the same user are often best identified not by their individual behavior (clickstream) but by their relationship to one another--in other words by their network behavior.
An example: chip-dumping in online poker. Here, an individual opens multiple new accounts on a poker site (using bogus information) and then claims the advertised bonus for each account (e..g, deposit of $100 is matched by a $100 bonus). Of course, the bonus has highly restrictive "cash-out rules, generally a threshold number of hands played before the bonus becomes like cash and can be withdrawn from the player's accounts as cash.
So the goal of chip dumping is to turn those bonus dollars in to real cash. One person opens five separate accounts (as five different people) then opens one more "legitimate" account (using their genuine identity). These six players--again actually just a single player--will play at one table against each other and the five sham accounts will quickly lose their stacks to the legitimate account, which quickly cashes out their winnings because of course the cash-out restrictions on bonuses apply only to the account to which they were originally given; hence the cash-out restrictions are completely circumvented.
What's difficult about this type of scheme is that the illegal conduct is virtually impossible to detect on an individual account basis--*the bad behavior, collusion, arises from the interaction of a group of commonly-owned accounts*--in other words, the behavior of interest needs to be studied at the network level.
And therefore, Graph Theory is a natural framework for analysis.
The technique i have applied was based on an academic paper by Chau et al. at Carnegie Mellon, titled Detecting Fraudulent Personalities in Networks of Online Auctioneers (PDF).
The fraud scenario at the heart of this paper is this: a seller on eBay wishes to sell a very expensive item (which they likely don't even own, but in any event, have no intention of ever shipping to the buyer) to a willing buyer. In order to induce the innocent buyer to willingly engage in the transaction, the fraudulent seller first acquires a very high (artificially high) reputation by engaging in a number of "successful" sales of items to a group of buyers; these buyers are often sham accounts controlled by the buyer.
More specifically, the authors of this Paper combine data across two levels (account level and network level) using a Belief Propagation algorithm over a Markov Random Field.
The signature graph structure, by the way, is known as a bipartite core, arising from a group of accounts which have a very high number of transactions among the members of this group, but very few outside of this group (i.e., with the rest of the eBay community).
如果您有权访问用户的游戏动作日志,则可以使用聚类对玩“类似”游戏的用户进行分组。拥有集群后,您可以使用 IP 来过滤每个集群内的用户。
另一种方法可能是使用监督学习算法,例如 Desicion-Trees、IBK 等。但要使其发挥作用,您需要一个训练集,其中包含您已知作弊的用户样本。
您可以使用 Weka 数据挖掘软件来查找数据中的模式。它有一个选项可以直接连接到数据库。它包括聚类、决策树、ibk 和许多可供尝试的算法。但您需要对每种算法有基本的了解才能解释结果。
If you have access to the user's game-movements log you could use clustering to group users who play 'similar'. Once you have the clusters you could use the IP to filter users inside each cluster.
Another approach may be to use a supervised-learning algorithm like Desicion-Trees, IBK, etc. But for this to work you need a training set with samples of users you already know have cheated.
You can use Weka data mining software to find patterns inside the data. And it has an option to connect directly to a database. It includes clustering, desicion-trees, ibk and a lot of algorithms to try with. But you need a basic understanding of each algorithm in order to interpret the results.