按照YouTube视频()关于使用数据集训练模型( https://colab.research.google.com/drive/1uxmenx_maio0ni26cg9h6mtjcrfafwir?usp = sharing#scrollto = soqdgftza-gl )。
在此代码中:
X = df[df.columns[3:4]].values
y = df[df.columns[-1]].values
我使用了“ 3:4”的位置,因为是唯一具有数字值的列(在我自己的数据集中)。
之后,我执行以下代码:
over = RandomOverSampler()
X, y = over.fit_resample(X, y)
data = np.hstack((X, np.reshape(y, (-1,1))))
transformed_df = pd.DataFrame(data, columns=df.columns)
但是当我在COLAB中执行它时,我会收到:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-222-eaff3b92a197> in <module>()
1 data = np.hstack((X, np.reshape(y, (-1,1))))
----> 2 transformed_df = pd.DataFrame(data, columns=df.columns)
2 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in _check_values_indices_shape_match(values, index, columns)
391 passed = values.shape
392 implied = (len(index), len(columns))
--> 393 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
394
395
ValueError: Shape of passed values is (2270, 2), indices imply (2270, 4)
有人可以帮助我解决这个问题吗?
提前致谢!
添加所有信息!
IA的目的是,我给它一个文本,并基于我的数据集的1000行,如果我想查看或不查看它,请自动检索,如果可能的话,请与类别进行分类。
数据集示例:
类别|部门|新闻|审查
恶意软件|其他部门|小丑恶意软件超过500 000 HUAWEI ANDROID设备 - 超过500,000 Huawei用户已从该公司的官方Android商店应用程序下载,这些官方的Android商店申请感染了Joker Malware,这些Joker Malware订阅了10个,这些杂货店已订阅了10个,发现了10个研究人员。 AppGallery中看似无害的应用程序包含用于连接恶意命令和控制服务器的代码以接收配置和其他组件。由Antivirus Maker Maker Doctor functional AppSA报告屏蔽,恶意应用程序保留了其广告的功能,但下载了将用户订购到Premium的组件移动服务。为了使用户处于深色状态,受感染的应用程序要求访问通知,这使他们能够拦截订阅服务通过SMS传递的确认代码。根据研究人员的方式,恶意软件可以订阅用户最多五个服务,尽管威胁演员可以随时修改此限制。恶意应用程序列表包括虚拟键盘,相机应用程序,启动器,在线信使,贴纸系列,着色程序和游戏。其中大多数来自一个开发人员(Shanxi Kuailaipai Network Technology Co.,Ltd。),另外两个。 Web Doctor Web说,这十个应用程序由538,000多名华为用户下载。尽管新用户无法再下载它们,但是那些已经在设备上运行的应用程序的用户需要运行手动清理。下表列出了应用程序的名称及其软件包的名称:应用程序名称包名称super键盘com.nova.nova.superkeyboard Happy Color com.colour.syuhgbvcff fun fun com.funcolor.toucheffects new 2021键盘Photo Video Camera com.sdkfj.uhbnji.dsfeff BeautyPlus Camera com.beautyplus.excetwa.camera Color RollingIcon com.hwcolor.jinbao.rollingicon Funney Meme Emoji com.meme.rouijhhkl Happy Tapping com.tap.tap.duedd All-in-One Messenger com.messenger.sjdoifothe研究人员说,AppGallery中受感染应用下载的相同模块也存在于Google Play上的其他应用程序中,该应用程序由Joker Malware的其他版本使用。妥协指标的完整列表可在此处找到。在Active时,恶意软件将通信到其远程服务器以获取配置文件,该文件包含任务列表,高级服务的网站,JavaScript,JavaScript模仿用户互动。早在2017年就很遥远,并且在通过Google Play商店分发的应用中不断找到自己的方式。 2019年10月,卡巴斯基(Kaspersky)的Android恶意软件分析师Tatyana Shishkova在推特上发布了大约70多个折衷的应用程序,这些应用程序已进入了官方商店。以及有关Google Play中有关恶意软件的报告。 2020年初,Google宣布自2017年以来,它已删除了大约1,700个被Joker感染的应用程序。last 2月,Joker仍在商店里,并且即使在去年7月,它仍继续超越Google的防御。| 0
完整代码:解决
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from imblearn.over_sampling import RandomOverSampler
df = pd.read_csv('news_parsed_end.csv', delimiter='|', error_bad_lines=False)
X = df[df.columns[3:4]].values
y = df[df.columns[-1]].values
over = RandomOverSampler()
X, y = over.fit_resample(X, y)
data = np.hstack((X, np.reshape(y, (-1,1))))
transformed_df = pd.DataFrame(data, columns=df.columns)
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.6, random_state=1000)
X_valid, X_test, y_valid, y_test = train_test_split(X_temp, y_temp, test_size=0.2, random_state=1000)
model = tf.keras.Sequential([
tf.keras.layers.Dense(16, activation='relu'), # if x <= 0 --> 0, x > 0 --> x
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.005),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=['accuracy'])
model.evaluate(X_train, y_train)
model.evaluate(X_valid, y_valid)
model.fit(X_train, y_train, batch_size=16, epochs=200, validation_data=(X_valid, y_valid))
model.evaluate(X_test, y_test)
方案:解决方案:解决方案:解决方案:解决方案: :
这要感谢@hpaulj!
我添加了以下内容:
data = np.hstack((X, np.reshape(y, (-1,1))))
data = data.reshape(1135,4)
data.shape
现在我可以做到这一点:
transformed_df = pd.DataFrame(data, columns=df.columns)
Following a youtube video (https://www.youtube.com/watch?v=VtRLrQ3Ev-U) about training a model with dataset (https://colab.research.google.com/drive/1UxmeNX_MaIO0ni26cg9H6mtJcRFafWiR?usp=sharing#scrollTo=sOqdGfTza-Gl).
In this code:
X = df[df.columns[3:4]].values
y = df[df.columns[-1]].values
I used the positions "3:4" because is the only column with numeric value (in my own dataset).
After this, i execute the following code:
over = RandomOverSampler()
X, y = over.fit_resample(X, y)
data = np.hstack((X, np.reshape(y, (-1,1))))
transformed_df = pd.DataFrame(data, columns=df.columns)
But when I execute it in Colab, i receive this:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-222-eaff3b92a197> in <module>()
1 data = np.hstack((X, np.reshape(y, (-1,1))))
----> 2 transformed_df = pd.DataFrame(data, columns=df.columns)
2 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in _check_values_indices_shape_match(values, index, columns)
391 passed = values.shape
392 implied = (len(index), len(columns))
--> 393 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
394
395
ValueError: Shape of passed values is (2270, 2), indices imply (2270, 4)
Somebody can help me to fix this?
THanks in advance!
Adding all the information!
Purpose of the IA, I give it a text and based on 1000 lines of my dataset, autodetect if I want to review it or not and classify with a category if its possible.
Dataset example:
Category|Sector|News|review
Malware|Other sector|Joker malware infects over 500 000 Huawei Android devices - More than 500,000 Huawei users have downloaded from the company’s official Android store applications infected with Joker malware that subscribes to premium mobile services.Researchers found ten seemingly harmless apps in AppGallery that contained code for connecting to malicious command and control server to receive configurations and additional components.Masked by functional appsA report from antivirus maker Doctor Web notes that the malicious apps retained their advertised functionality but downloaded components that subscribed users to premium mobile services.To keep users in the dark the infected apps requested access to notifications, which allowed them to intercept confirmation codes delivered over SMS by the subscription service.According to the researchers, the malware could subscribe a user to a maximum of five services, although the threat actor could modify this limitation at any time.The list of malicious applications included virtual keyboards, a camera app, a launcher, an online messenger, a sticker collection, coloring programs, and a game.Most of them came from one developer (Shanxi Kuailaipai Network Technology Co., Ltd.) and two from a different one. These ten apps were downloaded by more than 538,000 Huawei users, Doctor Web says.Doctor Web informed Huawei of these apps and the company removed them from AppGallery. While new users can no longer download them, those that already have the apps running on their devices need to run a manual cleanup. The table below lists the name name of the application and its package:Application name Package name Super Keyboard com.nova.superkeyboard Happy Colour com.colour.syuhgbvcff Fun Color com.funcolor.toucheffects New 2021 Keyboard com.newyear.onekeyboard Camera MX - Photo Video Camera com.sdkfj.uhbnji.dsfeff BeautyPlus Camera com.beautyplus.excetwa.camera Color RollingIcon com.hwcolor.jinbao.rollingicon Funney Meme Emoji com.meme.rouijhhkl Happy Tapping com.tap.tap.duedd All-in-One Messenger com.messenger.sjdoifoThe researchers say that the same modules downloaded by the infected apps in AppGallery were also present in other apps on Google Play, used by other versions of Joker malware. The full list of indicators of compromise is available here.Once active, the malware communicates to its remote server to get the configuration file, which contains a list of tasks, websites for premium services, JavaScript that mimics user interaction.Joker malware’s history goes as far back as 2017 and constantly found its way in apps distributed through Google Play store. In October 2019, Tatyana Shishkova, Android malware analyst at Kaspersky, tweeted about more than 70 compromised apps that had made it into the official store.And the reports about the malware in Google Play kept coming. In early 2020, Google announced that since 2017, it had removed about 1,700 apps infected with Joker.Last February, Joker was still present in the store and it continued to slip past Google’s defenses even in July last year.|0
Full code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from imblearn.over_sampling import RandomOverSampler
df = pd.read_csv('news_parsed_end.csv', delimiter='|', error_bad_lines=False)
X = df[df.columns[3:4]].values
y = df[df.columns[-1]].values
over = RandomOverSampler()
X, y = over.fit_resample(X, y)
data = np.hstack((X, np.reshape(y, (-1,1))))
transformed_df = pd.DataFrame(data, columns=df.columns)
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.6, random_state=1000)
X_valid, X_test, y_valid, y_test = train_test_split(X_temp, y_temp, test_size=0.2, random_state=1000)
model = tf.keras.Sequential([
tf.keras.layers.Dense(16, activation='relu'), # if x <= 0 --> 0, x > 0 --> x
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.005),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=['accuracy'])
model.evaluate(X_train, y_train)
model.evaluate(X_valid, y_valid)
model.fit(X_train, y_train, batch_size=16, epochs=200, validation_data=(X_valid, y_valid))
model.evaluate(X_test, y_test)
Solution:
It worked thanks to @hpaulj !
I added the following:
data = np.hstack((X, np.reshape(y, (-1,1))))
data = data.reshape(1135,4)
data.shape
And now I can do this:
transformed_df = pd.DataFrame(data, columns=df.columns)
发布评论
评论(1)
这是我认为正在发生的事情的示例,尽管我不知道您是否了解足够的
numpy
和pandas
将其应用于您的情况。人们通常会接受一些教程(或更糟糕的视频),并尝试使用自己的数据,而无需太多了解正在发生的事情。无论如何,让我们制作一个4列帧:
现在使用
hstack
组合两个列(df [[['b','d']]
也可以工作) :关键是,
如果我尝试像这样做的那样制作帧,则是2列数组,形状(3,2):
请注意相同的错误。
x
形状(3,2)与4df.columns
array所隐含的形状之间存在不匹配。相反,如果我选择了列的子集,则与
hstack
使用的数字相同,它起作用:2列,(n,2)数据。这全都与阵列形状有关。如果您不关注形状,则不会使用
pandas
numpy 。Here's an example of what I think is happening Though I don't know if you understand enough
numpy
andpandas
to apply it to your case. Often people take some tutorial (or worse yet a video), and try to use their own data, without much understanding of what's going on.Anyways, lets make a 4 column frame:
Now use
hstack
to combine two columns (df[['b','d']]
would have worked just as well):The key is that it is a 2 column array, shape (3,2)
If I try to make a frame from that as you do:
Note the same sort of error. There's a mismatch between the
x
shape (3,2), and the shape implied by the 4df.columns
array.If instead I select a subset of the columns, the same numbers as used for the
hstack
, it works:2 columns, (n,2) data. It's all about the array shapes. You won't get far with
pandas
ornumpy
if you don't pay attention to shapes.