keras pad_sequence 和 Tokenizer
i learn on kaggle dataset Here to practice on nlp i have an error when i tokenize the tweets and go to padding them i got an error i search for an solution but i don't get answer
# Get tha max Number Of Word In Tweets
texts = df['text']
LENGTH = texts.apply(lambda p:len(p.split()))
x = df ['text']
y = df['target']
x_train,x_test , y_train,y_test =train_test_split(x,y,test_size=.30,random_state=41)
tokenize = Tokenizer()
tokenize.fit_on_texts(x)
x = tokenize.texts_to_sequences(x)
print('start padding ...')
# Padding Tweets To Be The Same Length
x = pad_sequences(x ,maxlen=LENGTH)
我有这个错误
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_34/2607522322.py in <module>
8
9 # Padding Tweets To Be The Same Length
---> 10 x = pad_sequences(x ,maxlen=LENGTH)
/opt/conda/lib/python3.7/site-packages/keras/preprocessing/sequence.py in pad_sequences(sequences, maxlen, dtype, padding, truncating, value)
152 return sequence.pad_sequences(
153 sequences, maxlen=maxlen, dtype=dtype,
--> 154 padding=padding, truncating=truncating, value=value)
155
156 keras_export(
/opt/conda/lib/python3.7/site-packages/keras_preprocessing/sequence.py in pad_sequences(sequences, maxlen, dtype, padding, truncating, value)
83 .format(dtype, type(value)))
84
---> 85 x = np.full((num_samples, maxlen) + sample_shape, value, dtype=dtype)
86 for idx, s in enumerate(sequences):
87 if not len(s):
/opt/conda/lib/python3.7/site-packages/numpy/core/numeric.py in full(shape, fill_value, dtype, order, like)
340 fill_value = asarray(fill_value)
341 dtype = fill_value.dtype
--> 342 a = empty(shape, dtype, order)
343 multiarray.copyto(a, fill_value, casting='unsafe')
344 return a
TypeError: 'Series' object cannot be interpreted as an integer
i learn on kaggle dataset Here to practice on nlp i have an error when i tokenize the tweets and go to padding them i got an error i search for an solution but i don't get answer
# Get tha max Number Of Word In Tweets
texts = df['text']
LENGTH = texts.apply(lambda p:len(p.split()))
x = df ['text']
y = df['target']
x_train,x_test , y_train,y_test =train_test_split(x,y,test_size=.30,random_state=41)
tokenize = Tokenizer()
tokenize.fit_on_texts(x)
x = tokenize.texts_to_sequences(x)
print('start padding ...')
# Padding Tweets To Be The Same Length
x = pad_sequences(x ,maxlen=LENGTH)
i got this error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_34/2607522322.py in <module>
8
9 # Padding Tweets To Be The Same Length
---> 10 x = pad_sequences(x ,maxlen=LENGTH)
/opt/conda/lib/python3.7/site-packages/keras/preprocessing/sequence.py in pad_sequences(sequences, maxlen, dtype, padding, truncating, value)
152 return sequence.pad_sequences(
153 sequences, maxlen=maxlen, dtype=dtype,
--> 154 padding=padding, truncating=truncating, value=value)
155
156 keras_export(
/opt/conda/lib/python3.7/site-packages/keras_preprocessing/sequence.py in pad_sequences(sequences, maxlen, dtype, padding, truncating, value)
83 .format(dtype, type(value)))
84
---> 85 x = np.full((num_samples, maxlen) + sample_shape, value, dtype=dtype)
86 for idx, s in enumerate(sequences):
87 if not len(s):
/opt/conda/lib/python3.7/site-packages/numpy/core/numeric.py in full(shape, fill_value, dtype, order, like)
340 fill_value = asarray(fill_value)
341 dtype = fill_value.dtype
--> 342 a = empty(shape, dtype, order)
343 multiarray.copyto(a, fill_value, casting='unsafe')
344 return a
TypeError: 'Series' object cannot be interpreted as an integer
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题是
LENGTH
不是一个integer
而是一个Pandas 系列
。尝试这样的操作:如果您想使用后填充,请运行:
The problem is that
LENGTH
is not aninteger
but aPandas series
. Try something like this:If you want to use post-padding, run: