Tensorflow中把稀疏的数字类别标签转为向量类型并计算loss和accuracy（附对mnist的损失函数值cross_entropy的理解）

最新推荐文章于 2025-05-27 15:22:07 发布

最新推荐文章于 2025-05-27 15:22:07 发布 · 3.4k 阅读

tensorflow 专栏收录该内容

6 篇文章

订阅专栏

本文探讨了在深度学习中如何处理稀疏标签，并详细解释了两种计算交叉熵的方法：一是使用稀疏标签直接计算；二是将稀疏标签转换为one-hot向量后再计算。此外，还介绍了这两种方法在TensorFlow中的实现细节。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

对于N个类别，我们的数据中的标签一般是0,1,2,3,4,..n-1这样的数字
而官方的mnist中的标签是向量类型，比如有5类，那么五个标签分别是：
[ 1 , 0 , 0 , 0 , 0]
[ 0 , 1 , 0 , 0 , 0]
[ 0 , 0 , 1 , 0 , 0]
[ 0 , 0 , 0 , 1 , 0]
[ 0 , 0 , 0 , 0 , 1]
mnist中，最后经过10个神经元的输出输入到softmax层中，输出10维的向量，每一维分别代表那个类别的概率
因此mnist最后计算cross_entropy的时候是这样的：
cross_entropy = tf.reduce_mean( -tf.reduce_sum( ys * tf.log(prediction),
reduction_indices=[1]) )
ys和prediction的维度都是 batch_size * num_classes (mnist中num_classes就是10了)
ys * tf.log(prediction) 是对两个矩阵的对应位置的元素进行运算，假设结果为result，因此result的维度还是 batch_size * num_classes
通过 tf.reduce_sum( result ，reduction_indices=[1]) ) 这个reduction_indices=[1]在这里的意思是对列进行压缩，
整个的意思就是对 result 每一行的所有元素进行求和，所以经过 tf.reduce_sum() 运算后，结果的维度为 batch_size * 1，注意外面有个 '-' 负号，
即每行一个值，表示那个样本的cross_entropy

上面 tf.reduce_mean() 没有指定操作维度，意思就是对所有样本的cross_entropy取求和取平均，作为最终的loss

那么对于0,1,2,3,4,..n-1这样的稀疏的数字标签怎么做？
假设输入数据如下：
crop_shape = [ 42,42,4 ] #忽略这个，我直接从我的code中copy出来的
......
xs = tf.placeholder(tf.float32, [None, crop_shape[0],crop_shape[1],crop_shape[2]])
ys = tf.placeholder(tf.int64, [None,])
keep_prob = tf.placeholder(tf.float32)
网络（模型）的输出：
logits = naive_net(xs,NUM_CLASSES,keep_prob) #注意logits是全连接层的输出，没有经过softmax
prediction = tf.nn.softmax( logits )
一、直接用稀疏的数字标签来计算

##计算loss
loss = tf.nn.sparse_softmax_cross_entropy_with_logits( logits=logits,labels= ys )
loss = tf.reduce_mean(loss) #avg loss
##计算accuracy
correct_prediction = tf.equal( tf.argmax( prediction,1 ), ys )
accuracy = tf.reduce_mean( tf.cast(correct_prediction,tf.float32) )

二、转为类似mnist的向量形式dense_labels，再计算

# 以下是把稀疏标签ys转为向量形式shape [batch_size, NUM_CLASSES].注意，需要把上面ys的int64替换为int32
batch_size = train_batch_size #样本个数，假设是100
sparse_labels = tf.reshape(ys, [batch_size, 1])
print(np.shape(sparse_labels)) #[100*1]
indices = tf.reshape(tf.range(batch_size), [batch_size, 1]) #生成样本的索引
print(np.shape(indices)) #[100*1]
concated = tf.concat( [indices, sparse_labels], 1 )
print(np.shape(concated)) #[100*2]
dense_labels = tf.sparse_to_dense(concated,
[batch_size, NUM_CLASSES],
1.0, 0.0) #[100*7], 这里NUM_CLASSES = 7
# 转换完成
#对应的计算loss：
loss = tf.nn.softmax_cross_entropy_with_logits( logits=logits,labels= dense_labels )
loss = tf.reduce_mean(loss) #avg loss
#计算accuracy：
correct_prediction = tf.equal( tf.argmax( prediction,1 ), tf.argmax( dense_labels,1 ) )
accuracy = tf.reduce_mean( tf.cast(correct_prediction,tf.float32) )

总结：
以上有两个函数需要注意
1. tf.nn.sparse_softmax_cross_entropy_with_logits( logits=logits,labels= ys )
直接输入稀疏数字标签来计算cross_entropy损失
2. tf.nn.softmax_cross_entropy_with_logits( logits=logits,labels= dense_labels )
直接输入向量形式标签计算cross_entropy损失
二者的不同点：labels类型不一样
二者的共同点：要求输入的数据值必须是没有经过softmax归一化的
所以，联系到开始的mnist的cross_entropy:
cross_entropy = tf.reduce_mean( -tf.reduce_sum( ys * tf.log(prediction),
reduction_indices=[1]) )
这里的prediction是经过softmax的，即
prediction = tf.nn.softmax( logits )
其实这个softmax只是起到了归一化的作用，具体可以参考：http://www.jianshu.com/p/fb119d0ff6a6

那么问题来了：如果输入都正确的话，这三个是不是等价呢.....嫌麻烦，还没试过-_-|||