Heloowird

TensorFlow 踩坑之内存和耗时不断增加的问题

问题描述

使用finetune后的图像分类模型对一批图片进行特征提取时,发现:随着时间推移,每张图片处理耗时增多,占用内存不断变大。tensorflow有类似的issue

问题代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
...
with tf.Graph().as_default():
with slim.arg_scope(inception_resnet_v2.inception_resnet_v2_arg_scope()):
# build graph
preprocessed_image = tf.placeholder(tf.float32, shape=(image_size,image_size,3), name="preprocessed_images")
processed_image = tf.expand_dims(preprocessed_image, 0)
logits, end_points = inception_resnet_v2.inception_resnet_v2(processed_image, num_classes=_NUM_CLASSES, is_training=False)
probabilities = logits
init_fn = slim.assign_from_checkpoint_fn(sys.argv[1], slim.get_model_variables('InceptionResnetV2'))
with tf.Session() as sess:
# initialize graph
init_fn(sess)
# run graph
for line in sys.stdin:
start_time = time.time()
line = line.strip(" \r\n")
if len(line) == 0:
continue
try:
image_string_tmp = tf.gfile.FastGFile(line, 'rb').read()
image_decode_tmp = tf.image.decode_image(testImage_string_tmp, channels=3)
preprocessed_image_tmp = inception_preprocessing.preprocess_image(image_decode_tmp, image_size, image_size, is_training=False)
preprocessed_image_tmp_val = sess.run([preprocessed_image_tmp])
np_probabilities = sess.run(probabilities,{"preprocessed_image:0":preprocessed_image_tmp_val[0]})
np_probabilities = np_probabilities[0, 0:]
imgfea = np_probabilities.tolist()
sys.stdout.write("%s\t%s\n" % (line, " ".join(["%.17f"%x for x in imgfea])))
except Exception,e:
pass
print >>sys.stderr, (time.time() - start_time) * 1000

解决过程

tensorflow都是预先构建好graph,输入使用placeholder占位替代,然后再运行,即一次构建,多次运行。凭直觉,上面的代码中可能存在一个问题:inception_preprocessing.preprocess_image构建图操作放在了运行阶段。所以,第一步尝试把inception_preprocessing.preprocess_image从运行阶段放到构建图阶段,然而问题并未解决。之后查阅相关问题,按照issue上面的做法,详细记录各个步骤的耗时和内存占用。具体地,使用time.time()resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024分别记录每步大耗时和内存占用情况。举个例子:

1
2
3
4
5
6
7
8
9
...
import time
import resource
...
end_read_time = time.time()
image_decode_tmp = tf.image.decode_image(testImage_string_tmp, channels=3)
end_decode_time = time.time()
print >>sys.stderr, "[decode image] timecost=%f memory_usage=%f" % (end_decode_time - end_read_time, resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024)
...

从记录日志来看,主要是tf.image.decode_image这一步耗时和内存不断增长。所以需要把这一步也挪到构建图阶段。

解决方案

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
with tf.Graph().as_default():
with slim.arg_scope(inception_resnet_v2.inception_resnet_v2_arg_scope()):
# build graph
image_str = tf.placeholder(tf.string)
image_decode = tf.image.decode_image(image_str, channels=3)
image_tensor = tf.placeholder(tf.uint8, shape=[None, None, 3])
preprocessed_image = inception_preprocessing.preprocess_image(image_tensor, image_size, image_size, is_training=False)
processed_image = tf.expand_dims(preprocessed_image, 0)
logits, end_points = inception_resnet_v2.inception_resnet_v2(processed_image, num_classes=_NUM_CLASSES, is_training=False)
init_fn = slim.assign_from_checkpoint_fn(sys.argv[1], slim.get_model_variables('InceptionResnetV2'))
with tf.Session() as sess:
# initialize graph
init_fn(sess)
# run graph
for line in sys.stdin:
start_time = time.time()
line = line.strip(" \r\n")
if len(line) == 0:
continue
try:
with open(line, "r") as f:
image_string_tmp = f.read()
image_decode_tmp = sess.run([image_decode], {image_str: image_string_tmp})
image_feature = sess.run(logits, {image_tensor:image_decode_tmp[0]})
image_feature = image_feature[0, 0:]
imgfea = image_feature.tolist()
sys.stdout.write("%s\t%s\n" % (line, " ".join(["%.17f"%x for x in imgfea])))
except Exception,e:
sys.stderr.write("%s" % traceback.format_exc())
print "cost:", (time.time() - start_time) * 1000

总结

tf.image.decode_image仅仅是对图片进行图片解码(把图片字符转换成tensor),看似人畜无害,其实也暗藏陷阱。个人推测,每次构建图时,会为tensor分配内存。如果在运行时不断构建图,会导致内存急剧上升;时间上涨的原因待探索。所以,使用tensorflow时,尽量把tensor相关操作一次性定义在graph中,避免在运行阶段构建图。