TensorFlow.js 预测时间是第一次试验和后续试验之间的差异
我正在测试加载 TensorFlow.js 模型并尝试测量预测需要多少毫秒。例如,第一次需要大约 300 毫秒来预测值,但第二次则时间减少到 13~20 毫秒。我不是根据模型加载来计算时间。我只计算加载模型后的预测值。
谁能解释为什么预测价值的时间减少了?
// Calling TensorFlow.js model
const MODEL_URL = 'https://xxxx-xxxx-xxxx.xxx.xxx-xxxx-x.xxxxxx.com/model.json'
let model;
let prediction;
export async function getModel(input){
console.log("From helper function: Model is being retrieved from the server...")
model = await tf.loadLayersModel(MODEL_URL);
// measure prediction time
var str_time = new Date().getTime();
prediction = await model.predict(input)
var elapsed = new Date().getTime() - str_time;
console.log("Laoding Time for Tensorflow: " + elapsed)
console.log(prediction.arraySync())
...
}
I am testing to load the TensorFlow.js model and trying to measure how many milliseconds it takes to predict. For example, the first time, it takes about 300 milliseconds to predict value but the time is decreased to 13~20 milliseconds from the second trial. I am not calculating time from the model loading. I am calculating only the prediction value after the model is loaded.
Can anyone explain why it gets decreased time to predict value?
// Calling TensorFlow.js model
const MODEL_URL = 'https://xxxx-xxxx-xxxx.xxx.xxx-xxxx-x.xxxxxx.com/model.json'
let model;
let prediction;
export async function getModel(input){
console.log("From helper function: Model is being retrieved from the server...")
model = await tf.loadLayersModel(MODEL_URL);
// measure prediction time
var str_time = new Date().getTime();
prediction = await model.predict(input)
var elapsed = new Date().getTime() - str_time;
console.log("Laoding Time for Tensorflow: " + elapsed)
console.log(prediction.arraySync())
...
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通常,第一次预测会花费更长的时间,因为需要从 API 请求将模型加载到内存中,一旦完成,它将被缓存,您不需要再次发出相同的 API 请求。
如果您想查看实际的预测时间,请多次重复预测计时的过程(可能是 1000 次),然后获取第 99 个分位数值,该值将显示 99% 的情况的预测时间(您可以更改分位数值)以及 90 或 50)。
Usually the first prediction would take longer due to needing to load the model into memory from the API request, once thats done it would be cached and you would not need make the same API request again.
If you wanted to see the actual prediction time, repeat the process of timing the predictions many times(perhaps 1000) then get the 99th quantile value which will show what is the prediction time for 99% of the cases(you can alter the quantile value as well to 90 or 50).