之前有同学遇到一个问题,通过 Workload 配置一个 Serving 服务的时候,通过 model_config_file 这个选项来指定多个模型文件,配置文件大概长这个样子。
代码语言:javascript复制➜ tmp cat model.config
model_config_list {
config {
name:'10062'
base_path:'s3://xxx-ai/humanoid/10062'
model_platform:'tensorflow'
}
config {
name:'10075'
base_path:'s3://xxx-ai/humanoid/10075'
model_platform:'tensorflow'
}
}
但是 Serving 服务进程启动的时候,报错了,错误信息是说 Could not find base path xxxxxx,意思是没找到 base path?
其实这里是因为配置文件里的 base path 配置可以发现,最后没有斜杠 /,在 S3 里,没有 / 会被当做是一个对象 object,而 Serving 关于读取 base path 模型的源码如下。从源码可以看到,Serving 会拿到 base path 之后去遍历这个目录下面的文件,而如果是 s3 文件的话,这个对象本身是不存在的,所以就会报错,正确的做法,只要在 base path 参数的最后,补上斜杠 / 即可,如 s3://xxx-ai/humanoid/10075/,而这个问题,当模型在本地文件系统是不存在的。
代码语言:javascript复制// Like PollFileSystemForConfig(), but for a single servable.
Status PollFileSystemForServable(
const FileSystemStoragePathSourceConfig::ServableToMonitor& servable,
std::vector<ServableData<StoragePath>>* versions) {
// First, determine whether the base path exists. This check guarantees that
// we don't emit an empty aspired-versions list for a non-existent (or
// transiently unavailable) base-path. (On some platforms, GetChildren()
// returns an empty list instead of erring if the base path isn't found.)
if (!Env::Default()->FileExists(servable.base_path()).ok()) {
return errors::InvalidArgument("Could not find base path ",
servable.base_path(), " for servable ",
servable.servable_name());
}
// Retrieve a list of base-path children from the file system.
std::vector<string> children;
TF_RETURN_IF_ERROR(
Env::Default()->GetChildren(servable.base_path(), &children));
// GetChildren() returns all descendants instead for cloud storage like GCS.
// In such case we should filter out all non-direct descendants.
std::set<string> real_children;
for (int i = 0; i < children.size(); i) {
const string& child = children[i];
real_children.insert(child.substr(0, child.find_first_of('/')));
}
children.clear();
children.insert(children.begin(), real_children.begin(), real_children.end());
const std::map<int64 /* version */, string /* child */> children_by_version =
IndexChildrenByVersion(children);
bool at_least_one_version_found = false;
switch (servable.servable_version_policy().policy_choice_case()) {
case FileSystemStoragePathSourceConfig::ServableVersionPolicy::
POLICY_CHOICE_NOT_SET:
TF_FALLTHROUGH_INTENDED; // Default policy is kLatest.
case FileSystemStoragePathSourceConfig::ServableVersionPolicy::kLatest:
at_least_one_version_found =
AspireLatestVersions(servable, children_by_version, versions);
break;
case FileSystemStoragePathSourceConfig::ServableVersionPolicy::kAll:
at_least_one_version_found =
AspireAllVersions(servable, children, versions);
break;
case FileSystemStoragePathSourceConfig::ServableVersionPolicy::kSpecific: {
at_least_one_version_found =
AspireSpecificVersions(servable, children_by_version, versions);
break;
}
default:
return errors::Internal("Unhandled servable version_policy: ",
servable.servable_version_policy().DebugString());
}
if (!at_least_one_version_found) {
LOG(WARNING) << "No versions of servable " << servable.servable_name()
<< " found under base path " << servable.base_path();
}
return Status::OK();
}