问题现象
当我们部署私有 Harbor 镜像仓库且采用 Ceph S3 作为存储后端时,您可能会经常遇到上传大容量镜像重试的问题。特别是当我们在管理 AI 模型文件发布的场景下,当我们采用 OCI 方式来封装模型文件(镜像单层超过 5GB)上传到 Harbor后,此现象就变得尤为突出。其主要现象如下:
代码语言:javascript复制docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
x.x.x.x:80/library/centos latest adf05892850f 3 days ago 7.61GB
docker push x.x.x.x:80/library/centos
Using default tag: latest
The push refers to repository [x.x.x.x:80/library/centos]
9e607bb861a7: Retrying in 4 seconds
- rgw前端的lb,nginx及radosgw日志报404:
{
"@timestamp":"2022-04-15T17:24:25 08:00",
"@fields":{
"remote_addr":"x.x.x.x",
"remote_user":"",
"body_bytes_sent":"223",
"request_time":"0.057",
"status":"404",
"request":"PUT /xxx-harbor/docker/registry/v2/blobs/sha256/72/729ec3a6ada3a6d26faca9b4779a037231f1762f759ef34c08bdd61bf52cd704/data?partNumber=1&uploadId=2~EE0O6o35Ceuqg9ZII4MAT8_gnaEkr3n HTTP/1.1",
"request_method":"PUT",
"request_header":"{"x-amz-copy-source-range":"bytes=0-10485759","x-amz-content-sha256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","accept-encoding":"gzip","host":"x.x.x.x","user-agent":"aws-sdk-go\/1.15.11 (go1.17.7; linux; amd64)","x-amz-date":"20220415T092425Z","x-amz-copy-source":"xxx-harbor\/docker\/registry\/v2\/repositories\/library\/centos\/_uploads\/9731d0e9-5d78-4fa3-a42b-2cf8f08847f4\/data","content-length":"0","authorization":"AWS4-HMAC-SHA256 Credential=7IRCELLC8J9BTBNQV87C\/20220415\/default\/s3\/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-copy-source;x-amz-copy-source-range;x-amz-date, Signature=7232dce2e274a01a9838760c502d78f5ce1ad12b653c2f27986fe585c7594eb3"}"
}
}
配置信息
Ceph S3 账户信息
代码语言:javascript复制账户: cloudsre
租户: legacy
bucket: harbor
Harbor 对接 Ceph S3 配置
代码语言:javascript复制storage:
s3:
region: default
bucket: harbor
regionendpoint: http://x.x.x.x
multipartcopythresholdsize: 5368709120
cache:
layerinfo: redis
maintenance:
uploadpurging:
enabled: false
delete:
enabled: true
redirect:
disable: false
Harbor镜像推送大致流程
Harbor当前配置:
multipartcopythresholdsize
= 5G #(默认为32M、最大为5G).- 若image中存在单个文件
>threshold
,则会将大文件分片拷贝、rgw会报错404、push镜像retry. - 若image不存在单个文件
>threshold
,则将所有文件进行整体拷贝、push镜像成功. - 若将harbor的s3账户替换为
default
租户下的账户,则不论是否有大文件、push镜像都成功.
错误原因
代码语言:javascript复制"x-amz-copy-source":"xxx-harbor\/docker\/registry\/v2\/repositories\/library\/centos\/_uploads\/9731d0e9-5d78-4fa3-a42b-2cf8f08847f4\/data"
- harbor利用aws s3接口进行对象整体拷贝时:request的参数
x-amz-copy-source
,ceph rgw是可以自动识别源对象是属于 租户(legacy) 和 账户(cloudsre) 信息,即能正确找到需要拷贝的源对象 - harbor 利用 aws s3 接口进行对象分片拷贝:即调用
UploadPartCopyInput
时,ceph rgw 不能根据x-amz-copy-source
获取正确的租户、账户信息,后续就采用 “default” 租户下的bucket:xxx-harbor
下的文件作为源对象进行拷贝,则报404
rgw相关代码得知,分片拷贝时正确、且可以显示指定的bucket路径为:legacy:xxx-harbor
aws s3 sdk复现
代码语言:javascript复制package main
import (
"fmt"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/credentials"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
)
var (
// legacy/cloudsre
accessKey = "your-accessKey"
secretKey = "your-secretKey"
endPoint = "http://x.x.x.x:7480"
)
func main() {
sess, _ := session.NewSession(&aws.Config{
Credentials: credentials.NewStaticCredentials(accessKey, secretKey, ""),
Endpoint: aws.String(endPoint),
Region: aws.String("default"),
DisableSSL: aws.Bool(true),
S3ForcePathStyle: aws.Bool(true),
})
svc := s3.New(sess)
res, err := svc.CreateMultipartUpload(&s3.CreateMultipartUploadInput{
Bucket: aws.String("xxx-harbor"),
Key: aws.String("cp"),
})
uploadID := res.UploadId
fmt.Println(*uploadID)
copy_input := &s3.UploadPartCopyInput{
Bucket: aws.String("xxx-harbor"),
CopySource: aws.String("xxx-harbor/x.c"),
CopySource: aws.String("legacy:xxx-harbor/x.c"),
Key: aws.String("cp"),
PartNumber: aws.Int64(1),
UploadId: uploadID,
}
result, err := svc.UploadPartCopy(copy_input)
fmt.Println(result, err)
c_part := []*s3.CompletedPart{}
c_part = append(c_part, &s3.CompletedPart{
ETag: result.CopyPartResult.ETag,
PartNumber: aws.Int64(int64(1)),
})
part := &s3.CompletedMultipartUpload{
Parts: c_part,
}
res1, err := svc.CompleteMultipartUpload(&s3.CompleteMultipartUploadInput{
Bucket: aws.String("test"),
Key: aws.String("cp"),
UploadId: uploadID,
MultipartUpload: part,
})
fmt.Println(res1, err)
}
解决办法
代码语言:javascript复制storage:
s3:
region: default
bucket: legacy:harbor-prod
- bucket: harbor-prod
regionendpoint: http://x.x.x.x
chunksize: 10485760
multipartcopychunksize: 10485760
multipartcopythresholdsize: 5368709120
cache:
layerinfo: redis
maintenance:
uploadpurging:
enabled: false
delete:
enabled: true
redirect:
disable: false