TiFlash存算分离架构对OSS的兼容性不足,不支持LifeCycle方式

当前TiFlash对OSS的支持还不够完美,在默认配置参数下,会有报错导致没办法在merge或者split后删除历史数据,导致一直在重试。
使用AK的权限,已经是完全控制了:

TiFlash write_node的报错内容是:

[2024/02/02 18:36:47.373 +08:00] [WARN] [S3Common.cpp:145] ["tag=AWSErrorMarshaller message=Encountered Unknown AWSError 'NotImplemented': A header you provided implies functionality that is not implemented."] [source=AWSClient] [thread_id=974]
[2024/02/02 18:36:47.373 +08:00] [ERROR] [S3Common.cpp:145] ["tag=AWSXmlClient message=HTTP response code: 400\nResolved remote host IP address: 100.118.78.2:443\nRequest ID: 65BCC5BFE7447A3130E9E56E\nException name: NotImplemented\nError message: Unable to parse ExceptionName: NotImplemented Message: A header you provided implies functionality that is not implemented.\n7 response headers:\nconnection : close\ncontent-length : 334\ncontent-type : application/xml\ndate : Fri, 02 Feb 2024 10:36:47 GMT\nserver : AliyunOSS\nx-amz-request-id : 65BCC5BFE7447A3130E9E56E\nx-oss-server-time : 0"] [source=AWSClient] [thread_id=974]
[2024/02/02 18:36:47.374 +08:00] [ERROR] [S3Common.cpp:572] ["S3 PutEmptyObject failed: Unable to parse ExceptionName: NotImplemented Message: A header you provided implies functionality that is not implemented., request_id=65BCC5BFE7447A3130E9E56E bucket=mysql-dts-migrate root=perf-asset-tiflash-data/ key=s1069816/data/t_80772/dmf_2.del"] [source="bucket=mysql-dts-migrate root=perf-asset-tiflash-data/"] [thread_id=974]

[2024/02/02 18:36:47.396 +08:00] [ERROR] [S3LockService.cpp:136] ["DB Exception: S3 PutEmptyObject failed, bucket=mysql-dts-migrate root=perf-asset-tiflash-data/ key=s1069816/data/t_80772/dmf_2.del s3error=UNKNOWN s3exception_name=NotImplemented s3msg=Unable to parse ExceptionName: NotImplemented Message: A header you provided implies functionality that is not implemented. request_id=65BCC5BFDE10FD3531A4B28E\n\n       0x809e528\tDB::Exception::Exception<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&>(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) [tiflash+134866216]\n                \tdbms/src/Common/Exception.h:53\n       0x808e9ce\tDB::Exception DB::S3::fromS3Error<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&>(Aws::S3::S3Error const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) [tiflash+134801870]\n                \tdbms/src/Storages/S3/S3Common.h:57\n       0x808fb28\tDB::S3::uploadEmptyFile(DB::S3::TiFlashS3Client const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) [tiflash+134806312]\n                \tdbms/src/Storages/S3/S3Common.cpp:584\n       0x89f3cd9\tDB::S3::S3LockService::tryMarkDelete(disaggregated::TryMarkDeleteRequest const*, disaggregated::TryMarkDeleteResponse*) [tiflash+144653529]\n                \tdbms/src/Flash/Disaggregated/S3LockService.cpp:117\n       0x86fa5ea\tDB::FlashService::tryMarkDelete(grpc::ServerContext*, disaggregated::TryMarkDeleteRequest const*, disaggregated::TryMarkDeleteResponse*) [tiflash+141534698]\n                \tdbms/src/Flash/FlashService.cpp:874\n       0x98cb9d7\tgrpc::internal::RpcMethodHandler<tikvpb::Tikv::Service, disaggregated::TryMarkDeleteRequest, disaggregated::TryMarkDeleteResponse, google::protobuf::MessageLite, google::protobuf::MessageLite>::RunHandler(grpc::internal::MethodHandler::HandlerParameter const&) [tiflash+160217559]\n                \tcontrib/grpc/include/grpcpp/impl/codegen/method_handler.h:113\n       0x9242130\tgrpc::Server::SyncRequest::ContinueRunAfterInterception() [tiflash+153362736]\n                \tcontrib/grpc/src/cpp/server/server_cc.cc:433\n       0x9241f61\tgrpc::Server::SyncRequest::Run(std::__1::shared_ptr<grpc::Server::GlobalCallbacks> const&, bool) [tiflash+153362273]\n                \tcontrib/grpc/src/cpp/server/server_cc.cc:421\n       0x9254155\tgrpc::ThreadManager::WorkerThread::WorkerThread(grpc::ThreadManager*)::$_0::__invoke(void*) [tiflash+153436501]\n                \tcontrib/grpc/src/cpp/thread_manager/thread_manager.cc:36\n       0x95ee66a\tgrpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::__invoke(void*) [tiflash+157214314]\n                \tcontrib/grpc/src/core/lib/gprpp/thd_posix.cc:110\n  0x7f89afcecac3\t<unknown symbol> [libc.so.6+608963]\n  0x7f89afd7e850\t<unknown symbol> [libc.so.6+1206352]"] [thread_id=974]
[2024/02/02 18:36:47.396 +08:00] [ERROR] [S3LockClient.cpp:121] ["meets error, code=13 msg=S3 PutEmptyObject failed, bucket=mysql-dts-migrate root=perf-asset-tiflash-data/ key=s1069816/data/t_80772/dmf_2.del s3error=UNKNOWN s3exception_name=NotImplemented s3msg=Unable to parse ExceptionName: NotImplemented Message: A header you provided implies functionality that is not implemented. request_id=65BCC5BFDE10FD3531A4B28E"] [source="<key=s1069816/data/t_80772/dmf_2,type=MarkDelete>"] [thread_id=966]

这个是报错时候的监控,能看到一直是在重试

那应该是tiflash语法兼容性存在问题

OSS不快吧?

1 个赞

阿里公有云还是私有云?

毕竟只是兼容 S3 的协议…

这种对象存储会比本地的SSD更快吗?

后续通过 @flow-PingCAP 的指导,可以通过在write-node的配置中增加profiles.default.remote_gc_method: 2,通过原始的scan方式来删object。
参数介绍为:

the method of running GC task on the remote store. 1 - lifecycle, 2 - scan.

确实,不过能通过其他方式绕过

哇哦,来赶紧分享下心得~ :+1: :+1: :+1:

可以看下这个帖子里的测试的数据,第一次查询会直接从S3读取数据,会慢不少。但是有缓存后,后续的查询效率和本地SSD的性能持平。