678. 【数据库评测】Cloudwave 4.0 集群版(2节点) VS Starrocks 3.0 集群版(2节点)

一、测试结果

测试结论1:2台64核256g阿里云服务器组成的2节点集群,hadoop3.2.2 作为分布式存储,Cloudwave4.0在 SSB30g 国际标准测试集下,整体性能优于Starrocks3.0近0.4倍。

数据库 数据集 响应时间(ms) CPU 最大占用率 存储压缩比 数据导入时间
Cloudwave4.0 ssb30 748 10.8%(696%/6400%) 56.7%(10.2g/18g) 82秒
Starrocks3.0 ssb30 1057 33.3%(2131%/6400%) 46.1%(8.3g/18g) 95秒

测试结论2:2台64核256g阿里云服务器组成的2节点集群,hadoop3.2.2 作为分布式存储,Cloudwave4.0在 SSB100g 国际标准测试集下,整体性能优于Starrocks3.0近0.9倍。

数据库 数据集 响应时间(ms) CPU 最大占用率 存储压缩比 数据导入时间
Cloudwave4.0 ssb100 1128 44.3%(2834%/6400%) 58.6%(34.6g/59g) 9分钟24秒
Starrocks3.0 ssb100 2191 53.7%(3437%/6400%) 49.15%(29g/59g) 6分钟

二、评测环境

  • 硬件环境:2台 64核256g 云服务器(组成2节点的集群),essd pl1 高效云盘
  • 软件环境:jdk19(Cloudwave4.0官方推荐版本,官方基于jdk19版本里头的的vector api,实现全面向量化引擎)、jdk8(starrocks安装推荐jdk版本,主要用于fe,亦可少踩坑)、mysql8(作为starrocks的客户端)、hadoop 3.2.2(作为cloudwave 和 starrocks 共同的分布式存储,副本数=2)
  • 软件版本:Cloudwave 4.0(最新版在2023年5月份发版),Starrocks 3.0(最新版在2023年4月份发版)
  • 评测数据集:ssb30,ssb100

硬件环境

jdk版本

mysql版本

hadoop版本

Cloudwave版本

Starrocks版本

三、评测方法

  • 执行19轮测试脚本,每轮执行13条标准测试sql,去除第1轮的测试数据(由于IO原因,第1次查询两边的性能均受IO影响,本测试主要测数据库引擎的算法在同等计算资源的条件下的优劣,因此去除第一轮测试数据),将余下的18轮测试数据做平均,获得每条sql的平均耗时
  • 观察最大CPU占用
  • 观察存储压缩比
  • 观察数据加载时间

四、开始测试

  1. 启动hadoop
    两节点的hadoop启动完毕
hdfs namenode -format
start-dfs.sh
jps

2. [cloudwave]启动 Cloudwave4.0

./start-all-server.sh
jps

3. [cloudwave]创建 cloudwave 数据上传目录

dfs dfs -mkdir -p /cloudwave/uploads

4. [cloudwave]上传 ssb30 数据

hdfs dfs -put ssb30 /cloudwave/uploads/

5. [cloudwave]加载数据

  • 执行数据导入命令 loaddata ssb30
./cplus_go.bin -s 'loaddata ssb30'
  • [cloudwave]查看数据导入情况
  • 可以看到30g的数据,82s就导入完成了
  • 通过 hdfs 命令,可以看到cloudwave做了数据压缩,ssb30数据的原始大小是18G,导入cloudwave数据库之后,压缩到了10.2g(图中的20.3G 表示hdfs两个数据副本的总大小)
  1. [cloudwave]开始测试
  • 执行测试脚本./test_ssb.sh,七镜观察到cloudwave 的2节点集群测ssb30 CPU最大占用是696%/6400%
  • 执行分析脚本./analysis.sh cloudwave "$(ls n*txt)" +
  1. [cloudwave]按上述步骤测试ssb100
  • 上传数据到hdfs

  • 查看上传的数据

  • 执行导入数据命令
    9分钟导入完成

  • ssb100数据的原始大小是59G,导入cloudwave数据库之后,压缩到了34.6g

  • cloudwave 的2节点集群测ssb100 CPU最大占用是2834%/6400%

  • 分析测试结果

  1. [starrocks] 启动 starrocks3.0 fe
./start_fe.sh --daemon

9. [starrocks] 添加starrocks3.0 be

mysql -uroot -h127.0.0.1 -P9030
ALTER SYSTEM ADD BACKEND "172.17.161.33:9050"; 
ALTER SYSTEM ADD BACKEND "172.17.161.30:9050"; 

10. [starrocks] 启动 starrocks 3.0 be

./start_be.sh --daemon
ps -ef | grep starrocks

11. [starrocks] 验证集群状态,两个节点的 Alive=true 即可。

  1. [starrocks] 创建表

  2. [starrocks] 开始导入数据,ssb30导入时间是95s

date && ./bin/stream_load.sh data_dir/ssb30 && date

14. [starrocks] 查看ssb30 压缩比,ssb30数据的原始大小是18G,导入starrocks数据库之后,压缩到了8.3g

  1. [starrocks] 开始测试
  • 执行测试脚本./test_ssb.sh,七镜观察到 starrocks 的2节点集群测ssb30 CPU最大占用是2131%/6400%
  • 执行分析脚本./analysis.sh starrocks “$(ls n*txt)” +
  1. [starrocks]按上述步骤测试ssb100
  • 创建表

  • 导数据,6分钟导入完成ssb100g数据

  • 查看压缩比,ssb100数据的原始大小是59G,导入starrocks数据库之后,压缩到了29g

  • 执行测试,starrocks 的2节点集群测ssb100 CPU最大占用是3437%/6400%

  • 分析测试结果

五、附加

  1. Cloudwave 测试脚本
#!/bin/bash


# Program:

#       test ssb

# History:

# 2023/03/17    junfenghe.cloud@qq.com  version:0.0.1




rm -rf ./n*txt

for ((i=1; i<20; i++))

do




    cat sql_ssb.sql |./cplus.sh > n${i}.txt


done



  1. Starrocks 测试脚本
#!/bin/bash


# Program:

#       test ssb

# History:

# 2023/03/17    junfenghe.cloud@qq.com  version:0.0.1




rm -rf ./n*txt

for ((i=1; i<20; i++))

do




    cat sql_ssb.sql | mysql -uroot -P 9030 -h 127.0.0.1 -v -vv -vvv >n${i}.txt


done



  1. 分析脚本
#!/bin/bash


#Program:
#       analysis cloudwave/starrocks logs of base compute
#History:
#2023/02/20     junfenghe.cloud@qq.com  version:0.0.1



path=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/sbin:/usr/local/bin:~/bin
export path





suff="(s)#####"

if [ -z "${1}" ]
then

        echo "Please input database'name"
        exit -1

fi

if [ -z "$2" ]
then

        echo "Please input times of scanner"
        exit -f
fi

if [ -n "${3}" ]
then
        suff=${3}
fi

for current in ${2}
do
        result_time=""

        if [ "${1}" == "starrocks" ]
        then
            for time in $( cat ${current} | grep sec  | awk -F '('  '{print $2}' | awk -F ' ' '{print $1}' )
            do
                result_time="${result_time}${time}${suff}"
            done
        elif [ "${1}" == "cloudwave" ]
        then
            for time in $( cat ${current} | grep Elapsed | awk '{print $2}'| sed 's/:/*60+/g'| sed 's/+00\*60//g ; s/+0\*60//g ; s/^0\*60+//g' )
            do
                result_time="${result_time}${time}${suff}"
            done
        fi

        echo ${result_time%${suff}*}

done


exit 0

4. sql_ssb.sql

use ssb100;
select sum(lo_revenue) as revenue from lineorder,dates where lo_orderdate = d_datekey and d_year = 1993 and lo_discount between 1 and 3 and lo_quantity < 25;
select sum(lo_revenue) as revenue from lineorder,dates where lo_orderdate = d_datekey and d_yearmonthnum = 199401 and lo_discount between 4 and 6 and lo_quantity between 26 and 35;
select sum(lo_revenue) as revenue from lineorder,dates where lo_orderdate = d_datekey and d_weeknuminyear = 6 and d_year = 1994 and lo_discount between 5 and 7 and lo_quantity between 26 and 35;
select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder ,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_category = 'MFGR#12' and s_region = 'AMERICA' group by d_year, p_brand order by d_year, p_brand;
select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_brand between 'MFGR#2221' and 'MFGR#2228' and s_region = 'ASIA' group by d_year, p_brand order by d_year, p_brand;
select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_brand = 'MFGR#2239' and s_region = 'EUROPE' group by d_year, p_brand order by d_year, p_brand;
select c_nation, s_nation, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and c_region = 'ASIA' and s_region = 'ASIA'and d_year >= 1992 and d_year <= 1997 group by c_nation, s_nation, d_year order by d_year asc, lo_revenue desc;
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and  c_nation = 'UNITED STATES' and s_nation = 'UNITED STATES' and d_year >= 1992 and d_year <= 1997 group by c_city, s_city, d_year order by d_year asc, lo_revenue desc;
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and (c_city='UNITED KI1' or c_city='UNITED KI5') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_year >= 1992 and d_year <= 1997 group by c_city, s_city, d_year order by d_year asc, lo_revenue desc;
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and (c_city='UNITED KI1' or c_city='UNITED KI5') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_yearmonth  = 'Dec1997' group by c_city, s_city, d_year order by d_year asc, lo_revenue desc;
select d_year, c_nation, sum(lo_revenue) - sum(lo_supplycost) as profit from lineorder,dates,customer,supplier,part where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_partkey = p_partkey and c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, c_nation order by d_year, c_nation;
select d_year, s_nation, p_category, sum(lo_revenue) - sum(lo_supplycost) as profit from lineorder,dates,customer,supplier,part where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_partkey = p_partkey and c_region = 'AMERICA'and s_region = 'AMERICA' and (d_year = 1997 or d_year = 1998) and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, s_nation, p_category order by d_year, s_nation, p_category;
select d_year, s_city, p_brand, sum(lo_revenue) - sum(lo_supplycost) as profit from lineorder,dates,customer,supplier,part where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_partkey = p_partkey and c_region = 'AMERICA'and s_nation = 'UNITED STATES' and (d_year = 1997 or d_year = 1998) and p_category = 'MFGR#14' group by d_year, s_city, p_brand order by d_year, s_city, p_brand;

七镜还将带来Cloudwave 4.0 集群版 VS Starrocks 3.0 集群版 在 1T SSB数据集上的评测。

© 版权声明
THE END
喜欢就支持一下吧
点赞0

Warning: mysqli_query(): (HY000/3): Error writing file '/tmp/MYEc6eEb' (Errcode: 28 - No space left on device) in /www/wwwroot/583.cn/wp-includes/class-wpdb.php on line 2345
admin的头像-五八三
评论 抢沙发
头像
欢迎您留下宝贵的见解!
提交
头像

昵称

图形验证码
取消
昵称代码图片