Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark adds four new data types: STRING BLOB TIMESTAMP DATE #433

Closed
wants to merge 2 commits into from

Conversation

YangYumings
Copy link
Contributor

First, add four new data types "STRING BLOB TIMESTAMP DATE" in SensorType.java, and then write the data random generation function in generateworkLoad.java. Secondly, add the new type conversion in the genTablet method. Finally, set the encoding method of the new data type in the configuration file.

@YangYumings
Copy link
Contributor Author

image

@YangYumings YangYumings deleted the send branch July 31, 2024 06:38
# 插入数据的数据类型的比例,BOOLEAN:INT32:INT64:FLOAT:DOUBLE:TEXT
# INSERT_DATATYPE_PROPORTION=1:1:1:1:1:1
# 插入数据的数据类型的比例,BOOLEAN:INT32:INT64:FLOAT:DOUBLE:TEXT:STRING:BLOB:TIMESTAMP:DATE
INSERT_DATATYPE_PROPORTION=1:1:1:1:1:1:1:1:1:1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INSERT_DATATYPE_PROPORTION=1:1:1:1:1:1:1:1:1:1

@@ -112,6 +112,12 @@
<artifactId>opencsv</artifactId>
<version>5.5.2</version>
</dependency>
<dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否仅仅依赖这个 tsfile 快照版本是不够用的?因为还要更新 session 版本?如果是的话那就直接依赖 session 快照版本?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里去掉了依赖的引入

* 插入数据的比例 Data Type, D1:D2:D3:D4:D5:D6 D1: BOOLEAN D2: INT32 D3: INT64 D4: FLOAT D5: DOUBLE D6:
* TEXT
* 插入数据的比例 Data Type, D1:D2:D3:D4:D5:D6:D7:D8:D9:D9:D10 D1: BOOLEAN D2: INT32 D3: INT64 D4: FLOAT
* D5: DOUBLE D6:TEXT D7: STRING D8: BLOB D9: TIMESTAMP D10: DATE 0:0:0:0:0:0:0:0:1:0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最后这个 0:0:0:0:0:0:0:0:1:0 是啥

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除。之前debug是的数据类型比例

@@ -211,6 +211,12 @@ private void loadProps() {
config.setENCODING_DOUBLE(
properties.getProperty("ENCODING_DOUBLE", config.getENCODING_DOUBLE()));
config.setENCODING_TEXT(properties.getProperty("ENCODING_TEXT", config.getENCODING_TEXT()));
config.setENCODING_STRING(
properties.getProperty("ENCODING_STRING", config.getENCODING_STRING()));
config.setENCODING_BLOB(properties.getProperty("ENCODING_BLOB", config.getENCODING_BLOB()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对应的配置参数先加上吧,即使只有一个,万一未来支持扩展了就不用再新增 bm 参数了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里在配置文件中已经添加

@@ -536,7 +544,8 @@ private double[] generateProbabilities(int typeNumber) {
// Origin proportion array
double[] proportions = new double[typeNumber];
LOGGER.info(
"Init SensorTypes: BOOLEAN:INT32:INT64:FLOAT:DOUBLE:TEXT= {}", INSERT_DATATYPE_PROPORTION);
"Init SensorTypes: BOOLEAN:INT32:INT64:FLOAT:DOUBLE:TEXT:STRING:BLOB:TIMESTAMP:DATE= {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

考虑兼容性,如果老版本的配置文件上来需要做特殊处理的

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在ConfigDescriptor初始化配置文件时,对 INSERT_DATATYPE_PROPORTION 进行特判。输出相应的数据类型比例。

@@ -133,6 +135,19 @@ private static long getCurrentTimestampStatic(long stepOffset) {
return Constants.START_TIMESTAMP * timeStampConst + offset + timestamp;
}

private static long generateRandomTimestamp(long startTimeMillis, long endTimeMillis) {
Random random = new Random();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要创建这么多临时 random 对象,可以考虑跟 dataRandom 复用起来或者使用 ThreadLocalRandom

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处已删除,采用 Function 中 long 生成分布。

// 从 1000-01-01 开始是 valid
LocalDate start = LocalDate.of(1000, 1, 1);
int daysBetween = (int) ChronoUnit.DAYS.between(start, end);
Random random = new Random();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

与上一致

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处已删除,采用 Function 中 long 生成分布,然后将 int 转成 LocalDate 。

value = new Binary(blob);
break;
case TIMESTAMP:
value = generateRandomTimestamp(0, System.currentTimeMillis());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以考虑使用跟 long 类似的生成分布,可参考 Function.getValueByFunctionIdAndParam(param, currentTimestamp);

value = generateRandomTimestamp(0, System.currentTimeMillis());
break;
case DATE:
value = generateRandomDate(LocalDate.now());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以考虑使用跟 long 类似的生成分布,可参考 FunctionParam 看要不要抽几种分布出来

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处采用了 Function 中 long 的生成分布。

Binary[] sensorsBlob = (Binary[]) values[recordValueIndex];
sensorsBlob[recordIndex] =
binaryCache.computeIfAbsent(
String.valueOf((Binary) record.getRecordDataValue().get(recordValueIndex)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块的数据类型对齐一下,是不是前面统一都用 string 就可以了,否则对于 blob 类型转来转去

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处已改成 String, 减少类型转换次数。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants