正在显示
3 个修改的文件
包含
247 行增加
和
249 行删除
readme-cn.md
已删除
100644 → 0
| 1 | -[English](readme.md) | 简体中文 | ||
| 2 | - | ||
| 3 | -# go-stash简介 | ||
| 4 | - | ||
| 5 | -go-stash是一个高效的从Kafka获取,根据配置的规则进行处理,然后发送到ElasticSearch集群的工具。 | ||
| 6 | - | ||
| 7 | -go-stash有大概logstash 5倍的吞吐性能,并且部署简单,一个可执行文件即可。 | ||
| 8 | - | ||
| 9 | - | ||
| 10 | - | ||
| 11 | - | ||
| 12 | -### 安装 | ||
| 13 | - | ||
| 14 | -```shell | ||
| 15 | -cd stash && go build stash.go | ||
| 16 | -``` | ||
| 17 | - | ||
| 18 | -### Quick Start | ||
| 19 | - | ||
| 20 | -```shell | ||
| 21 | -./stash -f etc/config.yaml | ||
| 22 | -``` | ||
| 23 | - | ||
| 24 | -config.yaml示例如下: | ||
| 25 | - | ||
| 26 | -```yaml | ||
| 27 | -Clusters: | ||
| 28 | -- Input: | ||
| 29 | - Kafka: | ||
| 30 | - Name: go-stash | ||
| 31 | - Log: | ||
| 32 | - Mode: file | ||
| 33 | - Brokers: | ||
| 34 | - - "172.16.48.41:9092" | ||
| 35 | - - "172.16.48.42:9092" | ||
| 36 | - - "172.16.48.43:9092" | ||
| 37 | - Topic: ngapplog | ||
| 38 | - Group: stash | ||
| 39 | - Conns: 3 | ||
| 40 | - Consumers: 10 | ||
| 41 | - Processors: 60 | ||
| 42 | - MinBytes: 1048576 | ||
| 43 | - MaxBytes: 10485760 | ||
| 44 | - Offset: first | ||
| 45 | - Filters: | ||
| 46 | - - Action: drop | ||
| 47 | - Conditions: | ||
| 48 | - - Key: status | ||
| 49 | - Value: 503 | ||
| 50 | - Type: contains | ||
| 51 | - - Key: type | ||
| 52 | - Value: "app" | ||
| 53 | - Type: match | ||
| 54 | - Op: and | ||
| 55 | - - Action: remove_field | ||
| 56 | - Fields: | ||
| 57 | - - message | ||
| 58 | - - source | ||
| 59 | - - beat | ||
| 60 | - - fields | ||
| 61 | - - input_type | ||
| 62 | - - offset | ||
| 63 | - - "@version" | ||
| 64 | - - _score | ||
| 65 | - - _type | ||
| 66 | - - clientip | ||
| 67 | - - http_host | ||
| 68 | - - request_time | ||
| 69 | - Output: | ||
| 70 | - ElasticSearch: | ||
| 71 | - Hosts: | ||
| 72 | - - "http://172.16.188.73:9200" | ||
| 73 | - - "http://172.16.188.74:9200" | ||
| 74 | - - "http://172.16.188.75:9200" | ||
| 75 | - Index: "go-stash-{{yyyy.MM.dd}}" | ||
| 76 | - MaxChunkBytes: 5242880 | ||
| 77 | - GracePeriod: 10s | ||
| 78 | - Compress: false | ||
| 79 | - TimeZone: UTC | ||
| 80 | -``` | ||
| 81 | - | ||
| 82 | -## 详细说明 | ||
| 83 | - | ||
| 84 | -### input | ||
| 85 | - | ||
| 86 | -```shell | ||
| 87 | -Conns: 3 | ||
| 88 | -Consumers: 10 | ||
| 89 | -Processors: 60 | ||
| 90 | -MinBytes: 1048576 | ||
| 91 | -MaxBytes: 10485760 | ||
| 92 | -Offset: first | ||
| 93 | -``` | ||
| 94 | -#### Conns | ||
| 95 | - 链接kafka的链接数,链接数依据cpu的核数,一般<= CPU的核数; | ||
| 96 | - | ||
| 97 | -#### Consumers | ||
| 98 | - 每个连接数打开的线程数,计算规则为Conns * Consumers,不建议超过分片总数,比如topic分片为30,Conns *Consumers <= 30 | ||
| 99 | - | ||
| 100 | -#### Processors | ||
| 101 | - 处理数据的线程数量,依据CPU的核数,可以适当增加,建议配置:Conns * Consumers * 2 或 Conns * Consumers * 3,例如:60 或 90 | ||
| 102 | - | ||
| 103 | -#### MinBytes MaxBytes | ||
| 104 | - 每次从kafka获取数据块的区间大小,默认为1M~10M,网络和IO较好的情况下,可以适当调高 | ||
| 105 | - | ||
| 106 | -#### Offset | ||
| 107 | - 可选last和false,默认为last,表示从头从kafka开始读取数据 | ||
| 108 | - | ||
| 109 | - | ||
| 110 | -### Filters | ||
| 111 | - | ||
| 112 | -```shell | ||
| 113 | -- Action: drop | ||
| 114 | - Conditions: | ||
| 115 | - - Key: k8s_container_name | ||
| 116 | - Value: "-rpc" | ||
| 117 | - Type: contains | ||
| 118 | - - Key: level | ||
| 119 | - Value: info | ||
| 120 | - Type: match | ||
| 121 | - Op: and | ||
| 122 | -- Action: remove_field | ||
| 123 | - Fields: | ||
| 124 | - - message | ||
| 125 | - - _source | ||
| 126 | - - _type | ||
| 127 | - - _score | ||
| 128 | - - _id | ||
| 129 | - - "@version" | ||
| 130 | - - topic | ||
| 131 | - - index | ||
| 132 | - - beat | ||
| 133 | - - docker_container | ||
| 134 | - - offset | ||
| 135 | - - prospector | ||
| 136 | - - source | ||
| 137 | - - stream | ||
| 138 | -- Action: transfer | ||
| 139 | - Field: message | ||
| 140 | - Target: data | ||
| 141 | - | ||
| 142 | -``` | ||
| 143 | -#### - Action: drop | ||
| 144 | - - 删除标识:满足此条件的数据,在处理时将被移除,不进入es | ||
| 145 | - - 按照删除条件,指定key字段及Value的值,Type字段可选contains(包含)或match(匹配) | ||
| 146 | - - 拼接条件Op: and,也可写or | ||
| 147 | - | ||
| 148 | -#### - Action: remove_field | ||
| 149 | - 移除字段标识:需要移除的字段,在下面列出即可 | ||
| 150 | - | ||
| 151 | -#### - Action: transfer | ||
| 152 | - 转移字段标识:例如可以将message字段,重新定义为data字段 | ||
| 153 | - | ||
| 154 | - | ||
| 155 | -### Output | ||
| 156 | - | ||
| 157 | -#### Index | ||
| 158 | - 索引名称,indexname-{{yyyy.MM.dd}}表示年.月.日,也可以用{{yyyy-MM-dd}},格式自己定义 | ||
| 159 | - | ||
| 160 | -#### MaxChunkBytes | ||
| 161 | - 每次往ES提交的bulk大小,默认是5M,可依据ES的io情况,适当的调整 | ||
| 162 | - | ||
| 163 | -#### GracePeriod | ||
| 164 | - 默认为10s,在程序关闭后,在10s内用于处理余下的消费和数据,优雅退出 | ||
| 165 | - | ||
| 166 | -#### Compress | ||
| 167 | - 数据压缩,压缩会减少传输的数据量,但会增加一定的处理性能,可选值true/false,默认为false | ||
| 168 | - | ||
| 169 | -#### TimeZone | ||
| 170 | - 默认值为UTC,世界标准时间 | ||
| 171 | - | ||
| 172 | - | ||
| 173 | - | ||
| 174 | - | ||
| 175 | - | ||
| 176 | -## ES性能写入测试 | ||
| 177 | - | ||
| 178 | - | ||
| 179 | -### 测试环境 | ||
| 180 | -- stash服务器:3台 4核 8G | ||
| 181 | -- es服务器: 15台 16核 64G | ||
| 182 | - | ||
| 183 | -### 关键配置 | ||
| 184 | - | ||
| 185 | -```shell | ||
| 186 | -- Input: | ||
| 187 | - Conns: 3 | ||
| 188 | - Consumers: 10 | ||
| 189 | - Processors: 60 | ||
| 190 | - MinBytes: 1048576 | ||
| 191 | - MaxBytes: 10485760 | ||
| 192 | - Filters: | ||
| 193 | - - Action: remove_field | ||
| 194 | - Fields: | ||
| 195 | - - message | ||
| 196 | - - source | ||
| 197 | - - beat | ||
| 198 | - - fields | ||
| 199 | - - input_type | ||
| 200 | - - offset | ||
| 201 | - - request_time | ||
| 202 | - Output: | ||
| 203 | - Index: "nginx_pro-{{yyyy.MM.d}}" | ||
| 204 | - Compress: false | ||
| 205 | - MaxChunkBytes: 5242880 | ||
| 206 | - TimeZone: UTC | ||
| 207 | -``` | ||
| 208 | - | ||
| 209 | -### 写入速度平均在15W/S以上 | ||
| 210 | - | ||
| 211 | - | ||
| 212 | - | ||
| 213 | -### 微信交流群 | ||
| 214 | - | ||
| 215 | -加群之前有劳给一个star,一个小小的star是作者们回答问题的动力。 | ||
| 216 | - | ||
| 217 | -如果文档中未能覆盖的任何疑问,欢迎您在群里提出,我们会尽快答复。 | ||
| 218 | - | ||
| 219 | -您可以在群内提出使用中需要改进的地方,我们会考虑合理性并尽快修改。 | ||
| 220 | - | ||
| 221 | -如果您发现bug请及时提issue,我们会尽快确认并修改。 | ||
| 222 | - | ||
| 223 | -添加我的微信:kevwan,请注明go-stash,我拉进go-stash社区群🤝 | ||
| 224 | - | ||
| 225 | -### --END |
readme-en.md
0 → 100644
| 1 | +English | [简体中文](readme-cn.md) | ||
| 2 | + | ||
| 3 | +# go-stash | ||
| 4 | + | ||
| 5 | +go-stash is a high performance, free and open source server-side data processing pipeline that ingests data from Kafka, processes it, and then sends it to ElasticSearch. | ||
| 6 | + | ||
| 7 | +go-stash is about 4x throughput more than logstash, and easy to deploy, only one executable file. | ||
| 8 | + | ||
| 9 | + | ||
| 10 | + | ||
| 11 | +## Quick Start | ||
| 12 | + | ||
| 13 | +```shell | ||
| 14 | +gostash -f etc/config.yaml | ||
| 15 | +``` | ||
| 16 | + | ||
| 17 | +config.yaml example as below: | ||
| 18 | + | ||
| 19 | +```yaml | ||
| 20 | +Clusters: | ||
| 21 | +- Input: | ||
| 22 | + Kafka: | ||
| 23 | + Name: gostash | ||
| 24 | + Brokers: | ||
| 25 | + - "172.16.186.16:19092" | ||
| 26 | + - "172.16.186.17:19092" | ||
| 27 | + Topics: | ||
| 28 | + - k8slog | ||
| 29 | + Group: pro | ||
| 30 | + Consumers: 16 | ||
| 31 | + Filters: | ||
| 32 | + - Action: drop | ||
| 33 | + Conditions: | ||
| 34 | + - Key: k8s_container_name | ||
| 35 | + Value: "-rpc" | ||
| 36 | + Type: contains | ||
| 37 | + - Key: level | ||
| 38 | + Value: info | ||
| 39 | + Type: match | ||
| 40 | + Op: and | ||
| 41 | + - Action: remove_field | ||
| 42 | + Fields: | ||
| 43 | + - message | ||
| 44 | + - _source | ||
| 45 | + - _type | ||
| 46 | + - _score | ||
| 47 | + - _id | ||
| 48 | + - "@version" | ||
| 49 | + - topic | ||
| 50 | + - index | ||
| 51 | + - beat | ||
| 52 | + - docker_container | ||
| 53 | + - Action: transfer | ||
| 54 | + Field: message | ||
| 55 | + Target: data | ||
| 56 | + Output: | ||
| 57 | + ElasticSearch: | ||
| 58 | + Hosts: | ||
| 59 | + - "172.16.141.4:9200" | ||
| 60 | + - "172.16.141.5:9200" | ||
| 61 | + # {.event} is the value of the json attribute from input | ||
| 62 | + # {{yyyy-MM-dd}} means date, like 2020-09-09 | ||
| 63 | + Index: {.event}-{{yyyy-MM-dd}} | ||
| 64 | +``` |
| 1 | -English | [简体中文](readme-cn.md) | 1 | +[English](readme.md) | 简体中文 |
| 2 | 2 | ||
| 3 | -# go-stash | 3 | +# go-stash简介 |
| 4 | 4 | ||
| 5 | -go-stash is a high performance, free and open source server-side data processing pipeline that ingests data from Kafka, processes it, and then sends it to ElasticSearch. | 5 | +go-stash是一个高效的从Kafka获取,根据配置的规则进行处理,然后发送到ElasticSearch集群的工具。 |
| 6 | 6 | ||
| 7 | -go-stash is about 4x throughput more than logstash, and easy to deploy, only one executable file. | 7 | +go-stash有大概logstash 5倍的吞吐性能,并且部署简单,一个可执行文件即可。 |
| 8 | 8 | ||
| 9 | - | 9 | + |
| 10 | 10 | ||
| 11 | -## Quick Start | 11 | + |
| 12 | +### 安装 | ||
| 13 | + | ||
| 14 | +```shell | ||
| 15 | +cd stash && go build stash.go | ||
| 16 | +``` | ||
| 17 | + | ||
| 18 | +### Quick Start | ||
| 12 | 19 | ||
| 13 | ```shell | 20 | ```shell |
| 14 | -gostash -f etc/config.yaml | 21 | +./stash -f etc/config.yaml |
| 15 | ``` | 22 | ``` |
| 16 | 23 | ||
| 17 | -config.yaml example as below: | 24 | +config.yaml示例如下: |
| 18 | 25 | ||
| 19 | ```yaml | 26 | ```yaml |
| 20 | Clusters: | 27 | Clusters: |
| 21 | - Input: | 28 | - Input: |
| 22 | Kafka: | 29 | Kafka: |
| 23 | - Name: gostash | 30 | + Name: go-stash |
| 31 | + Log: | ||
| 32 | + Mode: file | ||
| 24 | Brokers: | 33 | Brokers: |
| 25 | - - "172.16.186.16:19092" | ||
| 26 | - - "172.16.186.17:19092" | ||
| 27 | - Topics: | ||
| 28 | - - k8slog | ||
| 29 | - Group: pro | ||
| 30 | - Consumers: 16 | 34 | + - "172.16.48.41:9092" |
| 35 | + - "172.16.48.42:9092" | ||
| 36 | + - "172.16.48.43:9092" | ||
| 37 | + Topic: ngapplog | ||
| 38 | + Group: stash | ||
| 39 | + Conns: 3 | ||
| 40 | + Consumers: 10 | ||
| 41 | + Processors: 60 | ||
| 42 | + MinBytes: 1048576 | ||
| 43 | + MaxBytes: 10485760 | ||
| 44 | + Offset: first | ||
| 31 | Filters: | 45 | Filters: |
| 32 | - Action: drop | 46 | - Action: drop |
| 33 | Conditions: | 47 | Conditions: |
| 48 | + - Key: status | ||
| 49 | + Value: 503 | ||
| 50 | + Type: contains | ||
| 51 | + - Key: type | ||
| 52 | + Value: "app" | ||
| 53 | + Type: match | ||
| 54 | + Op: and | ||
| 55 | + - Action: remove_field | ||
| 56 | + Fields: | ||
| 57 | + - message | ||
| 58 | + - source | ||
| 59 | + - beat | ||
| 60 | + - fields | ||
| 61 | + - input_type | ||
| 62 | + - offset | ||
| 63 | + - "@version" | ||
| 64 | + - _score | ||
| 65 | + - _type | ||
| 66 | + - clientip | ||
| 67 | + - http_host | ||
| 68 | + - request_time | ||
| 69 | + Output: | ||
| 70 | + ElasticSearch: | ||
| 71 | + Hosts: | ||
| 72 | + - "http://172.16.188.73:9200" | ||
| 73 | + - "http://172.16.188.74:9200" | ||
| 74 | + - "http://172.16.188.75:9200" | ||
| 75 | + Index: "go-stash-{{yyyy.MM.dd}}" | ||
| 76 | + MaxChunkBytes: 5242880 | ||
| 77 | + GracePeriod: 10s | ||
| 78 | + Compress: false | ||
| 79 | + TimeZone: UTC | ||
| 80 | +``` | ||
| 81 | + | ||
| 82 | +## 详细说明 | ||
| 83 | + | ||
| 84 | +### input | ||
| 85 | + | ||
| 86 | +```shell | ||
| 87 | +Conns: 3 | ||
| 88 | +Consumers: 10 | ||
| 89 | +Processors: 60 | ||
| 90 | +MinBytes: 1048576 | ||
| 91 | +MaxBytes: 10485760 | ||
| 92 | +Offset: first | ||
| 93 | +``` | ||
| 94 | +#### Conns | ||
| 95 | + 链接kafka的链接数,链接数依据cpu的核数,一般<= CPU的核数; | ||
| 96 | + | ||
| 97 | +#### Consumers | ||
| 98 | + 每个连接数打开的线程数,计算规则为Conns * Consumers,不建议超过分片总数,比如topic分片为30,Conns *Consumers <= 30 | ||
| 99 | + | ||
| 100 | +#### Processors | ||
| 101 | + 处理数据的线程数量,依据CPU的核数,可以适当增加,建议配置:Conns * Consumers * 2 或 Conns * Consumers * 3,例如:60 或 90 | ||
| 102 | + | ||
| 103 | +#### MinBytes MaxBytes | ||
| 104 | + 每次从kafka获取数据块的区间大小,默认为1M~10M,网络和IO较好的情况下,可以适当调高 | ||
| 105 | + | ||
| 106 | +#### Offset | ||
| 107 | + 可选last和false,默认为last,表示从头从kafka开始读取数据 | ||
| 108 | + | ||
| 109 | + | ||
| 110 | +### Filters | ||
| 111 | + | ||
| 112 | +```shell | ||
| 113 | +- Action: drop | ||
| 114 | + Conditions: | ||
| 34 | - Key: k8s_container_name | 115 | - Key: k8s_container_name |
| 35 | Value: "-rpc" | 116 | Value: "-rpc" |
| 36 | Type: contains | 117 | Type: contains |
| @@ -38,7 +119,7 @@ Clusters: | @@ -38,7 +119,7 @@ Clusters: | ||
| 38 | Value: info | 119 | Value: info |
| 39 | Type: match | 120 | Type: match |
| 40 | Op: and | 121 | Op: and |
| 41 | - - Action: remove_field | 122 | +- Action: remove_field |
| 42 | Fields: | 123 | Fields: |
| 43 | - message | 124 | - message |
| 44 | - _source | 125 | - _source |
| @@ -50,15 +131,93 @@ Clusters: | @@ -50,15 +131,93 @@ Clusters: | ||
| 50 | - index | 131 | - index |
| 51 | - beat | 132 | - beat |
| 52 | - docker_container | 133 | - docker_container |
| 53 | - - Action: transfer | 134 | + - offset |
| 135 | + - prospector | ||
| 136 | + - source | ||
| 137 | + - stream | ||
| 138 | +- Action: transfer | ||
| 54 | Field: message | 139 | Field: message |
| 55 | Target: data | 140 | Target: data |
| 141 | + | ||
| 142 | +``` | ||
| 143 | +#### - Action: drop | ||
| 144 | + - 删除标识:满足此条件的数据,在处理时将被移除,不进入es | ||
| 145 | + - 按照删除条件,指定key字段及Value的值,Type字段可选contains(包含)或match(匹配) | ||
| 146 | + - 拼接条件Op: and,也可写or | ||
| 147 | + | ||
| 148 | +#### - Action: remove_field | ||
| 149 | + 移除字段标识:需要移除的字段,在下面列出即可 | ||
| 150 | + | ||
| 151 | +#### - Action: transfer | ||
| 152 | + 转移字段标识:例如可以将message字段,重新定义为data字段 | ||
| 153 | + | ||
| 154 | + | ||
| 155 | +### Output | ||
| 156 | + | ||
| 157 | +#### Index | ||
| 158 | + 索引名称,indexname-{{yyyy.MM.dd}}表示年.月.日,也可以用{{yyyy-MM-dd}},格式自己定义 | ||
| 159 | + | ||
| 160 | +#### MaxChunkBytes | ||
| 161 | + 每次往ES提交的bulk大小,默认是5M,可依据ES的io情况,适当的调整 | ||
| 162 | + | ||
| 163 | +#### GracePeriod | ||
| 164 | + 默认为10s,在程序关闭后,在10s内用于处理余下的消费和数据,优雅退出 | ||
| 165 | + | ||
| 166 | +#### Compress | ||
| 167 | + 数据压缩,压缩会减少传输的数据量,但会增加一定的处理性能,可选值true/false,默认为false | ||
| 168 | + | ||
| 169 | +#### TimeZone | ||
| 170 | + 默认值为UTC,世界标准时间 | ||
| 171 | + | ||
| 172 | + | ||
| 173 | + | ||
| 174 | + | ||
| 175 | + | ||
| 176 | +## ES性能写入测试 | ||
| 177 | + | ||
| 178 | + | ||
| 179 | +### 测试环境 | ||
| 180 | +- stash服务器:3台 4核 8G | ||
| 181 | +- es服务器: 15台 16核 64G | ||
| 182 | + | ||
| 183 | +### 关键配置 | ||
| 184 | + | ||
| 185 | +```shell | ||
| 186 | +- Input: | ||
| 187 | + Conns: 3 | ||
| 188 | + Consumers: 10 | ||
| 189 | + Processors: 60 | ||
| 190 | + MinBytes: 1048576 | ||
| 191 | + MaxBytes: 10485760 | ||
| 192 | + Filters: | ||
| 193 | + - Action: remove_field | ||
| 194 | + Fields: | ||
| 195 | + - message | ||
| 196 | + - source | ||
| 197 | + - beat | ||
| 198 | + - fields | ||
| 199 | + - input_type | ||
| 200 | + - offset | ||
| 201 | + - request_time | ||
| 56 | Output: | 202 | Output: |
| 57 | - ElasticSearch: | ||
| 58 | - Hosts: | ||
| 59 | - - "172.16.141.4:9200" | ||
| 60 | - - "172.16.141.5:9200" | ||
| 61 | - # {.event} is the value of the json attribute from input | ||
| 62 | - # {{yyyy-MM-dd}} means date, like 2020-09-09 | ||
| 63 | - Index: {.event}-{{yyyy-MM-dd}} | 203 | + Index: "nginx_pro-{{yyyy.MM.d}}" |
| 204 | + Compress: false | ||
| 205 | + MaxChunkBytes: 5242880 | ||
| 206 | + TimeZone: UTC | ||
| 64 | ``` | 207 | ``` |
| 208 | + | ||
| 209 | +### 写入速度平均在15W/S以上 | ||
| 210 | + | ||
| 211 | + | ||
| 212 | + | ||
| 213 | +### 微信交流群 | ||
| 214 | + | ||
| 215 | +加群之前有劳给一个star,一个小小的star是作者们回答问题的动力。 | ||
| 216 | + | ||
| 217 | +如果文档中未能覆盖的任何疑问,欢迎您在群里提出,我们会尽快答复。 | ||
| 218 | + | ||
| 219 | +您可以在群内提出使用中需要改进的地方,我们会考虑合理性并尽快修改。 | ||
| 220 | + | ||
| 221 | +如果您发现bug请及时提issue,我们会尽快确认并修改。 | ||
| 222 | + | ||
| 223 | +添加我的微信:kevwan,请注明go-stash,我拉进go-stash社区群🤝 |
-
请 注册 或 登录 后发表评论