正在显示
3 个修改的文件
包含
247 行增加
和
249 行删除
readme-cn.md
已删除
100644 → 0
1 | -[English](readme.md) | 简体中文 | ||
2 | - | ||
3 | -# go-stash简介 | ||
4 | - | ||
5 | -go-stash是一个高效的从Kafka获取,根据配置的规则进行处理,然后发送到ElasticSearch集群的工具。 | ||
6 | - | ||
7 | -go-stash有大概logstash 5倍的吞吐性能,并且部署简单,一个可执行文件即可。 | ||
8 | - | ||
9 | -![go-stash](https://pro-public.xiaoheiban.cn/icon/84cc2f235035d7f1da6df512d4ba97b7.png) | ||
10 | - | ||
11 | - | ||
12 | -### 安装 | ||
13 | - | ||
14 | -```shell | ||
15 | -cd stash && go build stash.go | ||
16 | -``` | ||
17 | - | ||
18 | -### Quick Start | ||
19 | - | ||
20 | -```shell | ||
21 | -./stash -f etc/config.yaml | ||
22 | -``` | ||
23 | - | ||
24 | -config.yaml示例如下: | ||
25 | - | ||
26 | -```yaml | ||
27 | -Clusters: | ||
28 | -- Input: | ||
29 | - Kafka: | ||
30 | - Name: go-stash | ||
31 | - Log: | ||
32 | - Mode: file | ||
33 | - Brokers: | ||
34 | - - "172.16.48.41:9092" | ||
35 | - - "172.16.48.42:9092" | ||
36 | - - "172.16.48.43:9092" | ||
37 | - Topic: ngapplog | ||
38 | - Group: stash | ||
39 | - Conns: 3 | ||
40 | - Consumers: 10 | ||
41 | - Processors: 60 | ||
42 | - MinBytes: 1048576 | ||
43 | - MaxBytes: 10485760 | ||
44 | - Offset: first | ||
45 | - Filters: | ||
46 | - - Action: drop | ||
47 | - Conditions: | ||
48 | - - Key: status | ||
49 | - Value: 503 | ||
50 | - Type: contains | ||
51 | - - Key: type | ||
52 | - Value: "app" | ||
53 | - Type: match | ||
54 | - Op: and | ||
55 | - - Action: remove_field | ||
56 | - Fields: | ||
57 | - - message | ||
58 | - - source | ||
59 | - - beat | ||
60 | - - fields | ||
61 | - - input_type | ||
62 | - - offset | ||
63 | - - "@version" | ||
64 | - - _score | ||
65 | - - _type | ||
66 | - - clientip | ||
67 | - - http_host | ||
68 | - - request_time | ||
69 | - Output: | ||
70 | - ElasticSearch: | ||
71 | - Hosts: | ||
72 | - - "http://172.16.188.73:9200" | ||
73 | - - "http://172.16.188.74:9200" | ||
74 | - - "http://172.16.188.75:9200" | ||
75 | - Index: "go-stash-{{yyyy.MM.dd}}" | ||
76 | - MaxChunkBytes: 5242880 | ||
77 | - GracePeriod: 10s | ||
78 | - Compress: false | ||
79 | - TimeZone: UTC | ||
80 | -``` | ||
81 | - | ||
82 | -## 详细说明 | ||
83 | - | ||
84 | -### input | ||
85 | - | ||
86 | -```shell | ||
87 | -Conns: 3 | ||
88 | -Consumers: 10 | ||
89 | -Processors: 60 | ||
90 | -MinBytes: 1048576 | ||
91 | -MaxBytes: 10485760 | ||
92 | -Offset: first | ||
93 | -``` | ||
94 | -#### Conns | ||
95 | - 链接kafka的链接数,链接数依据cpu的核数,一般<= CPU的核数; | ||
96 | - | ||
97 | -#### Consumers | ||
98 | - 每个连接数打开的线程数,计算规则为Conns * Consumers,不建议超过分片总数,比如topic分片为30,Conns *Consumers <= 30 | ||
99 | - | ||
100 | -#### Processors | ||
101 | - 处理数据的线程数量,依据CPU的核数,可以适当增加,建议配置:Conns * Consumers * 2 或 Conns * Consumers * 3,例如:60 或 90 | ||
102 | - | ||
103 | -#### MinBytes MaxBytes | ||
104 | - 每次从kafka获取数据块的区间大小,默认为1M~10M,网络和IO较好的情况下,可以适当调高 | ||
105 | - | ||
106 | -#### Offset | ||
107 | - 可选last和false,默认为last,表示从头从kafka开始读取数据 | ||
108 | - | ||
109 | - | ||
110 | -### Filters | ||
111 | - | ||
112 | -```shell | ||
113 | -- Action: drop | ||
114 | - Conditions: | ||
115 | - - Key: k8s_container_name | ||
116 | - Value: "-rpc" | ||
117 | - Type: contains | ||
118 | - - Key: level | ||
119 | - Value: info | ||
120 | - Type: match | ||
121 | - Op: and | ||
122 | -- Action: remove_field | ||
123 | - Fields: | ||
124 | - - message | ||
125 | - - _source | ||
126 | - - _type | ||
127 | - - _score | ||
128 | - - _id | ||
129 | - - "@version" | ||
130 | - - topic | ||
131 | - - index | ||
132 | - - beat | ||
133 | - - docker_container | ||
134 | - - offset | ||
135 | - - prospector | ||
136 | - - source | ||
137 | - - stream | ||
138 | -- Action: transfer | ||
139 | - Field: message | ||
140 | - Target: data | ||
141 | - | ||
142 | -``` | ||
143 | -#### - Action: drop | ||
144 | - - 删除标识:满足此条件的数据,在处理时将被移除,不进入es | ||
145 | - - 按照删除条件,指定key字段及Value的值,Type字段可选contains(包含)或match(匹配) | ||
146 | - - 拼接条件Op: and,也可写or | ||
147 | - | ||
148 | -#### - Action: remove_field | ||
149 | - 移除字段标识:需要移除的字段,在下面列出即可 | ||
150 | - | ||
151 | -#### - Action: transfer | ||
152 | - 转移字段标识:例如可以将message字段,重新定义为data字段 | ||
153 | - | ||
154 | - | ||
155 | -### Output | ||
156 | - | ||
157 | -#### Index | ||
158 | - 索引名称,indexname-{{yyyy.MM.dd}}表示年.月.日,也可以用{{yyyy-MM-dd}},格式自己定义 | ||
159 | - | ||
160 | -#### MaxChunkBytes | ||
161 | - 每次往ES提交的bulk大小,默认是5M,可依据ES的io情况,适当的调整 | ||
162 | - | ||
163 | -#### GracePeriod | ||
164 | - 默认为10s,在程序关闭后,在10s内用于处理余下的消费和数据,优雅退出 | ||
165 | - | ||
166 | -#### Compress | ||
167 | - 数据压缩,压缩会减少传输的数据量,但会增加一定的处理性能,可选值true/false,默认为false | ||
168 | - | ||
169 | -#### TimeZone | ||
170 | - 默认值为UTC,世界标准时间 | ||
171 | - | ||
172 | - | ||
173 | - | ||
174 | - | ||
175 | - | ||
176 | -## ES性能写入测试 | ||
177 | - | ||
178 | - | ||
179 | -### 测试环境 | ||
180 | -- stash服务器:3台 4核 8G | ||
181 | -- es服务器: 15台 16核 64G | ||
182 | - | ||
183 | -### 关键配置 | ||
184 | - | ||
185 | -```shell | ||
186 | -- Input: | ||
187 | - Conns: 3 | ||
188 | - Consumers: 10 | ||
189 | - Processors: 60 | ||
190 | - MinBytes: 1048576 | ||
191 | - MaxBytes: 10485760 | ||
192 | - Filters: | ||
193 | - - Action: remove_field | ||
194 | - Fields: | ||
195 | - - message | ||
196 | - - source | ||
197 | - - beat | ||
198 | - - fields | ||
199 | - - input_type | ||
200 | - - offset | ||
201 | - - request_time | ||
202 | - Output: | ||
203 | - Index: "nginx_pro-{{yyyy.MM.d}}" | ||
204 | - Compress: false | ||
205 | - MaxChunkBytes: 5242880 | ||
206 | - TimeZone: UTC | ||
207 | -``` | ||
208 | - | ||
209 | -### 写入速度平均在15W/S以上 | ||
210 | -![go-stash](https://pro-public.xiaoheiban.cn/icon/ee207a1cb094c0b3dcaa91ae75b118b8.png) | ||
211 | - | ||
212 | - | ||
213 | -### 微信交流群 | ||
214 | - | ||
215 | -加群之前有劳给一个star,一个小小的star是作者们回答问题的动力。 | ||
216 | - | ||
217 | -如果文档中未能覆盖的任何疑问,欢迎您在群里提出,我们会尽快答复。 | ||
218 | - | ||
219 | -您可以在群内提出使用中需要改进的地方,我们会考虑合理性并尽快修改。 | ||
220 | - | ||
221 | -如果您发现bug请及时提issue,我们会尽快确认并修改。 | ||
222 | - | ||
223 | -添加我的微信:kevwan,请注明go-stash,我拉进go-stash社区群🤝 | ||
224 | - | ||
225 | -### --END |
readme-en.md
0 → 100644
1 | +English | [简体中文](readme-cn.md) | ||
2 | + | ||
3 | +# go-stash | ||
4 | + | ||
5 | +go-stash is a high performance, free and open source server-side data processing pipeline that ingests data from Kafka, processes it, and then sends it to ElasticSearch. | ||
6 | + | ||
7 | +go-stash is about 4x throughput more than logstash, and easy to deploy, only one executable file. | ||
8 | + | ||
9 | +![go-stash](doc/flow.png) | ||
10 | + | ||
11 | +## Quick Start | ||
12 | + | ||
13 | +```shell | ||
14 | +gostash -f etc/config.yaml | ||
15 | +``` | ||
16 | + | ||
17 | +config.yaml example as below: | ||
18 | + | ||
19 | +```yaml | ||
20 | +Clusters: | ||
21 | +- Input: | ||
22 | + Kafka: | ||
23 | + Name: gostash | ||
24 | + Brokers: | ||
25 | + - "172.16.186.16:19092" | ||
26 | + - "172.16.186.17:19092" | ||
27 | + Topics: | ||
28 | + - k8slog | ||
29 | + Group: pro | ||
30 | + Consumers: 16 | ||
31 | + Filters: | ||
32 | + - Action: drop | ||
33 | + Conditions: | ||
34 | + - Key: k8s_container_name | ||
35 | + Value: "-rpc" | ||
36 | + Type: contains | ||
37 | + - Key: level | ||
38 | + Value: info | ||
39 | + Type: match | ||
40 | + Op: and | ||
41 | + - Action: remove_field | ||
42 | + Fields: | ||
43 | + - message | ||
44 | + - _source | ||
45 | + - _type | ||
46 | + - _score | ||
47 | + - _id | ||
48 | + - "@version" | ||
49 | + - topic | ||
50 | + - index | ||
51 | + - beat | ||
52 | + - docker_container | ||
53 | + - Action: transfer | ||
54 | + Field: message | ||
55 | + Target: data | ||
56 | + Output: | ||
57 | + ElasticSearch: | ||
58 | + Hosts: | ||
59 | + - "172.16.141.4:9200" | ||
60 | + - "172.16.141.5:9200" | ||
61 | + # {.event} is the value of the json attribute from input | ||
62 | + # {{yyyy-MM-dd}} means date, like 2020-09-09 | ||
63 | + Index: {.event}-{{yyyy-MM-dd}} | ||
64 | +``` |
1 | -English | [简体中文](readme-cn.md) | 1 | +[English](readme.md) | 简体中文 |
2 | 2 | ||
3 | -# go-stash | 3 | +# go-stash简介 |
4 | 4 | ||
5 | -go-stash is a high performance, free and open source server-side data processing pipeline that ingests data from Kafka, processes it, and then sends it to ElasticSearch. | 5 | +go-stash是一个高效的从Kafka获取,根据配置的规则进行处理,然后发送到ElasticSearch集群的工具。 |
6 | 6 | ||
7 | -go-stash is about 4x throughput more than logstash, and easy to deploy, only one executable file. | 7 | +go-stash有大概logstash 5倍的吞吐性能,并且部署简单,一个可执行文件即可。 |
8 | 8 | ||
9 | -![go-stash](doc/flow.png) | 9 | +![go-stash](https://pro-public.xiaoheiban.cn/icon/84cc2f235035d7f1da6df512d4ba97b7.png) |
10 | 10 | ||
11 | -## Quick Start | 11 | + |
12 | +### 安装 | ||
13 | + | ||
14 | +```shell | ||
15 | +cd stash && go build stash.go | ||
16 | +``` | ||
17 | + | ||
18 | +### Quick Start | ||
12 | 19 | ||
13 | ```shell | 20 | ```shell |
14 | -gostash -f etc/config.yaml | 21 | +./stash -f etc/config.yaml |
15 | ``` | 22 | ``` |
16 | 23 | ||
17 | -config.yaml example as below: | 24 | +config.yaml示例如下: |
18 | 25 | ||
19 | ```yaml | 26 | ```yaml |
20 | Clusters: | 27 | Clusters: |
21 | - Input: | 28 | - Input: |
22 | Kafka: | 29 | Kafka: |
23 | - Name: gostash | 30 | + Name: go-stash |
31 | + Log: | ||
32 | + Mode: file | ||
24 | Brokers: | 33 | Brokers: |
25 | - - "172.16.186.16:19092" | ||
26 | - - "172.16.186.17:19092" | ||
27 | - Topics: | ||
28 | - - k8slog | ||
29 | - Group: pro | ||
30 | - Consumers: 16 | 34 | + - "172.16.48.41:9092" |
35 | + - "172.16.48.42:9092" | ||
36 | + - "172.16.48.43:9092" | ||
37 | + Topic: ngapplog | ||
38 | + Group: stash | ||
39 | + Conns: 3 | ||
40 | + Consumers: 10 | ||
41 | + Processors: 60 | ||
42 | + MinBytes: 1048576 | ||
43 | + MaxBytes: 10485760 | ||
44 | + Offset: first | ||
31 | Filters: | 45 | Filters: |
32 | - Action: drop | 46 | - Action: drop |
33 | Conditions: | 47 | Conditions: |
48 | + - Key: status | ||
49 | + Value: 503 | ||
50 | + Type: contains | ||
51 | + - Key: type | ||
52 | + Value: "app" | ||
53 | + Type: match | ||
54 | + Op: and | ||
55 | + - Action: remove_field | ||
56 | + Fields: | ||
57 | + - message | ||
58 | + - source | ||
59 | + - beat | ||
60 | + - fields | ||
61 | + - input_type | ||
62 | + - offset | ||
63 | + - "@version" | ||
64 | + - _score | ||
65 | + - _type | ||
66 | + - clientip | ||
67 | + - http_host | ||
68 | + - request_time | ||
69 | + Output: | ||
70 | + ElasticSearch: | ||
71 | + Hosts: | ||
72 | + - "http://172.16.188.73:9200" | ||
73 | + - "http://172.16.188.74:9200" | ||
74 | + - "http://172.16.188.75:9200" | ||
75 | + Index: "go-stash-{{yyyy.MM.dd}}" | ||
76 | + MaxChunkBytes: 5242880 | ||
77 | + GracePeriod: 10s | ||
78 | + Compress: false | ||
79 | + TimeZone: UTC | ||
80 | +``` | ||
81 | + | ||
82 | +## 详细说明 | ||
83 | + | ||
84 | +### input | ||
85 | + | ||
86 | +```shell | ||
87 | +Conns: 3 | ||
88 | +Consumers: 10 | ||
89 | +Processors: 60 | ||
90 | +MinBytes: 1048576 | ||
91 | +MaxBytes: 10485760 | ||
92 | +Offset: first | ||
93 | +``` | ||
94 | +#### Conns | ||
95 | + 链接kafka的链接数,链接数依据cpu的核数,一般<= CPU的核数; | ||
96 | + | ||
97 | +#### Consumers | ||
98 | + 每个连接数打开的线程数,计算规则为Conns * Consumers,不建议超过分片总数,比如topic分片为30,Conns *Consumers <= 30 | ||
99 | + | ||
100 | +#### Processors | ||
101 | + 处理数据的线程数量,依据CPU的核数,可以适当增加,建议配置:Conns * Consumers * 2 或 Conns * Consumers * 3,例如:60 或 90 | ||
102 | + | ||
103 | +#### MinBytes MaxBytes | ||
104 | + 每次从kafka获取数据块的区间大小,默认为1M~10M,网络和IO较好的情况下,可以适当调高 | ||
105 | + | ||
106 | +#### Offset | ||
107 | + 可选last和false,默认为last,表示从头从kafka开始读取数据 | ||
108 | + | ||
109 | + | ||
110 | +### Filters | ||
111 | + | ||
112 | +```shell | ||
113 | +- Action: drop | ||
114 | + Conditions: | ||
34 | - Key: k8s_container_name | 115 | - Key: k8s_container_name |
35 | Value: "-rpc" | 116 | Value: "-rpc" |
36 | Type: contains | 117 | Type: contains |
@@ -38,7 +119,7 @@ Clusters: | @@ -38,7 +119,7 @@ Clusters: | ||
38 | Value: info | 119 | Value: info |
39 | Type: match | 120 | Type: match |
40 | Op: and | 121 | Op: and |
41 | - - Action: remove_field | 122 | +- Action: remove_field |
42 | Fields: | 123 | Fields: |
43 | - message | 124 | - message |
44 | - _source | 125 | - _source |
@@ -50,15 +131,93 @@ Clusters: | @@ -50,15 +131,93 @@ Clusters: | ||
50 | - index | 131 | - index |
51 | - beat | 132 | - beat |
52 | - docker_container | 133 | - docker_container |
53 | - - Action: transfer | 134 | + - offset |
135 | + - prospector | ||
136 | + - source | ||
137 | + - stream | ||
138 | +- Action: transfer | ||
54 | Field: message | 139 | Field: message |
55 | Target: data | 140 | Target: data |
141 | + | ||
142 | +``` | ||
143 | +#### - Action: drop | ||
144 | + - 删除标识:满足此条件的数据,在处理时将被移除,不进入es | ||
145 | + - 按照删除条件,指定key字段及Value的值,Type字段可选contains(包含)或match(匹配) | ||
146 | + - 拼接条件Op: and,也可写or | ||
147 | + | ||
148 | +#### - Action: remove_field | ||
149 | + 移除字段标识:需要移除的字段,在下面列出即可 | ||
150 | + | ||
151 | +#### - Action: transfer | ||
152 | + 转移字段标识:例如可以将message字段,重新定义为data字段 | ||
153 | + | ||
154 | + | ||
155 | +### Output | ||
156 | + | ||
157 | +#### Index | ||
158 | + 索引名称,indexname-{{yyyy.MM.dd}}表示年.月.日,也可以用{{yyyy-MM-dd}},格式自己定义 | ||
159 | + | ||
160 | +#### MaxChunkBytes | ||
161 | + 每次往ES提交的bulk大小,默认是5M,可依据ES的io情况,适当的调整 | ||
162 | + | ||
163 | +#### GracePeriod | ||
164 | + 默认为10s,在程序关闭后,在10s内用于处理余下的消费和数据,优雅退出 | ||
165 | + | ||
166 | +#### Compress | ||
167 | + 数据压缩,压缩会减少传输的数据量,但会增加一定的处理性能,可选值true/false,默认为false | ||
168 | + | ||
169 | +#### TimeZone | ||
170 | + 默认值为UTC,世界标准时间 | ||
171 | + | ||
172 | + | ||
173 | + | ||
174 | + | ||
175 | + | ||
176 | +## ES性能写入测试 | ||
177 | + | ||
178 | + | ||
179 | +### 测试环境 | ||
180 | +- stash服务器:3台 4核 8G | ||
181 | +- es服务器: 15台 16核 64G | ||
182 | + | ||
183 | +### 关键配置 | ||
184 | + | ||
185 | +```shell | ||
186 | +- Input: | ||
187 | + Conns: 3 | ||
188 | + Consumers: 10 | ||
189 | + Processors: 60 | ||
190 | + MinBytes: 1048576 | ||
191 | + MaxBytes: 10485760 | ||
192 | + Filters: | ||
193 | + - Action: remove_field | ||
194 | + Fields: | ||
195 | + - message | ||
196 | + - source | ||
197 | + - beat | ||
198 | + - fields | ||
199 | + - input_type | ||
200 | + - offset | ||
201 | + - request_time | ||
56 | Output: | 202 | Output: |
57 | - ElasticSearch: | ||
58 | - Hosts: | ||
59 | - - "172.16.141.4:9200" | ||
60 | - - "172.16.141.5:9200" | ||
61 | - # {.event} is the value of the json attribute from input | ||
62 | - # {{yyyy-MM-dd}} means date, like 2020-09-09 | ||
63 | - Index: {.event}-{{yyyy-MM-dd}} | 203 | + Index: "nginx_pro-{{yyyy.MM.d}}" |
204 | + Compress: false | ||
205 | + MaxChunkBytes: 5242880 | ||
206 | + TimeZone: UTC | ||
64 | ``` | 207 | ``` |
208 | + | ||
209 | +### 写入速度平均在15W/S以上 | ||
210 | +![go-stash](https://pro-public.xiaoheiban.cn/icon/ee207a1cb094c0b3dcaa91ae75b118b8.png) | ||
211 | + | ||
212 | + | ||
213 | +### 微信交流群 | ||
214 | + | ||
215 | +加群之前有劳给一个star,一个小小的star是作者们回答问题的动力。 | ||
216 | + | ||
217 | +如果文档中未能覆盖的任何疑问,欢迎您在群里提出,我们会尽快答复。 | ||
218 | + | ||
219 | +您可以在群内提出使用中需要改进的地方,我们会考虑合理性并尽快修改。 | ||
220 | + | ||
221 | +如果您发现bug请及时提issue,我们会尽快确认并修改。 | ||
222 | + | ||
223 | +添加我的微信:kevwan,请注明go-stash,我拉进go-stash社区群🤝 |
-
请 注册 或 登录 后发表评论