[ZBXNEXT-1572] Active / Active High Availability Zabbix Created: 2013 Jan 14 Updated: 2022 May 25 Resolved: 2022 May 25 |
|
Status: | Closed |
Project: | ZABBIX FEATURE REQUESTS |
Component/s: | Server (S) |
Affects Version/s: | 2.0.4, 3.0.29, 4.0.17, 4.4.5, 5.0.0alpha1 |
Fix Version/s: | None |
Type: | New Feature Request | Priority: | Major |
Reporter: | Simon Tsang | Assignee: | Unassigned |
Resolution: | Workaround proposed | Votes: | 35 |
Labels: | highavailability | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Linux |
Description |
Currently, presently Active/Active is not possible with Zabbix without many modifications (according to the zabbix howto wiki). Is Active/Active HA on the roadmap for Zabbix in near future? It would be very nice if Zabbix can support that. Any plan for Zabbix to support Postgres-XC also? Thank you. |
Comments |
Comment by richlv [ 2013 Jan 14 ] |
to clarify, this is referring to active/active zabbix server clustering possibility ? |
Comment by David Israel [ 2013 Mar 09 ] |
What if the https://www.zabbix.com/documentation/2.0/manual/appendix/items/activepassive active agent configuration could be used to send the same data to two Zabbix servers? From the documentation it is not clear if the checks for both active servers would be run twice or not. If not this is a reasonable HA solution already. |
Comment by Onno Steenbergen [ 2013 May 01 ] |
My view is on active/active is having the possibility to run multiple servers using a single database. if there is one server then it behaves as normal. Running a second zabbix server using the same database will distribute the load. Agents can report to both servers or a roaming ip (which resolves to one server). It should also be possible to run a distributed monitoring setup in Active/Active. The result is similar as discussed above only instead of distributing the load randomly hosts get monitored using a specific node. My current situation:
Currently if node 2 fails I need to restore the virtual machine from a backup which results in downtime. Also if a device in preproduction is promoted it needs to be removed from node 3 and added to 1 or 2 losing all data. The ideal situation: The database has a list of all hosts/items and a list of preferred nodes for each host. The zabbix servers monitor each other with keep-a-lives and if all preferred nodes are unavailable an other node tries to monitor the hosts. Will need a second list to assign nodes that replace an other node (A can be replace by B&C but not by D&E). On a side note: Some DB clusters, such as PostgreSQL, do not support Active/Active but support read-only backup nodes. Frontend could use read-only DB nodes to reduce the load on the DB server. Other tweaks to reduce the number of connections to the DB are probably necessary. To summarize: |
Comment by Murat KoƧ [ 2013 May 01 ] |
Use galera ( http://codership.com/content/using-galera-cluster ) with mysql or oracle RAC ( if you have enough money And I suggest to use haproxy in front of galera cluster for both distributing load and showing only one database IP to the zabbix servers. We use galera in different kinds of production systems and happy with it Since you are using virtual machines you can clone the vm and use it as a failover virtual machine. I think that setup will solve all of your problems. |
Comment by Onno Steenbergen [ 2013 May 15 ] |
Replacing the DB by a master-master cluster doesn't solve all issues.
I know active-active databases have mediators to resolve split brains, but if you have two data centers and there is no connection between them the database is only active in the one where the mediator is. Now you don't have any monitoring in the other data center as it cannot store any data. |
Comment by Onno Steenbergen [ 2013 May 15 ] |
Maybe it is easier to assume that each zabbix node has its own DB and it needs to be able to sync with other nodes. And in case of failure a the remaining zabbix nodes divide the labour if the node can reach that network. |
Comment by Sol Arioto [ 2013 Nov 07 ] |
Where are we with this? this is a major upset with upgrading and maintenance down times not having a Active / Active Solution |
Comment by jagadeeswar Reddy [ 2017 Feb 06 ] |
high availability zabbix cluster when every components for this system should be failed over when issues comes up |
Comment by Oleksii Zagorskyi [ 2022 May 25 ] |
In 6.0 we have some clustering solution implemented, so many things discussed here are not actual anymore. If needed, new requests should be created. Let's close this one. |