tag:blogger.com,1999:blog-6431746400497756040.post5245329835286762810..comments2023-02-14T23:24:39.195+00:00Comments on Eran Gampel Blog : Dragonflow Distributed DHCP for OpenStack Neutron done the SDN wayEran Gampelhttp://www.blogger.com/profile/03346930681002907532noreply@blogger.comBlogger3125tag:blogger.com,1999:blog-6431746400497756040.post-77888615622811411712015-09-30T08:03:30.571+01:002015-09-30T08:03:30.571+01:00Armando,
I agree with you that some parts could h...Armando,<br /><br />I agree with you that some parts could have been rephrased better, but i wanted to add my point of view on this.<br /><br />Management / HA / Scalability<br /><br />The root problem here, in my eyes, is the big amount of entities that are used to implement<br />a specific feature.<br />(In our case all these namespaces with dnsmasq)<br /><br />This brings many problems in terms of deployment and configuration.<br />It makes debugging/management hard for the operators and users, there are many possible spots <br />of failures and when management/monitoring software for OpenStack mature it will be hard for them to keep track.<br /><br />Yes, as you mentioned there are solutions (for example the process monitor) and you can deploy things at various different points in your setup but this again makes the code more complex and harder to maintain, it introduce much more possible failure scenarios.<br />It makes it harder for the deployers to find this "sweet spot" as you call it.<br /><br />For the HA, i think that a problematic behaviour in the reference implementation is that HA design is being looked in as a per feature/area basis instead of in a global implementation look.<br />Up until now, these solutions involve duplicating the entities involved (the dnsmasq namespaces in this case and the virtual router namespaces in the L3 case).<br /><br />This again introduce all the above problems (management complexity/ code complexity) in addition to now added synchronisation issues and all the various different implementations and dependencies used to solve the redundancy for each individual feature.<br /><br />Performance wise, i agree with you that posting declarations without actual experiments doesnt make much sense, but after spending a lot of time in performance improvement area i can honestly say that even with tests its hard to get a clear picture because there are so many different cases and environments possibilities that its hard to get a conclusive statement.<br />Thats why i think its important to understand the alternative and the various solutions.<br />I think the solution described by Eran solves many of the problems mentioned above and also decrease broadcast/unicast DHCP traffic and the resource consumptions (which when you look at the overall deployment, adds up even if it doesnt look like it when you just consider DHCP).<br /><br />For the distribution of dnsmasq namespaces to the compute nodes, this solves the traffic problem but makes all the other problems much worse and also introduce so many other possible abnormal scenarios when we start synching everything using RPC. (not to mention HA/code complexity)<br /><br />Yes, there are also advantages to the current solution. mainly the fact that its well tested and already deployed.<br />I think however that we all strive to make Neutron implementation better in an open way, and the approach proposed here in addition to other projects which goes with the same path all try to do so in an open source way (At least thats my own intention and goal)<br />I agree with you that we must strive to be very accurate in terms of our descriptions because<br />its hard for the users which don't have enough time to invest in investigating each different implementation (maybe we should make something formal in this area)<br /><br />I am personally very happy to see all the possible alternatives implementations for Neutron and try to get familiar with as many as my bandwidth allow me and also participate as an active contributor, i think we should continue to be judgemental to the various designs, each has its pros and cons but we should also strive to solve things in a simple way which is easy to manage/maintain by our users.<br /><br /> <br />Just my take on the things :)<br />Anonymoushttps://www.blogger.com/profile/11597491831744678991noreply@blogger.comtag:blogger.com,1999:blog-6431746400497756040.post-75037143213878404112015-09-29T21:25:28.754+01:002015-09-29T21:25:28.754+01:00Thanks for the comment, Armando.
The stock DHCP is...Thanks for the comment, Armando.<br />The stock DHCP is not a bad solution, it works and it's probably good enough for many users.<br /><br />But As you said, it is all about choices, and in this post we highlighted why we chose a different path from the stock DHCP implementation, we wanted to show the problems in that implementation that lead us and others to seek distributed DHCP solution.<br /><br />I think that comparison is inevitable when you have multi choice and alternative implementations, but by no means I indented to say that one implementation is “bad”.<br /><br />In the end I agree it is all about the numbers what is the scale that you seek, I will try to get the numbers that lead my companies public cloud solution to look for a distributed DHCP solution and not use the DHCP agent and if possible publish them.<br /><br />I think a Namespace + (vNIC) + dnsmasq process at the least per subnet, extra network broadcast traffic for the DHCP messages , maintenance of the DHCP agents are significant resource overhead, when compared to a local controller handling it without any overhead outside the local VM compute node.<br />Eran Gampelhttps://www.blogger.com/profile/03346930681002907532noreply@blogger.comtag:blogger.com,1999:blog-6431746400497756040.post-72552690266482048292015-09-29T05:05:10.750+01:002015-09-29T05:05:10.750+01:00Neutron's DHCP implementation is far from bein...Neutron's DHCP implementation is far from being perfect, but there are a number of inaccuracies with issues as stated above, and you are depicting a picture far worse than it actually is.<br /><br />a) Management - You need to configure and manage multiple instances of DNSMASQ. One of the known stability issues in Neutron is related to these processes sometime silently & inexplicably dying. <br /><br />Nothing dies inexplicably nor silently, and you should know that Neutron has built-in process monitors that revive processes should they die.<br /><br />b) HA is achieved by running an additional DHCP server instance per subnet on a different Network Node which adds another layer of complexity. <br /><br />HA is achieved by running multiple dhcp servers per network; yes you may need more than one network node, but you can distribute the dhcp function across compute nodes too. This is a deployment option you have.<br /><br />c) Scalability - As a centralized solution that depends on the Network Node, it has serious limitations in scale. As the number of tenants/subnets grow, you add more and more running instances of DNSMASQ, all on the same Network Node. If you want to split the DNSMASQs to different Network Nodes, you end-up with significantly worse management complexity. <br /><br />The reference architecture based on network nodes is 'one' deployment architecture; it is not the only one. As I mentioned above you can hit the sweet spot for your environment if you know what you are doing.<br /><br />Having said that, Neutron community developers have also worked in pushing the architecture boundaries with:<br /><br />https://blueprints.launchpad.net/neutron/+spec/distributed-dhcp<br /><br />d) Performance - Using both a dedicated Namespace and a dedicated DNSMASQ instance per subnet is very resource heavy. <br /><br />Without actual numbers and what you mean by heavy, this is a very shallow statement: I could search and replace Namespace and DNSMASQ with Dragonflow and I might get away with it just as well. But I typically refrain from making such claims. Namespaces are very lightweight and process isolation is very lean: you can run thousands of networks on a single nodes before it goes belly up, but then again that depends from a multitude of variables.<br /><br />Do not get me wrong: I think that controller-based approach like Dragonflow are great and I very much welcome these types of initiatives within the OpenStack ecosystem. Choice is great and I wish we had more of these SDN-based options going around as we see today when Neutron had started.<br /><br />You don't have to label a choice as 'bad' to promote another, they are all choices after all and if you are impartial you'll have better chances to be heard.<br /><br />Armando Migliaccio - aka armax<br />Neutron Project Team LeadAnonymousnoreply@blogger.com