-
Notifications
You must be signed in to change notification settings - Fork 64
Fix SDN get and watch resource workflow #241
Fix SDN get and watch resource workflow #241
Conversation
Tested on multi-node dev cluster |
So I guess the overall idea is that NewListWatchFromClient() does what watchAndGetResource() was trying to do, except without the bug? The patches seem good, though I want to look through some parts of it again (and other people looking at them would still be great). |
Looks like a good cleanup to me, with the exception of the 5 second wait for service watching. That I wish we could make more reliable. |
@@ -399,18 +303,18 @@ func (registry *Registry) DeleteNetNamespace(name string) error { | |||
} | |||
|
|||
func (registry *Registry) GetServicesForNamespace(namespace string) ([]osdnapi.Service, error) { | |||
services, _, err := registry.getServices(namespace) | |||
services, err := registry.getServices(namespace) | |||
return services, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the removal of the startVersion argument this can just be return registry.getServices(namespace)
, but then also, you could just move the code from getServices() into this function and have GetServices() call GetServicesForNamespace(kapi.NamespaceAll)
assuming it passes tests/extended/networking.sh, and fixes 1275904, then LGTM |
e9eafe7
to
47df54d
Compare
47df54d
to
8debfa8
Compare
Rebased and patched on top of master branch. Added some more changes to ditch ugly 5 second wait for service watching. Ready for review/merge. |
Tested and passed extended networking tests (test/extended/networking.sh) |
Good cleanups... Should we log something in the Watch* functions when the eventQueue fails, as long as the failure isn't "I'm done"? Otherwise transient errors could quietly kill the event queues and we'll be none-the-wiser from the logs. Also, I assume that's how we know when to terminate, when eventQueue.Pop() returns some "done" error? Next outside the registry, how does subnets.go::watchSubnets() or watchNodes() know when to break out of the for() loop now that 'oc.stop' is gone? One behavioral change (though I'm not sure if it matters?) is that now on startup, the "get all the things" calls will no longer block the main goroutine but instead are now done from a goroutine. That might cause race issues, since I think most of the stuff is currently done synchronously on startup. Not sure though? |
8debfa8
to
bd7cf1d
Compare
I think this will cause problems with endpoint filtering: the first call to OnEndpointsUpdate() might happen before WatchPods() has completely filled in registry.namespaceOfPodIP, causing warnings and incorrect filtering. It will eventually recover (since it gets asked to filter the entire list of endpoints every time any service changes), but it might cause problems at startup. |
Other than that though, LGTM |
On Wed, Apr 6, 2016 at 1:56 PM, Dan Williams [email protected]
More challenging issue is synchronizing sdn master/node, I have seen 2 bugs
|
Updated, fixed endpoints filtering during startup. |
LGTM now I think |
GetNetNamespaces and GetServices. No longer needed. Remove unused methods GetPods, GetNodes and GetNamespaces.
This will ensure subsequent VNID lookups in watchNamespaces(), watchNetNamespaces() and watchServices() succeeds.
so handle approriately during watching resources(services,namespaces,etc.)
1868d12
to
a7c0b12
Compare
Rebased and resolved merge conflicts. |
Trello card: https://trello.com/c/hB3SyOLw
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1275904