Eucalyptus - cannot find nodes

Previously, we just have bought a cool machines supposedly to be setup as Node Controllers for our cloud master machine (running Walrus Controller, Cloud Controller, Storage Controller, and Cluster Controller).

The current setup we have is,

SERVER1 - Cloud master running i3 with 12GB of memory
SERVER2 and SERVER3 - running i7 3.0 with 16GB of memory

With our setup, our ideal result must have 62 available instances to be run inside the cloud, however, due to the inconsistency we found (check this http://open.eucalyptus.com/wiki/EucalyptusKnownBugs_v1.5.2) with the problem with cloud master against NC's, issuing the command


#> euca-describe-availability-zones verbose


AVAILABILITYZONE kinten-cloud 122.2.1.4
AVAILABILITYZONE |- vm types free / max   cpu   ram  disk
AVAILABILITYZONE |- m1.small 0031 / 0031   1    512    10
AVAILABILITYZONE |- c1.medium 0015 / 0015   1   1024    15
AVAILABILITYZONE |- m1.large 0007 / 0007   2   2048    20
AVAILABILITYZONE |- m1.xlarge 0003 / 0003   4   4096    30
AVAILABILITYZONE |- c1.xlarge 0001 / 0001   8   8192    40

which the result is not correct. It only able to see one NC (node controller). So fixing this, I tried to look on axis2c.log under /var/log/eucalyptus directory, and I see the errors below


[Tue Jun 14 06:04:40 2011] [error] rampart_timestamp_token.c(179) [rampart]Timestamp not valid: Created time is not valid
[Tue Jun 14 06:04:40 2011] [error] error.c(94) OXS ERROR [euca_axis.c:364 in verify_node] element failed , Validation failed for Timestamp with ID = #SigID-232793f0-9609-1e01-3638
[Tue Jun 14 06:04:40 2011] [error] euca_axis.c(322) [rampart][eucalyptus-verify] "Failed to verify location of signed elements!"
[Tue Jun 14 06:04:40 2011] [error] rampart_engine.c(159) [rampart][rampart_engine] Cannot get saved rampart_context
[Tue Jun 14 06:04:40 2011] [error] rampart_out_handler.c(136) [rampart][rampart_out_handler] ramaprt_context creation failed.
[Tue Jun 14 06:04:40 2011] [error] phase.c(233) Handler RampartOutHandler invoke failed within phase MessageOut
[Tue Jun 14 06:04:40 2011] [error] engine.c(696) Invoking phase MessageOut failed


This has to be the problem of ntp which the cloud master and the NC weren't sync at all. To fix this, I issued

#> ntpdate 192.168.1.1

where 192.168.1.1 is the IP of the cloud master inside the private network (server1, server2 and server3).  IP 122.2.1.4 is within the LAN.

Before running ntpdate, make sure that your ntp daemon is not running, else you can do 

#> /etc/init.d/ntp restart

be sure you have properly edited /etc/ntp.conf and have your server added there. Mine its

server 192.168.1.1

After all, I have working nodes found.

#> euca-describe-availability-zones verbose
AVAILABILITYZONE kinten-cloud 122.2.1.4
AVAILABILITYZONE |- vm types free / max   cpu   ram  disk
AVAILABILITYZONE |- m1.small 0062 / 0062   1    512    10
AVAILABILITYZONE |- c1.medium 0030 / 0030   1   1024    15
AVAILABILITYZONE |- m1.large 0014 / 0014   2   2048    20
AVAILABILITYZONE |- m1.xlarge 0006 / 0006   4   4096    30
AVAILABILITYZONE |- c1.xlarge 0002 / 0002   8   8192    40



Hope this will fix in your end.


Comments

Anonymous said…
I have the same problem with Eucalytus 2.0.0 on ubuntu 10.10 (fully updated). Your fix works but only sporadically.
Toytoy said…
I agree with you. I actually manage this by creating a cron job in order to make things working and it was stable as I notice. I'm going to update this post with regards on that. Thank you for posting comments and visiting my blog.

Popular posts from this blog

Using Oracle 11g thru VirtualBox appliance in Mac OS X Lion

Use Shell Editor for Eclipse for editing bash, ksh, csh in Unix/Linux system

LVM: How to remove a volume using pvremove