Monday, December 14, 2009

Oracle RAC One Node – part 2

In this post I want to share with you some scenarios about RAC One Node and client behaviour. In that part I’m using SQL*Plus from 10g so there is no FAN notification, next part will be with FAN enabled client. The main difference between using and not using FAN is that FAN client has a feedback from Oracle Cluster (formally Grid Infrastructure in 11R2 or Oracle ClusterWare in previous releases) and has information about current cluster status. Non FAN client has to wait for TCP/IP time-out for current session before switching to other node. More information about client failover and cluster configuration can be found here.

My server and client configuration has been presented in part 1 (entry from TNSNAMES.ORA) so this is only short recall:
  • Two nodes – RAC1 and RAC2 with SCAN configured
  • SQL*Plus 10g Client


I want to present 3 scenarios:


  1. One node in cluster is running

  2. Two nodes are running – current instance is crashing



  3. Fixing configuration





One node in cluster is running

Due to failure or maintenance only one node in cluster is running. RAC One node service is starting on that node and it is opened for all sessions. Irrespective of which node was current previously instance is open on node which is running now. Current status can be checked via raconestatus

[oracle@rac1 ~]$ raconestatus

RAC One Node databases on this cluster:


Database UP Fix Required Current Server Candidate Server Names
======== == ============ ============================== ========================================
testone Y N rac1 rac1 rac2


Available Free Servers:

As we can see there is no available server to move our instance on. If we start Omotion process we will hit error.


[oracle@rac1 ~]$ Omotion

RAC One Node databases on this cluster:

# Database Server Fix Required
=== ======== ============================== ============
[1] testone rac1 N

Enter number of the database to migrate [1]:

Specify maximum time in minutes for migration to complete (max 30) [30]: 2

Available Target Server(s) :
# Server Available
=== ================== =========
ERROR: Cannot find any available servers in this cluster.

Exiting...

[oracle@rac1 ~]$

In case of shutdown node RAC1 all session will be terminated as in single node configuration. Whatever server will be started after both nodes are down will become a current one.

Two nodes are running – current instance is crashing
Both servers are up – instance is running on RAC1 server. Current cluster configuration can be checked via crsctl command



[oracle@rac1 grid]$ $GRID_HOME/bin/crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.eons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.gsd
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.net1.network
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.ons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac1
ora.oc4j
1 OFFLINE OFFLINE
ora.rac1.vip
1 ONLINE ONLINE rac1
ora.rac2.vip
1 ONLINE ONLINE rac2
ora.scan1.vip
1 ONLINE ONLINE rac1
ora.testone.db
1 ONLINE ONLINE rac1 Open
ora.testone.serviceone.svc
1 ONLINE ONLINE rac1
[oracle@rac2 grid]$


Session is connected to RAC1 server


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> select * from test;

ID
----------
1
2
3
4
5
6

6 rows selected.

SQL> insert into test values (7);

1 row created.

SQL> commit;

Commit complete.

SQL> insert into test values (8);

1 row created.

SQL> select instance_name, host_name from v$instance;

INSTANCE_NAME HOST_NAME
---------------- ----------------------------------------------------------------
testone_2 rac1

SQL>

Active Query
Now first scenario when I’m repeating a select command in SQL*Plus and in that same time I’m turning off RAC1 server. In that case a active statement is cancelled and client is trying to reconnect at that time.


SQL> select to_char(sysdate,'hh24:mi') from dual;

TO_CH
-----
09:55

SQL> select to_char(sysdate,'hh24:mi') from dual;
select to_char(sysdate,'hh24:mi') from dual
*
ERROR at line 1:
ORA-03135: connection lost contact


SQL> select to_char(sysdate,'hh24:mi') from dual;
ERROR:
ORA-03114: not connected to ORACLE

As you can see command has been cancelled and session has been terminated. There is no other instance up and running yet to there is no target to failover a session. In client session network trace file you can find following order of errors:

  • ORA -12547 – TNS Lost Contact

  • ORA- 12151 – Bad packages

  • ORA-03115 – connection lost contact


After that client is going to reconnect session to other nodes like in normal RAC configuration. Client is calling SCAN address to get another VIP address but cluster is still in that same state – a reconfiguration has been just started – and Oracle Cluster is returning a rac1-vip once again as it is only one VIP configured for database service.

Keep in mind that current, not committed transaction in that case will be rollback.


Idle session
When a session is in an idle state a failover will take place when a first SQL query will be executed.
See example

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> select to_char(sysdate,'hh24:mi') from dual;

TO_CH
-----
15:30


SQL> select instance_name, host_name from v$instance;

INSTANCE_NAME HOST_NAME
---------------- ----------------------------------------------------------------
testone_2 rac1


SQL> select to_char(sysdate,'hh24:mi') from dual;

TO_CH
-----
15:38

SQL>
SQL> select instance_name, host_name from v$instance;

INSTANCE_NAME HOST_NAME
---------------- ----------------------------------------------------------------
testone_1 rac2

SQL>

As you can see a connection has been migrated between servers without any error. This is because a before a last query a new instance was up and running. If we take a look into client network trace file we will see following errors
  • ORA- 12151
  • ORA- 3113
After that client is going to reconnect session to other nodes. Client is calling SCAN address to get another VIP address and cluster in returning a new VIP (rac2-vip) as a database service has been successfully migrated to other node.
Keep in mind that current, not committed transaction has to be rollback after session migration.


Fixing configuration

After unexpected crash it is possible that RAC One Node configuration need to be fixed.
I have figure out two possible scenarios:

  • After node crash and migration of instance and service the crashed node has been rebooted and there are two running instances

    [oracle@rac1 grid]$ raconestatus

    RAC One Node databases on this cluster:


    Database UP Fix Required Current Server Candidate Server Names
    ======== == ============ ============================== ========================================
    testone Y Y rac1 rac2 rac1 rac2


    Available Free Servers:

    [oracle@rac1 grid]$

    Fixing of configuration will close one of instance (on rebooted node) and will clean a configuration.

    [oracle@rac1 rac1]$ raconefix

    RAC One Node databases on this cluster:

    # Database Server Fix Required
    === ======== ============================== ============
    [1] testone rac1 Y

    Enter number of the database to fix [1]:

    If database is up, it will be checked and cleaned after a previous fail over.


  • Very similar situation like in previous point but only one instance is working and there is a “mess” in RAC service configuration. Oracle RAC One Node fix will clean up a configuration.


At the end I want to mention that I have hit some strange behavior related to migration of instance between servers and starting and stopping servers. I’m not sure if it a bug or not but after whole cluster restart instances have been migrated across.
First configuration was

  • RAC1 – instance name : testone_2
  • RAC2 – instance name : testone_1
Yes I know it was my inconsequence in naming so Oracle decided to fix it. No, not at the beginning but after a week of testing and restarts. And now it looks like:

  • RAC1 – instance name : testone_1
  • RAC2 – instance name : testone_2

Unfortunately after that I was unable to play with RAC One Node and I have to clear a whole configuration and create it from scratch.

Next part will be about FAN enabled clients and more about transactions.

1 comments:

Vladimir said...

This is an excellent article.
Still, if you want something more hands-on, try these:
http://vgrigorian.com/11gsimulator/1_rac11gr2.htm
http://vgrigorian.com/11gsimulator/2_rac11gr2rdbms1.htm
http://vgrigorian.com/11gsimulator/3_rac11gasm.htm
http://vgrigorian.com/11gsimulator/4_11gr2dbcreate.htm

You can find more demos (including dataguard, goldengate, streams) there at http://vgrigorian.com/

Thanks.
Vladimir Grigorian