Module 2 Engine Summary PDF

====================== Module 2 Engine ====================== 3 types of Matrixx Installation Bare Metal - most use for production Virtualization Containers Maximum blades per cluster: Processing - 3 (all are active at the same time) Publishing - 2 (1 active , 1 standby) Checkpointing - 1 Processing Cluster -> Messages for the Engine come in via business applications (North side) and operator network (South side) -> Processing Cluster receives the incoming messages and processed using round-robin. -> Transaction Records are then used by all Processing Blades to update their IMDB at the same time. Processing Blade -> process any incoming traffic using round-robin and create Transaction Records that all Processing Blades use to update their IMDBs. -> Triggers subscriber notification as per configuration which are sent to Notification Blade. there they will be formatted to be sent out via the configured delivery channels. Publishing Cluster -> Publishing Cluster bundles the Transaction Records received from Processing cluster and write them as Transaction Log files to the SAN. -> Publishing Cluster Is also in charge of creating Event Files which contain only billing-related information. Which can be stored in a SAN or streamed to external system. -> 2 Processing Blades are deployed but only 1 is active at any one time. And only the active Publishing Blade can write information to the SAN. All other Engine Blades have read-only access to it. -> Publishing Cluster 2 optional features : 1. Event Loading - Loading the Event Detail Records from the MEFs into the Event Repository (ER) for long-term storage. The ER is based on MongoDB. 2. Event Publishing - Publish MEFs to another location outside of the MDCP for external consumption. This could be for billing or accounting applications, for analytics CRM or data warehouse. Publishing Blade -> uses the Transaction Records to update its IMDB before the active Publishing Blade bundles them and writes them as Transaction Log files onto the SAN. -> It can also generate Event Streams for streaming real-time data to external systems using Streaming Framework. -> generate MATRIXX Event Files that will be loaded to the Event Repository outside of the Engine for either temporary or long-term storage. -> Only the active Publishing Blade can write to the SAN (with read only access to all other Engine Blades) Checkpointing Cluster/ Checkpointing Blade -> Checkpointing Blade updates its IMDB by reading the Transaction Log files from the SAN. -> Creates Checkpoints on regular basis (every few hours). -> Checkpoints are then sent over to the active Publishing Blade which writes it to the SAN. -> Checkpoints are file dumps of the IMDB at a particular moment in time. -> Checkpoints are file dumps of the IMDB that are use to restore the IMDB when active Engine is started. -> All subsequent Transaction Log files are then replayed to store the data to the state the IMDB was in prior to shutdwn. -> Any Transaction Log files that were created since and written to the SAN are replayed. -> The most recent Checkpoint and all subsequent Transaction Log files are used when the Engine starts up and needs to rebuild the iMDB content as the IMDB is held in memory only. -> Single Blade -> Creates regular Checkpoints - Saved temporarily on Publishing Blade. - Publishing Blade then writes it to SAN. -> Checkpointing Blade will rsync Checkpoints that is created over to the Publishing Cluster. And the active Publishing Blade will write it to the SAN. -> Checkpointing blade does not require special MATRIXX scripts, you can use the Cluster tools that is also use to the other Clusters. ex. print_blade_stats.py -C , you can use this to check the Cluster status. -> Configuration for the creation and management of the Checkpoints is performed using Engine's create_config.info.Here you will find several questions. ex. - How many Checkpoints you want to keep on the SAN. - How frequently Checkpoints should be created. -> Checkpoints are commonly created automatically every 4-6 hours. -> Checkpoints need to be included in you backup considerations as only the most recent Checkpoints are stored on the SAN. -> Checkpoints must be backed up before the "sliding window" removes them. -> Checkpoints are presented as diretory structures and the directory name is made up of the MATRIXX version number and a UNIX timestamp. - Inside the Checkpoint directory , we can see a number of zipped files for the entities within the IMDB. - print_data_container_file.py -> As data is stored in MDC format , we can use this script to convert the MDC format to human readable form. -> it is advised to pipe the output into a text file. ->Use this tool on most files that are avaialbe in MDC format like Transaction Log files , MEFs and Checkpoints. -> Can be used to view the contents of a SEF. -> create_checkpoint.py - Script to manually create a checkpoint. - You can manually create Checkpoints only from the Checkpointing Blade. -> Validating Checkpoints - Checkpoints are vital for restoring the IMDB and needs to be validated regularly. - validate nightly or on preferred off-peak time. - valide before any MATRIXX upgrade. - validate after upgrading MATRIXX software - duration of the validation is dependent on th size of your IMDB and may take a long time. - validateCheckpoint.jar -> java -jar /opt/mtx/bin/validateCheckpoint.jar / -> Java program in the bin directory that will verify the Checkpoint and create a log file in the directory you provide. *If error occurs raise a support case/ contact support to fix issues. Engine Chain -> 2 or 3 Engines are deployed in production to prevent single point of failure. -> Only 1 of these Engines is active. -> Active Engine's Publishing Cluster forwards the Transaction Records to the Standby Engine's Processing Cluster where they are replyayed to update the IMDB. -> Then they will be forwarded to the Standby Engine's Publishing Cluster to updates its IMDB before they are bundled and written to the local SAN. -> The Standby Engine's Checkpointing Blade reads the Transaction Log files from the local SAN to update its IMDB and create Checkpoints to to be written to the local SAN. 3rd Engine -> Standby Engine Publishing Cluster forwads transaction records to this Engine. -> Also called 2nd Standby Engine or Upgrade Engine as can be use to do upgrade while running 2-Engine-Chain /the other 2 are active and standby. - Engine Processes - ps -ef | grep mtx -> to see the list of running processes related to Matrixx. Production Engine -> You will typically work with one of two Matrixx user accoutns : mtx or tra. -> trx is reserved for the Traffic Routing Agent and Network Enabler. -> Everything else runs on mtx user context. -> If at least 1 Cluster is not running, Engine will not start/run. mtx_process_ctrl -f mtx_sync_dpages -f -> MTX Process Controller -> Master or parent process that controls all other processes. Processing Blades processes : snmp_agent transaction_server charging_server mdc_gateway diameter_gateway camel_gateway task_manager cluster_manager snmp_agent -> Use for monitoring purposes Messages -> messages to be processed by the Chargig Server (charging_server) needs to be converted to MDC (Matrixx Data Container) format. -> Some messages are converted outside of the engine like BSS messages via the RS Gateway.Those are received by the MDC Gateway (mdc_gateway) and require no further conversion. -> Other messages are converted by the Processing Blades : - Diameter messages via the Diameter Gateway (diameter_gateway) - SS7/SIGTRAN messages via the CAMEL Gateway (camel_gateway) Charging Server -> processes the messages by going through different stages : policy processing,rating and charging. Task manager (task_manager) -> handles the Subscriber management related taks. Cluster Manager (cluster_manager) -> enables intra-Cluster communication. -> Each Cluster has a leader which is the Blade with the lowest ID. Transaction Server (transaction_server) -> coordinates and commits transactions to the IMDB and forwards them to the Publishing Cluster. Publishing Blades processes : snmp_agent transaction_server charging_server mdc_gateway task_manager cluster_manager event_loader event_stream_server -> Gateway processes that perform conversion no longer running as the time messages get to the Publishing Cluster are in the required MDC format. Event Loader -> loads Events/ Event Details Records from Event files into the Event repository. Event Stream Server -> streaming events to external systems. Checkpointing Blade processes : snmp_agent transaction_server charging_server mdc_gateway task_manager cluster_manager -> Checkpointing Blade has fewer processes as its main job is to keep its IMDB updated then create periodic Checkpoints. This is all completed by the earlier processes covered. - Matrix Engine - -> Matrix Engine is made up of 3 Clusters. -> Engine Cluster and Blade has fixed IDs assigned to them. Processing is Cluster 1 Publishing is Cluster 2 Checkpointing is Cluster 3. -> You can control any Engine Blade from any other Engine Blade. -> Passwordless logins are configured for the MATRIXX user accounts from within the Engine and you refer each Blade using E:C:B reference -> E:C:B stands for Engine , Cluster, Blade 3 Networks/ Plane where Engine Blade is connected 1. Management Plane(.4 subnet) -> Carries signaling and control messages for management purposes. -> SNMP Traffic -> use to connect remotely to any Blade using SSH. 2. Data Plane (.6 subnet) -> Carries operator network traffic and business application traffic to and from the Engine. -> transaction replay traffic between Clusters and Engines. 3. Transaction Plane (.5 subnet) -> High-speed network used for transaction communication between Blades in Processing and Publishing Cluster. To access any Engine Blade remotely -> connect using mtx user via IP address of the Management Network or DNS name of the Blade. - Basic Control Scripts - Management commands can be exeucted from one location. Simple Command -> _.py start| stop| check| restart engine | cluster | blade can specify additional info -> If no location is specified the script will use the target that you are executing it on. -> To specify the remote location use the E:C:B reference as argument 2 formats 1.) -e id -c id -b id 2.) E:C:B notation IDS where seperated by colon ex. 1:1:2 -> This is more common. Complex Command ->run_cmd_on_.py " " -> Convenient if you need to execute a script on each Engine Blade. -> Can be executed locally rather than logging on to each Blade. -> Note that the script name to execute has to be enclose in double quotes "". Common --help -h --debug -d --skip_local -s --ssh debug --as_user -u Domain / Sub-Domain -> A set of subscriber which co-exist within a homogenous pricing configuration. ->You might see 4 digits as part of the E:C:B Reference, this leading digit refers to Domain/Sub-Domain. ->You can have 15 million users or subscriber per Sub-Domain -Useful Engine Scripts- check_engine_start_prereqs.py -> can be executed on any engine blade. -> should be run when engine is stoped. -> run it before you wish to start the Engine for the very first time -> after configuration change or software upgrade. -> will check that local and shared directories are configured so that Transaction Records can be saved locally and data read from the SAN. check_engine.py (rely on local SNMP Agent on each blade to be up and running) -> Verifies that the Engine is operation. -> Does not show if all engine blades are running. -> Cluster Leader is the Blade with the lowest ID and reports if the Cluster is ready. -> If all Clusters are ready the Engine is shown as started. print_blade_stats.py -C (rely on local SNMP Agent on each blade to be up and running) -> to find out all the details of all Blades of a Cluster. -> This will show the details of the Cluster you are running. -> it confirms current Blade reference using ECB and Software version. print_engine_stats.py (rely on local SNMP Agent on each blade to be up and running) -> Shows various statistics for the Engine. -> latency and number of erros per Blade. -> Calculates the number of transactions per second for each Processing Blade. *print is the command usually use to view/show the details. - Directories - /opt/mtx -> Where MATRIXX is installed. /var/log/mtx -> Transaction Logs and error logs stored in Blades local SSD. /var/mtx -> core files are written to if a Blade stops unexpectedly. /var/run/mtx -> process ID files for Engine processes currently running on the Blade /shared -> contains the shared storage directories of the SAN. - Engine Configuration Simplified - mtx_config.xml -> Each Engine Blade reads this local file before starting up the Engine. -> /opt/mtx/conf -> Determines the role of each Engine Blade and the functionality that each Blade is to provide. -> gets created or updated using create_config.py create_config.info matrixx -/opt/mtx/custom -> Modified to change the behavior of the Engine Blade. -> contains list of questions asked by the create_config.py script and the answers provided. -> where the configuration begins -> configured for the mtx user as all core Engine functionality runs under that user context. -> there is a single create_config.info file for all the Engine Blades that econmpasses the whole Engine configuration. tra - /opt/tra/custom -> separate create_config.info file that contains a subset of the Engine's configuration but contains only the details that are related for the tra user account. mtx_config_base_xml_sed.extra -> to make any configuration changes that are not contained within create_config.info. -> Unlike create_config.info , it does use questions and answers but uses sed stream editor syntax to make advance config changes and does not have consistency checks. create_config.py -> initial information is provided by /opt/mtx/conf mtx_config_base.xml which includes all the capabailities that the Engine can provide. -> the parameters of this input are modified by the contents of create_config.info and mtx_config_base_xml_sed.extra, resulting in the mtx_config.xml output and the modified create_config.info file which is written back to as required. =============== Module 3 Traffic Routing Agent and Network Enabler ====================== = Traffic Routing Agent = -> TRA/NE Has it own SNMP Agent which receives traffic on a dedicated port. -> TRA always runs as pairs in production environment. -> Nodes means TRA load balancers -> TRA Load Balancers needs to be up and running before you can start the Engine. -> traffic_manager is the name of the TRA process 5 Functions of TRA : Engine level (within the Engine) 1. TRA-LB - Load Balancer Processing Cluster 2. TRA-LB-PUB - Load Balancer Publishing Cluster Site level (Outside the Engine) 3. TRA-SI - Site Independence 4. TRA-DR - Disaster Recovery 5. TRA-RT - Sub-Domain Routing TRA-LB -> receive traffic from SITE-level TRAs -> Runs in pairs, one on each Processing Blade. -> If you have 3 Processing Blade , the TRA-LB will be installed on the first 2 in the Cluster. -> Share a virtual IP address (VIP), only one of them will actively route incoming messages. The other will act as standby to prevent single point of failure. ->Diameter traffic and MDC Gateway traffic are routed using round robin approach accross all the Processing Blade. ->SNMP Agent traffic will always be sent to the Cluster Leader which is the Blade with the lowest ID in the Cluster. TRA-LB-PUB -> Receives traffic from the Processing Cluster -> If streaming is enables , also receive from RS Gateway via Site-level TRA. -> Only 1 active Publishing Blade , so all traffics are routed there. Virtual Servers -> Receive traffic from outside the Cluster and forward it to the correct process and port on the Cluster Blades. -> 5 virtual servers available on TRA-LB. -> lablled as "VS" followed by the ID. -> Sample of purpose "internal MDC" to replay traffic -> Sample type "vip" virtual IP address that receive the traffic. - TRA-LB Configuration files - /opt/mtx/custom/create_config.info -> Same file for all Engine Blades /opt/tra/custom/create_config.info -> Same file on the 2 Process Blades that run the TRA-LB -> Both TRA-LB needs to be configured with the same copy of this file. /opt/tra/conf/tra_config.xml -> Contains the routing configuration -> Both TRA-LB needs to be configured with the same copy of this file. -> default file stored in /opt/tra/conf -> contains default parameters out of the box. -> 2 main sections 1. Parameter sections - contains refenrence to the other tra_config file, tra_config_network_topology.xml and generaic parameters that are related to overall TRA operation. 2. TM-Cluster section - contains the configuration for hight availabity parameters, including the IP address for th TRA-LB nodes. - TM stands for "Traffic Manager" previous name of Traffic Routing Agent. /opt/tra/conf/tra_config_network_topology.xml -> Contains the routing configuration -> Both TRA-LB needs to be configured with the same copy of this file. -> used as a starting point -> 3 main sections 1. Pools -> pool of avialbe Processing Blade that can receive incoming traffic. -> identified by a name "Monitor" determines how the Cluster Management Interface (CMI) monitors the pool. -> "balance method" defines how traffic is load balanced or routed to individual Pool noes. -> ex. "cmi-node-active-cluster-active" - selects any Processing Blade from the active Pool nodes. 2. VIP address -> requires to configure the 2 VIP used : Management and Data Network plane. 3. Virtual Servers -> List all availabe servies and their details -> Best practice is to seperate traffic into 3 network planes. ->99% of the details here are automatically populated as a result of running the create_tra_config.py script. /opt/tra/conf/process_control.cfg -> It contains all the list of all TRA processes that can run on any of the Blades that provide TRA functionality. -> It contains further TRA and NE configuration parameters and comments inside the file to describe each section. Engine-leve TRAs -snmp -traffic ->any services not required should be commented out using the hash(#) chracater. Site-leve TRAs -snmp -traffic -ne -rcc/rcp -> requires more services to be running. create_tra_config.py -> script to read the Engine's mtx_config.xml and extract the routing details for TRA Load Balancers. -> Should be run as mtx user as the mtx user has access to the Engine configuration files. -> creates its output by default /opt/mtx/conf/tra_config_output. sub-directory gets under that directory is created for Processing Cluster and Publishing Cluster. -> Those files have to be copied to their respective /opt/tra/conf directory for the tra user on the Processing and Publishing Blades. - Basic TRA Control Scripts - -> On the Site-level TRAs both TRA and NE are controlled at the same time. start_tra_cluster.py stop_tra_cluster.py start_tra_node.py stop_tra_node.py -> you can start or stop the Cluster as whole -> Or via indvidual node by specifying 1 or 2 for the node you wish to control ps -ef | grep tra -> to see all TRA/NE processes. mtx_ping_accept -> supports ping request to the TRA - TRA Utilities - validate_tra_config.py -> needs to run after configuration changes to the TRA to ensure both nodes are configured identically. -> can run the script at any time agains any TRA component to verify that the configuration matches. print_tra_cluster_status.py -> show the status of both TRA or NE nodes. print_snmp_stats.py -> can be used for a variety of results. -> The arguments used and the output shown vary depending on where you execute the script. -> Executing the scipt on the Site-level TRAs we can use this to check the communication state between the TRA/NE and the Engine. -> Executing the scipt on the TRA-LB we can verify the current configuration without the need to open any of the configuration files. ex. we can see VIPs, network interfaces, pools, ports, etc. = Network Enabler = -> Runs alongside the Site-level TRAs when the SS7/SIGTRAN messages are to be routed to the engine. -> NE works different to the TRA, no load balancing agents running on the Engine. -> Load Balancing to all Processing Blade is performed by the NE itself. -> CAMEL Gateway process on each Blade converts the data to MDC format before it is passed on to the Charging Server process. -> network_enabler is the name of the NE process , all NE functionality is implemented by this process. Route Cache Controller and Route Cache -> For multiple Sub-Domain TRA and NE require additional RT component which runs on a Site-level TRAs. -> a lookup table in which all users/subscriber are listed and the Sub-Domain they are hosted on. The Site-level TRAs receive this information from the CRM via the RS Gateway. To prevent single point of failure 2 Site-level TRAs with TRA-RT included -> TRA-RT queries the Route Cache and determine the correct Sub-Domain. -> TRA-SI/DR then forwards the messages on the matching active Engine. TRA-RT -> Both TRA and NE use this feature to identify which Sub-Domain a user/subscriber is hosted on. -> it is only required when you use multipe Sub-Domain in MDCP. =============== Module 4 SBA Gateway ====================== - SBA Gateway - -> Service Base Architecture Gateway -> Enables 5G applications to communicate with Engine -> Sits in front of the Site-level TRAs. -> Each SBA Gateway registers its URI with the NRF and sends regular hearbeat to it. This enabls other 5G Core network elements to discover and use the Converged Charging Services offered by the MATRIXX DCP. -> Converts JSON messages to and from MDC format, as other incoming traffic going to the Matrixx Engine needs to be converted to MATRIXX Data Container format before they can be processed. -> Is an executable JAR file which implements 2 Service Based Interfaces through 5G CHF interface(see below). -> at least 2 SBA Gateway will be deployed in production to prevent single point of failure. ->Matrixx 5G SBA GW interfaces with the ff. 5G Core Network elements: SMF -> Session Management Function -> Similar to PCEF, Policy and Charging Enforcement Function. PCF -> Policy Control Function -> Similar to a PCRF, Policy and Charging Rules Function. NRF -> Network Respoistory Function -> manages the 5G Core entities. -> Within the MDCP, the SBA GW interacts with : Site-level TRAs -> for routing of messages to the active Engine for processing and responses in return. ActiveMQ -> for routing notifcation messages initiated by the Engine to the SBA GW, such as reauthorization request and spending limit notifcation. -> The SBA Gateway will again convert the routing and notifications messages from the Engine into HTTP/JSON and forward to either the SMF or PCF. -> The Engine acts as a Converged Charging Service CCS to network where user are authorized and charged to use the network or applications according to their subscription and usage characteristics. -> SBA Gateways (like all MATRIXX software components) are availabe as either RPM installation packages or Docker container images. -> SBA Gateway provides functionality using a wide range of components: Transport Layer Security (TLS) and HTTP/2 -> Part of the protocol stack TLS Mutual Authentication (TLS/mAuth) -> For authentication OAuth 2.0 -> For authorization NRF client -> For discovery Dynamic Message Mapper -> For JSON/MDC Conversion ActiveMQ -> For outboung traffic notification Common Config -> For all network functions -> The SBA Gateway executable JAR includes alls necessary libraries and logic to build a 5G Network Function(NF). -> When SBA Gateways starts, it loads serveral files from the../conf/sba and../lib/sba sub-directories. -> SBA Directories : 1. There is no custom directory for the SBA GW. 2. There are multiple sub-directories undert the conf directory on the SBA GW. 3. There are multiple sub-directories under the lib directory on the SBA GW. /opt/mtx/conf/sba /opt/mtx/conf/chf /opt/mtx/conf/nrf /opt/mtx/lib/sba /opt/mtx/lib/chf /opt/mtx/lib/nrf /var/log/mtx/nf.log SBA Logs -> SBA Gateway uses log4j2 to configure logging. -> SBA Gateway JAR file includes a default log4j2 configuration file, which can be modified with environment variables or overriden. -> Logs are written to /var/log/mtx/nf.log by default. Charging Function interface (CHF) -> MDCP can connect into an operator's 5G Core network through CHF functionality. -> 2 Service based Interface MDCP provides : Nchf_ConvergedCharging -> For SMF, is used to inform SMF about any business conditions that have occured that may affect charging or error conditions. Nchf_SpendingLimitControl -> For PCF, is used to inform PCF about policy counter status changes, to end the session and to inform it about error conditions. Network Function Implementations -Java/Groovy classes that implement and SBI interface in the SBA Gateway. SBI Interface -> using this , NF implenetatios can register URIs and implement NF service logic. NF service logic -> Logic that is common to all Network Function is provided by the SBA Gateway an a set of default configration ( which can be overridden if required). - SBA Gateway Operations - docker ps -a -> Identify Docker container Image. -> Shows a list of available Docker image. docker start -> Will use the selected image to start a respective contatiner docker exect -it /bin/bash -> Establish /to enable ssh connection. *By default you will be logged in as root, switch to the mtx user to perform these operations. Health Check -> curl -v http//localhost:9098/healthcheck - commad to verify if the SBA Gateway is responsive, to run a basic health check. -> You can run the curl command from any Blade as long as it can get to the SBA GW. =============== Module 5 Proxy ====================== Different Components running on a Proxy Blade : Apache Tomcat -> which all Matrixx web application are based. RS Gateway -> is the Rest Service Gateway -> the connecting and conversion piece between BSS and the Engine. -> To convert REST or Java traffic to/from MDC format. Matrixx CM -> Matrixx Customer Manager -> A GUI for managing Subscriber data. My Matrixx -> which is the application for configuring pricing plans for the Engine. MATRIXX web application -> to configured all MATRIXX web application you can use tomcat_matrixx.conf -> There is a global configuration for all Matrixx web application in /etc/tomcat/conf.d/tomacat_matrixx.conf MDC Gateway Proxy -> The connecting piece between the web application and the Engine. -> To protect the Engine from unauthenticated traffic. -> Can use it for non-authenticated or authenticated traffic. -> non-authenticated traffic via port 4070. -> authenticated traffic via port 4080 requires credentials for sucessful connection, this is what is recommend for production environment. Event Streaming Framework ->receives event data streams from the Publishing Cluster and forwards those to one or more streaming targets such as Apache Kafka. *Each MATRIXX web application (as well as the MDC Gateway Proxy and Event Streaming Framwork) have their own properties file in /op/mtx/conf - Control Scripts - MDC Gateway Proxy service needs to be started first and must be running on each Proxy Blade. start_proxy.py stop_proxy.py restart_proxy.py For Matrixx web applications, you configure Tomcat to include those you wish to run. And then you control Tomcat as a whole using sudo command. -> My MATRIXX, RS Gateway and Customer Manager are applications deployed to the Apache Tomcat service. -> sudo service tomcat { start | stop | status | restart} Directories for MDC Gateway Proxy & Streaming : /opt/mtx/bin ->Contains MDC Gateway Proxy executble and Python scripts ->gateway_proxy.jar, start_proxy.py, stop_proxy.py. restart_proxy.py, mtx_event_streamer.sh ->mtx_event_streamer.sh - script to control the Streaming Framework. /opt/mtx/data -> Contains templates file and those you need to configure for MDC Gateway Proxy has to be made avaible in the /opt/mtx/conf directory by changing its name to gateway_proxy.properties. -> gateway_proxy_Sample.properties /opt/mtx/conf -> configuration file for MDC Gateway Proxy (gateway_proxy.properties) -> Streaming Framework also has its own configuration file in this directory.(mtx_event_streamer.properties) /var/log/mtx -> gateway_proxy.log - logs for the MDC Gateway Proxy -> events.json -> mtx_event_streamer.log Directories for Tomcat Web Applications : /opt/mtx /var/mtx_catalog_builder /var/cache/tomcat/temp /var/run/mtx /var/log/tomcat My Matrixx -> Catalog builder is the previous name of My Matrix -> GUI based web application -> uses Subversion for the file management of pricing files in the background. /var/mtx_catalog_builder/data -> Master Subversion directory -> each Domain or Sub-Domain will have its own working data directory structure. -> used to cooredinate changes made from multiple users. -> "local workspaces" - a temp directory structure used to check out the files for the required Domains. =============== Module 6 Logs ====================== - MATRIXX Engine software version number- -> refenced in the ff: - control script results (print_blade_stats.py -C) - log (Transaction Log Files) - event files ( MEF) - Checkpoitns etc. - dedicated script -> print_mtx_version.py - dedicated script that shows the further details. - output the current version and revision number of the MATRIXX software. - including the date the installation package was changed. - 5 types of MATRIXX Logs - (Debug Logs , Transaction Log Files , System Logs, Tomcat Logs, MATRIXX Component Logs) 1. Debug Logs -> Main Engine application status log file. -> Main application log files for the Engine and TRA. -> stored in mtx log directory $MTX_LOG_DIR ,/var/log/mtx/ -> File name mtx_debug.log *TRA Components use their own mtx_debug.log in /var/log/tra -> Troubleshooting starting point -> Each Engine Blade creates its own log file. -> Set the default logging level for the Engine with 6 different levels: LM_TRACE - highest level / largest amount of details. should only be enabled temporarily LM_DEBUG LM_INFO LM_WARN - Warning, mostly used LM_ERROR - Error, mostly used LM_CRITI - lowest level of details, critical errors only -> Debug Log Parts (in order) - Log message level (ex. LM_INFO) - Process Id and Parent process ID (ex. 2340|2390) - date and time stamp (ex. 2020-07-17 14:12:38.303414) - Service/Service logging message (ex. transaction_server) - FQ ID / Fully Qualified ID - using the 4 digit E:C:B reference - if you don't have Sub-Domain configured it will show the 4 digits refering to thee Domain as Domain ID 1 ex.(1:1:1:1) -> merge_mtx_debug_logs.py - Merges multiple mtx_debug.log files ( Can be use in any Cluster/Blades) - Produce chronologically sorted output file - Accepts Input and Input directories - ex.merge_mtx_debug_logs.py --input_dir /home/mtx/log --input_dir /home/mtx/archive > mtxlog 2, Transaction Log Files -> Logs of all transactions. Business Critical logs generated by the Publishing Blade. -> Bundle Transaction Records that are collected and written to the SAN by the active Publishing Blade. -> located in the shared directory $MTX_SHARED_DIR -> Transaction Log files must be actively manages on the SAN. -> Transaction records are temoporarily stored locally on the Processing Blades and are automatically deleted once they have been written on to the SAN by the Publishing Blade as Transaction Log files. -> Transaction records/Transaction Log Files have to be deleted regularly from the SAN only. The Transaction Server will automatically delete those files locally, once they are written to the SAN. -> delete_old_transaction_logs.py - utility to delete old Transaction Log files from the SAN. - "-u" Target Max used disk , specify how much disk space to reserved - "-f" Target free disk , specify how much disk space to keep free -> "obsolete" Transaction Log files - are older Transaction Log files that are no longer required to restore the IMDB to its most recent state. * When we rebuild the IMDBs we use the most current Checkpoint and any Transaction Log files that were created afterwards. 3. System Logs -> Logs for Linux OS-reported application messages -> Operating system logs -> located at /var/log/messages -> Can access them either as root or using the sudo command. -> Certaim MATRIXX components also write to the Linux OS syslog file like Process Controller (mtx_process_ctrl). 4. Tomcat Logs -> Stauts logs for Tomcat web service (My MATRIXX, RS Gateway and Customer Manager). -> located at /var/log/tomcat -> Can access them either as root or using the sudo command. 5. MATRIXX Component Logs -> Diamater and MDC Gateway, Price Loader , MDC Gateway Proxy, Event Loader, Event Stream Log , etc. - Setting Debug Log Levels - A. Static Configuration 5 Steps 1. Backup create_config.info from the /opt/mtx/custom directory. - You can make Engine configuration to any Engine Blade but it is advice to use the same Blade as to not cofuse yourself. 2. Edit the answer in create_config.info to the ff. questions "What log level do you want to use ?" 3. Run create_config.py to update your local Engine configuration files and update the mtx_config.xml file. 4. Run configure_engine.py to propagate the changes to all other Engine Blades 5. Run restart_engine.py for the changes to take effect. ->These 5 steps can also be apply if you want to reconfigure the Engine in any other way. B. Dynamic Configuration Dynamic logging ->set_trace.py - will be used for advance troubleshooting or as instructed by the Technical Support. - the 6 log levels are also availabe here. - You can set either debug or trace level for one or more targeted processes. - you have to run the script twice to apply dynamic logging for the 2 levels. - The 1st time you run it, specify 2 arguments 1. Enable Tracing (per Blade) ex.set_trace.py diam debug - The 2nd time you run it, specify 4 parameters : - the process/service name : - which process task you wish to include. (specific task name or "all" to included all tof the service tasks. - whether you want debug output in your terminal (binary debug flag on/off , hexadecimal) - trace flag ( hexadecimal value, add the value of which messages you wish to include) 2. Specifying logging options for debug or trace level. ex. set_trace.py diam all, 0x00, 0x03 - Once both/ 2 steps are executed , it will immediately write the level chosen to the mtx debug log file, No restart of any services is required. - only works on a per-Blade basis. To enable dynamic logging on all Processing Cluster Blades you could use the run_cmd_on_cluster_sript. ex. run_cmd_on_cluster -e 1 -c 1 "" - Once completed we need to run set_trace.py for the 3rd time - this time to switch it back to our chosen default logging. ->set_subscriber_trace.py - Specify a particular subscriber. Ex. set_subscriber_trace.py -IMSI 1234567 MATRIXX Transaction Record directory hierarchy is mounted. -> MATRIXX Transaction Record directory hierarchy is empty. -> Shared directory is mounted. -> /staging/temp is empty. - 2 things to focus on : 1.) Any log messages other than INFO 2.) The script result. - Situations were its useful to run the script. -> Before starting the Engine for the first time. -> To verify access to storage directories for Transaction Records and Transaction Log files. -> After receiving Engine start error. - you can add ff. arguments : -> -T , --txn-analysis - Analyze Transaction Logs to find out/verify if there are missing transactions. - If results show that there is a gap, see if you can try to fix it by restoring a file from a backup. -> -w , --write-checkpoint-restart-file - You can add this argument along with -T if are still unabe to fix the problem. - Writes results of -T in temp file, recording up to which point all transaction are complete. - Subsequent restart will filter out transactions ff. first detected missing transaction. - Your Engine will then only load transactions up to this point when restoring the IMDBs. This ensures your database are consistent. -> -A --activate-cluster - To run this script against the active Cluster. -> -S --standby-cluster - To run this script against the standby Cluster. -> -F --check-file-sytem - To ensure the fsck file system scans cleanly. -> -s --shared-dir recover_transaction_logs.py -> To be executed locally on one of the MATRIXX Blades. -> Technical Support may ask to execute this script when experiencing total system failure/ catastrophic faileure where your whole Engine Chains is down on the production environment. -> To recover orphaned Transaction logs prior to restart. -> Run this script in 2 modes : 1. Interactive , which is the default, will prompt whenver input is required. -Iteratively takes action on Transaction logs identified. -Recommended to write recovery info to a checkpoint restart file before restarting. 2. Best effort mode, which is not recommended to use. -Exits on failure, non-iterative and does not automatically restart. -> On running the script it will do the ff: 1. The Publishing Cluster will go through multiple stages in attemp to retrieve any leftover Transaction Records from the Processing Cluster.It may prompt you for further input if required. - that one of the startup requirements is that the the local Transaction Record directories on the Processing Cluster are empty. 2.In the final step the recovery script will ask if you wish to start the Engine. Status tools -> print_blade_stats.py - To be executed locally on one of the MATRIXX Blades. - One of the most versatile monitoring scripts. - You can add the ff. arguments when executing this script: -> -C - shows the Cluster information. -> -Y - shows the system statistics for the Cluster, monitoring interval, number of processing errors, total amount of memory allocated for MATRIXX databases, hearbeat information, etc. -> -B - shows the details for the IMDBs. - can be ue to verify Subscribers were loaded successfully into the database. -> -N - shows notification details. -> check_engine.py - To be executed locally on one of the MATRIXX Blades. - verifies if the Engine is running. - shows a brief summary for every Cluster Leader. Off-Engine tools -> Monitoring tools that should be exuected outside of the Engine -> ex. can be executed on Network Operation Center (NOC) -> cluster_mgr_cli.py - cluster_mgr_cli.py [options/arguments] command - You should execute with the ff. arguments : -> -t TARGET, --target=Target - target is the virtual IP address and the external port used for Cluster control.ex, 10.0.6.110:4480 ->ex. commands get cluster_state - returns HA state of target cluster get excluded_nodes - returns a list of nodes that are execluted from cluster membership due to failures get peer_clusters - returns the VIPs and HA states of peer clusters. get schema_version - returns the schema version of the cluster. shutdown cluster - request shutdown of the target cluster;returns 0 on success. switchover active_cluster - request HA switchover of the peer cluster;returns 0 on success. -> print_topology.py - it displays information about the software topology including: -> sites -> domains -> load balancers -> IDs for Engines -> Clusters -> Blades -> Ip addresses, etc. Unix Level - top -> c - shows the details in the COMMAND column -> V - shows processes as hierarchy -> etc. SNMP -> Simple Network Management Protocol -> provides direct monitoring options. -> Integration point for an SNMP Manager Application, eg. - Prometheus/Grafana - HP BTO/OpenView - Netcool - etc. -> is a standard application-layer protocol that allows a Network Operations Center(NOC) to poll the SNMP agents running on each Blade in the MATRIXX Engine. -> allows the MATRIXX SNMP agents to send unsolicited notifcations to NOCs in the form of traps to identify important system events. -> when you run one of the print Matrixx scripts, those scripts will query the SNMP agent of your chosen target to get the requested details. -> Most MATRIXX monitoring scripts rely on the MATRIXX SNMP agents to be running. -> SNMP should be use for continuous monitoring , MATRIXX scripts can be used for ad-hoc monitoring. -> MATRIXX software running under the mtx user context and MATRIXX software running under the tra user context is monitored via seprate SNMP ports. -> hardware and operating system monitoring are responsibility of the vendors.Our focus is only on the MATRIXX software and its integration with 3rd Party SNMP manager applications such as : Prometheus -acts as the storage backend and accepts SNMP traps. -should be installed outside of the Engine Grafana -the interface for analysis and visualization. -should be installed outside of the Engine -> MATRIXX provides several Management Information Base (MIB) files to load into you preferred monitoring solution. -> MATRIXX SNMP agents support both SNMP version 2 and 3. -> SNMP MIBs are available in data folder : /snmp/mibs. ex. matrixx_mib.txt contains the Engine monitoring MIB. -> To use MIBs, load MIBs into your SNMP console in your NOC and then point your SNMP system to the targets you whish to monitor. -> MATRIXX SNMP agents for the Engine are configures using the Engine's create_config.info. This includes the SNMP trap destination. ex. where in your NOC the SNMP agent should send the notifications to. curl -> command line browser -> useful for testing connectivity within MDCP. -> it acceps URIs as input and provides its output on standard out (stdout). -> Demonstrates end-toa-end connectivity accross the system. -> Also reports on domain used for loaded pricing. -> Use IP address of the Blade where RS Gateway resides. -> Can be run from any Blade. -> All the connections this tool check when executing this command : http://:8080/rsgateway/data/v3/pricing/status 1. RS Gateway 2. MDC Gateway Proxy 3. Site-level TRA 4. TRA-LB 5. MDC Gateway 6. Charging Service -> If this tool fails ensure that 1. Each componented listed is running. 2. Each component is configured with the correct connection details. - troubleshoot using the sequnce starting from outside of the engine. ====================== Module 8 Maintanance ====================== - Periodic Maintanance - Local Data on Engine Blades -> automatically deleted when it is no longer required. -> You may archive old data - Transaction Log Files (SAN) - MATRIXX Event Files (MEFs) - Checkpoints - Pricing Plans Monitoring tools and items to monitor 1. Cluster Manager - monitors and automatically corrects basic operation issues: Blade failures, etc. - the Cluster Manager for each of the Engine Clusters provides built-in monitoring capabailities and it will automatically correct basic operation issues, such as Blade failures etc. 2. Memory Usage - Due to the way data is stored in the Engine's IMDBs, memory usage needs to be monitored. - Consider any changes in your environment, both technical and commercial that can have imapct. - This will include topology changes, Subscriber management, new Product Offers, promotions etc. 3. Disk Usage - SSD and SAN - Each Blade is typically configured with mirrored SSDs. This is here the OS, pre-requisited and MATRIXX software are installed. 4. Log Files - ex. mtx_debug.log - need to be manages by doing log rotations. (check docs for log rotation configuration details) - The SAN will also need to be monitored regularly. 5. SNMP Counters - ex. synchronization status between Primary and Secondary Sites. - A wide range of SNMP counters is available to meet your monitoring requirements, including tracking the synchronization status between Primary and Secondary Sites of you Enging Chains. 6. Checkpoints - ex. validate nightly on spare Publishing Blade. Backup and Recovery -> Project Integration team will advise on what information to back up and how frequently this should be performed in line with the customer requirements. -> Reasons to backup MATIXX data in addition to configration file: 1. Because data is typically only stored for a short time on MATIXX system. 2. To prevent data loss due to hardware errors. 3. To analyze or investigate potential issues that occured in the past. *See notes on Checkpoints above. - Deploying Pricing - My Matrixx -> used to create Pricing Catalogs for production. -> To deploy a new Pricing Catalog into production, ensure - New catalog has been fully tested and approved - Pricing Administration has created a new XML Price file and provided it to Operations. -> Execute the ff. command on a Processing Blade - load_pricing.py -f -> Once a compiled pricing configuration file has beent tested and approved, you load it into the Engine using this script. = name of the file -> You can load either XML or ZIP files. - load_pricing.py -r -> If Pricing team realizes that the new pricing file contains incorrect pricing details after loading it,can switch back using this script argument. - You can only go back one step. -> Pricing files can only be loaded from Processing Blades. -> It is recommended to always use the home directory on the first Processing Blade. - This will avoid confusion and you can keep a history of pricing files in one place. -> Pricing files should be backed up on a regular basis. -> what happens when loading a pricing file : 1. The pricing files gets verified to ensure its intact. 2. The currently loaded pricing file gets stored in the custom directory adding _PREV to inidcate that it is the previous pricing file. ex. mtx_pricing_.xml_PREV 3. The new pricing file is loaded and a copy also gets stored in the custom directory. -> Ex. $HOME - myPricing.xml /opt/mtx/custom - mtx_pricing_.xml_PREV /opt/mtx/custom - mtx_pricing_.xml Transaction Log Files -> Written to SAN by active Publishing Blade. -> They can contain Transaction Records that were created by the Processing blades and sent over to the Publishing Cluster. -> Help restore a system back to an operation state. -> Transaction Log files are use in conjunction with the most recent Checkpoint to restore the IMBD whenver the active Engine starts. -> The main purpose of creating Transaction Log files is to be able to restore the system back to an operational state, in conjunction with the latest Checkpoint. -> file extension is : log.gz ex. transaction_1_2_1_155731623_1.log.gz -> stored directory is : /shared/txnlogs/blade_1_2_1 - Checkpoints and TXN/Transaction Log Files - -> The system will automatically restart using - most recent Checkpoint - subsequently commited Transaction Log files -> Here is the steps/concept : 1. The Publishing Cluster bundles the Transaction Records it receives from the Processing Cluster and writes them to the SAN as Transaction Log files. 2. The Checkpointing Blade continuously reads those Transaction Log files from the SAN to update its IMDB. 3. The Checkpointing Blade creates Checkpoints on a regular basis and rsyncs them over to the Publishing Cluster. 4. The active Publishing Blade will also write them to the SAN. -> The most recent Checkpoint and all subsequent Transaction Log files are automatically used when the active Engine starts to rebuild the IMDBs. -> Whenever the active Engine starts, each Cluster Leader reads the most recent information from the SAN and coordinates the rebuilding of the IMDBs within its own Cluster. -> For emergency procedure on how to use an alternative Checkpoint and/or not replay Transaction Log files contact MATRIXX support. -> If you experience an emergency for ex. whole Engine Chain being down and need to recover it, immediately contact MATRIXX support. They can help you if you want to use an alternative Checkpoint and/or not replay Transaction Log files. - Alternative Restore Operations - -> For non-production environment ex. testing systems you could provide only the Checkpoint and subsequent Transaction Log files to be used in the shared area and then start the Engine. - To restore database state: -> Replace the archived Checkpoint and Transaction Log files in the /shared are -> Restart the Engine Production Catastropic Failure -> Call Technical support -> Run recover_transaction_logs.py Iteractive Mode (default) as instructed by the Technical support ====================== Module 9 MEF and SEF ====================== MATRIXX Event Files -> Include/Contain Event Details Records (EDRs) in an event type hierarchy. - EDRs contains list of usage and non-usage events generated during transaction processing -> MATRIXX engine generates EDRs for the all activities that can trigger rating , such as usage and non-usage event types. - Non-usage includes events like Catalog Item purchased and cancellations. -> Event generations are configured using My MATRIXX. -> Generated by the active Publishing Blade. -> The active Publishing Blade generates Transaction Log files and writes them to the SAN, those form the basis for the MEFs which are generated afterwards and also written to the SAN. -> Meant to be published to (and to be consumed) outside of the MDCP. -> Can be input into third-party applications. -> MEFs only contains records that are billing-related, including Diameter and SS7 events and Subscriber Notifications. (This is true to both v1 and v2 MEFs) - Ex. of Billable Event Types : 1. Usage 2. Recurring 3. First use of a balance 4. Purchase 5. Cancel 6. Balance Adjustment 7. Balance top-up 8. Suspend (Product Offer) 9. Resume (Product offer) 10. Balance write-off 11. Refund -> Anything non-billable such as cetain Subscriber management events are not included in MEFs. -> Currently two different MEF version available. 1.) MEF version 1 - Default - 1:1 relationship with the Transaction Log files. -> For every Transaction Log file there will be one MEF. *Even if Transaction Log file has no billable events, MEF will still be created in MEFv1. -> Store each financially significant transaction record to product MEF(v1) - Not chronological -> Not chronologically sorted but appear in the same order as they are insided the Transaction Log files. - Do not contain Balance value data -> only including the changes to be made to Balances. - Contain Notification and Balance tracking records. - Available in 3 formats : -> Compact MDC "MATRIXX Data Container" -> XML elements -> XML attirbute - Will be deperecated in the future. - Not configurable -> we cannot configure it directly ,however since that it has 1:1 relationship with the Transaction Log files, we can use the Engines's create_config.info to configure those. -> By answering "no" to the question whether we want to use default transaction logging chracteriscticcs and running create_config.py , we will be promted for all the subsequent questions shown here. -> compression level ranges from 0(none), 1(fastest) to 9 (produces smallest files) -> specify several condition for closing files. -> we set multiple thresholds for closing a file, whichever threshold is crossed first, will cause the file to be closed and new one to be created. - MEFs have a file extension : mef.gz ex. transaction_1_2_1_155731623_1.mef.gz -> stored directory is : /shared/event_files - Publishing of MEFs is optional with MEFv1 as MEFs are automatically saved to the SAN. - You can enable publishing in create_config.info and specify location outside of the MDCP to copy MEFs to. - The default data format is Compact MDC.(You can see in mtx_config.xml the written output format for the MEF generation task) - use gunzip -d command to view/extract MEF files. 2.) MEF version 2 - Additional configuration required - Created from Stream Event Files (SEFs) - Configurable -> fully configurable , closing files based on time, file size , idle time, number of records. -> configurable independent of Transaction Log files. -> can be configured usign create_config.info -> Whichever threshold is crossed first will lead to file closure. - Chronological(Global Transaction Counter - GTC) -> Chronologically sorted using the GTC, which is a unique interger value that is incremented and assigned to each message as they come in to the Processing Cluster. - Balance values are calculated -> calculates the actual Balance value. - Avaibble formats : -> Compact MDC -> XML element - Compact MDC vs XML - Compact MDC -> advantage - Smaller file sizes. XML -> advantage - highly compatible -> easier to process -> easier to read. MEF Implentation Consideration ->Which version to implement. - MEF v1 vs v2 - v2 is recommended since v1 will be deprecated in the future. - when switching from v1 to v2, ensure that the files are stored in separate directories. -> Retention - Keep -> How long to retain EDRs for - Online - 6 months - Off-line - varies in line with Country regulations - Backup -> All files are gzipped - indepentdent of what version or format was selected. ex. for MEFv1 using Compact MDC : 17,000 events produce about 2.5 MB compressed. MEFv1 and Streaming - MEFv1 can be combined with Stream. Streaming provided EDRs as data streams to external system in near real-time. -> Streaming follow strict rules, ex. events have to streamed in chronological order and the same events must not be streamed multiple times. -> The Streaming Server on Publishing Cluster generated the data streams and the active Blade send those over to the Streaming Framework on the Proxy Blade. -> This is where one or more connectors are configured to stream the data onto external systems which includes Apache Kafka,ActiveMQ and Google Pub/Sub. -> Disk and Console connectors are mainly used for testing. -> Compliance with streaming rules is achieved with the help of a cursor that the Streaming Framework send via the RS Gateway, onto the Site-level TRA, onto the TRA-LB-PUB on the Publishing Custer and from there to the Streaming Server. -> The cursos specifies the GTC value for the next event expected to be streamed. -> If Streaming is enabled(optional), Event Loading will use Streaming Event files (SEFs) that both Publishing Blades generate for restore purposes. -> This enables the Publishing Blades on a restart to identify the last events that were streamed successful and to resend any if required. MEFv2 and Streaming *higlighted difference from v1 -> Streaming is required to be enabled. SEFs now provided the basis for both MEFs creation and Event Loading. -> Event Publishing has to be enabled/configured as MEFs are not automatically saved to the SAN. You can use the publishing settings to implement this. -> All other components work as it is. - Streaming Architecture Impacts - -> Publishing Cluster functions in Real Time High Availability. - Active/Hot Standby - Failover Managed by TRA-LB-PUB - Both Blades actively prepare data for streaming but only active Publishing Blade sends it to the Streaming Framework on the Proxy Blade. - Ensure real time high availabity where the hot standby Publishing Blade can take over with minimal disruption. - The failover is managed by the TRA-LB-PUB. -> Current release of Event Streaming Framework on Proxy Blades - Software distributed with Proxy RPM - Streaming Framework is included in the Proxy RPM for installation on one or more Proxy Baldes. -> Event Stream Server - Event Stream Server is included in the Engine RPM but has to be enabled. - Event Stream Server on Publishing Blade automatically removes Streamed Event Files (SEFs) from both local and the SAN by default. -> Local storage (Publishing Cluster) after 24 hours -> Shared storage (SAN) after 26 hours Streamed Event Files -> SEFs are automaticaly deleted after 24 hours from Publishing Cluster. -> SEFs are automaticaly deleted after 26 hours from the SAN. -> SEFs are tempory stored locally on the Publishing Cluster and on the SAN. - Streaming Framework Configuration - mtx_event_streamer.properties -> use to configure the Streaming Framwork on the Proxy Blade. 1. We specify the connect parameters for the RS Gateway, for the cursors sent towards the Publishing Cluster. 2. AS the RS Gateway runs on the same Blade as the Streaming Framework, we can again use localhost. - In production environment the "host" parameter would be the VIP for the Site-level TRA. - Streaming uses dedicated port 4100 Stream -> We need to configure at least one stream. -> A stream is made up of an optional filter and configuration details. - filter options: -> whitelist and blacklist -> Most streaming consumers suport either JSON and/or RAW (=Compact MDC). -> And our connector is of type disk, using a local directory that we write JSON files to, for testing. - Managing the Stream Framwork - mtx_event_streamer.sh {start | stop | restart | status} -> run the command on the Proxy Blade -> script to control the Streaming Framework which includes start, stop , restart, and status arguments. -> starts up and connects Event Streaming Server on the Publishing Blade. -> to check the current status you can use ps -ef | grep mtx, to view the running processes where the Streaming Framework will be displayed with a reference to the properties file. - SEF Characteristics - -> The Streaming Server on the Publishing Cluster maintains a local directory structure to assemble individual SEFs in SEF directories that are saved to the SAN. -> The 3 subdirectories listed here inside the local_stream_events directory. -> Created by Streaming Server on Publishing Blade - /var/log/mtx/local/stream_events -> Current Events records appended to cur_events directory. -> From there they are move to dir_ (while active) directory based on a threshold value. -> Once the directory is "full" which is based on a threshold value. the directory is moved over to the SAN and renamed to sef_ including the timestamp and the range of GTCs that are included in that directory (sef_ when complete). -> Copies as each SEF directory is close to - /shared/stream_events with same name -> Number of records per SEF determined by - mtx_config.xml - you can use the defaults in the production environment. - Use sed to edit mtx_config_base_xml_sed.extra -> set the max record per file for the event_write_task ====================== Module 10 Export ====================== 2 options to export data from IMDB 1. Use a Checkpoint 2. Use a MEF directory (v1 or v2) * The tool cannot export the live IMDB. data_export.jar -> Java export tool , java -jar data_export.jar -> available in /opt/mtx/bin -> The tool requires 5 parameters as input : 1. Input folder (Checkpoint or MEF directory) - Checkpoint, /shared/checkpoints/mtx_ckpt_v>realease_number> - MEFs ,/shared/event_files 2. Output folder ( ensure it exist and that it is empty before running the tool) 3. Config file that contains your private/custom MDCs. - reference to the custom MDC config file 4. MATRIXX version number. - Matrixx version number 5. Export configuration properties file - Checkpoints and MEFs does not use the same export config file. - Template file for exporting a Checkpoint, it is available in /opt/mtx/data directory. - Template file for exporting a MEFs, it is available in /opt/mtx/data directory. * You can also add the Optional SQL schema name file but the other 5 parameter is required and is enough. -> The tool produces the ff. output : 1. create_tables file for creating the required tables in My SQL 2. load_tables file for loading the exported data into My SQL. 3. Multiple CSV files that contain the exported data. ====================== Module 11 Event Repository ====================== Event Repository -> Optional -> is to use MongoDB for long term storage of Event Detail Recods. -> The Publishing Cluster loads Event Details Records, which contained in either MEFs or SEFs depending on the configuration. -> EDRs are loaded into one or more Event Collections on MongoDB using Event Loader Service. -> Additional Feature , is to configure General Ledger account names and transaction types using My MATRIXX. - General Ledger(GL) is a record of a company's financial transactions. - GL Processing EDRs inside the MongoDB is a separate job that typically scheduled to run overnight. - Afterwards, GL posting produces XML GL Posting files that can then be imported into accounting or Enterprise Resource Planning applications. API access -> EDRs have been moved out of the Engine into to Event Repository the remain accessible using API access. - GET commands will also target the active IMDB too eventhough it appear to be directed to Event Store only. - commands for the /eventstore will not automatically target Event Repository and IMDB. Deployment options for MongoDB -> 2 MongoDB servers that hold data, one active (primary) and the other one standby (secondary). Arbiter -> 3rd server added To avoid conflicts where both servers could assume that they are primary. -> It does not hold data but cast a vote as which of the other 2 servers is the primary SQL EQuivalents MongoDB SQL -> Collections = Tables -> Documents = Data Rows MtxEventDatabase -> Holds 3 types of Collections 1. EventCollections [YYYYMMDD] - hold the actual events that were loaded into MongoDB. There is a separate Collection for each calendar month. - MATRIXX EDRs are loaded into. 2. LoaderStatsCollection - Back logs, performance - provides details on the Event Loader's backlog and performance. 3. LoaderTracecollection - includes Loader trace , heart beats, Event File status. - Stats & Status Scripts - -> Matrixx provided 6 different scripts , 3 for Event Repository as a whole and further 3 for the Event Loader. Database (Event Repository) -> check_event_repository.py - Displays the Event Repository configuration information. -> print_event_repository_status.py - Displays the status of your MongoDB servers. -> print_event_repository_stats.py - Displays Event Store event database statistics. ex. event count and data sizes. Event Loader -> print_event_repository_loader_status.py - Displays status of the Event Repository Event Loader services running on the Publishing Cluster. -> print_event_repository_loader_stats.py - Displays the Event Repository Event Loader statistics for a running Event Loader. -> print_event_repository_loader_trace.py - Displays the status of a running Event Loader service, as well as the traces of its SEF/MEF processing associated statuses. - MongoDB Credentials - Username -> mongo admin (prompt) -> -u or --user commmand line option -> You can save username for the MongoDB connections in a hidden file ex..mtx_event_repository_username file -> To log in to MongoDB as an Administrator , you can type mongo admin and wait to be prompted for credentials. Or add the username as an argument. -> default MATRIXX username is MtxAdmin. -> If Authorization is enabled on MongoDB, the MATRIXX scripts shown earlier require credentials too. Password -> Prompt -> -p command line option - Cached - MongoDB Tools - Monitoring -> Mongo shell admin commands - getReplicationInfo - printSlaveReplicationInfo -> mongostat -> iostat Monitoring, Backup, Restore -> Ops Manager ====================== Module 12 Notifications ====================== MDCP -> can generate real-time user notification messages for the transfer of external systems. -> ex. e-mail, push notifcations and/or SMS. Notification -> can be generated when Product Offers are purchased or cancelled, Balances expire or Credit Limits and Thresholds are reached. -> Pricing Administrators define whether notifications are triggered and if so when they should be triggered. - They configure this in My MATRIXX and the resulting pricing configuration file gets loaded into the Engine. The ff. components are required in order to use the Notification Framework : -> 3 Main components 1. ActiveMQ server 2. Engine (incl. ActiveMQ Gateway) 3. Notifier 1.) The Engine functionality needs to be enabled using create_config.info. 2.) The resulting mtx_config.xml will enable the ActiveMQ Gateway as part of the MDC Gateway on all the Processing Blades. 3.) The Charging Server on the Processing Blades will trigger notifications as per requirement and hand those over to the MDC Gateway. 4.) The ActiveMQ Gateway then passes those notifcations on to the Apache ActiveMQ server, where outgoing notifications are queed in the mtx_notifcation_queue. 5.) ActiveMQ sends the notifcation on to the Notifier, which is a MATRIXX software component based on Apache Camel. 6.) The Notifier then formats the content and passes it on to your configured delivery channels like e-mail servers. 7.) From there they are sent out to respective users. 8.) The Notifier also sends an acknowledgement message, back to the Engine for each notifcation that was succesfully sent. It uses the same route in reverse. *Note the separate queue for responses on the ActiveMQ servers. *In production ActiveMQ server and Notifier run on seperate Blade. - Notifier Configuration - MATRIXX Notifier -> uses Apache Camel FreeMarker Templates to format notification messages. -> When formatting a notifcation message, the Notifier first checks if there is a custom template for a specific notifcation type. -> Preformatted templates are avaialbe in /opt/mtx/data/mtx_notifier_templates_Sample.zip. -> MATRIXX recommends that you customize the notifiation templates for your specific business needs. /opt/mtx/conf -> contains the Notifier configuration files - mtx_notifier_camel.properties -> configures the connection and authentication parameters for the Notifier. -> ex. credetials and URI for the ActiveMQ queues, SMTP servers, push notifications servers etc. - mtx_notifier_camel.xml -> configures Apache Camel routes and endpoint. -> ex. describing possible connections to delivery systems. - application_mtx_notifier_camel.properties -> configures additional *.jar and *.classes. - mtx_notifier_camel_log4j2.xml -> contains the logging parameters for the Notifier. -> ex. the name and location of the log file to be created. /opt/mtx/data - mtx_notifier_templates_Sample.zip - Notifation Framework Operations - List of oprations for the Notifcation Framework -> ActiveMQ is the system service that needs to be running on the ActiveMQ server. -> To start or stop the Notifier, we log in as the mtx user and use any of the ff. control commands - start_notifier_camel.py - stop_notifier_camel.py - /etc/int.d/mtx_notifier_camel (start | stop) -> To monitor the notifcations from the Engine , we use print_blade_stats.py -N. - This will show a summary of -> messages sent and acknowledged, -> error type -> re-try attempts. -> Notifcations are included in - Both type of event files MEF and SEF - Checkpoints - Notifcation Framework Logs - 3 main logs to consult when troubleshooting the Notification Framework 1. MATRIXX ActiveMQ Gateway - /var/log/mtx/mtx_debug.log - If its on the Engine , consult the mtx_debug.log on the Processing Cluster. - This will include the ActiveMQ Gateway details as it is part of the MDC Gateway process. 2. Apache ActiveMQ - /opt/activemq/data/activemq.log - If you expect the issue to be with regards to message queues on the ActiveMQ server, go to /opt/activemq/data where you will find the activemq.log. - You will need root/sudo privileges to access this file. 3. MATRIXX Notifier - /var/log/mtx/mtx_notifier_camel.log - for troubleshooting the Notifier, default is /var/log/mtx where you will find the mtx_notifier_camel.log. ====================== Module 13 HA - High Availability ====================== Geo-Redundancy -> referring to MDCP as a whole, implemented on the MDCP -> underlines the need to implement Engines that belong to the same Engine Chain, in at least two geographical locations for best high availability. -> Esures business continuity in the event of a major system failure by implementing high availability through two or more geographical locations. -> This protects us from a wider range of single points of failure, to ensure business continuity. -> Engines are assigned an identifier that never changes. -> Primary refers to Engine 1 which never changes -> While the identifier never change the status of an Engine can change. -> Active Engine refers to a status which can change. Steps on how Geo-redundancy is implemented in the MDCP. DIAMETER/MDC Traffic Step 1 -> With our primary Engine currently being active , any Diamater or MDC messages will be forwared by the SITE-level TRA to this Engine for Processing -> The TRA-LBs on the active Processing Cluster receive the traffic and the active TRA-LB node will apply round robin to balance the load accross all Processing Blades. Step 2 -> The local IMDBs are updated and the Transaction Records forwared on to th Publishin Cluster where they are replayed to update the local IMDBs. Step 3 -> The active Publishing Cluster then forwards the Transaction Records on to the Standby Engine's Processing Cluster where they are replayed locally again to update the IMDBs. Step 4 -> The stand by Engine's Processing Cluster forwards the Transaction Records on to the Standby Engine's Publising Cluster for replay. -> Please note whenever the Standby Engine starts up, it uses Tranaction Records from the Active Engine to rebuild its IMDBs , rather than information from the local SAN. SS7/SIGTRAN Messages Step 1 -> Messages are forwarded by the Network Enabler on to all Camel Gateway process of the active Processing Cluster using its intergrated load balancing method. -> Step 2 to 4 are the same as before. -> The MDCP replay flow is independent of the incoming message type. -> The only difference between incoming traffic is with regards to routing the processing to the Processing Cluster. Once processed, there is no difference with regards to the flow of the replay traffic. - Cluster States - Peer Cluster -> is the Active Cluster/ source that a standby Cluster receives its information from Cluster States ->When an Engine moves from Active to Standby , the Cluster changes in multiple steps. - Unknown -> The cluster will show this status as each Blade reads its mtx_config.xml and starts up the respective MATRIXX processes. - Pre-Init - Init - Post-Init - Active - Standby - Active-Sync - Standby-Sync - Secondary Engine - -> Most of the management scripts can also be applied to the standby Engine. -> If you are logged in to the active Engine you can use the standby Engine's reference as an argument. Manual Changeover -> activate_engine.py -e EngineID - The most common method to manually change over the active Engine. - This will initiate a controlled changeover to ensure no data loss. -> stop_engine.py - execute on the active engine (assume this is the Primary Engine) - the active cluster will shutdown and handover the control to the standby engine. - once the Primary Engine is restarted and resync it will go to the standby state. - running the stop_engine.py to the secondary engine will switch back to the primary engine.

Module 2 Engine Summary PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue