Network and System Asset Protection Controls

Share This Post:

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Asset Protection Controls 850

Network and system asset protection controls are meant to minimize the risks to your network infrastructure that you can’t eliminate. These controls include:

  • Physical security to prevent access to secure areas.
  • The elimination of single failure points for critical systems with redundancy and fault-tolerance strategies.
  • Backups to ensure that data remains available even after data is lost.
  • Business continuity strategies so mission-critical functions continue to operate in the event of disaster.

At the core of this approach lays the concept of Defence in Depth, which refers to the implementation of several layers of security protection as a single layer of protection does not really help you as if that one layer fails, you have nothing left to protect you.   Defence in depth is fostered in three main ways:

  • Control diversity: The use of different security control types, such as technical controls, administrative controls, and physical controls. For example, technical security controls such as firewalls, intrusion detection systems (IDSs), and proxy servers help protect a network. Physical security controls can provide extra protection for the server room or other areas where these devices are located. Administrative controls such as vulnerability assessments and penetration tests can help verify that these controls are working as expected.
  • Vendor diversity: The implementation of  security controls from different vendors to increase security so that if a vulnerability is discovered in he product of one vendor, it doesn’t compromise the entire system, 
  • User training: Educate users about engaging in risky behaviors, such as downloading and installing files from unknown sources or responding to phishing emails, that give attackers a path into an organization’s network. 

Physical Asset Security

Physical security controls are tools that limit the physical ability to access, alter or remove assets and include such as hardware locks, fences, an identification badge, and a security camera.  These controls are implemented at different boundaries:

  • Perimeters: Typically involved fences  with security guards and barricades at gates to control access
  • Buildings: Guards and locked doors to restrict entry to  only authorized personnel; additionally many buildings include lighting and video cameras to monitor the entrances and exits.
  • Server/networking rooms: Servers and network devices are normally stored in locked areas where only IT personnel can access them as a means of providing additional physical security. These rooms typically have:
    • Locked wiring closet to prevent an attacker from installing monitoring hardware.
    • Locking cabinets to protect servers and other equipment installed in the equipment bays.
    • Cable locks to protect laptop computers
    • A safe for smaller items 
  • Work areas: Restricted access to specific work areas when employees perform classified or restricted access tasks.
    • Airgap: A physical security control that ensures that a computer or network is physically isolated from another computer or network. 

Using Signs: A simple physical security control is a sign that deter many people from entering a restricted area.

Many organizations use security guards to control access to buildings and secure spaces in order to:

  • Check employee badges/identification prior to granting access.
  • Restrict access by checking people’s identity against a prea pproved access control list and  record all access in an access log.
  • Deter tailgating incidents by observing personnel when they use their proximity card to gain access to a secure area.

Comparing Door Lock Types

Although it seems mundane, good physical security starts with a door lock and most facilities use what is referred too as a  door access system that only allows access after some control mechanism (cipher lock/proximity cards/biometrics) is successfully challenged.
When implementing door access systems, consider:

  • Limiting the number of entry and exit points as you need sets of controls to control access at all places.
  • Personnel safety in the event of a fire.  A door access systems must allow personnel to exit the building without any form of authentication.

Cipher locks: Have four or five buttons labeled with numbers that get pressed in a certain order to unlock the door.

  • Cipher locks can be electronic or manual.
  • An electronic cipher lock automatically unlocks the door entering the correct code into the keypad.
  • A manual cipher lock requires turning a handle after entering the code.
    • Some manual cipher locks require two numbers be entered at the same time in order to add complexity and reduce brute force attacks
  • Drawbacks of cipher locks include that:
    •  They don’t identify the users.
    • Users can distribute the cipher code to unauthorized individuals without understanding the risks.
    • Shoulder surfers can discover the code by watching users as they enter.

Card Access: Used for access points, such as the entry to a building or the entry to a controlled area within a building, they use an electronic lock that only unlocks when the user passes the proximity card in front of a card reader or insert the smart card into a smart card reader to gain access. 

proximity card don’t require its own power source as the electronics in the card include a capacitor and a coil that accept a charge from the proximity card reader; when the card passes close to the reader, the reader excites the coil and stores a charge in the capacitor, the card then transmits the information to the reader using a radio frequency.

  • When used with door access systems, so proximity card systems include details on the user and record when the user enters or exits the area.
  • Can combine a proximity card reader with a keypad that requires a personal identification number (PIN) to provide multifactor authentication.
    • The user has something (the proximity card) and knows something (a PIN).

Biometrics Door Access:  Some biometric methods provide both identification and authentication if they connected to database of authorized individuals, allowing these systems to record activity, such as who entered the area and when.

Mantraps

A physical security mechanism, mantraps are designed to control access to a secure area through a buffer zone and prevent tailgating; tailgating/piggybacking occurs when one user follows closely behind another user without using credentials. High-traffic areas are most susceptible to tailgating attacks and the best solution is a mantrap.

Mantraps get their name due to their ability to lock a person between two areas, such as an open access area and a secure access area, but not all of them are that sophisticated.

Network and System Asset Protection Controls: Circular Man Traps
Example of circular mantraps.

Simple mantrap:   Usually a turnstile similar.
Sophisticated mantrap:   A room that creates a buffer area between the secure/unsecured areas with Access through the entry/exit door tightly controlled with either with guards or access card.

 

Video Surveillance

Organizations often use video cameras within a work environment to protect employees and enhance security in the workplace.  Video surveillance provides the most reliable proof of a person’s location and activity. Access logs provide a record, but it’s possible to circumvent the security of an access log.Security cameras, and closed-circuit television (CCTV) are used for video surveillance in the workplace and surrounding area including:

  • Parking lot,
  • Building entrances and exits.
  • High-security areas, such as the entrance of a data center or server room.

Beyond security, CCTV provides enhanced safety by deterring threats.  Most systems include a recording element, and they can verify if someone is stealing the company’s assets. By recording activity, videos can be played back later for investigation and even prosecution.

 

Network and System Asset Protection Controls - Video Survelliance

When using video surveillance in a work environment, it’s important to respect privacy and to be aware of privacy laws. Some things to consider are

  • Only record activity in public areas. People have a reasonable expectation of privacy in certain areas, such as locker rooms and restrooms, and it is often illegal to record activity in these areas.
  • Notify employees of the surveillance. If employees aren’t notified of the surveillance, legal issues related to the video surveillance can arise. This is especially true if the recordings are used when taking legal and/or disciplinary actions against an employee.
  • Do not record audio. Recording audio is illegal in many jurisdictions, without the express consent of all parties being recorded. Many companies won’t even sell surveillance cameras that record audio.

Fencing, Lighting, and Alarms

Fences: provide a barrier around a property and deter people from entering and control access to the area via specific gates and mantraps.  In some situations, fencing isn’t enough to deter potential attackers, so organizations erect barricades. As an alternative to barricades, which can be intimidating, you can use bollards, short vertical posts of reinforced concrete and/or steel placed in front of entrances about three or four feet apart that prevent vehicles driving through the front of buildings.

Lights: At the entrances to a building/internal restricted areas can deter attackers from trying to break in. Many organizations use a combination of automation, light dimmers, and motion sensors to save on electricity costs without sacrificing security. The lights automatically turn on at dusk, but in a low, dimmed mode. When the motion sensors detect any movement, the lights turn on at full capacity. They automatically turn off at dawn.  if an attacker can remove the light bulbs, it defeats the control, so place the lights high enough so that they can’t easily be reached, or protect them with a metal cage.
Alarms: Provide security protection by monitor entry points such as doors and windows, detecting when someone opens them.  You can also combine motion detection systems with burglary prevention systems to detect movement using microwave technologies within monitored areas and trigger alarms.  Infrared detectors sense infrared radiation, sometimes called infrared light, which effectively sees a difference between objects of different temperatures.  This can help eliminate false alarms by sensing more than just motion, but motion from objects of different temperatures.

Locking It Down.....

Companies that don’t have the resources to employ advanced security systems often use hardware locks to prevent access to secure areas. Although not sophisticated, they are much better than leaving the rooms open and the equipment exposed.  Proper key management is important when using hardware locks and ensures that only authorized personnel can access the physical keys. 

Cable locks:  Functioning similar to how a bicycle cable lock does, they provide a theft deterrent for mobile computers as well as desktop computers, especially in unsupervised locations, like labs.

Locking cabinets: Larger companies with large server rooms have advanced security to restrict access and within the server room use locking cabinets (as seen below) to secure equipment mounted within equipment bays which hold servers, routers, and other IT equipment. 

Network and System Asset Protection Controls: Storage Locker
Locking server storage cabinets.

 

Safes: you can store smaller devices such as external USB drives or USB flash drives in an office safe or locking cabinet when they aren’t in use

Asset Management: Involves tracking valuable assets throughout their life cycles with specific operational processes to track hardware  (servers, desktop computers, laptop computers, routers, and switches) which reduce vulnerabilities in a number of ways:

  • Architectural/design weaknesses: Reduction in architectural/ design weaknesses as equipment purchases go through a formal approval process to evaluate that it fits in the overall network architecture.
    • Unapproved assets weaken security by adding in additional resources that aren’t managed.
  • System sprawl and undocumented assets: Occurs when an organization has more systems than it needs so they are underutilized; asset management prior to purchasing hardware by evaluating the purchase and once purchased is added into the asset management tracking system.
    • Many organizations use automated methods for inventory control
    • As mobile devices are easy to lose, organizations often use asset- tracking methods to reduce losses, especially when people leave the company.

Environmental Security

Environmental controls contribute to the availability of system by ensuring temperature and humidity controls are operating properly, fire suppression systems are in place, and proper procedures are used when running cables.  Environmental monitoring includes temperature and humidity controls and makes adjustments as necessary to keep the temperature and humidity constant.

Some large-scale data centers log the environmental data, recording the temperature/humidity at different times during the day, allowing administrators to see if the HVAC system can keep up with the demands within the data center.

Heating, ventilation, and air conditioning (HVAC) systems: Computers and other electronic equipment can’t handle drastic changes in temperatures and if systems overheat, chips can burn themselves out, so HVAC systems are important physical security controls that enhance the availability of systems.  HVACs also include:

  • Thermostats: Ensures that the air temperature is controlled and maintained.
  • Humidity controls: Ensure that the humidity is controlled as
    • High humidity can cause condensation on the equipment, leading to water damage.
    • Low humidity allows a higher incidence of electrostatic discharge

Cooling capacity of HVAC systems is measured as tonnage and one ton of cooling equals 12,000 British thermal units per hour (Btu/hour).  For a point of reference, a typical home HVAC systems are three-ton units. 

Most servers rooms are built with raised flooring so air conditioning can flow through the space under the raised floor while servers are kept in rack-mountable cases in equipment cabinets that have locking perforated doors in the front and rear that provide air circulation and physical security.  Cold air comes in the front, passes over and through the servers to keep them cool, and warmer air exiting out the rear. 

Hot and Cold Aisles:  Consider what happens if all the cabinets face the same way; hot air pumped out the back of one row of cabinets would be sent to the front of the cabinets behind them, so a server room layout is designed to regulate the cooling in data centers:

  • Hot Aisle: The backs of all the cabinets in one row faces the back of all the cabinets in an adjacent row.  As hot air exits the back of the cabinet, aisles with the backs facing each other is the hot aisle.
  • Cold Aisle:  The fronts of the cabinets in one row face the front of the cabinets in the adjacent row. Cool air pumped through the floor to this cool aisle, while perforated floor tiles in the raised flooring allow the cool air to rise.  

Fire Suppression:  Most organizations included fixed systems to control fires and place portable fire extinguishers in different areas around the organization. A fixed system can detect a fire and automatically activate to extinguish the fire. Individuals use portable fire extinguishers to extinguish or suppress small fires.  The components of a fire are heat, oxygen, fuel, and a chain reaction creating the fire. Fire suppression involves disrupting one of these elements to extinguish a fire by:

  • Removing heat: Fire extinguishers commonly use chemical agents or water to remove the heat; water should never be used on an electrical fire.
  • Removing oxygen: Many methods use a gas, such as carbon dioxide (CO2) to displace the oxygen and is a common method of fighting electrical fires because CO2 and similar gasses are harmless to electrical equipment.
    • HVAC systems are often integrated with fire alarm systems to help prevent fire from spreading.  If an HVAC system operates normally while a fire is active, it feeds the fire by continuing to pump oxygen.  An HVAC system integrated with the fire alarm system can control the airflow to prevent the rapid spread of the fire. Many current HVAC systems have dampers that can control airflow to specific areas of a building. Other HVAC systems automatically turn off when fire suppression systems detect a fire.
  • Disrupt the chain reaction: Some chemicals can disrupt the chain reaction of fires to stop them.

When implementing any fire suppression system, it’s important to consider the safety of personnel:

  • If a fire suppression system displaces oxygen with carbon dioxide, it’s important to ensure that personnel can get out before the oxygen is displaced.
  • Card secured exit doors:  What happens if a fire starts and power to the building is lost and a card reader won’t work, so if the door can’t open, employees will be trapped. It’s important to ensure that an alternative allows personnel to exit even if the proximity card reader loses power while not introducing a vulnerability.

Shielding: Helps prevent electromagnetic interference (EMI) and radio frequency interference (RFI) from interfering with normal signal transmissions as well as unwanted emissions, preventing attackers from capturing network traffic.  As data travels along copper wire, an induction field is created around the wire, so with the right tools you can capture the signal. The shielding in STP cable blocks this. Fiber-optic cable is not susceptible to this type of attack.While shielding used to block interference and leakage from both EMI and RFI sources is often referred to as simply EMI shielding, the difference between the two are: 

  • EMI comes from different types of motors, power lines, and even fluorescent lights. 
  • RFI comes from radio frequency (RF) sources such as AM or FM transmitters. 

Attackers often use different types of eavesdropping methods to capture network traffic. If the data is emanating outside of the wire or outside of an enclosure, attackers may be able to capture and read the data. EMI shielding fulfills the dual purpose of keeping interference out and preventing attackers from capturing network traffic.

  • Protected Cabling: Twisted-pair cable (CAT5e and CAT6 ) come as either shielded twisted-pair (STP) and unshielded twisted-pair (UTP); shielding helps prevent an attacker from capturing network traffic and block interference from corrupting the data.  
  • Protected Distribution of Cabling: Physical security includes cable routing as if a cable is accessible, attackers can cut the cable with a hub, and then capture all the traffic going through the hub with a protocol analyzer.   A way to avoid this running cables through cable troughs, which are long metal containers, about 4 inches wide by 4 inches high so they aren’t as accessible to potential attackers.  Additionally,  keep the cables away from EMI sources like fluorescent lighting fixtures that can disrupt the signals on the cables. The result is intermittent connectivity for users.
  • Faraday Cage: A structure designed to prevent signals from escaping it by reflected radio waves that reach the periphery back to the center of the room, preventing signal emanation outside the Faraday cage.  This also provides shielding to prevent outside interference such as EMI and RFI from entering the room. 

System Redundancy and Fault Tolerance

Computers systems and networks eventually fail but adding redundancy into systems and networks,  increases their reliability and therefore, their availability.  Although IT personnel recognize the risks with single points of failure, they often overlook them until a disaster occurs.
Redundancy: Duplicates critical system components to provide fault tolerance, allowing service to continue as if a fault never occurred, by eliminating single points of failure (a component that, if it fails, can cause an entire system to crash) at any number of levels:

  • Disk redundancies using RAID: If a server uses a single drive, the system will crash if the drive fails; redundant array of inexpensive disks (RAID) provides fault tolerance and is an inexpensive method of adding fault tolerance to a system.
  • Server redundancies:If a server provides a critical service and its failure halts the service, it is a single point of failure. Failover clusters  provide fault tolerance for critical servers.
  • Power redundancies: If there is only has one source of power for critical systems, the power is a single point of failure; an uninterruptible power supplies (UPSs) and power generators provide fault tolerance for power outages.  Power is a critical utility to consider when reviewing redundancies. For mission-critical systems, you can use uninterruptible power supplies and generators to provide both fault tolerance and high availability. An UPS provides fault tolerance for power and can protect against power fluctuations. It provides short-term power. Generators provide long-term power in extended outages.
  • Site redundancies: Adding hot, cold, or warm sites

Disk Redundancies

Computer’s have four primary resources:

  • Hard disk
  • Memory,
  • Network interface.
  • Processor

As any administrator knows, the disk is the slowest and most susceptible to failure, so it often pays to upgrade disk subsystems to improve their performance and redundancy.

Redundant array of inexpensive disks (RAID) provide disk fault tolerance, improving system availability; most RAID subsystems tolerate disk failure, so the system continue to operate. RAID systems are becoming much more affordable as the price of drives steadily falls and disk capacity steadily increases. RAID systems come in a variety of flavors:

RAID-0 (striping): Includes two or more physical disks, but doesn’t provide any redundancy or fault tolerance as files stored on a RAID-0 array are spread across each of the disks.

  • Increases read and write performance as different parts of the file can be read from/written to simultaneously on each of the disks because a file is spread across multiple physical disks. If a RAID-0, has three 400 GB drives, you have 1,200 GB (1.2 TB) of storage space.

RAID-1 (mirroring): Data written to one disk is also written to the other disk so if one of the disks fails, the other disk still has all the data and the system can continue to operate without any data loss.  Mirroring all the drives in a system allows a system to keep running even if half of the drives die.

  • Disk duplexing, which gives each disks its own disk controller, removes the disk controller as a single point of failure.
    If you have two 500 GB drives used in a RAID-1, you have 500 GB of
  • storage space as the second 500 GB of storage space is the fault- tolerant, mirrored volume.

RAID-5: A system with three or more disks striped together similar to RAID-0 with the equivalent of one drive including parity information striped across each of the drives for fault tolerance. If one of the drives fails, the system can read the information on the remaining drives and determine what the actual data should be. If two of the drives fail in a RAID-5, the data is lost.

RAID-6: Requires 4 disks at a minimum and extends RAID-5 with an additional parity block; continues operations even if two disk drives fail. 

RAID-10/RAID 1+0: Combines mirroring (RAID-1) and striping (RAID-0) while requiring at least 4 drives to start and you must add drives in multiples of two. If you have four 100 GB drives used in a RAID-10, you have 200 GB of usable storage.

Servers: Availability and Redundancy

High availability system remain operational with almost zero downtime by utilizing redundancy and fault-tolerance methods, like high capacity failover clusters, in order to maintain “five nines” (99.999 %) uptime.   Although achievable, this is expensive but if the cost of an outage is high,  redundant technology costs are justified. 

Distributive allocation: Provide both high availability and scalability by configuring multiple computers (nodes) to work together within a local network. A central processor:

  • Divides a complex problem into smaller problems
  • Coordinates tasking the individual nodes and collecting the results.
  • High availability: If any single node fail, the central processor doesn’t task it anymore, but since overall processing continues
  • High scalability: Easy to add additional nodes and task them when they come online.
As you read on, keep in minds that failover clusters are commonly used for applications such as database applications while load balancers are often used for services, such as web servers in a web farm.
 

Failover Clusters: Provide high availability for a service offered by a server by using two or more servers (“nodes“)  in a cluster configuration with at least one server active and at least one inactive; if an active node fails, the inactive node can take over the load without interruption to clients.

Network and System Asset Protection Controls - Failover clusters
A simple 2 node failover network set up.

The diagram above shows a two-node active-passive failover cluster with both nodes being individual application servers, that in additional to monitoring each other, have access to external data storage used by the active server. 

Imagine that Node 1 is the active node. When any of the clients connect, the cluster software (installed on both nodes) ensures that the clients connect to the active node. If Node 1 fails, Node 2 senses the failure through their connection and takes over as the active node.  Since both nodes access the shared storage, there is no data loss.  Additionally, the shared storage could be a single point of failure if not set up as a RAID. This ensures that even if a hard drive in the shared storage fails, the service will continue.   If this was an active-active cluster, the would be a load balancer that shares the load between both servers.

Load Balancers: Software or hardware sitting on DMZ located servers that optimize and distribute data loads across network and system resources; many load balancers can also detect when a server fails, so if a server stops responding, the load-balancing software no longer sends clients to this server. Balancing primarily provides scalability, the ability to serve more clients without decreasing performance where as  availability ensures that systems are up and operational when needed.  This contributes to overall high availability for the load balancer.  Spreading the load among multiple systems ensures individual systems are not overloaded and can be done two ways:

  •  A hardware-based load balancer accepts traffic and directs it to servers based on factors such as processor utilization and the number of current connections to the server. 
  • A software- based load balancer uses software running on each of the servers in the load- balanced cluster to balance the load using a virtual IP; clients send requests to this virtual IP address and the load-balancing software redirects the request to one of the three servers in the web farm using their private IP addresses. In this scenario, the actual IP address is referred to as a virtual IP.
Network and System Asset Protection Controls: Load Balancer
Simple load balancing configuration.

The diagram above is an example of a simple load balancer with multiple web servers, each with the same web application. A load balancer uses a scheduling technique to determine where to send new requests. 

  • Some load balancers simply send new requests to the servers in a round-robin fashion.  
  • Other load balancers automatically detect the load on individual servers and send new clients to the least used server.
  • Some load balancers use source address affinity to direct the requests. Source affinity sends requests to the same server based on the requestor’s IP address.

Backing Up Data

If you work with computers for any amount of time, experiencing a data loss is inevitable, you need to back up your data so it can be restored. Without a backup, all of the data, regardless of the redundancy of your infrastructure, is gone.  Organizations create written backup policy to identify issues such as what data to back up, how often to back up the data, how to test the backups, how long to retain the backups as well as  geographic considerations, such as:

  • Off-site backups: A copy of a backup should be stored in a separate geographic location, protecting critical data against a disaster such as a fire or flood at the main site.
  • Distance: Many organizations have specific requirements related to the distance between the main site and the off-site location. 
  • Location selection: The location is often dependent on environmental issues like flood plains and earthquake zones.
  • Legal implications: Depend on the data stored in the backups; if the backups include Personally Identifiable Information (PII) or Protected Health Information (PHI), the backups need to be protected according to governing laws.
  • Data sovereignty: Refers to the legal implications when data is stored off-site in a different country as they are subject to the laws of that country; an issue  if the backups are stored in a cloud location, and the cloud servers are in a different country.

Tapes are the most common back up media and they store more data cheaper than other media, though some organizations use hard drives for backups. However, the type of media doesn’t affect the backup type.  Most organizations balance time and money and use either a full/differential or a full/incremental backup strategy. The following backup types, listed in order of descending comprehensiveness, are commonly used:

  • Full backup: Backs up all data specified in the backup. While every backup strategy starts with a full backup, it’s rare to do a full backup on a daily basis in most production environments. This is because of:
    • Time Requirements: Can take several hours to complete and can interfere with operations.
    • Capital Requirements:  Daily full backups require more media, and the cost can be prohibitive so organizations often combine full backups with differential or incremental backups.

Restoring a full backup is the easiest and quickest of the restorations as you only need to restore the single full backup and you’re done.  

  • Differential backup: Backs up data that changed since the last full backup; starting with a full backup, differential backups back up data that has changed or is different since the last full backup.  Full/differential strategies reduce the amount of time needed to restore backups and is used in organization where being able to recover systems is a priority.
  • Incremental backup: Backs up all the data that has changed since the last full or incremental backup. An incremental backup strategy also starts with a full backup. After the full backup, incremental backups then back up data that has changed since the last  full backup, or incremental backup.

    Because incremental backups often back up different data each day of the week, each of the incremental backups must be restored in chronological order.  Full/incremental strategies reduce the amount of time needed to perform backups and is used in organization that don’t have a lot of time to do maintenance.

  • Snapshot: Sometimes referred to as an image backup; captures the data at a point in time and is commonly used with virtual machines and sometimes referred to as a checkpoint. 

Testing Backups:  Next to not backing up data, having failed back ups is another nightmare for administrators. The only way to validate a backup is to perform a test restore.  In addition to finding out if your back ups work,  performing regular test restores  allows administrators to become familiar with the process outside of a crisis 

Backup Protection:  media should be protected at the same level as the data that it holds and includes:

  • Storage: Using labeling to identify the data and physical security protection to prevent others from easily accessing it while it’s stored.
  • Transfer: Data should be protected any time it is transferred from one location to another. This is especially true when transferring a copy of the backup to a separate geographical location.
  • Destruction: When the backups are no longer needed, they should be destroyed by degaussing/shredding/burning/scrubbing.

Business Continuity Planning

Organizations create a business continuity plan (BCP) to plan for potential critical services/functions going off line in order to ensure that critical business operations continue and the organization survives. . This plan includes disaster recovery elements that provide the steps used to return critical functions to operation after an outage.  Disasters come from many sources and the goal is to predict the relevant disasters, their impact, and then develop recovery strategies to mitigate them.

Trouble Never Sends A Warning
Trouble never sends a warning.

The time to make these decisions is not during a crisis. Instead, the organization completes a BIA (business impact analysis) in advance.  While a BIA does not recommend solutions, it provides management with the  information required to focus on critical business functions as it collects information from throughout the organization by identifying:

  • Business essential systems components
  • Vulnerable business processes. These are processes that support mission-essential functions.  It goes without saying that if critical systems and components fail, mission-essential functions cannot be completed and the business goes kaput.

Used appropriately, the BIA addresses the following questions:

  • What are the critical systems and functions?
  • Are there any dependencies related to these critical systems and functions?
  • What is the maximum downtime limit of these critical systems and functions?
    •  Identifying the maximum downtime limit drives decisions related to recovery objectives, helping an organization identify contingency plans and policies.
  • What scenarios are most likely to impact these critical systems and functions?
  • What is the potential loss from these scenarios?

Disaster Impact

A BIA attempts to identify and evaluate the impact from a range of disaster scenarios by answering a number of questions related to the disaster scenarios including:

  • Loss of life? Is there a way to minimize the risk to personnel?
  • Loss of property?
  • Reduction of safety for personnel or property?
  • Financial losses to the organization?
  • Organizational reputation damage?

Privacy Threshold and Impact  Assessments:  These are two tools that organizations can use when completing a BIA.

  • Privacy Threshold Assessment: Primary purpose is the identification of PII within a system and is completed by the system owner or data owner by answering a simple questionnaire.  If PII is present in the system, then the next step is to conduct a privacy impact assessment.
  • Privacy Impact Assessment:  Attempts to proactively identify potential risks related to the PII by reviewing how the information is handled with the goal being to ensure that the system is complying with applicable laws, regulations, and guidelines.

Recovery Time Objective (RTO):  The maximum acceptable amount of time to restore a system after an outage as defined by the maximum acceptable outage or maximum tolerable outage time for mission-essential functions and critical systems. If an outage lasts longer than this maximum time, the impact is unacceptable to the organization.

Recovery Point Objective (RPO): Identifies a point in time where data loss is acceptable.  Management might decide that some data loss is acceptable, but they always want to be able to recover data from at least the previous month, so the RPO is a month; administrators would ensure that they have monthly backups to restore from to meet the RPO.

Mean Time Between Failures/Mean Time To Recover
When working with a BIA, experts often attempt to predict the possibility of a failure.  The following two terms are often used to predict potential failures:

  • Mean time between failures (MTBF): A measure of a system’s reliability in hours with higher MTBF numbers indicating a more reliable system and is used to predict potential outages.
  • Mean time to recover (MTTR):  The average time it takes to restore a failed system.  Maintenance contracts often specifies, but not guarantees, the MTTR as a part of the contract. Sometimes, it might take a little longer and sometimes it might be a little quicker, with the average defined by the MTTR.

Planning for Continuity of Operations

Planning for the continuity of operations is an important part of network and system asset protection controls as it focuses on restoring mission-essential functions at a recovery site after a critical outage.   A disaster recovery process typically include the following steps:

  • Activate the disaster recovery plan (DRP):  Some disasters, occur without much warning so the disaster recovery plan is activated after the disaster, while other disasters provide a warning so the plan is activated when the disaster is imminent.
  • Implement contingencies: If the recovery plan requires implementation of an alternate site, move critical function to these sites. If the disaster destroyed on-site backups, this step retrieves the off-site backups from the off-site location.
  • Recover critical systems: After the disaster recover critical systems based on the system hierarchy in the DRP documents; also includes reviewing change management documentation to ensure that recovered systems include approved changes.
  • Test recovered systems: Before bringing systems online, test/verify them. This may include comparing the restored system with a performance baseline to verify functionality.
  • After-action report: The final phase of disaster recovery includes a review of the disaster and includes a lessons learned review to identify what went right/ wrong so the organization often updates the plan to incorporate any lessons learned.

The rest of this section discusses the various components of the disaster recover process.

Disaster Recovery Plan:  As part of an overall business continuity plan, organization use a business impact analysis to identify the critical systems and components in order to  develop disaster recovery strategies and disaster recovery plans (DRPs).   A DRP/BCP include a hierarchical list of critical systems identifying which systems to restore after a disaster and in what order.; systems often have interdependencies requiring systems to be restored in a certain order.  This hierarchical list is also valuable when using alternate warm/cold sites; when the organization moves operations to an alternate site, it will want the most important systems and functions restored first.

As a best practice, return the least critical functions to the primary site first as the critical organizational functions are operational at the alternate site and can stay there as long as necessary. If a site has just gone through a disaster, it’s very likely that there are still some unknown problems. By moving the least critical functions first, undiscovered problems will appear and can be resolved without significantly affecting mission-essential functions.

If the DRP doesn’t prioritize the systems, individuals restoring the systems will use their own judgment, which might not meet the overall needs of the organization.     Similarly, the DRP often prioritizes the services to restore after an outage. As a rule, critical business functions and security services are restored first. Support services are restored last.

Failover is the process of moving these mission-essential functions to a recovery/alternate site.

Recovery Sites: an alternate data processing site use after a disaster.  These alternate locations could be office space within a building, an entire building, or even a group of buildings.  These sites include:

  • Hot Site: Often another active business location, it is operational 24 hours a day, seven days a week and take over for the primary site quickly after a primary site failure; includes all the capabilities and up to date data of the primary site.  Copies of backup tapes are often stored at the hot site . Hot sites are the most effective and expensive disaster recovery solution for high-availability requirements.
  • Cold Site: The organization brings all the equipment, software, and data to the site when it activates it, so all a cold site requires is power and connectivity. A cold site is the cheapest to maintain, but it is also the most difficult to test.
  • Warm Site: A compromise between hot and cold sites that an organization can tailor to meet its needs.
  • Mobile site: A self contained transportable unit with the required equipment
  • Mirrored sites:  Identical to the primary location and provide 100 percent availability, using real-time transfers to send modifications from the primary location to the mirrored site. Although a hot site can be up and operational within an hour, the mirrored site is always up and operational.

Testing Plans with Exercises:  Testing validates that the plan works as desired and will often include testing redundancies and backups. There are several different types of testing used with BCPs and DRPs.  NIST SP 800-34, “Guide to Test, Training, and Exercise Programs for IT Plans and Capabilities,” provides detailed guidance on testing BCP and DRP plans. SP 800-34 identifies two primary types of exercises:

  • Tabletop exercise: Discussion-based with a coordinator leading employees them through one or more scenarios in order to generate discussion about team members’ roles and responsibilities and the decision- making process during an incident. Ideally, this validates that the plan is valid. However, it sometimes reveals flaws that can be worked into the disaster recover plan.
  • Functional exercise: Provides an opportunity to test plans in a simulated environment so the participants go through the steps in a controlled manner without affecting the actual system.  A full- blown test goes through all the steps of the plan, verifying the plan works and the amount of execution time.

 

Common elements of testing include:

  • Backups: Tested by restoring the data from the backup.
  • Server restoration:  Participants rebuild a server using a test system without touching the live system.
  • Server redundancy. If a server is within a failover cluster,  test the cluster by taking a primary node offline as another node within the cluster should automatically assume the role of this offline node.
  • Alternate sites:  Test an alternate site by moving some of the functionality to the alternate site and ensuring the alternate site works as desired.

Summary: Network and System Asset Protection Controls

  • Layered security uses control diversity, implementing administrative, technical, and physical security controls.
  •  Vendor diversity utilizes controls from different vendors. 
  • User training informs users of threats, helping them avoid common attacks.
  • Proximity cards are credit card-sized access cards. Users pass the card near a proximity card reader and the card reader then reads data on the card. Some access control points use proximity cards with PINs for authentication.
  • Door access systems include cipher locks, proximity cards, and biometrics. Cipher locks do not identify users. Proximity cards can identify and authenticate users when combined with a PIN. Biometrics can also identify and authenticate users.
  • Tailgating is a social engineering tactic that occurs when one user follows closely behind another user without using credentials. Mantraps, which allow only a single person to pass at a time, are used to prevent tailgating. 
  • Video surveillance provides reliable proof of a person’s location and activity. It can identify who enters and exits secure areas and can record theft of assets. 
  • Higher-tonnage HVAC systems provide more cooling capacity. This keeps server rooms at lower operating temperatures and results in fewer failures.
  • Failover clusters are one method of server redundancy and they provide high availability for servers. They can remove a server as a single point of failure. Load balancing increases the overall processing power of a service by sharing the load among multiple servers. Configurations can be active-passive, or active-active. Scheduling methods include round-robin and source IP address affinity. Source IP address affinity scheduling ensures clients are redirected to the same server for an entire session.
  • The recovery time objective (RTO) identifies the maximum amount of time it should take to restore a system after an outage. It is derived from the maximum allowable outage time identified in the BIA. The recovery point objective (RPO) refers to the amount of data you can afford to lose.
  • A disaster recovery plan (DRP) includes a hierarchical list of critical systems and often prioritizes services to restore after an outage. Testing validates the plan. The final phase of disaster recovery includes a review to identify any lessons learned and may include an update of the plan.
  • Validate business continuity plans through testing. Tabletop exercises are discussion-based only and are typically performed in a classroom or conference setting. Functional exercises are hands-on exercises.

Share This Post:

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *

Table of Contents

You May Like

Related Posts