Knowing how to manage storage on Linux is an essential skill for both hackers and administrators. Administrators need to optimize the processes for storing and managing files/data and hackers need to know how to find this data quickly within complex filesystems. This article from Secur is going to teach you how to manage storage on Linux and covers:
- The basics of how Linux handles storage devices
- Managing data in the Linux environment.
Linux Storage Management Basics
Linux handles hard disk drives (HDD) and solid-state drives (SSD) storage devices the same way as their management mostly depends on the connection method used to connect the drives to the Linux system. HDD are the most commonly used form of persistent data storage on computer systems; they are physical devices that store data using a set of electro magnetic disk platters that spin around, storing data on the platters with a moveable read/write head that writes and retrieves magnetic images on the platters. Another form of persistent storage are SSDs; using integrated circuits to store data electronically, they have are no moving parts they are faster and more resilient than HDDs.
Linux Drive Connections
When connecting a drive to a Linux system, the kernel assigns the drive device a file in the “/dev” folder, called a “raw device“; it provides a path directly to the drive from the Linux system. Data written to the file writes to the drive, and reading a file reads data directly from the drive. HDDs and SSDs both interface with the Linux system using one of three main types of drive connections:
- Parallel Advanced Technology Attachment (PATA): connects drives using a parallel interface and supports two devices per adapter. The raw device file is named /dev/hdx, where x is a letter representing the individual drive, starting with letter “a”.
- Serial Advanced Technology Attachment (SATA): connects drives with a serial interface; faster than PATA and supports up to four devices per adapter.
- Small Computer System Interface (SCSI): connects using a parallel interface; has the speed of SATA and supports up to eight devices per adapter.
For SATA and SCSI devices, the raw device file Linux uses is “/dev/sdx“, where x is a letter representing the individual drive, again starting with a. If referencing the first SATA device on the system, use /dev/sda, then for the second device /dev/sdb, and so on
Partitioning Drives In Linux
A partition is a self-contained section within a drive that Linux operating treats as a separate storage space; Linux, like most operating systems, allows you to partition a drive into multiple sections. Partitioning helps organize data by doing separating operating system data from user data, so if user fills up the disk space with data, the operating system still has room on the separate partition. When partitioning disk to manage data on Linux, remember the following:
- Linux creates /dev files for each separate disk partition.
- Linux attaches a partition number to the end of the device name and numbers the primary partitions starting at 1,
- As an example, the first primary partition on the first SATA drive would be /dev/sda1.
- Partitions are tracked by a drive indexing system.
- Systems using BIOS boot loader manage disk partitions with the Master Boot Record (MBR) method.
- It supports four primary partitions on a drive with each primary partition divisible into multiple extended partitions.
- MBR extended partitions are numbered starting at 5, so the first extended partition is assigned the file /dev/sda5a
- It supports four primary partitions on a drive with each primary partition divisible into multiple extended partitions.
- Systems with the UEFI boot loader manage partitions with the GUID Partition Table (GPT).
- Supports up to 128 partitions per drive.
- Linux assigns partition numbers in the order that the partition appears on the drive, starting with 1.
- Systems using BIOS boot loader manage disk partitions with the Master Boot Record (MBR) method.
Automatic Storage Drive Detection on Linux
While Linux detects drives/partitions at boot time, assigning each one a unique device file name, it use the udev application, running in the background, in order to automatically detect new hardware connected to a running system and assign it a unique device file name in the /dev folder. Another function of udev is the creation of persistent storage device files.:
- Normally, adding/removing removable storage devices changes the /dev name assigned to it, making it difficult for applications to find storage devices.
- udev solves this by using the /dev/disk folder to create links to the /dev storage device files based on unique attributes of the device; device links let you specifically reference a storage device by a permanent identifier rather than where or when it was plugged into the Linux system.
- There are four folders created by udev for storing links:
- /dev/disk/by-id: Links storage devices by their manufacturer make/model/serial number.
- /dev/disk/by-label: Links storage devices by the label assigned to them.
- /dev/disk/by-path: Links storage devices by the physical hardware port they are connected to.
- /dev/disk/by-uuid: Links storage devices by the devices’ 128-bit universally unique identifiers (UUID).
Linux Disk Partitioning Tools
After connecting a drive to a Linux system, you need to partition it even if there’s only single partition and there are several tools for creating partitions on raw storage devices to create partitions. The following sections cover the most popular partitioning tools you’ll run across in Linux. If you are working on a virtual machine and need to add another disk to help with this exercise, follow these instructions.
Partitioning with fdisk
The “fdisk” utility is the most common command-line partitioning tool; it allows creating/viewing/delete/modifying partitions on drives using the MBR index partitioning. Use the “fdisk” application, you must specify the drive device name (not the partition name) of the device you want to work with. The fdisk command doesn’t allow you to alter the size of an existing partition, you need to delete the existing partition and rebuild it from scratch. We created a new disk, sdb, for this exercise. You can see it in the screenshot below. The table below the screenshot shows the common “fdisk” command.
Command | Description |
---|---|
a | Toggles a bootable flag. To be able to boot the system from a partition, you must set the boot flag for the partition. The bootable partitions are indicated in the output listing with an asterisk in the screen. |
b | Edit bsd disk label |
c | Toggle the DOS compatibility flag. |
d | Delete a partition. |
g | Create a new empty GPT partition table. |
G | Create an IRIX (SGI) partition table. |
l (Lower case “L”) | List known partition types. |
m | Print this menu. |
n | Add a new partition. |
o | Create a new empty DOS partition table. |
p | Print the partition table. |
q | Quit without saving changes. |
s | Create a new empty Sun disklabel. |
t | Change a partition’s system ID. |
u | Change display/entry units. |
v | Verify the partition table. |
w | Write table to disk and exit. If you make any changes to the drive partitions, you must exit using the w command to write the changes to the drive. |
x | Extra functionality |
The p command displays the current partition scheme on the drive and below you can find two examples of its useage.
Example 1: In this example, the new /dev/sdb drive is not partitioned
Example 2:
- The /dev/sda drive is partitioned into three sda1, sda2 and sda5.
- Partitions from 1 to 4 are primary partitions.
- Partitions above 5 are logical partitions.
- The Id and Type columns refer to the type of filesystem the partition is formatted to handle.
- The first partition is formatted for windows and Linux and is 512 M.
- The second is formatted as extended and is 19.6G.
- In the DOS partitioning scheme, if you use logical partitions you define a pointer within one of the primary partitions for these. At this pointer the BIOS will find further information.
- The pointer here is “sda2“; fdisk shows its id as 5 “Extended”, extending the partitioning-scheme to more than the default 4 partitions normally possible. The system consists of two partitions:
- The primary, bootable partition: sda1
- The logical partition: sda5
- In the DOS partitioning scheme, if you use logical partitions you define a pointer within one of the primary partitions for these. At this pointer the BIOS will find further information.
Partitioning with gdisk
If you’re working with drives that use the GPT indexing method, use the “gdisk” program; it identifies the type of formatting used on the drive and if the drive doesn’t use the GPT method, “gdisk” offers the option to convert it to a GPT drive. As seen in the table below the screenshots, “gdisk” commands are similar to those with “fdisk“. The second screenshot shows the use of:
- The “n” option creates a partition.
- The “i” option displays partition information.
Command | Description |
---|---|
b | Back up GPT data to a file. |
c | Change a partition’s name. |
d | Delete a partition. |
i | Show detailed information on a partition. |
l | List known partition types. |
n | Add a new partition. |
o | Create a new empty GUID partition table (GPT). |
p | Print the partition table |
q | Quit without saving changes |
r | Recovery and transformation options. |
s | Sort partitions. |
t | Change a partition’s type code. |
v | Verify disk. |
w | Write table to disk and exit. |
x | Extra functionality. |
? | Print this menu. |
The GNU parted Command
GParted: Graphical GNOME Partition Editor
GParted, as seen in the screenshot above, uses a graphical layout to display all the drives (and their respective partitions) on a system one at a time; right-click a partitions to select options for mounting or unmounting, formatting, deleting, or resizing the partition.
Understanding Linux Filesystems
Similar to a filing cabinet, Linux requires an organizational method to be a useful data management tool; this is accomplished using filesystems, a method of maintaining a map to locate each file placed in the storage device. Compared to the Windows path, that tells you exactly which physical device the file is stored on, Linux uses a virtual directory structure which contains file paths from all the storage devices installed on the system consolidated into a single directory structure.
The Linux Virtual Directory
While the main admin user account in Linux is named “root, it is not related to the root virtual directory folder; similar to Windows, the Linux virtual directory structure contains a single base directory, the “root” directory, listing all the files and folders beneath it based on the folder path used to get to them.
A typical Linux file path looks like this:
/home/ComputerGeek/Documents/PersonalNetWorthStatement.doc
The path only indicates that the file PersonalNetWorthStatement.doc is stored in the Documents folder for the user ComputerGeek; unlike Windows, Linux:
- Paths uses forward slashes instead of the backward slashes
- Does not use drive letters.
- Doesn’t give you any clues as to which physical device contains the file.
- Places physical devices in the virtual directory using mount points, which are folder placeholders within the virtual directory pointing to a specific physical device.
- Demonstrated in the image below, there are two drives used on the Linux system. The on the left is associated with the root of the virtual directory. The second drive is mounted in the virtual directory at the location /home, so its files and folders stored on the drive are available under the /home folder.
While the Linux filesystem hierarchy standard (FHS) attempts to standardize the Linux virtual filesystem, not all Linux distributions follow it completely, so do some research before tinkering with things. Typically, the FHS defines:
- Core folder names and their locations (seen in the table below the diagram).
- What type of data should be in each folder on a Linux system.
Folder | Description |
---|---|
/boot | Holds bootloader files used to boot the system |
/home | Contains user data files. |
/media | Used as a mount point for removable devices. |
/mnt | Used as a mount point for removable devices. |
/opt | Holds data for optional third-party programs. |
/tmp | Contains temporary files created by system users. |
/usr | Contains data for standard Linux programs. |
/usr/bin | Contains local user programs and data. |
/usr/local | Holds the data related to programs unique to the local installation. |
/usr/sbin | Contains data for system programs and data. |
Navigating the Linux Filesystem
There are two ways to navigate to a file:
- The Absolute Path: The absolute path to a file always starts at the root folder (/) and includes every folder along the virtual directory tree to the file.
<command> /home/ComputerGeek/Documents/PersonalNetWorthStatement.doc
- The Relative Path: The relative path to a file denotes the location of a file relative to your current location within the virtual directory tree structure. If you were already in the Documents folder, you’d just need to type
<command> PersonalNetWorthStatement.doc
When Linux sees that the path doesn’t start with a forward slash, it assumes the path is relative to the current directory.
Formatting Filesystems
Before assigning a drive partition to a mount point, you must format it using a filesystem; Linux supports different filesystem with different features and capabilities. In the next part of this article, Secur walk your through a range of popular filesystems.
Formatting Linux Filesystems
There are four main filesystems that you can choose from if you are looking for a Linux specific filesystem:
- btrfs: Supports files/total filesystem size up to 16 exbibytes (EiB) in size and can perform:
- A form of Redundant Array of Inexpensive Disks (RAID)
- Logical volume management (LVM).
- Includes additional advanced features:
- built-in snapshots for backup
- improved fault tolerance, and
- data compression on the fly.
- eCryptfs/Enterprise Cryptographic File System (eCryptfs): applies POSIX-compliant encryption to data prior to storing it.
- Only the operating system that created the filesystem can read data from it.
- encryptfs filesystem partitions appear in the /etc/crypttab file and will be mounted automatically at boot time.
- ext3/ext3fs: Based on the original Linux ext filesystem.
- Supports:
- Files up to 2 tebibytes (TiB) and a total filesystem size of 16TiB.
- Journaling
- Faster startup/recovery
- Supports:
- ext4/ext4fs: Current version of the original Linux filesystem and is s used on most Linux distributions these days and supports:
- Files up to 16TiB
- Total filesystem size of 1EiB
- Journaling: a method of tracking data not yet written to the drive in a log file, called the journal. If the system fails before the data can be written to the drive, the journal data can be recovered and stored upon the next system boot.
- Improved performance features.
- reiserFS:
- Created before the Linux ext3fs filesystem
- Used on older Linux systems,
- most features now found in ext3fs/ext4fs.
- Linux dropped support for reiser4fs.
- swap: The swap filesystem allows you to create virtual memory for your system using space on a physical drive. The system can then swap data out of normal memory into the swap space, providing a method of adding additional memory to your system. This is not intended for storing persistent data.
Formatting Non-Linux Filesystems
Inherent in Linux is its ability to read data stored on devices formatted for other operating systems, facilitating the sharing of data between systems build on different operating systems. As a rule of practice, don’t format a partition with a non-Linux filesystem if you are using the drive for only Linux systems. Linux supports these filesystems mainly as a method for sharing data with other operating systemsThe more common non-Linux filesystems include:
- CIFS/Common Internet File System: Protocol created by Microsoft that uses a network storage device for reading and writing data across a network.
- Released for public for use on all operating systems.
- HFS/ Hierarchical File System (HFS): developed by Apple.
- Linux also works with HFS+ filesystem.
- ISO-9660: Used for creating filesystems on CD-ROM devices.
- NFS/Network File System: An open-source standard for reading/writing data to a network storage device using a network storage device.
- NTFS/The New Technology File System: Used by the Microsoft NT operating system/versions of Windows.
- Linux can read and write data on an NTFS partition as of kernel 2.6.x.
- SMB/The Server Message Block: Proprietary Microsoft filesystem for network storage and interacting with other network devices.
- Support for SMB allows Linux clients/servers to interact with network based Microsoft clients/servers.
- UDF/Universal Disc Format: Used on DVD-ROM devices for storing data.
- Linux can read/write data using this filesystem.
- VFAT/Virtual File Allocation Table: Extension of Microsoft File Allocation Table (FAT) filesystem.
- Commonly used for removable storage devices
- XFS/X File System: Created by Silicon Graphics for its advanced graphical workstations.
- ZFS/The Zettabyte File System: Sun Microsystems filesystem meant for Unix workstations/servers.
- Has features similar to the btrfs Linux filesystem.
How to Create A Linux Filesystem
Before formatting a disk, make sure you really want to format the disc or be prepared for a resume creating event ;). When formatting a partition, any existing data on the partition is lost. If you specify the wrong partition name, you could lose important data (as well as your job) or make your Linux system not able to boot. As an administrator, you are going to use the “mkfs” application a lot as it serves as a frontend to individual tools for filesystem creation. The screenshot below shows the use of “mkfs” to make an “ext4” system by using the “-t” option to specify the filesystem; do this by specifying the “-t” option and then the partition device file name for the partition to be formatted on the command line. The mkfs program creates all of the index files and tables necessary for the specific filesystem as each filesystem type has its own method for indexing files/folders and tracking file access.
Mounting Linux Filesystems
- Use the command line to manually mount the partition within the virtual directory structure.
- Let Linux do the heavy lifting, allowing it to automatically mount the partition at boot time.
Manually Mounting A Linux Filesystem
Use the “mount” command to temporarily mount a filesystem to the Linux virtual directory with the following syntax (The -t command-line option to specify the filesystem type of the device:):
mount -t <file_system_type> <device> <mount_point>
The command only temporarily mounts the device in the virtual directory. When you reboot the system, you have to manually mount the devices again. The “mount” command uses the “-o” option to specify additional features of the filesystem, such as:
- Mounting it in read-only mode.
- User permissions assigned to the mount point.
- How data is stored on the device.
These options are shown in the output of the mount command. Usually you can omit the -o option to use the system defaults for the new mount point. Removal of a mounted drive from the virtual directory is accomplished with the “umount” command by specifying either the device file name or the mount point directory. The screenshot below show the use of the mount command to create a filesystem on sdb1.
Also seen in the screenshot above is that using the “mount” command alone (with no parameters), displays all of the devices mounted on the system; as most Linux distributions mount lots of virtual devices in the virtual directory to provide information about system resources, there will be a lot of output.
As you can see in the main hard drive device (“/dev/sda“) contains a partition, sda1 mounted at “/boot/efi“, and a second hard drive. sdb1, with one partition is mounted at “/mnt” device
Automatically Mounting Devices in Linux
- The drive device file the raw file or one of the permanent udev file names),
- The mount point location,
- The filesystem type,
- Any additional required drive mounting options
- Devices are referenced by their udev UUID value. This ensures that the correct drive partition is accessed no matter the order in which it appears in the raw device table.
- The first partition is mounted at the /boot/efi mount point in the virtual directory.
- The fifth partition is mounted at the root (/) of the virtual directory.
Managing Linux Filesystems
If you have been following along, you’ll notice that we:
- Created a filesystem
- Mounted the filesystem to the virtual directory
So now you need to know how to maintain a filesystem and your friends at Secur are here to walk you through the tools you need to keep yourself employed as a Linux administrator by showing you some of the Linux utilities for managing the filesystems.
Retrieving Linux Filesystem Stats
As an administrator, you job is to keep the network humming at top speed in order get a job promotion, so you’ll need to produce some network performance stats to back up your job demands, so here are a few different tools available to help you do that:
- “df” displays disk usage by partition. Use the “-i” option with the df command will show you the percentage of inodes used on a filesystem and can be a lifesaver.
- Some filesystems, such as ext3 and ext4, allocate a specific number of inodes when created. If the filesystem runs out of inode entries in the table, you can’t create anymore files, even if there’s available space on the drive
- “du” displays disk usage by directory, good for finding users or applications that are taking up the most disk space.
- “iostat” displays a real-time chart of disk statistics by partition.
- “lsblk” displays current partition sizes and mount points.
- /proc and /sys folders: The kernel records system statistics here.
- /proc/partitions and /proc/mounts folders: Provide information on system partitions and mount points
- /sys/block folder: Contains separate folders for each mounted drive with partitions and kernel-level stats.
Linux Filesystem Tools
The “e2fsprogs” package of tools provide a number of utilities for fine-tuning parameters on ext filesystems. These include:
- “blkid” displays information about block devices, such as storage drives.
- “chattr” changes file attributes on the filesystem.
- “debugfs” manually views and modifies the filesystem structure, such as undeleting a file or extracting a corrupt file.
- “dumpe2fs” displays block and superblock group information.
- “e2label” changes the label on the filesystem.
- “resize2fs” expands or shrinks a filesystem.
- “tune2fs” modifies filesystem parameters.
The XFS filesystem also has management utilities available for tuning the filesystem although they can’t help fix filesystem errors ( The XFS module for fsck does not repair XFS filesystems so use the xfs_repair tool.):
- xfs_admin: displays or changes filesystem parameters such as the label or UUID assigned.
- xfs_info: displays information about a mounted filesystem, including the block sizes and sector sizes as well as label and UUID information.
fsck: Fixing Linux Filesystem Errors
Like “mkfs” command, the “fsck” application is a front end to a group of programs which check filesystems in order to match the index against the actual files stored in the filesystem. If any issues are found, attempt to reconcile the discrepancies and fix the filesystem by running “fsck” in repair mode. The screenshot below shows the execution of the fsck command after un-mounting the drive.
Linux Storage Alternatives
Storage device partitions have a number of limitations:
- Once you create and format a partition, it’s not easy to modify its size.
- Partitions can fail as they are just disks and result in all the data stored in the partition being lost.
To overcome these limitations, Linux has a number of storage options, as well as fault-tolerance features.
Linux Multipathing
Linux supports Device Mapper Multipathing (DM-multipathing), which aggregates multiple paths between a Linux system and network storage devices provides for increased throughout while all of the paths are active as well as fault tolerance if one of the paths go down .
Linux DM-multipathing tools include:
- dm-multipath: A kernel module providing multipath support by using the dynamic /dev/mapper device file folder in Linux.
- Linux creates a /dev/mapper device file named “mpathN” for each new multipath storage device you add to the system.
- N is the number of the multipath drive.
- This file acts as a device file, allowing the creation of partitions/filesystems on the multipath device like a normal drive partition.
- multipath: A command-line command for viewing multipath devices
- multipathd: A background process for monitoring paths and activating/deactivating paths
- kpartx: A command-line tool for creating device entries for multipath storage devices.
Linux Logical Volume Manager
The benefit of the Linux Logical Volume Manager (LVM) is that you can add and remove physical partitions as needed to a logical volume, expanding and shrinking the logical volume as required. Similar to Linux DM-multipathing, LMV relies on the “/dev/mapper” dynamic device folder to create virtual drive devices. LVM aggregates multiple physical drive partitions into virtual volumes, which you then treat as a single partition on your system.
Each physical partition must be marked as a Linux LVM filesystem type in “fdisk” or “gdisk“. You then make use of one of several LVM tools to create/manage the logical volumes:
- pvcreate: Vreates a physical volume.
- vgcreate: Groups physical volumes into a volume group.
- lvcreate: Creates a logical volume from partitions in each physical volume.
These tools create entries in the “/dev/mapper” folder representing a LVM device you can format with a filesystem and use like a normal partition. The three screenshots below show the logical volume creation process.
As seen above, there are 3 physical drives each with 3 partitions.
- Logical volume 1: Consists of the first two partitions of the first drive.
- Logical volume 2: Spans 2 drives; combines the third partition of the first drive with the first/second partitions of the second drive to create one volume.
- Logical volume 3: Consists of the third partition of the second drive and the first two partitions of the third drive.
The third partition of the third drive is left unassigned and can be added later to any of the logical volumes when needed.
Using RAID Technology
RAID technology provides:
- Improved data access performance and reliability
- Data redundancy for fault tolerance by combining multiple drives into one virtual drive.
Ironically, the downside of RAID is that the storage devices can be somewhat expensive and impractical for most home uses; Linux implements a software RAID system to deploy RAID features on any disk system. The “mdadm” utility allows specification of multiple partitions to be used in any type of RAID environment. The RAID device appears as a single device in the /dev/mapper folder, which you can then partition and format to a specific filesystem.
The common RAID versions are:
- RAID-0: Disk striping, spreads data across multiple disks for faster access.
- RAID-1: Disk mirroring duplicates data across two drives.
- RAID-10: Disk mirroring and striping provides striping for performance and mirror- ing for fault tolerance
- RAID-4: Disk striping with parity adds a parity bit stored on a separate disk so that data on a failed data disk can be recovered.
- RAID-5: Disk striping with distributed parity adds a parity bit to the data stripe so that it appears on all of the disks so that any failed disk can be recovered.
- RAID-6: Disk striping with double parity stripes both the data and the parity bit so two failed drives can be recovered.
Summary: How To Manage Storage on Linux
In most people’s minds, the ability for a computer to permanently store data is its defining feature; The design of the Linux kernel facilitates this in a number of ways as it supports:
- Hard drive disk and solid-state drive technologies for persistently storing data.
- It also supports alternative storage solutions for storage:
- Fault tolerant multi-path IO
- Logical volumes
- Software based RAID technology.
- It also supports alternative storage solutions for storage:
- Three main types of drive connections.
- PATA, SATA, and SCSI.
- Drive partitioning with a range of tools including
- fdisk
- parted
- gparted
- gdisk
- gparted
- Filesystem including:
- mkfs
- ext4
- btrfs
- xfs
- zfs
- Windows based filesystems, vfat and ntfs.
If you want to read a summary of the topic, be sure to read Secur’s article Filesystem and Storage Device Management in Linux