2gericsson Commands 141202051937 Conversion Gate01
2gericsson Commands 141202051937 Conversion Gate01
This book is dedicated to Michael Kerrisk and the team at the Linux Documentation
Project.
Copyright Info:
Published by LinuxManFiles.com
Unit 12 / 30 upper Queen Street
Auckland, New Zealand 1010
All rights reserved. No part of this book shall be reproduced, stored in a retrieval system,
or transmitted by any means, electronic, mechanical, photocopying, recording, or
otherwise, without written permission from the publisher. No patent liability is assumed
with respect to the use of the information contained herein. Although every precaution has
been taken in the preparation of this book, the publisher and author assume no
responsibility for errors or omissions. Neither is any liability assumed for damages
resulting from the use of the information contained herein.
Copyright 2016 by LinuxManFiles.com
First Edition 2016
ISBN 978-0-9941357-1-1
Published in New Zealand
Compilation and Editing: Gareth Morgan Thomas
Disclaimer:
To the best of our knowledge, all text published in this manual, unless otherwise stated, is
in the public domain. We take seriously our commitment to the public domain. If you have
reason to believe that any text published by LinuxManFiles.com is not yet in the public
domain, please send an email message to: editor at linuxmanfiles.com.
CentOS Administrator Man Pages
Volume Two
www.LinuxManFiles.com
Table of contents
IPMADDR
› NAME
ipmaddr - adds, deletes, and displays multicast addresses
› SYNOPSIS
/usr/sbin/ipmaddr [<operation>] [<args>]
› NOTE
This program is obsolete. For replacement check ip maddr.
› DESCRIPTION
The ipmaddr command can perform one of the following operations:
add - add a multicast address
del - delete a multicast address
show - list multicast addresses
› SEE ALSO
ip(8).
IPRCONFIG
› NAME
iprconfig - IBM Power RAID storage adapter configuration/recovery utility
› SYNOPSIS
iprconfig [-e editor] [-k dir] [-c command]
iprconfig —version —debug —force
› DESCRIPTION
iprconfig is used to configure IBM Power RAID storage adapters, display information
about them, and to perform adapter and disk unit recovery. The menu options are:
1. Display hardware status. This option can be used to display various information
regarding the IBM Power RAID adapters attached to the system and the disk units
controlled by them. For each adapter and disk unit, their /dev name, physical location,
description, vendor/product ID, and hardware status will be available. Beside each
resource is an OPT field. By entering a 1 beside any resource, detailed information about
that resource can be obtained. For an adapter resource, this will display the adapter
firmware version and the physical location amongst other things.
2. Work with Disk Arrays This option will present a second menu containing disk array
related commands.
Display disk array status is used to display the status of disk arrays on the system.
Create a disk array is used to create a disk array.
Delete a disk array is used to delete disk arrays. Selecting this option will provide you
with a list of disk arrays which can be deleted.
Add a device to a disk array is used to include devices of similar capacity into an
existing disk array. This function is currently only supported for RAID 5 and RAID 6 disk
arrays.
Format device for advanced function is used to format disks to 522 bytes/sector so that
they may be used in a disk array. Only disks which are not formatted for advanced
function or are formatted for advanced function but are not known to be zeroed will be
available for selection for this function.
Format device for JBOD function (512) is used to format disks to 512 bytes/sector so
that they may be used as standalone disks. Only disks which are not formatted for JBOD
function or are formatted for JBOD function and are in the Format Required state will be
available for this function.
Work with hot spares is used to create a hot spare which designates a device as a
dedicated hot spare. It is also used to delete a hot spare which unconfigures a previously
configured hot spare.
Work with asymmetric access is used to select which path of a disk array will be the
primary path in a dual controller environment. Asymmetric Access must be enabled on the
adapter first. Not all adapters support asymmetric access and adapters that do provide
support may require updated microcode.
Force RAID Consistency Check is used to force a consistency check on a RAID array.
All ipr adapters continually perform background consistency checking when idle. This
option can be used to force a consistency check to be performed.
Migrate disk array protection is used to change the RAID protection level for an array
to another supported level. In some cases, this will require adding more disks to the array.
In other cases, disks will be freed.
3. Work with disk unit recovery is used to perform the following disk unit recovery
actions:
Concurrent add device is used to concurrently add a new disk to a running system. This
feature is only supported with SES (SCSI Enclosure Services) packaging.
Concurrent remove device is used to concurrently remove a disk from a running system.
This feature is only supported with SES (SCSI Enclosure Services) packaging.
Initialize and format disk unit is used to issue a SCSI format command to attached
devices. A format unit command has special meaning to the adapter and is used as a
service action for certain error conditions. Formatting a disk unit will lose all data on that
drive. If the disk is attached to an ipr adapter that does not support RAID, the drive will be
formatted to 512 bytes/sector. If the disk is attached to an ipr RAID adapter, the block size
will not be changed. To change the block size, use the format menu options under the disk
arrays menu.
Reclaim IOA cache storage is used to repair cache error conditions. ATTENTION: Use
this option with care. This is used to discard data from the cache and may result in data
loss. This option is designed to be used by authorized IBM hardware customer engineers.
Rebuild disk unit data is generally used following concurrent maintenance. Select this
option after a failing array member device has been replaced to reconstruct the device as
an active array member.
Work with resources containing cache battery packs is used to display information
regarding rechargeable cache battery packs and to force rechargeable cache battery packs
into an error state so that they can be replaced prior to failure. ATTENTION: Once an
error has been forced on a rechargeable cache battery pack write caching will be disabled
until the battery pack is replaced.
4. Work with SCSI bus configuration is used to change configurable SCSI bus
attributes, such as maximum SCSI bus speed, SCSI initiator ID, etc.
5. Work with driver configuration is used to change driver configurable attributes, such
as log_level.
6. Work with disk configuration is used to change configurable disk attributes, such as
queue depth.
7. Work with adapter configuration is used to change configurable adapter attributes,
such as dual adapter settings. Refer to the following command line options: primary,
secondary, query-ha-mode, set-ha-mode, set-ioa-asymmetric-access and set-array-
asymmetric-access for more information regarding these settings.
8. Download microcode is used to download microcode to ipr adapters and attached SCSI
disks.
9. Analyze Log is an option available to analyze /var/log/messages* files. By default it
will use vi as the editor to open the concatenated error log files. This can be changed by
using option 6 on the Kernel Messages Log menu. Selecting option 1 on the Kernel
Messages Log menu will display only the most recent errors logged by the ipr device
driver and may be useful to filter out some of the clutter. Option 2 will display all recorded
errors logged by the ipr device driver. Option 3 will display all kernel messages. Option 4
will display errors logged by the iprconfig utility. This may be useful for debugging
problems. Option 5 can be used to change where the tool looks to find the kernel messages
files. The default is to look in /var/log.
› OPTIONS
-e editor
Default editor for viewing error logs. The default editor is vi, but can be changed
with this parameter.
-k directory
Kernel messages root directory. Root directory to look for kernel messages. Default
is /var/log.
-c command
Command line, non-interactive commands. Currently supported commands include:
show-config
Show ipr configuration.
show-alt-config
Show alternate ipr configuration information.
show-ioas
Show all ipr adapters.
show-arrays
Show all ipr arrays.
show-battery-info [IOA]
Show cache battery information for specified IOA. Example: iprconfig -c show-
battery-info sg5
show-details [device]
Show device details for specified device. Example: iprconfig -c show-details sda
show-hot-spares
Show all configured hot spares.
show-af-disks
Show disks formatted for Advanced Function that are not configured in an array or as
a hot spare.
show-all-af-disks
Show all disks formatted for Advanced Function
show-jbod-disks
Show all disks formatted for JBOD Function.
show-slots
Show all disks slots available on the system.
status [device]
Show the status of the specified device. This is the same status as which shows up in
the last column of the Display hardware status menu. Can specify either a /dev/sdX
name or a /dev/sgX name. Example: iprconfig -c status /dev/sda
alt-status [device]
Show the status of the specified device. This is the same status as above with the
exception of when a long running command is executing to the device, in which case
the percent complete for the long running command is printed.
query-raid-create [IOA]
Show all devices attached to the specified IOA that are candidates for being used in a
RAID array. Example: iprconfig -c query-raid-create sg5
query-raid-delete [IOA]
Show all RAID arrays attached to the specified IOA that can be deleted. iprconfig -c
query-raid-delete sg5
query-hot-spare-create [IOA]
Show all devices attached to the specified IOA that are candidates for being hot
spares.
query-hot-spare-delete [IOA]
Show all hot spares attached to the specified IOA that can be deleted.
query-raid-consistency-check
Show all RAID arrays that are candidates for a RAID consistency check.
query-format-for-jbod
Show all disks that can be reformatted for JBOD function.
query-reclaim
Show all IOAs that may need a reclaim cache storage.
query-arrays-raid-include
Show all RAID arrays that can have disks included in them to increase their capacity.
query-devices-raid-include [array]
Show all disks that can be added to the specified array to increase its capacity.
query-supported-raid-levels [IOA]
Show all RAID levels supported by the specified adapter.
query-include-allowed [IOA] [raid level]
Some RAID levels allow for adding additional disks to existing disk arrays to
increase their capacity. Prints “yes” to stdout if the specified RAID level supports this
function, else prints “no”.
query-max-devices-in-array [IOA] [raid level]
Print the maximum number of devices allowed in a RAID array of the specified
RAID level for the specified RAID adapter.
query-min-devices-in-array [IOA] [raid level]
Print the minimum number of devices allowed in a RAID array of the specified
RAID level for the specified RAID adapter.
query-min-mult-in-array [IOA] [raid level]
Print the minimum multiple of devices required in a RAID array of the specified
RAID level for the specified RAID adapter.
query-supp-stripe-sizes [IOA] [raid level]
Print all supported stripe sizes supported for RAID arrays of the specified RAID level
on the specified RAID adapter. Stripe sizes are printed in units of KB.
query-recommended-stripe-size [IOA] [raid level]
Print the default/recommended stripe size for RAID arrays of the specified RAID
level on the specified RAID adapter. Stripe size is in units of KB.
query-recovery-format
Show all disks that can be formatted for error recovery purposes.
query-raid-rebuild
Show all disks in RAID arrays that can be rebuilt.
query-format-for-raid
Show all disks that can be formatted such that they can be used in a RAID array or as
a hot spare.
query-ucode-level [device]
Show the microcode level that is currently loaded on the specified device. Note: The
device specified may be the sg device associated with an IOA, in which case the
IOA’s microcode level will be shown.
query-format-timeout [device]
Show the current format timeout to be used when formatting the specified disk. This
value is only applicable when the device is currently in Advanced Function format.
query-qdepth [device]
Show the queue depth currently being used for the specified disk.
query-tcq-enable [device]
Print 1 to stdout if tagged queuing is enabled for the specified device, else print 0 to
stdout.
query-log-level [IOA]
Print the current log level being used for the specified IOA. Can be a number from 0
to n.
query-add-device
Show all empty disk slots that can have a disk concurrently added.
query-remove-device
Show all disk slots which are either empty or have disks in them which can be
concurrently removed from the running system.
query-initiator-id [IOA] [busno]
Show the current SCSI initiator ID used by the IOA for the specified SCSI bus.
query-bus-speed [IOA] [busno]
Show the current maximum SCSI bus speed allowed on the specified SCSI bus.
query-bus-width [IOA] [busno]
Show the current SCSI bus width in units of bits for the specified SCSI bus.
query-path-status [IOA]
Show the current dual path state for the SAS devices attached specified IOA.
query-path-details [device]
Show the current dual path details for the specified SAS device.
query-arrays-raid-migrate
Show the arrays that can be migrated to a different protection level.
query-devices-raid-migrate [array]
Show the AF disks that are candidates to be used in a migration for a given array.
query-raid-levels-raid-migrate [array]
Show the protection levels to which the given array can be migrated.
query-stripe-sizes-raid-migrate [array] [raid level]
Given an array and a protection level, show the valid stripe sizes to which the array
can be migrated.
query-devices-min-max-raid-migrate [array] [raid level]
Show the number of devices that will be removed for a migration to a protection level
that requires fewer devices. Or, show the minmum number of devices required, the
maximum number of devices allowed and the multiple of the number of devices
required for a migration that requires more devices.
query-ioas-asymmetric-access
Show the IOAs that support asymmetric access.
query-arrays-asymmetric-access
Show the disk arrays that are candidates for setting their asymmetric access mode to
Optimized or Non-Optimized.
query-ioa-asymmetric-access-mode [IOA]
Show the current asymmetric access mode for the given IOA.
query-array-asymmetric-access-mode [array]
Show the current asymmetric access mode for the given disk array.
query-ioa-caching [IOA]
Show whether or not the user requested caching mode for the given IOA is set to
default or disabled.
query-array-label [label]
Show the device name of the array with the specified label. Label must have been
specified when creating the RAID array. See raid-create command.
query-array-rebuild-rate [IOA]
Show the array rebuild rate for the given IOA.
query-array-rebuild-verify [IOA]
Show whether array rebuild verification is enabled for the given IOA.
query-array [location]
Show the device name of the array of which one of the disks in the array has the
specified platform location code.
query-device [location]
Show the device name of the disk that has the specified platform location code.
query-location [device]
The device specified can be either the device name of a disk or the device name of a
single disk RAID 0 array. If the specified device name is a disk, the platform location
code will be displayed. If the specified device name is a single device RAID 0 array,
the platform location of the disk which is a member of the specified array will be
displayed.
query-write-cache-policy [device]
Show the current write cache policy for [device].
raid-create [-r raid_level] [-s stripe_size_in_kb] [-l label] [—skip-format] [devices…]
Create a RAID array. RAID level can be any supported RAID level for the given
adapter, such as 0, 10, 5, 6. Currently supported stripe sizes in kb include 16, 64, and
256. If raid_level is not specified, it will default to RAID 5. If stripe size is not
specified, it will default to the recommended stripe size for the selected RAID level.
Devices are specified with their full name, either the /dev/sd name or the /dev/sg
name is acceptable. On some RAID adapters, a label can also be specified. Example
array creation: iprconfig -c raid-create -r 5 -s 64 /dev/sda /dev/sdb /dev/sdc This
would create a RAID 5 array with a 64k stripe size using the specified devices.
raid-delete [RAID device]
Delete the specified RAID array. Specify either the /dev/sd name or the /dev/sg name.
Only 1 array can be deleted with a single command. Example: iprconfig -c raid-
delete /dev/sda This would delete the disk array represented by /dev/sda
raid-include [array] [disk] … [disk]
Add the specified devices to the specified disk array to increase its capacity.
Example: iprconfig -c raid-include sda sg6 sg7
raid-migrate -r raid_level [-s stripe_size_in_kb] array [disk] … [disk]
Migrate an existing RAID array to a new RAID protection level. Optionally, a new
stripe size can be given. In some cases one or more new disks must be added for the
migration to succeed. Example: iprconfig -c raid-migrate -r 10 -s 64 sda sg5 sg6
format-for-raid [disk] … [disk]
Format the specified disks for Advanced Function so they can be used in a RAID
array or as a hot spare.
format-for-jbod [disk] … [disk]
Format the specified disks for JBOD Function so they can be used as standalone
disks.
recovery-format [disk] … [disk]
Format the specified disks as directed by the reference guide for error recovery
purposes.
hot-spare-create [disk]
Create a hot spare using the specified Advanced Function disk.
hot-spare-delete [disk]
Delete the specified hot spare.
disrupt-device [disk]
Force the specified Advanced Function device failed.
reclaim-cache [IOA]
Reclaim the specified IOA’s write cache. ATTENTION: Use this option with care.
This is used to discard data from the cache and may result in data loss. This option is
designed to be used by authorized IBM hardware customer engineers.
reclaim-unknown-cache [IOA]
Reclaim the specified IOA’s write cache and allow unknown data loss.
ATTENTION: Use this option with care. This is used to discard data from the cache
and WILL result in data loss. This option is designed to be used by authorized IBM
hardware customer engineers.
raid-consistency-check [array]
Force a full RAID consistency check on the specified array. This command will
return before the RAID consistency check has completed. Use the status command to
check the status of the command.
raid-rebuild [disk]
Following a disk replacement for a failed disk in a RAID array, use this command to
rebuild the failed disk’s data onto the new disk and return the disk array to the Active
state.
update-ucode [device] [microcode file]
Update the microcode on the specified device (IOA or disk) with the specified
microcode file. ATTENTION: Limited checking of the microcode image is done.
Make sure the specified microcode file is the correct file for the specified device.
set-format-timeout [disk] [timeout in hours]
Set the format timeout to be used when formatting the specified disk.
set-qdepth [device] [queue depth]
Set the queue depth for the specified device or disk array.
set-tcq-enable [device] [0 = disable, 1 = enable]
Enable/disable tagged command queueing for the specified device.
set-log-level [IOA] [log level]
Set the error logging verbosity to use for the specified IOA. Default is 2.
set-write-cache-policy [device] [writeback|writethrough]
Set the write cache policy for [device]. Available policies are writeback and
writethrough. This feature is only supported by JBOD disks. Example: iprconfig -c
set-write-cache-policy sdp writeback
identify-disk [disk] [0 = turn off identify LED, 1 = turn on identify LED]
Turn on/off the disk identify LED for the specified disk. This function may or may
not be available depending on the hardware packaging.
identify-slot [location] [0 = turn off identify LED, 1 = turn on identify LED]
Turn on/off the disk identify LED for the specified location. This function may or
may not be available depending on the hardware packaging. Example: iprconfig -c
identify-slot 0000:d8:01.0/0:1:1: 1
remove-disk [disk] [0 = turn off identify LED, 1 = turn on identify LED]
Turn on/off the disk remove identify LED for the specified device. When 1 is
specified as the second argument, the specified disk is set to the remove state. When
in this state, the disk may be removed. Once the disk has been physically removed,
iprconfig must be invoked again with the second argument set to 0. This turns off the
slot identifier light and logically removes the disk from the host operating system.
remove-slot [location] [0 = turn off identify LED, 1 = turn on identify LED]
Turn on/off the disk remove identify LED for the specified location. When 1 is
specified as the second argument, the specified location is set to the remove state.
When in this state, the disk may be removed. Once the disk has been physically
removed, iprconfig must be invoked again with the second argument set to 0. This
turns off the slot identifier light and logically removes the disk from the host
operating system. Example: iprconfig -c remove-slot 0000:d8:01.0/0:1:1: 1
add-slot [location] [0 = turn off identify LED, 1 = turn on identify LED]
Turn on/off the disk insert identify LED for the specified location. When 1 is
specified as the second argument, the specified location is set to the insert state.
When in this state, the disk may be inserted. Once the disk has been physically
inserted, iprconfig must be invoked again with the second argument set to 0. This
turns off the slot identifier light and logically adds the disk to the host operating
system. Example: iprconfig -c add-slot 0000:d8:01.0/0:1:1: 1
set-initiator-id [IOA] [busno] [initiator id]
Set the IOA’s SCSI initiator ID for the specified bus. Must be a value between 0 and
7 and must not conflict with any other device on the SCSI bus.
set-bus-speed [IOA] [busno] [speed in MB/sec]
Set the maximum SCSI bus speed allowed on the specified SCSI bus.
set-bus-width [IOA] [busno] [bus width in # bits]
Set the SCSI bus width to use for the specified SCSI bus. Example: iprconfig -c set-
bus-width sg5 0 16
primary [IOA]
Set the adapter as the preferred primary adapter. This is used in dual initiator RAID
configurations to indicate which adapter should be the primary adapter. The
primary adapter should be the adapter receiving the majority of the I/O. Example:
iprconfig -c primary sg5
secondary [IOA]
Set the adapter to indicate it is not the preferred primary adapter. See the notes for
the preferred primary for additional information. Example: iprconfig -c secondary
sg5
set-all-primary
Set all attached ipr adapters as the preferred primary adapter. This can be used
when running a dual initiator RAID HA configuration. This command can be run on
the primary system to quickly enable the preferred primary mode for all attached
adapters. Refer to /etc/ha.d/resource.d/iprha for an example of how this might be
used. Example: iprconfig -c set-all-primary
set-all-secondary
Set all attached ipr adapters to indicate they are not the preferred primary adapter.
Example: iprconfig -c set-all-secondary
query-ha-mode [IOA]
When an adapter is configured in a highly available dual adapter configuration, it
may be able to be configured in one of two ways. The default mode is Normal. This
mode is used for all SCSI adapters and many SAS adapters. Some SAS adapters also
support a JBOD dual adapter configuration. This mode is to be used when the dual
adapter configuration is to consist of JBOD disks rather than RAID arrays. If the
adapter is NOT going to be used in a dual adapter configuration, this mode MUST be
set to Normal. Example: iprconfig -c query-ha-mode sg5
set-ha-mode [IOA] [Normal | RAID]
Used to set the high-availability mode of the adapter. Refer to the query-ha-mode
command for more information regarding these settings. Example: iprconfig -c set-
ha-mode sg5 Normal
set-array-asymmetric-access-mode [array] [Optimized | Non-Optimized]
Used to set the asymmetric access mode of the disk array. Example: iprconfig -c set-
array-asymmetric-access-mode sda Optimized
set-ioa-asymmetric-access-mode [IOA] [Enabled | Disabled]
Used to set the asymmetric access mode of the IOA. Example: iprconfig -c set-ioa-
asymmetric-access-mode sg5 Enabled
set-ioa-caching [IOA] [Default | Disabled]
Used to set the requested caching mode of the IOA. Example: iprconfig -c set-ioa-
caching sg5 Disabled
set-array-rebuild-verify [IOA] [enable | disable | default]
Used to define whether to perform verification during an array rebuild. Enabling this
can affect performance. The default value is disabled. Example: iprconfig -c set-
array-rebuild-rate sg5 disable
set-array-rebuild-rate [IOA] [Rebuild Rate | default]
Used to set the rebuilt rate ratio of the IOA. [Rebuild Rate] must be in range 0..100.
If ‘default’ is used, the IOA will reset to the implementation default rate. Example:
iprconfig -c set-array-rebuild-rate sg5 10
get-live-dump [IOA]
Dump the IOA’s implementation unique critical information. The dump data will be
saved in the /var/log/ directory with the pattern ipr-CCIN-PCI_ADDRESS-dump-
TIMESTAMP. Example: iprconfig -c get-live-dump sg5
dump
Display detailed hardware and system information on standard output. In case a
report file is needed, the iprsos command will create one at /var/log/iprsos.log.
Example: iprconfig -c dump
—version Print version number of iprconfig —debug Enable additional error logging.
Enabling this will result in additional errors logging to /var/log/messages. —force Disable
safety checks. Use this to disable safety checks in iprconfig. This will allow you to format
devices that are not at the appropriate code levels. Only use this option if you really know
what you are doing.
› AUTHOR
Brian King <[email protected]>
› NOTES
Notes on using under iSeries 5250 telnet
Only use this utility across 5250 telnet when there are no other options available to you.
Since there may be occasions when 5250 telnet is your only option to access your iSeries
Linux console, every attempt has been made to make this utility usable under 5250 telnet.
By following a few guidelines, you can make your 5250 telnet experience more
productive and much less frustrating.
1. First of all, it will be helpful to know how the keys are mapped under 5250 telnet. From
the 5250 telnet window, hit ESC. This will get you to the Send TELNET Control
Functions menu. Take option 6 to display the keyboard map. Take note of how TAB,
ESC, CTLC, and SENDWOCR are bound. They will be useful in the future.
2. When selecting menu options, enter the menu number, followed by the enter key, same
as usual.
3. When typing single character commands (eg. r=Refresh), type the single character
followed by the SENDWOCR key (F11 by default).
4. When on a device/array/IOA selection screen (eg. Display Disk Unit Details), do NOT
use the arrow keys to navigate. Instead use the TAB key (F7 by default) to navigate these
screens.
5. Beware of the backspace and delete keys. As a rule do NOT use them.
6. When editing the root kernel message log directory or the default editor, you may use
the arrow keys, but not the backspace and delete keys. Use the space bar to remove
already typed characters.
IPRDBG
› NAME
iprdbg - IBM Power RAID storage adapter debug utility
› SYNOPSIS
This executable is part of the package ‘iprutils’: Utilities for the IBM Power Linux RAID
Adapters
› DESCRIPTION
iprdbg is used to debug IBM Power RAID storage adapters.
› EXIT STATUS
-EINVAL
Invalid input.
-ENXIO
No IOA devices.
Or an exit status of specified command.
› FILES
/etc/iprdbg.conf
/var/log/iprdbg
› NOTES
iprdbg is part of iprutils package
› SEE ALSO
iprconfig(8), iprdump(8), iprinit(8), iprupdate(8)
› AUTHOR
Wayne Boyer <[email protected]>
IPRDUMP
› NAME
iprdump - IBM Power RAID adapter dump utility
› SYNOPSIS
iprdump [-d directory]
iprdump —version —debug —use-polling —use-uevents
› DESCRIPTION
iprdump is used to gather information in the event of an adapter failure. The dump data
will by default be saved in the /var/log/ directory with the prefix iprdump.# where # will
be the dump ID of the file. The ipr dump utility will make a log in the system error log
when a dump is taken. Iprdump should be started as a service daemon instead of run
directly. Nevertheless, it can also be started at any time and will properly handle adapters
being dynamically added and removed from the system. When it is run directly it will stay
running forever in order to process any dump that might occur. You can start it with
“iprdump —daemon” to force it into the background.
› OPTIONS
—version
Print the version number of iprdump
—debug
Enable additional error logging. Enabling this will result in additional errors logging
to /var/log/messages.
-d <directory>
Directory where dump data is to be stored. Default is /var/log/.
—use-polling
Do not use netlink/uevent notification, but rather poll for adapter and device
configuration changes.
—use-uevents
Use netlink/uevent notification rather than polling for adapter and device
configuration changes. If not specified, polling will be used until the first uevent
notification appears, then netlink will be used.
—daemon
Force the daemon to be run in the background
› AUTHOR
Originally written by Michael Anderson ,
MODEMMANAGER
› NAME
ModemManager - modem management daemon
› SYNOPSIS
ModemManager [—version] | [—help]
ModemManager [—debug] [—log-level=<level>] [—log-file=<filename>] [—
timestamps] [—relative-timestamps]
› DESCRIPTION
The ModemManager daemon provides a unified high level API for communicating with
(mobile broadband) modems. While the basic commands are standardized, the more
advanced operations (like signal quality monitoring while connected) varies a lot.
ModemManager is a system daemon and is not meant to be used directly from the
command line.
› OPTIONS
The following options are supported:
—version
Print the ModemManager software version and exit.
—help
Print ModemManager’s available options and exit.
—debug
Runs ModemManager with “DEBUG” log level and without daemonizing. This is
useful for debugging, as it directs log output to the controlling terminal in addition to
syslog.
—log-level=<level>
Sets how much information ModemManager sends to the log destination (usually
syslog’s “daemon” facility). By default, only informational, warning, and error
messages are logged. Given level must be one of “ERR”, “WARN”, “INFO” or
“DEBUG”.
—log-file=<filename>
Specify location of the file where ModemManager will dump its log messages,
instead of syslog.
—timestamps
Include absolute timestamps in the log output.
—relative-timestamps
Include timestamps, relative to the start time of the daemon, in the log output.
› SEE ALSO
NetworkManager(8).
NETWORKMANAGER
› NAME
NetworkManager - network management daemon
› SYNOPSIS
NetworkManager [OPTIONS…]
› DESCRIPTION
The NetworkManager daemon attempts to make networking configuration and operation
as painless and automatic as possible by managing the primary network connection and
other network interfaces, like Ethernet, WiFi, and Mobile Broadband devices.
NetworkManager will connect any network device when a connection for that device
becomes available, unless that behavior is disabled. Information about networking is
exported via a D-Bus interface to any interested application, providing a rich API with
which to inspect and control network settings and operation.
› DISPATCHER SCRIPTS
NetworkManager will execute scripts in the /etc/NetworkManager/dispatcher.d directory
or subdirectories in alphabetical order in response to network events. Each script should
be a regular executable file owned by root. Furthermore, it must not be writable by group
or other, and not setuid.
Each script receives two arguments, the first being the interface name of the device an
operation just happened on, and second the action.
The actions are:
pre-up
The interface is connected to the network but is not yet fully activated. Scripts acting
on this event must be placed or symlinked into the
/etc/NetworkManager/dispatcher.d/pre-up.d directory, and NetworkManager will wait
for script execution to complete before indicating to applications that the interface is
fully activated.
up
pre-down
The interface will be deactivated but has not yet been disconnected from the network.
Scripts acting on this event must be placed or symlinked into the
/etc/NetworkManager/dispatcher.d/pre-down.d directory, and NetworkManager will
wait for script execution to complete before disconnecting the interface from its
network. Note that this event is not emitted for forced disconnections, like when
carrier is lost or a wireless signal fades. It is only emitted when there is an
opportunity to cleanly handle a network disconnection event.
down
vpn-pre-up
The VPN is connected to the network but is not yet fully activated. Scripts acting on
this event must be placed or symlinked into the
/etc/NetworkManager/dispatcher.d/pre-up.d directory, and NetworkManager will wait
for script execution to complete before indicating to applications that the VPN is
fully activated.
vpn-up
A VPN connection has been activated.
vpn-pre-down
The VPN will be deactivated but has not yet been disconnected from the network.
Scripts acting on this event must be placed or symlinked into the
/etc/NetworkManager/dispatcher.d/pre-down.d directory, and NetworkManager will
wait for script execution to complete before disconnecting the VPN from its network.
Note that this event is not emitted for forced disconnections, like when the VPN
terminates unexpectedly or general connectivity is lost. It is only emitted when there
is an opportunity to cleanly handle a VPN disconnection event.
vpn-down
hostname
The system hostname has been updated. Use gethostname(2) to retrieve it. The
interface name (first argument) is empty and no environment variable is set for this
action.
dhcp4-change
dhcp6-change
The environment contains more information about the interface and the connection. The
following variables are available for the use in the dispatcher scripts:
CONNECTION_UUID
CONNECTION_ID
CONNECTION_FILENAME
DEVICE_IFACE
IP4_ADDRESS_N
The IPv4 address in the format “address/prefix gateway”, where N is a number from
0 to (# IPv4 addresses - 1). gateway item in this variable is deprecated, use
IP4_GATEWAY instead.
IP4_NUM_ADDRESSES
The variable contains the number of IPv4 addresses the script may expect.
IP4_GATEWAY
IP4_ROUTE_N
The IPv4 route in the format “address/prefix next-hop metric”, where N is a number
from 0 to (# IPv4 routes - 1).
IP4_NUM_ROUTES
The variable contains the number of IPv4 routes the script may expect.
IP4_NAMESERVERS
IP4_DOMAINS
DHCP4_<dhcp-option-name>
If the connection used DHCP for address configuration, the received DHCP
configuration is passed in the environment using standard DHCP option names,
prefixed with “DHCP4_”, like “DHCP4_HOST_NAME=foobar”.
The same variables as for IPv4 are available for IPv6, but the prefixes are IP6_ and
DHCP6_ instead.
In case of VPN, VPN_IP_IFACE is set, and IP4_*, IP6_* variables with VPN prefix are
exported too, like VPN_IP4_ADDRESS_0, VPN_IP4_NUM_ADDRESSES.
Dispatcher scripts are run one at a time, but asynchronously from the main
NetworkManager process, and will be killed if they run for too long. If your script might
take arbitrarily long to complete, you should spawn a child process and have the parent
return immediately. Also beware that once a script is queued, it will always be run, even if
a later event renders it obsolete. (Eg, if an interface goes up, and then back down again
quickly, it is possible that one or more “up” scripts will be run after the interface has gone
down.)
› OPTIONS
The following options are understood:
—version | -V
—help | -h
—no-daemon | -n
Do not daemonize.
—debug | -d
Do not daemonize, and direct log output to the controlling terminal in addition to
syslog.
—pid-file | -p
Specify location of a PID file. The PID file is used for storing PID of the running
proccess and prevents running multiple instances.
—state-file
Specify file for storing state of the NetworkManager persistently. If not specified, the
default value of /var/lib/NetworkManager/NetworkManager.state is used.
—config
—plugins
List plugins used to manage system-wide connection settings. This list has preference
over plugins specified in the configuration file. Currently supported plugins are:
keyfile, ifcfg-rh, ifcfg-suse, ifupdown.
—log-level
Sets how much information NetworkManager sends to the log destination (usually
syslog’s “daemon” facility). By default, only informational, warning, and error
messages are logged. See the section on logging in NetworkManager.conf(5) for
more information.
—log-domains
A comma-separated list specifying which operations are logged to the log destination
(usually syslog). By default, most domains are logging-enabled. See the section on
logging in NetworkManager.conf(5) for more information.
› UDEV PROPERTIES
udev(7) device manager is used for the network device discovery. The following property
influences how NetworkManager manages the devices:
NM_UNMANAGED
No default connection will be created and automatic activation will not be attempted
when this property of a device is set to a true value (“1” or “true”). You will still be
able to attach a connection to the device manually or observe externally added
configuration such as addresses or routes.
Create an udev rule that sets this property to prevent NetworkManager from
interfering with virtual Ethernet device interfaces that are managed by virtualization
tools.
› DEBUGGING
The following environment variables are supported to help debugging. When used in
conjunction with the —no-daemon option (thus echoing PPP and DHCP helper output to
stdout) these can quickly help pinpoint the source of connection issues. Also see the —
log-level and —log-domains to enable debug logging inside NetworkManager itself.
NM_PPP_DEBUG: When set to anything, causes NetworkManager to turn on PPP
debugging in pppd, which logs all PPP and PPTP frames and client/server exchanges.
› SEE ALSO
NetworkManager.conf(5), nmcli(1), nmcli-examples(5), nm-online(1), nm-settings(5),
nm-applet(1), nm-connection-editor(1)udev(7)
PAM
› NAME
PAM, pam - Pluggable Authentication Modules for Linux
› DESCRIPTION
This manual is intended to offer a quick introduction to Linux-PAM. For more
information the reader is directed to the Linux-PAM system administrators’ guide.
Linux-PAM is a system of libraries that handle the authentication tasks of applications
(services) on the system. The library provides a stable general interface (Application
Programming Interface - API) that privilege granting programs (such as login(1) and
su(1)) defer to to perform standard authentication tasks.
The principal feature of the PAM approach is that the nature of the authentication is
dynamically configurable. In other words, the system administrator is free to choose how
individual service-providing applications will authenticate users. This dynamic
configuration is set by the contents of the single Linux-PAM configuration file
/etc/pam.conf. Alternatively, the configuration can be set by individual configuration files
located in the /etc/pam.d/ directory. The presence of this directory will cause Linux-PAM
to ignore/etc/pam.conf.
From the point of view of the system administrator, for whom this manual is provided, it is
not of primary importance to understand the internal behavior of the Linux-PAM library.
The important point to recognize is that the configuration file(s) define the connection
between applications (services) and the pluggable authentication modules (PAMs) that
perform the actual authentication tasks.
Linux-PAM separates the tasks of authentication into four independent management
groups: account management; authentication management; password management; and
session management. (We highlight the abbreviations used for these groups in the
configuration file.)
Simply put, these groups take care of different aspects of a typical user’s request for a
restricted service:
account - provide account verification types of service: has the user’s password expired?;
is this user permitted access to the requested service?
authentication - authenticate a user and set up user credentials. Typically this is via some
challenge-response request that the user must satisfy: if you are who you claim to be
please enter your password. Not all authentications are of this type, there exist hardware
based authentication schemes (such as the use of smart-cards and biometric devices), with
suitable modules, these may be substituted seamlessly for more standard approaches to
authentication - such is the flexibility of Linux-PAM.
password - this group’s responsibility is the task of updating authentication mechanisms.
Typically, such services are strongly coupled to those of the auth group. Some
authentication mechanisms lend themselves well to being updated with such a function.
Standard UN*X password-based access is the obvious example: please enter a
replacement password.
session - this group of tasks cover things that should be done prior to a service being given
and after it is withdrawn. Such tasks include the maintenance of audit trails and the
mounting of the user’s home directory. The session management group is important as it
provides both an opening and closing hook for modules to affect the services available to a
user.
› FILES
/etc/pam.conf
/etc/pam.d
ip route add
add new route
ip route change
change route
ip route replace
change or add new one
to TYPE PREFIX (default)
the destination prefix of the route. If TYPE is omitted, ip assumes type unicast. Other
values of TYPE are listed above. PREFIX is an IP or IPv6 address optionally
followed by a slash and the prefix length. If the length of the prefix is missing, ip
assumes a full-length host route. There is also a special PREFIX default - which is
equivalent to IP 0/0 or to IPv6 ::/0.
tos TOS
dsfield TOS
the Type Of Service (TOS) key. This key has no associated mask and the longest
match is understood as: First, compare the TOS of the route and of the packet. If they
are not equal, then the packet may still match a route with a zero TOS. TOS is either
an 8 bit hexadecimal number or an identifier from /etc/iproute2/rt_dsfield.
metric NUMBER
preference NUMBER
the preference value of the route. NUMBER is an arbitrary 32bit number.
table TABLEID
the table to add this route to. TABLEID may be a number or a string from the file
/etc/iproute2/rt_tables. If this parameter is omitted, ip assumes the main table, with
the exception of local, broadcast and nat routes, which are put into the local table by
default.
dev NAME
the output device name.
via ADDRESS
the address of the nexthop router. Actually, the sense of this field depends on the
route type. For normal unicast routes it is either the true next hop router or, if it is a
direct route installed in BSD compatibility mode, it can be a local address of the
interface. For NAT routes it is the first address of the block of translated IP
destinations.
src ADDRESS
the source address to prefer when sending to the destinations covered by the route
prefix.
realm REALMID
the realm to which this route is assigned. REALMID may be a number or a string
from the file /etc/iproute2/rt_realms.
mtu MTU
mtu lock MTU
the MTU along the path to the destination. If the modifier lock is not used, the MTU
may be updated by the kernel due to Path MTU Discovery. If the modifier lock is
used, no path MTU discovery will be tried, all packets will be sent without the DF bit
in IPv4 case or fragmented to MTU for IPv6.
window NUMBER
the maximal window for TCP to advertise to these destinations, measured in bytes. It
limits maximal data bursts that our TCP peers are allowed to send to us.
rtt TIME
the initial RTT (‘Round Trip Time’) estimate. If no suffix is specified the units are
raw values passed directly to the routing code to maintain compatibility with
previous releases. Otherwise if a suffix of s, sec or secs is used to specify seconds
and ms, msec or msecs to specify milliseconds.
rttvar TIME (2.3.15+ only)
the initial RTT variance estimate. Values are specified as with rtt above.
rto_min TIME (2.6.23+ only)
the minimum TCP Retransmission TimeOut to use when communicating with this
destination. Values are specified as with rtt above.
ssthresh NUMBER (2.3.15+ only)
an estimate for the initial slow start threshold.
cwnd NUMBER (2.3.15+ only)
the clamp for congestion window. It is ignored if the lock flag is not used.
initcwnd NUMBER (2.5.70+ only)
the initial congestion window size for connections to this destination. Actual window
size is this value multiplied by the MSS (“Maximal Segment Size”) for same
connection. The default is zero, meaning to use the values specified in RFC2414.
initrwnd NUMBER (2.6.33+ only)
the initial receive window size for connections to this destination. Actual window
size is this value multiplied by the MSS of the connection. The default value is zero,
meaning to use Slow Start value.
quickack BOOL (3.11+ only)
Enable or disable quick ack for connections to this destination.
advmss NUMBER (2.3.15+ only)
the MSS (‘Maximal Segment Size’) to advertise to these destinations when
establishing TCP connections. If it is not given, Linux uses a default value calculated
from the first hop device MTU. (If the path to these destination is asymmetric, this
guess may be wrong.)
reordering NUMBER (2.3.15+ only)
Maximal reordering on the path to this destination. If it is not given, Linux uses the
value selected with sysctl variable net/ipv4/tcp_reordering.
nexthop NEXTHOP
the nexthop of a multipath route. NEXTHOP is a complex value with its own syntax
similar to the top level argument lists:
via ADDRESS - is the nexthop router.
dev NAME - is the output device.
weight NUMBER - is a weight for this element of a multipath route reflecting its
relative bandwidth or quality.
scope SCOPE_VAL
the scope of the destinations covered by the route prefix. SCOPE_VAL may be a
number or a string from the file /etc/iproute2/rt_scopes. If this parameter is omitted,
ip assumes scope global for all gatewayed unicast routes, scope link for direct
unicast and broadcast routes and scope host for local routes.
protocol RTPROTO
the routing protocol identifier of this route. RTPROTO may be a number or a string
from the file /etc/iproute2/rt_protos. If the routing protocol ID is not given, ip
assumes protocol boot (i.e. it assumes the route was added by someone who doesn’t
understand what they are doing). Several protocol values have a fixed interpretation.
Namely:
redirect - the route was installed due to an ICMP redirect.
kernel - the route was installed by the kernel during autoconfiguration.
boot - the route was installed during the bootup sequence. If a routing daemon starts,
it will purge all of them.
static - the route was installed by the administrator to override dynamic routing.
Routing daemon will respect them and, probably, even advertise them to its peers.
ra - the route was installed by Router Discovery protocol.
The rest of the values are not reserved and the administrator is free to assign (or not
to assign) protocol tags.
onlink
pretend that the nexthop is directly attached to this link, even if it does not match any
interface prefix.
Adds a default route (for all addresses) via the local gateway 192.168.1.1 that can be
reached on device eth0.
› SEE ALSO
ip(8)
› AUTHOR
Original Manpage by Michail Litvak <[email protected]>
IP-RULE
› NAME
ip-rule - routing policy database management
› SYNOPSIS
ip [ OPTIONS ] rule { COMMAND | help }
ip rule [ list | add | del | flush ] SELECTOR ACTION
SELECTOR := [ from PREFIX ] [ to PREFIX ] [ tos TOS ] [ fwmark FWMARK[/MASK]
] [ iif STRING ] [ oif STRING ] [ pref NUMBER ]
ACTION := [ table TABLE_ID ] [ nat ADDRESS ] [ prohibit | unreachable ] [ realms
[SRCREALM/]DSTREALM ]
TABLE_ID := [ local | main | default | NUMBER ]
› DESCRIPTION
ip rule manipulates rules in the routing policy database control the route selection
algorithm.
Classic routing algorithms used in the Internet make routing decisions based only on the
destination address of packets (and in theory, but not in practice, on the TOS field).
In some circumstances we want to route packets differently depending not only on
destination addresses, but also on other packet fields: source address, IP protocol,
transport protocol ports or even packet payload. This task is called ‘policy routing’.
To solve this task, the conventional destination based routing table, ordered according to
the longest match rule, is replaced with a ‘routing policy database’ (or RPDB), which
selects routes by executing some set of rules.
Each policy routing rule consists of a selector and an action predicate. The RPDB is
scanned in order of decreasing priority. The selector of each rule is applied to {source
address, destination address, incoming interface, tos, fwmark} and, if the selector matches
the packet, the action is performed. The action predicate may return with success. In this
case, it will either give a route or failure indication and the RPDB lookup is terminated.
Otherwise, the RPDB program continues with the next rule.
Semantically, the natural action is to select the nexthop and the output device.
At startup time the kernel configures the default RPDB consisting of three rules:
1.
Priority: 0, Selector: match anything, Action: lookup routing table local (ID 255).
The local table is a special routing table containing high priority control routes for
local and broadcast addresses.
2.
Priority: 32766, Selector: match anything, Action: lookup routing table main (ID
254). The main table is the normal routing table containing all non-policy routes.
This rule may be deleted and/or overridden with other ones by the administrator.
3.
Priority: 32767, Selector: match anything, Action: lookup routing table default (ID
253). The default table is empty. It is reserved for some post-processing if no
previous default rules selected the packet. This rule may also be deleted.
Each RPDB entry has additional attributes. F.e. each rule has a pointer to some routing
table. NAT and masquerading rules have an attribute to select new IP address to
translate/masquerade. Besides that, rules have some optional attributes, which routes have,
namely realms. These values do not override those contained in the routing tables. They
are only used if the route did not select any attributes.
The RPDB may contain rules of the following types:
unicast - the rule prescribes to return the route found in the routing table referenced
by the rule.
blackhole - the rule prescribes to silently drop the packet.
unreachable - the rule prescribes to generate a ‘Network is unreachable’ error.
prohibit - the rule prescribes to generate ‘Communication is administratively
prohibited’ error.
nat - the rule prescribes to translate the source address of the IP packet into some
other value.
ip rule flush - also dumps all the deleted rules. This command has no arguments. ip rule
show - list rules This command has no arguments. The options list or lst are synonyms
with show.
› SEE ALSO
ip(8)
› AUTHOR
Original Manpage by Michail Litvak <[email protected]>
IP-TCP_METRICS
› NAME
ip-tcp_metrics - management for TCP Metrics
› SYNOPSIS
ip [ OPTIONS ] tcp_metrics { COMMAND | help }
ip tcp_metrics { show | flush } SELECTOR
ip tcp_metrics delete [ address ] ADDRESS
SELECTOR := [ [ address ] PREFIX ]
› DESCRIPTION
ip tcp_metrics is used to manipulate entries in the kernel that keep TCP information for
IPv4 and IPv6 destinations. The entries are created when TCP sockets want to share
information for destinations and are stored in a cache keyed by the destination address.
The saved information may include values for metrics (initially obtained from routes),
recent TSVAL for TIME-WAIT recycling purposes, state for the Fast Open feature, etc.
For performance reasons the cache can not grow above configured limit and the older
entries are replaced with fresh information, sometimes reclaimed and used for new
destinations. The kernel never removes entries, they can be flushed only with this tool.
ip tcp_metrics
Removes all IPv6 entries from cache keeping the IPv4 entries.
› SEE ALSO
ip(8)
› AUTHOR
Original Manpage by Julian Anastasov <[email protected]>
IP-TOKEN
› NAME
ip-token - tokenized interface identifer support
› SYNOPSIS
ip token { COMMAND | help }
ip token { set } TOKEN dev DEV
ip token { get } dev DEV
ip token { list }
› DESCRIPTION
IPv6 tokenized interface identifer support is used for assigning well-known host-part
addresses to nodes whilst still obtaining a global network prefix from Router
advertisements. The primary target for tokenized identifiers are server platforms where
addresses are usually manually configured, rather than using DHCPv6 or SLAAC. By
using tokenized identifiers, hosts can still determine their network prefix by use of
SLAAC, but more readily be automatically renumbered should their network prefix
change [1]. Tokenized IPv6 Identifiers are described in the draft [1]: <draft-chown-6man-
tokenised-ipv6-identifiers-02>.
set the interface token to the kernel. Once a token is set, it cannot be removed from the
interface, only overwritten.
TOKEN
the interface identifer token address.
dev DEV
the networking interface.
list all tokenized interface identifers for the networking interfaces from the kernel.
› SEE ALSO
ip(8)
› AUTHOR
Manpage by Daniel Borkmann
IP-TUNNEL
› NAME
ip-tunnel - tunnel configuration
› SYNOPSIS
ip [ OPTIONS ] tunnel { COMMAND | help }
ip tunnel { add | change | del | show | prl } [ NAME ] [ mode MODE ] [ remote ADDR ]
[ local ADDR ] [ [i|o]seq ] [ [i|o]key KEY ] [ [i|o]csum ] ] [ encaplimit ELIM ] [ ttl TTL ]
[ tos TOS ] [ flowlabel FLOWLABEL ] [ prl-default ADDR ] [ prl-nodefault ADDR ] [
prl-delete ADDR ] [ [no]pmtudisc ] [ dev PHYS_DEV ]
MODE := { ipip | gre | sit | isatap | ip6ip6 | ipip6 | any }
ADDR := { IP_ADDRESS | any }
TOS := { STRING | 00..ff | inherit | inherit/STRING | inherit/00..ff }
ELIM := { none | 0..255 }
TTL := { 1..255 | inherit }
KEY := { DOTTED_QUAD | NUMBER }
TIME := NUMBER[s|ms]
› DESCRIPTION
tunnel objects are tunnels, encapsulating packets in IP packets and then sending them over
the IP infrastructure. The encapsulating (or outer) address family is specified by the -f
option. The default is IPv4.
ip tunnel add
add a new tunnel
ip tunnel change
change an existing tunnel
ip tunnel delete
destroy a tunnel
name NAME (default)
select the tunnel device name.
mode MODE
set the tunnel mode. Available modes depend on the encapsulating address family.
Modes for IPv4 encapsulation available: ipip, sit, isatap and gre. Modes for IPv6
encapsulation available: ip6ip6, ipip6 and any.
remote ADDRESS
set the remote endpoint of the tunnel.
local ADDRESS
set the fixed local address for tunneled packets. It must be an address on another
interface of this host.
ttl N
set a fixed TTL N on tunneled packets. N is a number in the range 1—255. 0 is a
special value meaning that packets inherit the TTL value. The default value for IPv4
tunnels is: inherit. The default value for IPv6 tunnels is: 64.
tos T
dsfield T
tclass T
set the type of service (IPv4) or traffic class (IPv6) field on tunneled packets, which
can be specified as either a two-digit hex value (e.g. c0) or a predefined string (e.g.
internet). The value inherit causes the field to be copied from the original IP header.
The values inherit/STRING or inherit/00..ff will set the field to STRING or 00..ff
when tunneling non-IP packets. The default value is 00.
dev NAME
bind the tunnel to the device NAME so that tunneled packets will only be routed via
this device and will not be able to escape to another device when the route to
endpoint changes.
nopmtudisc
disable Path MTU Discovery on this tunnel. It is enabled by default. Note that a fixed
ttl is incompatible with this option: tunneling with a fixed ttl always makes pmtu
discovery.
key K
ikey K
okey K
( only GRE tunnels ) use keyed GRE with key K. K is either a number or an IP
address-like dotted quad. The key parameter sets the key to use in both directions.
The ikey and okey parameters set different keys for input and output.
csum, icsum, ocsum
( only GRE tunnels ) generate/require checksums for tunneled packets. The ocsum
flag calculates checksums for outgoing packets. The icsum flag requires that all input
packets have the correct checksum. The csum flag is equivalent to the combination
icsum ocsum.
seq, iseq, oseq
( only GRE tunnels ) serialize packets. The oseq flag enables sequencing of
outgoing packets. The iseq flag requires that all input packets are serialized. The seq
flag is equivalent to the combination iseq oseq. It isn’t work. Don’t use it.
encaplim ELIM
( only IPv6 tunnels ) set a fixed encapsulation limit. Default is 4.
flowlabel FLOWLABEL
( only IPv6 tunnels ) set a fixed flowlabel.
ID
is specified by a source address, destination address, transform protocol XFRM-
PROTO, and/or Security Parameter Index SPI. (For IP Payload Compression, the
Compression Parameter Index or CPI is used for SPI.)
XFRM-PROTO
specifies a transform protocol: IPsec Encapsulating Security Payload (esp), IPsec
Authentication Header (ah), IP Payload Compression (comp), Mobile IPv6 Type 2
Routing Header (route2), or Mobile IPv6 Home Address Option (hao).
ALGO-LIST
contains one or more algorithms to use. Each algorithm ALGO is specified by:
the algorithm type: encryption (enc), authentication (auth or auth-trunc),
authenticated encryption with associated data (aead), or compression (comp) the
algorithm name ALGO-NAME (see below) (for all except comp) the keying material
ALGO-KEYMAT, which may include both a key and a salt or nonce value; refer to
the corresponding RFC (for auth-trunc only) the truncation length ALGO-TRUNC-
LEN in bits (for aead only) the Integrity Check Value length ALGO-ICV-LEN in bits
source and destination addresses, with masks, source and destination ports and
protocol for selection of packets. The source and destination ports are only legal if
the transport protocol is TCP or UDP. A port can be specified as either decimal,
hexadecimal (leading 0x), octal (leading 0) or a name listed in the first column of
/etc/services. A transport protocol can be specified as either decimal, hexadecimal
(leading 0x), octal (leading 0) or a name listed in the first column of /etc/protocols. If
a transport protocol or port is not specified then it defaults to 0 which means all
protocols or all ports respectively.
protocol (proto), indicating (together with the effective destination and the security
parameters index) which Security Association should be used to process the packet
Security Parameters Index (spi), indicating (together with the effective destination
and protocol) which Security Association should be used to process the packet (must
be larger than or equal to 0x100)
effective destination (edst), where the packet should be forwarded after processing
(normally the other security gateway)
+
OR
SAID (said), indicating which Security Association should be used to process the
packet
Addresses are written as IPv4 dotted quads or IPv6 coloned hex, protocol is one of “ah”,
“esp”, “comp” or “tun” and SPIs are prefixed hexadecimal numbers where ‘.’ represents
IPv4 and ‘:’ stands for IPv6.
SAIDs are written as “protoafSPI@address”. There are also 5 “magic” SAIDs which have
special meaning:
+
%reject means that matches are to be dropped and an ICMP returned, if possible to
inform
%trap means that matches are to trigger an ACQUIRE message to the Key
Management daemon(s) and a hold eroute will be put in place to prevent subsequent
packets also triggering ACQUIRE messages.
%hold means that matches are to stored until the eroute is replaced or until that
eroute gets reaped
%pass means that matches are to allowed to pass without IPSEC processing
tunnelling code
tunnel-xmit
pfkey
xform
eroute
spi
radij
esp
ah
ipcomp
ip compression transforms code
verbose
give even more information, BEWARE: a)this will print authentication and
encryption keys in the logs b)this will probably trample the 4k kernel printk buffer
giving inaccurate output
All Klips debug output appears as kernel.info messages to syslogd(8). Most systems are
set up to log these messages to /var/log/messages. Beware that klipsdebug —all produces
a lot of output and the log file will grow quickly.
The file format for /proc/net/ipsec_klipsdebug is discussed in ipsec_klipsdebug(5).
› EXAMPLES
klipsdebug —all
IKE’s Job
Pluto
pluto runs as a daemon with userid root. Before running it, a few things must be set up.
pluto requires a working IPsec stack.
pluto supports multiple public networks (that is, networks that are considered insecure and
thus need to have their traffic encrypted or authenticated). It discovers the public
interfaces to use by looking at all interfaces that are configured (the —interface option
can be used to limit the interfaces considered). It does this only when whack tells it to —
listen, so the interfaces must be configured by then. Each interface with a name of the
form ipsec[0-9] is taken as a KLIPS virtual public interface. Another network interface
with the same IP address (the first one found will be used) is taken as the corresponding
real public interface. The —listen can be used to limit listening on only 1 IP address of a
certain interface. ifconfig(8) or ip(8) with the -a flag will show the name and status of
each network interface.
pluto requires a database of preshared secrets and RSA private keys. This is described in
the ipsec.secrets(5). pluto is told of RSA public keys via whack commands. If the
connection is Opportunistic, and no RSA public key is known, pluto will attempt to fetch
RSA keys using the Domain Name System.
The most basic network topology that pluto supports has two security gateways
negotiating on behalf of client subnets. The diagram of RGB’s testbed is a good example
(see klips/doc/rgb_setup.txt).
The file INSTALL in the base directory of this distribution explains how to start setting up
the whole system, including KLIPS.
Make sure that the security gateways have routes to each other. This is usually covered by
the default route, but may require issuing route(8) commands. The route must go through
a particular IP interface (we will assume it is eth0, but it need not be). The interface that
connects the security gateway to its client must be a different one.
It is necessary to issue a ipsec_tncfg(8) command on each gateway. The required
command is:
ipsec tncfg —attach —virtual ipsec0 —physical eth0
A command to set up the ipsec0 virtual interface will also need to be run. It will have the
same parameters as the command used to set up the physical interface to which it has just
been connected using ipsec_tncfg(8).
No special requirements are necessary to use NETKEY - it ships with all modern versions
of Linux 2.4 and 2.6. however, note that certain vendors or older distributions use old
versions or backports of NETKEY which are broken. If possible use a NETKEY version
that is at least based on, or backported from Linux 2.6.11 or newer.
ipsec.secrets file
A pluto daemon and another IKE daemon (for example, another instance of pluto) must
convince each other that they are who they are supposed to be before any negotiation can
succeed. This authentication is accomplished by using either secrets that have been shared
beforehand (manually) or by using RSA signatures. There are other techniques, but they
have not been implemented in pluto.
The file /etc/ipsec.secrets is used to keep preshared secret keys, RSA private keys, X.509
encoded keyfiles and XAUTH passwords. Smartcards are handled via NSS. For
debugging, there is an argument to the pluto command to use a different file. This file is
described in ipsec.secrets(5).
Running Pluto
To fire up the daemon, just type pluto (be sure to be running as the superuser). The default
IKE port number is 500, the UDP port assigned by IANA for IKE Daemons. pluto must
be run by the superuser to be able to use the UDP 500 port. If pluto is told to enable NAT-
Traversal, then UDP port 4500 is also taken by pluto to listen on.
Pluto supports different IPstacks on different operating systems. This can be configured
using one of the options —use-netkey (the default), —use-klips, —use-mast, —use-
bsdkame, —use-win2k or —use-nostack. The latter is meant for testing only - no actual
IPsec connections will be loaded into the kernel. The option —use-auto has been
obsoleted. On startup, pluto might also read the protostack= option to select the IPsec
stack to use if —config /etc/ipsec.conf is given as argument to pluto. If both —use-XXX
and —config /etc/ipsec.conf are specified, the last command line argument specified takes
precedence.
Pluto supports RFC 3947 NAT-Traversal. The allowed range behind the NAT routers is
submitted using the —virtual-private option. See ipsec.conf(5) for the syntax. The option
—force-keepalive forces the sending of the keep-alive packets, which are send to prevent
the NAT router from closing its port when there is not enough traffic on the IPsec
connection. The —keep-alive sets the delay (in seconds) of these keep-alive packets. The
newer NAT-T standards support port floating, and Libreswan enables this per default.
Pluto supports the use of X.509 certificates and sends it certificate when needed. This can
confuse IKE implementations that do not implement this, such as the old FreeS/WAN
implementation. The —nocrsend prevents pluto from sending these. At startup, pluto
loads all the X.509 related files from the directories /etc/ipsec.d/certs, /etc/ipsec.d/cacerts,
/etc/ipsec.d/aacerts, /etc/ipsec.d/private and /etc/ipsec.d/crls. The Certificate Revocation
Lists can also be retrieved from an URL. The option —crlcheckinterval sets the time
between checking for CRL expiration and issuing new fetch commands. The first attempt
to update a CRL is started at 2*crlcheckinterval before the next update time. Pluto logs a
warning if no valid CRL was loaded or obtained for a connection. If —strictcrlpolicy is
given, the connection will be rejected until a valid CRL has been loaded.
Pluto can also use helper children to off-load cryptographic operations. This behavior can
be fine tuned using the —nhelpers. Pluto will start (n-1) of them, where n is the number
of CPU’s you have (including hypherthreaded CPU’s). A value of 0 forces pluto to do all
operations in the main process. A value of -1 tells pluto to perform the above calculation.
Any other value forces the number to that amount.
Pluto uses the NSS crypto library as its random source. Some government Three Letter
Agency requires that pluto reads 440 bits from /dev/random and feed this into the NSS
RNG before drawing random from the NSS library, despite the NSS library itself already
seeding its internal state. As this process can block pluto for an extended time, the default
is to not perform this redundant seeding. The —seedbits option can be used to specify the
number of bits that will be pulled from /dev/random and seeded into the NSS RNG. This
can also be accomplished by specifying seedbits in the “config setup” section of
ipsec.conf. This option should not be used by most people.
pluto attempts to create a lockfile with the name /var/run/pluto/pluto.pid. If the lockfile
cannot be created, pluto exits - this prevents multiple plutos from competing Any
“leftover” lockfile must be removed before pluto will run. pluto writes its PID into this
file so that scripts can find it. This lock will not function properly if it is on an NFS
volume (but sharing locks on multiple machines doesn’t make sense anyway).
pluto then forks and the parent exits. This is the conventional “daemon fork”. It can make
debugging awkward, so there is an option to suppress this fork. In certain configurations,
pluto might also launch helper programs to assist with DNS queries or to offload
cryptographic operations.
All logging, including diagnostics, is sent to syslog(3) with facility=authpriv; it decides
where to put these messages (possibly in /var/log/secure or /var/log/auth.log). Since this
too can make debugging awkward, the option —stderrlog is used to steer logging to
stderr.
Alternatively, —logfile can be used to send all logging information to a specific file.
If the —perpeerlog option is given, then pluto will open a log file per connection. By
default, this is in /var/log/pluto/peer, in a subdirectory formed by turning all dot (.) [IPv4}
or colon (:) [IPv6] into slashes (/).
The base directory can be changed with the —perpeerlogbase.
Once pluto is started, it waits for requests from whack.
To understand how to use pluto, it is helpful to understand a little about its internal state.
Furthermore, the terminology is needed to decipher some of the diagnostic messages.
Pluto supports food groups, and X.509 certificates. These are located in /etc/ipsec.d, or
another directory as specified by —ipsecdir.
Pluto may core dump. It will normally do so into the current working directory. You can
specify the —coredir option for pluto, or specify the dumpdir= option in ipsec.conf.
If you are investigating a potential memory leak in pluto, start pluto with the —leak-
detective option. Before the leak causes the system or pluto to die, shut down pluto in the
regular way. pluto will display a list of leaks it has detected.
The (potential) connection database describes attributes of a connection. These include the
IP addresses of the hosts and client subnets and the security characteristics desired. pluto
requires this information (simply called a connection) before it can respond to a request to
build an SA. Each connection is given a name when it is created, and all references are
made using this name.
During the IKE exchange to build an SA, the information about the negotiation is
represented in a state object. Each state object reflects how far the negotiation has
reached. Once the negotiation is complete and the SA established, the state object remains
to represent the SA. When the SA is terminated, the state object is discarded. Each State
object is given a serial number and this is used to refer to the state objects in logged
messages.
Each state object corresponds to a connection and can be thought of as an instantiation of
that connection. At any particular time, there may be any number of state objects
corresponding to a particular connection. Often there is one representing an ISAKMP SA
and another representing an IPsec SA.
KLIPS hooks into the routing code in a LINUX kernel. Traffic to be processed by an
IPsec SA must be directed through KLIPS by routing commands. Furthermore, the
processing to be done is specified by ipsec eroute(8) commands. pluto takes the
responsibility of managing both of these special kinds of routes.
NETKEY requires no special routing.
Each connection may be routed, and must be while it has an IPsec SA. The connection
specifies the characteristics of the route: the interface on this machine, the “gateway” (the
nexthop), and the peer’s client subnet. Two connections may not be simultaneously routed
if they are for the same peer’s client subnet but use different interfaces or gateways
(pluto‘s logic does not reflect any advanced routing capabilities).
On KLIPS, each eroute is associated with the state object for an IPsec SA because it has
the particular characteristics of the SA. Two eroutes conflict if they specify the identical
local and remote clients (unlike for routes, the local clients are taken into account).
When pluto needs to install a route for a connection, it must make sure that no conflicting
route is in use. If another connection has a conflicting route, that route will be taken down,
as long as there is no IPsec SA instantiating that connection. If there is such an IPsec SA,
the attempt to install a route will fail.
There is an exception. If pluto, as Responder, needs to install a route to a fixed client
subnet for a connection, and there is already a conflicting route, then the SAs using the
route are deleted to make room for the new SAs. The rationale is that the new connection
is probably more current. The need for this usually is a product of Road Warrior
connections (these are explained later; they cannot be used to initiate).
When pluto needs to install an eroute for an IPsec SA (for a state object), first the state
object’s connection must be routed (if this cannot be done, the eroute and SA will not be
installed). If a conflicting eroute is already in place for another connection, the eroute and
SA will not be installed (but note that the routing exception mentioned above may have
already deleted potentially conflicting SAs). If another IPsec SA for the same connection
already has an eroute, all its outgoing traffic is taken over by the new eroute. The
incoming traffic will still be processed. This characteristic is exploited during rekeying.
All of these routing characteristics are expected change when KLIPS and NETKEY
merge into a single new stack.
Using whack
whack is used to command a running pluto. whack uses a UNIX domain socket to speak
to pluto (by default, /var/pluto.ctl).
whack has an intricate argument syntax. This syntax allows many different functions to be
specified. The help form shows the usage or version information. The connection form
gives pluto a description of a potential connection. The public key form informs pluto of
the RSA public key for a potential peer. The delete form deletes a connection description
and all SAs corresponding to it. The listen form tells pluto to start or stop listening on the
public interfaces for IKE requests from peers. The route form tells pluto to set up routing
for a connection; the unroute form undoes this. The initiate form tells pluto to negotiate an
SA corresponding to a connection. The terminate form tells pluto to remove all SAs
corresponding to a connection, including those being negotiated. The status form displays
the pluto‘s internal state. The debug form tells pluto to change the selection of debugging
output “on the fly”. The shutdown form tells pluto to shut down, deleting all SAs.
The crash option asks pluto to consider a particularly target IP to have crashed, and to
attempt to restart all connections with that IP address as a gateway. In general, you should
use Dead Peer Detection to detect this kind of situation automatically, but this is not
always possible.
Most options are specific to one of the forms, and will be described with that form. There
are three options that apply to all forms.
—ctlbase path
path.ctl is used as the UNIX domain socket for talking to pluto. This option
facilitates debugging.
—label string
—version
The connection form describes a potential connection to pluto. pluto needs to know what
connections can and should be negotiated. When pluto is the initiator, it needs to know
what to propose. When pluto is the responder, it needs to know enough to decide whether
is is willing to set up the proposed connection.
The description of a potential connection can specify a large number of details. Each
connection has a unique name. This name will appear in a updown shell command, so it
should not contain punctuation that would make the command ill-formed.
—name connection-name
the identity of the end. Currently, this can be an IP address (specified as dotted quad
or as a Fully Qualified Domain Name, which will be resolved immediately) or as a
Fully Qualified Domain Name itself (prefixed by “@” to signify that it should not be
resolved), or as user@FQDN, or an X.509 DN, or as the magic value %myid. Pluto
only authenticates the identity, and does not use it for addressing, so, for example, an
IP address need not be the one to which packets are to be sent. If the option is absent,
the identity defaults to the IP address specified by —host. %myid allows the identity
to be separately specified (by the pluto or whack option —myid or by the
ipsec.conf(5)config setup parameter myid). Otherwise, pluto tries to guess what
%myid should stand for: the IP address of %defaultroute, if it is supported by a
suitable TXT record in the reverse domain for that IP address, or the system’s
hostname, if it is supported by a suitable TXT record in its forward domain.
the IP address of the end (generally the public interface). If pluto is to act as a
responder for IKE negotiations initiated from unknown IP addresses (the “Road
Warrior” case), the IP address should be specified as %any (currently, the obsolete
notation 0.0.0.0 is also accepted for this). If pluto is to opportunistically initiate the
connection, use %opportunistic
—cert filename
The filename of the X.509 certificate. This must be the public key certificate only,
and cannot be the PKCS#12 certificate file. See ipsec.conf(5) on how to extrac this
from the PKCS#12 file.
—ca distinguished name
the X.509 Certificate Authority’s Distinguished Name (DN) used as trust anchor for
this connection. This is the CA certificate that signed the host certificate, as well as
the certificate of the incoming client.
—sendcert yes|forced|always|ifasked|no|never
Whether or not to send our X.509 certificate credentials. This could potentially give
an attacker too much information about which identities are allowed to connect to
this host. The default is to use ifasked when we are a Responder, and to use yes
(which is the same as forced and always if we are an Initiator. The values no and
never are equivalent. NOTE: “forced” does not seem to be actually implemented - do
not use it.
—sendca none|issuer|all
How much of our available X.509 trust chain to send with the end certificate,
excluding any root CAs. Specifying issuer sends just the issuing intermediate CA,
while all will send the entire chain of intermediate CAs.none will not send any CA
certs. The default is none which maintains the current libreswan behavior.
—certtype number
—ikeport port-number
the UDP port that IKE listens to on that host. The default is 500. (pluto on this
machine uses the port specified by its own command line argument, so this only
affects where pluto sends messages.)
—nexthop ip-address
where to route packets for the peer’s client (presumably for the peer too, but it will
not be used for this). When pluto installs an IPsec SA, it issues a route command. It
uses the nexthop as the gateway. The default is the peer’s IP address (this can be
explicitly written as %direct; the obsolete notation 0.0.0.0 is accepted). This option
is necessary if pluto‘s host’s interface used for sending packets to the peer is neither
point-to-point nor directly connected to the peer.
—client subnet
the subnet for which the IPsec traffic will be destined. If not specified, the host will
be the client. The subnet can be specified in any of the forms supported by
ipsec_atosubnet(3). The general form is address/mask. The address can be either a
domain name or four decimal numbers (specifying octets) separated by dots. The
most convenient form of the mask is a decimal integer, specifying the number of
leading one bits in the mask. So, for example, 10.0.0.0/8 would specify the class A
network “Net 10”.
—clientwithin subnet
This option is obsolete and will be removed. Do not use this option anymore.
—clientprotoport protocol/port
specify the Port Selectors (filters) to be used on this connection. The general form is
protocol/port. This is most commonly used to limit the connection to L2TP traffic
only by specifying a value of 17/1701 for UDP (protocol 17) and port 1701. The
notation 17/%any can be used to allow all UDP traffic and is needed for L2TP
connections with Windows XP machines before Service Pack 2.
—srcip ip-address
the IP address for this host to use when transmitting a packet to the remote IPsec
gateway itself. This option is used to make the gateway itself use its internal IP,
which is part of the —client subnet. Otherwise it will use its nearest IP address,
which is its public IP address, which is not part of the subnet-subnet IPsec tunnel, and
would therefor not get encrypted.
—xauthserver
this end is an xauthserver. It will lookup the xauth user name and password and
verify this before allowing the connection to get established.
—xauthclient
this end is an xauthclient. To bring this connection up with the —initiate also
requires the client to specify —xauthuser username and —xauthpass password
—xauthuser
The username for the xauth authentication.This option is normally passed along by
ipsec_auto(8) when an xauth connection is started using ipsec auto —up conn
—xauthpass
The password for the xauth authentication. This option is normally passed along by
ipsec_auto(8) when an xauth connection is started using ipsec auto —up conn
—modecfgserver
—modecfgclient
—modecfgdns1
The IP address of the first DNS server to pass along to the ModeConfig Client
—modecfgdns2
The IP address of the second DNS server to pass along to the ModeConfig Client
—dnskeyondemand
specifies that when an RSA public key is needed to authenticate this host, and it isn’t
already known, fetch it from DNS.
—updown updown
—to
separates the specification of the left and right ends of the connection. Pluto tries to
decide whether it is left or right based on the information provided on both sides of
this option.
The potential connection description also specifies characteristics of rekeying and security.
—psk
Propose and allow preshared secret authentication for IKE peers. This authentication
requires that each side use the same secret. May be combined with —rsasig; at least
one must be specified.
—rsasig
Propose and allow RSA signatures for authentication of IKE peers. This
authentication requires that each side have have a private key of its own and know
the public key of its peer. May be combined with —psk; at least one must be
specified.
—encrypt
All proposed or accepted IPsec SAs will include non-null ESP. The actual choices of
transforms are wired into pluto.
—authenticate
All proposed IPsec SAs will include AH. All accepted IPsec SAs will include AH or
ESP with authentication. The actual choices of transforms are wired into pluto. Note
that this has nothing to do with IKE authentication.
—compress
All proposed IPsec SAs will include IPCOMP (compression). This will be ignored if
KLIPS is not configured with IPCOMP support.
—tunnel
the IPsec SA should use tunneling. Implicit if the SA is for clients. Must only be used
with —authenticate or —encrypt.
—ipv4
The host addresses will be interpreted as IPv4 addresses. This is the default. Note that
for a connection, all host addresses must be of the same Address Family (IPv4 and
IPv6 use different Address Families).
—ipv6
The host addresses (including nexthop) will be interpreted as IPv6 addresses. Note
that for a connection, all host addresses must be of the same Address Family (IPv4
and IPv6 use different Address Families).
—tunnelipv4
The client addresses will be interpreted as IPv4 addresses. The default is to match
what the host will be. This does not imply —tunnel so the flag can be safely used
when no tunnel is actually specified. Note that for a connection, all tunnel addresses
must be of the same Address Family.
—tunnelipv6
The client addresses will be interpreted as IPv6 addresses. The default is to match
what the host will be. This does not imply —tunnel so the flag can be safely used
when no tunnel is actually specified. Note that for a connection, all tunnel addresses
must be of the same Address Family.
—pfs
There should be Perfect Forward Secrecy - new keying material will be generated for
each IPsec SA rather than being derived from the ISAKMP SA keying material. Since
the group to be used cannot be negotiated (a dubious feature of the standard), pluto
will propose the same group that was used during Phase 1. We don’t implement a
stronger form of PFS which would require that the ISAKMP SA be deleted after the
IPSEC SA is negotiated.
—disablearrivalcheck
If the connection is a tunnel, allow packets arriving through the tunnel to have any
source and destination addresses.
—esp esp-algos
—aggrmode
This tunnel is using aggressive mode ISAKMP negotiation. The default is main
mode. Aggressive mode is less secure than main mode as it reveals your identity to
an eavesdropper, but is needed to support road warriors using PSK keys or to
interoperate with other buggy implementations insisting on using aggressive mode.
—modecfgpull
—dpddelay seconds
Set the delay (in seconds) between Dead Peer Detection (RFC 3706) keepalives
(R_U_THERE, R_U_THERE_ACK) that are sent for this connection (default 30
seconds).
—timeout seconds
Set the length of time (in seconds) we will idle without hearing either an
R_U_THERE poll from our peer, or an R_U_THERE_ACK reply. After this period
has elapsed with no response and no traffic, we will declare the peer dead, and
remove the SA (default 120 seconds).
—dpdaction action
When a DPD enabled peer is declared dead, what action should be taken.
hold(default) means the eroute will be put into %hold status, while clearmeans the
eroute and SA with both be cleared. Clear is really only useful on the server of a
Road Warrior config. The action restart is used on tunnels that need to be
permanently up, and have static IP addresses. The action restart_by_peerhas been
obsoleted and its functionality has been moved into the restart action.
—forceencaps
In some cases, for example when ESP packets are filtered or when a broken IPsec
peer does not properly recognise NAT, it can be useful to force RFC-3948
encapsulation using this option. It causes pluto lie and tell the remote peer that RFC-
3948 encapsulation (ESP in UDP port 4500 packets) is required.
Only initiate the connection when we have traffic to send over the connection
—pass
—drop
—reject
Drop unencrypted traffic silently, but send an ICMP message notifying the other end.
to be documented
—failpass
to be documented
—faildrop
to be documented
—failreject
to be documented
—listall
—listpubkeys
list all the public keys that have been successfully loaded.
—listcerts
—checkpubkeys
list all the loaded X.509 certificates which are about to expire or have been expired.
—listcacerts
list all the X.509 Certificate Authority (CA) certificates that are currently loaded.
—listacerts
list all the X.509 Attribute certificates that are currently loaded
—listaacerts
—listgroups
—listcrls
how long pluto will propose that an ISAKMP SA be allowed to live. The default is
3600 (one hour) and the maximum is 86400 (1 day). This option will not affect what
is accepted. pluto will reject proposals that exceed the maximum.
—ipseclifetime seconds
how long pluto will propose that an IPsec SA be allowed to live. The default is
28800 (eight hours) and the maximum is 86400 (one day). This option will not affect
what is accepted. pluto will reject proposals that exceed the maximum.
—rekeymargin seconds
how long before an SA’s expiration should pluto try to negotiate a replacement SA.
This will only happen if pluto was the initiator. The default is 540 (nine minutes).
—rekeyfuzz percentage
—keyingtries count
how many times pluto should try to negotiate an SA, either for the first time or for
rekeying. A value of 0 is interpreted as a very large number: never give up. The
default is three.
—dontrekey
A misnomer. Only rekey a connection if we were the Initiator and there was recent
traffic on the existing connection. This applies to Phase 1 and Phase 2. This is
currently the only automatic way for a connection to terminate. It may be useful with
Road Warrior or Opportunistic connections. Since SA lifetime negotiation is take-it-
or-leave it, a Responder normally uses the shorter of the negotiated or the configured
lifetime. This only works because if the lifetime is shorter than negotiated, the
Responder will rekey in time so that everything works. This interacts badly with —
dontrekey. In this case, the Responder will end up rekeying to rectify a shortfall in
an IPsec SA lifetime; for an ISAKMP SA, the Responder will accept the negotiated
lifetime.
—delete
when used in the connection form, it causes any previous connection with this name
to be deleted before this one is added. Unlike a normal delete, no diagnostic is
produced if there was no previous connection to delete. Any routing in place for the
connection is undone.
The delete form deletes a named connection description and any SAs established or
negotiations initiated using this connection. Any routing in place for the connection is
undone.
—deletestate state-number
The deletestate form deletes the state object with the specified serial number. This is
useful for selectively deleting instances of connections.
The route form of the whack command tells pluto to set up routing for a connection.
Although like a traditional route, it uses an ipsec device as a virtual interface. Once
routing is set up, no packets will be sent “in the clear” to the peer’s client specified in the
connection. A TRAP shunt eroute will be installed; if outbound traffic is caught, Pluto will
initiate the connection. An explicit whack route is not always needed: if it hasn’t been
done when an IPsec SA is being installed, one will be automatically attempted.
—route, —name connection-name
When a routing is attempted for a connection, there must not already be a routing for
a different connection with the same subnet but different interface or destination, or if
there is, it must not be being used by an IPsec SA. Otherwise the attempt will fail.
The unroute form of the whack command tells pluto to undo a routing. pluto will
refuse if an IPsec SA is using the connection. If another connection is sharing the
same routing, it will be left in place. Without a routing, packets will be sent without
encryption or authentication.
The initiate form tells pluto to initiate a negotiation with another pluto (or other IKE
daemon) according to the named connection. Initiation requires a route that —route
would provide; if none is in place at the time an IPsec SA is being installed, pluto
attempts to set one up.
—initiate, —name connection-name, —asynchronous
The initiate form of the whack command will relay back from pluto status
information via the UNIX domain socket (unless —asynchronous is specified). The
status information is meant to look a bit like that from FTP. Currently whack simply
copies this to stderr. When the request is finished (eg. the SAs are established or
pluto gives up), pluto closes the channel, causing whack to terminate.
This will cause pluto to attempt to opportunistically initiate a connection from here
to the there, even if a previous attempt had been made. The whack log will show the
progress of this attempt.
Ending an connection
—terminate, —name connection-name
the terminate form tells pluto to delete any SAs that use the specified connection and
to stop any negotiations in process. it does not prevent new negotiations from starting
(the delete form has this effect).
—crash ip-address
If the remote peer has crashed, and therefor did not notify us, we keep sending
encrypted traffic, and rejecting all plaintext (non-IKE) traffic from that remote peer.
The —crash brings our end down as well for all the known connections to the
specified ip-address
—whackrecordfilename, —whackstoprecord
this causes plutoto open the given filename for write, and record each of the
messages received from whack or addconn. This continues until the whackstoprecord
option is used. This option may not be combined with any other command. The
start/stop commands are not recorded themselves. These files are usually used to
create input files for unit tests, particularly for complex setups where policies may in
fact overlap.
The format of the file consists of a line starting with #!pluto-whack and the date that
the file was started, as well as the hostname, and a linefeed. What follows are binary
format records consisting of a 32-bit record length in bytes, (including the length
record itself), a 64-bit timestamp, and then the literal contents of the whack message
that was received. All integers are in host format. In order to unambigously determine
the host order, the first record is an empty record that contains only the current
WHACK_MAGIC value. This record is 16 bytes long.
ip-address
If the remote peer has crashed, and therefor did not notify us, we keep sending
encrypted traffic, and rejecting all plaintext (non-IKE) traffic from that remote peer.
The —crash brings our end down as well for all the known connections to the
specified ip-address
The public key for informs pluto of the RSA public key for a potential peer. Private keys
must be kept secret, so they are kept in ipsec.secrets(5).
—keyid id
specififies the identity of the peer for which a public key should be used. Its form is
identical to the identity in the connection. If no public key is specified, pluto
attempts to find KEY records from DNS for the id (if a FQDN) or through reverse
lookup (if an IP address). Note that there several interesting ways in which this is not
secure.
—addkey
specifies that the new key is added to the collection; otherwise the new key replaces
any old ones.
—pubkeyrsa key
specifies the value of the RSA public key. It is a sequence of bytes as described in
RFC 2537 “RSA/MD5 KEYs and SIGs in the Domain Name System (DNS)”. It is
denoted in a way suitable for ipsec_ttodata(3). For example, a base 64 numeral starts
with 0s.
The listen form tells pluto to start listening for IKE requests on its public interfaces. To
avoid race conditions, it is normal to load the appropriate connections into pluto before
allowing it to listen. If pluto isn’t listening, it is pointless to initiate negotiations, so it will
refuse requests to do so. Whenever the listen form is used, pluto looks for public
interfaces and will notice when new ones have been added and when old ones have been
removed. This is also the trigger for pluto to read the ipsec.secrets file. So listen may
useful more than once.
—listen
—unlisten
The trafficstatus form will display the xauth username, add_time and the total in and out
bytes of the IPsec SA’s.
—trafficstatus
The shutdown form is the proper way to shut down pluto. It will tear down the SAs on
this machine that pluto has negotiated. It does not inform its peers, so the SAs on their
machines remain.
—shutdown
Examples
It would be normal to start pluto in one of the system initialization scripts. It needs to be
run by the superuser. Generally, no arguments are needed. To run in manually, the
superuser can simply type
ipsec pluto
The command will immediately return, but a pluto process will be left running, waiting
for requests from whack or a peer.
Using whack, several potential connections would be described:
ipsec whack —name silly —host 127.0.0.1 —to —host 127.0.0.2 —ikelifetime 900 —
ipseclifetime 800 —keyingtries 3
Since this silly connection description specifies neither encryption, authentication, nor
tunneling, it could only be used to establish an ISAKMP SA.
ipsec whack —name conn_name —host 10.0.0.1 —client 10.0.1.0/24 —to —
host 10.0.0.2 —client 10.0.2.0/24 —encrypt
This is something that must be done on both sides. If the other side is pluto, the same
whack command could be used on it (the command syntax is designed to not distinguish
which end is ours).
Now that the connections are specified, pluto is ready to handle requests and replies via
the public interfaces. We must tell it to discover those interfaces and start accepting
messages from peers:
ipsec whack —listen
If we don’t immediately wish to bring up a secure connection between the two clients, we
might wish to prevent insecure traffic. The routing form asks pluto to cause the packets
sent from our client to the peer’s client to be routed through the ipsec0 device; if there is
no SA, they will be discarded:
ipsec whack —route conn_name
Finally, we are ready to get pluto to initiate negotiation for an IPsec SA (and implicitly, an
ISAKMP SA):
ipsec whack —initiate —name conn_name
A small log of interesting events will appear on standard output (other logging is sent to
syslog).
whack can also be used to terminate pluto cleanly, tearing down all SAs that it has
negotiated.
ipsec whack —shutdown
Notification of any IPSEC SA deletion, but not ISAKMP SA deletion is sent to the peer.
Unfortunately, such Notification is not reliable. Furthermore, pluto itself ignores
Notifications.
XAUTH
Whenever pluto brings a connection up or down, it invokes the updown command. This
command is specified using the —updown option. This allows for customized control
over routing and firewall manipulation.
The updown is invoked for five different operations. Each of these operations can be for
our client subnet or for our host itself.
prepare-host or prepare-client
is run before bringing up a new connection if no other connection with the same
clients is up. Generally, this is useful for deleting a route that might have been set up
before pluto was run or perhaps by some agent not known to pluto.
route-host or route-client
is run when bringing up a connection for a new peer client subnet (even if prepare-
host or prepare-client was run). The command should install a suitable route.
Routing decisions are based only on the destination (peer’s client) subnet address,
unlike eroutes which discriminate based on source too.
unroute-host or unroute-client
is run when bringing down the last connection for a particular peer client subnet. It
should undo what the route-host or route-client did.
up-host or up-client
is run when bringing up a tunnel eroute with a pair of client subnets that does not
already have a tunnel eroute. This command should install firewall rules as
appropriate. It is generally a good idea to allow IKE messages (UDP port 500) travel
between the hosts.
down-host or down-client
is run when bringing down the eroute for a pair of client subnets. This command
should delete firewall rules as appropriate. Note that there may remain some inbound
IPsec SAs with these client subnets.
The script is passed a large number of environment variables to specify what needs to be
done.
PLUTO_VERSION
indicates what version of this interface is being used. This document describes
version 1.1. This is upwardly compatible with version 1.0.
PLUTO_VERB
PLUTO_CONNECTION
PLUTO_NEXT_HOP
is the next hop to which packets bound for the peer must be sent.
PLUTO_INTERFACE
PLUTO_ME
is the IP address of our host.
PLUTO_MY_CLIENT
is the IP address / count of our client subnet. If the client is just the host, this will be
the host’s own IP address / max (where max is 32 for IPv4 and 128 for IPv6).
PLUTO_MY_CLIENT_NET
is the IP address of our client net. If the client is just the host, this will be the host’s
own IP address.
PLUTO_MY_CLIENT_MASK
is the mask for our client net. If the client is just the host, this will be
255.255.255.255.
PLUTO_PEER
PLUTO_PEER_CLIENT
is the IP address / count of the peer’s client subnet. If the client is just the peer, this
will be the peer’s own IP address / max (where max is 32 for IPv4 and 128 for IPv6).
PLUTO_PEER_CLIENT_NET
is the IP address of the peer’s client net. If the client is just the peer, this will be the
peer’s own IP address.
PLUTO_PEER_CLIENT_MASK
is the mask for the peer’s client net. If the client is just the peer, this will be
255.255.255.255.
PLUTO_MY_PROTOCOL
PLUTO_PEER_PROTOCOL
lists the protocols the peer allows over this IPsec SA.
PLUTO_MY_PORT
lists the ports the peer allows over this IPsec SA.
PLUTO_MY_ID
PLUTO_PEER_ID
PLUTO_PEER_CA
All output sent by the script to stderr or stdout is logged. The script should return an exit
status of 0 if and only if it succeeds.
Pluto waits for the script to finish and will not do any other processing while it is waiting.
The script may assume that pluto will not change anything while the script runs. The
script should avoid doing anything that takes much time and it should not issue any
command that requires processing by pluto. Either of these activities could be performed
by a background subprocess of the script.
Rekeying
When an SA that was initiated by pluto has only a bit of lifetime left, pluto will initiate
the creation of a new SA. This applies to ISAKMP and IPsec SAs. The rekeying will be
initiated when the SA’s remaining lifetime is less than the rekeymargin plus a random
percentage, between 0 and rekeyfuzz, of the rekeymargin.
Similarly, when an SA that was initiated by the peer has only a bit of lifetime left, pluto
will try to initiate the creation of a replacement. To give preference to the initiator, this
rekeying will only be initiated when the SA’s remaining lifetime is half of rekeymargin. If
rekeying is done by the responder, the roles will be reversed: the responder for the old SA
will be the initiator for the replacement. The former initiator might also initiate rekeying,
so there may be redundant SAs created. To avoid these complications, make sure that
rekeymargin is generous.
One risk of having the former responder initiate is that perhaps none of its proposals is
acceptable to the former initiator (they have not been used in a successful negotiation). To
reduce the chances of this happening, and to prevent loss of security, the policy settings
are taken from the old SA (this is the case even if the former initiator is initiating). These
may be stricter than those of the connection.
pluto will not rekey an SA if that SA is not the most recent of its type (IPsec or ISAKMP)
for its potential connection. This avoids creating redundant SAs.
The random component in the rekeying time (rekeyfuzz) is intended to make certain
pathological patterns of rekeying unstable. If both sides decide to rekey at the same time,
twice as many SAs as necessary are created. This could become a stable pattern without
the randomness.
Another more important case occurs when a security gateway has SAs with many other
security gateways. Each of these connections might need to be rekeyed at the same time.
This would cause a high peek requirement for resources (network bandwidth, CPU time,
entropy for random numbers). The rekeyfuzz can be used to stagger the rekeying times.
Once a new set of SAs has been negotiated, pluto will never send traffic on a superseded
one. Traffic will be accepted on an old SA until it expires.
When pluto receives an initial Main Mode message, it needs to decide which connection
this message is for. It picks based solely on the source and destination IP addresses of the
message. There might be several connections with suitable IP addresses, in which case one
of them is arbitrarily chosen. (The ISAKMP SA proposal contained in the message could
be taken into account, but it is not.)
The ISAKMP SA is negotiated before the parties pass further identifying information, so
all ISAKMP SA characteristics specified in the connection description should be the same
for every connection with the same two host IP addresses. At the moment, the only
characteristic that might differ is authentication method.
Up to this point, all configuring has presumed that the IP addresses are known to all
parties ahead of time. This will not work when either end is mobile (or assigned a dynamic
IP address for other reasons). We call this situation “Road Warrior”. It is fairly tricky and
has some important limitations, most of which are features of the IKE protocol.
Only the initiator may be mobile: the initiator may have an IP number unknown to the
responder. When the responder doesn’t recognize the IP address on the first Main Mode
packet, it looks for a connection with itself as one end and %any as the other. If it cannot
find one, it refuses to negotiate. If it does find one, it creates a temporary connection that
is a duplicate except with the %any replaced by the source IP address from the packet; if
there was no identity specified for the peer, the new IP address will be used.
When pluto is using one of these temporary connections and needs to find the preshared
secret or RSA private key in ipsec.secrets, and and the connection specified no identity for
the peer, %any is used as its identity. After all, the real IP address was apparently
unknown to the configuration, so it is unreasonable to require that it be used in this table.
Part way into the Phase 1 (Main Mode) negotiation using one of these temporary
connection descriptions, pluto will be receive an Identity Payload. At this point, pluto
checks for a more appropriate connection, one with an identity for the peer that matches
the payload but which would use the same keys so-far used for authentication. If it finds
one, it will switch to using this better connection (or a temporary derived from this, if it
has %any for the peer’s IP address). It may even turn out that no connection matches the
newly discovered identity, including the current connection; if so, pluto terminates
negotiation.
Unfortunately, if preshared secret authentication is being used, the Identity Payload is
encrypted using this secret, so the secret must be selected by the responder without
knowing this payload. This limits there to being at most one preshared secret for all Road
Warrior systems connecting to a host. RSA Signature authentications does not require that
the responder know how to select the initiator’s public key until after the initiator’s
Identity Payload is decoded (using the responder’s private key, so that must be
preselected).
When pluto is responding to a Quick Mode negotiation via one of these temporary
connection descriptions, it may well find that the subnets specified by the initiator don’t
match those in the temporary connection description. If so, it will look for a connection
with matching subnets, its own host address, a peer address of %any and matching
identities. If it finds one, a new temporary connection is derived from this one and used
for the Quick Mode negotiation of IPsec SAs. If it does not find one, pluto terminates
negotiation.
Be sure to specify an appropriate nexthop for the responder to send a message to the
initiator: pluto has no way of guessing it (if forwarding isn’t required, use an explicit
%direct as the nexthop and the IP address of the initiator will be filled in; the obsolete
notation 0.0.0.0 is still accepted).
pluto has no special provision for the initiator side. The current (possibly dynamic) IP
address and nexthop must be used in defining connections. These must be properly
configured each time the initiator’s IP address changes. pluto has no mechanism to do this
automatically.
Although we call this Road Warrior Support, it could also be used to support encrypted
connections with anonymous initiators. The responder’s organization could announce the
preshared secret that would be used with unrecognized initiators and let anyone connect.
Of course the initiator’s identity would not be authenticated.
If any Road Warrior connections are supported, pluto cannot reject an exchange initiated
by an unknown host until it has determined that the secret is not shared or the signature is
invalid. This must await the third Main Mode message from the initiator. If no Road
Warrior connection is supported, the first message from an unknown source would be
rejected. This has implications for ease of debugging configurations and for denial of
service attacks.
Although a Road Warrior connection must be initiated by the mobile side, the other side
can and will rekey using the temporary connection it has created. If the Road Warrior
wishes to be able to disconnect, it is probably wise to set —keyingtries to 1 in the
connection on the non-mobile side to prevent it trying to rekey the connection.
Unfortunately, there is no mechanism to unroute the connection automatically.
Debugging
pluto accepts several optional arguments, useful mostly for debugging. Except for —
interface, each should appear at most once.
—interface interfacename
specifies that the named real public network interface should be considered. The
interface name specified should not be ipsecN. If the option doesn’t appear, all
interfaces are considered. To specify several interfaces, use the option once for each.
One use of this option is to specify which interface should be used when two or more
share the same IP address.
—ikeport port-number
changes the UDP port that pluto will use (default, specified by IANA: 500)
—ctlbase path
basename for control files. path.ctl is the socket through which whack communicates
with pluto. path.pid is the lockfile to prevent multiple pluto instances. The default is
/var/run/pluto/pluto).
—secretsfile file
specifies the file for authentication secrets (default: /etc/ipsec.secrets). This name is
subject to “globbing” as in sh(1), so every file with a matching name is processed.
Quoting is generally needed to prevent the shell from doing the globbing.
specifies where to find pluto‘s helper program for asynchronous DNS lookup. pluto
can be built to use _pluto_adns. By default, pluto will look for the program in
$IPSEC_DIR (if that environment variable is defined) or, failing that, in the same
directory as pluto.
—nofork
disable “daemon fork” (default is to fork). In addition, after the lock file and control
socket are created, print the line “Pluto initialized” to standard out.
—uniqueids
if this option has been selected, whenever a new ISAKMP SA is established, any
connection with the same Peer ID but a different Peer IP address is unoriented
(causing all its SAs to be deleted). This helps clean up dangling SAs when a
connection is lost and then regained at another IP address.
—force-busy
if this option has been selected, pluto will be forced to be “busy”. In this state, which
happens when there is a Denial of Service attack, will force pluto to use cookies
before accepting new incoming IKE packets. Cookies are send and required in ikev1
Aggressive Mode and in ikev2. This option is mostly used for testing purposes, but
can be selected by paranoid administrators as well.
—stderrlog
For example
pluto —secretsfile ipsec.secrets —ctlbase pluto.base —ikeport 8500 —nofork —use-
nostack —stderrlog
—debug-all
—debug-raw
—debug-crypt
—debug-parsing
—debug-emitting
—debug-controlmore
—debug-lifecycle
—debug-klips
—debug-pfkey
—debug-dns
show pluto‘s interaction with DNS for KEY and TXT records
—debug-dpd
—debug-natt
—debug-oppo
show why pluto didn’t find a suitable DNS TXT record to authorize opportunistic
initiation
—debug-oppoinfo
log when connections are initiated due to acquires from the kernel. This is often
useful to know, but can be extremely chatty on a busy system.
—debug-whackwatch
if set, causes pluto not to release the whack —initiate channel until the SA is
completely up. This will cause the requestor to possibly wait forever while pluto
unsuccessfully negotiates. Used often in test cases.
—debug-private
The debug form of the whack command will change the selection in a running pluto. If a
connection name is specified, the flags are added whenever pluto has identified that it is
dealing with that connection. Unfortunately, this is often part way into the operation being
observed.
For example, to start a pluto with a display of the structure of input and output:
pluto —debug-emitting —debug-parsing
To later change this pluto to only display raw bytes:
whack —debug-raw
For testing, SSH’s IKE test page is quite useful:
http://isakmp-test.ssh.fi/m[]
Hint: ISAKMP SAs are often kept alive by IKEs even after the IPsec SA is established.
This allows future IPsec SA’s to be negotiated directly. If one of the IKEs is restarted, the
other may try to use the ISAKMP SA but the new IKE won’t know about it. This can lead
to much confusion. pluto is not yet smart enough to get out of such a mess.
When pluto doesn’t understand or accept a message, it just ignores the message. It is not
yet capable of communicating the problem to the other IKE daemon (in the future it might
use Notifications to accomplish this in many cases). It does log a diagnostic.
When pluto gets no response from a message, it resends the same message (a message
will be sent at most three times). This is appropriate: UDP is unreliable.
When pluto gets a message that it has already seen, there are many cases when it notices
and discards it. This too is appropriate for UDP.
Combine these three rules, and you can explain many apparently mysterious behaviours.
In a pluto log, retrying isn’t usually the interesting event. The critical thing is either earlier
(pluto got a message which it didn’t like and so ignored, so it was still awaiting an
acceptable message and got impatient) or on the other system (pluto didn’t send a reply
because it wasn’t happy with the previous message).
Notes
If pluto is compiled without -DKLIPS, it negotiates Security Associations but never ask
the kernel to put them in place and never makes routing changes. This allows pluto to be
tested on systems without KLIPS, but makes it rather useless.
Each IPsec SA is assigned an SPI, a 32-bit number used to refer to the SA. The IKE
protocol lets the destination of the SA choose the SPI. The range 0 to 0xFF is reserved for
IANA. Pluto also avoids choosing an SPI in the range 0x100 to 0xFFF, leaving these SPIs
free for manual keying. Remember that the peer, if not pluto, may well chose SPIs in this
range.
Policies
This catalogue of policies may be of use when trying to configure Pluto and another IKE
implementation to interoperate.
In Phase 1, only Main Mode is supported. We are not sure that Aggressive Mode is secure.
For one thing, it does not support identity protection. It may allow more severe Denial Of
Service attacks.
No Informational Exchanges are supported. These are optional and since their delivery is
not assured, they must not matter. It is the case that some IKE implementations won’t
interoperate without Informational Exchanges, but we feel they are broken.
No Informational Payloads are supported. These are optional, but useful. It is of concern
that these payloads are not authenticated in Phase 1, nor in those Phase 2 messages
authenticated with HASH(3).
Diffie Hellman Groups MODP 1024 and MODP 1536 (2 and 5) are supported. Group
MODP768 (1) is not supported because it is too weak.
3DES CBC (Cypher Block Chaining mode) is the only encryption supported, both for
ISAKMP SAs and IPSEC SAs.
MD5 and SHA1 hashing are supported for packet authentication in both kinds of
SAs.
The ESP, AH, or AH plus ESP are supported. If, and only if, AH and ESP are
combined, the ESP need not have its own authentication component. The selection is
controlled by the —encrypt and —authenticate flags.
Each of these may be combined with IPCOMP Deflate compression, but only if the
potential connection specifies compression and only if KLIPS is configured with
IPCOMP support.
The IPSEC SAs may be tunnel or transport mode, where appropriate. The —tunnel
flag controls this when pluto is initiating.
PFS is acceptable, and will be proposed if the —pfs flag was specified. The DH
group proposed will be the same as negotiated for Phase 1.
› SIGNALS
Pluto responds to SIGHUP by issuing a suggestion that “whack —listen” might have
been intended.
Pluto exits when it receives SIGTERM.
› EXIT STATUS
pluto normally forks a daemon process, so the exit status is normally a very preliminary
result.
0
10
The first (comment) line, indicating the nature and date of the key, and giving a host name,
is used by ipsec_showhostkey(8) when generating some forms of key output.
The commented-out pubkey= line contains the public key, the public exponent and the
modulus combined in approximately RFC 2537 format (the one deviation is that the
combined value is given with a 0s prefix, rather than in unadorned base-64), suitable for
use in the ipsec.conf file.
The Modulus, PublicExponent and PrivateExponent lines give the basic signing and
verification data.
The Prime1 and Prime2 lines give the primes themselves (aka p and q), largest first. The
Exponent1 and Exponent2 lines give the private exponent mod p-1 and q-1 respectively.
The Coefficient line gives the Chinese Remainder Theorem coefficient, which is the
inverse of q, mod p. These additional numbers (which must all be kept as secret as the
private exponent) are precomputed aids to rapid signature generation. When NSS is used,
these values are not available outside the NSS security database (software token or
hardware token) and are instead filled in with the CKA_ID.
No attempt is made to break long lines.
The US patent on the RSA algorithm expired 20 Sept 2000.
› EXAMPLES
ipsec rsasigkey —verbose 4096 >mykey.txt
generates a 4096-bit signature key and puts it in the file mykey.txt, with running
commentary on standard error. The file contents can be inserted verbatim into a
suitable entry in the ipsec.secrets file (see ipsec_secrets(5)), and the public key can
then be extracted and edited into the ipsec.conf (see ipsec_showhostkey(8)).
› FILES
/dev/random, /dev/urandom
› SEE ALSO
random(4), rngd(8), ipsec_showhostkey(8), Applied Cryptography, 2nd. ed., by Bruce
Schneier, Wiley 1996, RFCs 2537, 2313, GNU MP, the GNU multiple precision arithmetic
library, edition 2.0.2, by Torbj Granlund
› HISTORY
Originally written for the Linux FreeS/WAN project <http://www.freeswan.orgm[]> by
Henry Spencer. Updated for the Libreswan Project by Paul Wouters.
The —round and —noopt options were obsoleted as these were only used with the old
non-library crypto code
The —random device is only used for seeding the crypto library, not for direct random to
generate keys
› BUGS
There is an internal limit on nbits, currently 20000.
rsasigkey‘s run time is difficult to predict, since /dev/random output can be arbitrarily
delayed if the system’s entropy pool is low on randomness, and the time taken by the
search for primes is also somewhat unpredictable. Specifically, embedded systems and
most virtual machines are low on entropy. In such a situation, consider generating the RSA
key on another machine, and copying ipsec.secrets and the ipsec.d/*db files to the
embedded platform. Note that NSS embeds the full path in the DB files, so the path on
proxy machine must be identical to the path on the destination machine.
› AUTHOR
Paul Wouters
The —version option causes the version of the binary to be emitted, and nothing else.
The —verbose may be present one or more times. Each occurance increases the verbosity
level.
The —dhclient option cause the output to be suitable for inclusion in dhclient.conf(5) as
part of configuring WAVEsec. See <http://www.wavesec.orgm[]>.
Normally, the default key for this host (the one with no host identities specified for it) is
the one extracted. The —id option overrides this, causing extraction of the key labeled
with the specified identity, if any. The specified identity must exactly match the identity in
the file; in particular, the comparison is case-sensitive.
There may also be multiple keys with the same identity. All keys are numbered based
upon their linear sequence in the file (including all include directives)
The —file option overrides the default for where the key information should be found, and
takes it from the specified secretfile.
› DIAGNOSTICS
A complaint about “no pubkey line found” indicates that the host has a key but it was
generated with an old version of FreeS/WAN and does not contain the information that
showhostkey needs.
› FILES
/etc/ipsec.secrets
› SEE ALSO
ipsec.secrets(5), ipsec.conf(5), ipsec_rsasigkey(8)
› HISTORY
Written for the Linux FreeS/WAN project <http://www.freeswan.orgm[]> by Henry
Spencer. Updated by Paul Wouters for the IPSECKEY format.
› BUGS
Arguably, rather than just reporting the no-IN-KEY-line-found problem, showhostkey
should be smart enough to run the existing key through rsasigkey with the —oldkey
option, to generate a suitable output line.
The —id option assumes that the identity appears on the same line as the : RSA { that
begins the key proper.
› AUTHOR
Paul Wouters
<SA> means: —af (inet | inet6) —edst daddr —spi spi —proto proto OR —said said,
<life> means: —life (soft | hard) allocations | bytes | addtime | usetime | packets |
[value…] <SA> —src src —ah (hmac-md5-96 | hmac-sha1-96) [—
replay_window replayw] [<life>] —authkey akey
ipsec spi <SA> —src src —esp (3des | 3des-md5-96 | 3des-sha1-96) [—
replay_window replayw] [<life>] —enckey ekey
ipsec spi <SA> —src src —esp [—replay_window replayw] [<life>] —enckey ekey
—authkey akey
ipsec spi <SA> —src src —comp deflate
ipsec spi <SA> —ip4 —src encap-src —dst encap-dst
ipsec spi <SA> —ip6 —src encap-src —dst encap-dst
ipsec spi <SA> —del
ipsec spi —help
ipsec spi —version
ipsec spi —clear
› DESCRIPTION
Spi
creates and deletes IPSEC Security Associations. A Security Association (SA) is a
transform through which packet contents are to be processed before being forwarded. A
transform can be an IPv4-in-IPv4 or an IPv6-in-IPv6 encapsulation, an IPSEC
Authentication Header (authentication with no encryption), or an IPSEC Encapsulation
Security Payload (encryption, possibly including authentication).
When a packet is passed from a higher networking layer through an IPSEC virtual
interface, a search in the extended routing table (see ipsec_eroute(8)) yields an effective
destination address, a Security Parameters Index (SPI) and a IP protocol number. When an
IPSEC packet arrives from the network, its ostensible destination, an SPI and an IP
protocol specified by its outermost IPSEC header are used. The destination/SPI/protocol
combination is used to select a relevant SA. (See ipsec_spigrp(8) for discussion of how
multiple transforms are combined.)
The af, daddr, spi and proto arguments specify the SA to be created or deleted. af is the
address family (inet for IPv4, inet6 for IPv6). Daddr is a destination address in dotted-
decimal notation for IPv4 or in a coloned hex notation for IPv6. Spi is a number, preceded
by ‘0x’ for hexadecimal, between 0x100 and 0xffffffff; values from 0x0 to 0xff are
reserved. Proto is an ASCII string, “ah”, “esp”, “comp” or “tun”, specifying the IP
protocol. The protocol must agree with the algorithm selected.
Alternatively, the said argument can also specify an SA to be created or deleted. Said
combines the three parameters above, such as: “[email protected]” or “tun:101@1:2::3:4”,
where the address family is specified by “.” for IPv4 and “:” for IPv6. The address family
indicators substitute the “0x” for hexadecimal.
The source address, src, must also be provided for the inbound policy check to function.
The source address does not need to be included if inbound policy checking has been
disabled.
Keys vectors must be entered as hexadecimal or base64 numbers. They should be
cryptographically strong random numbers.
All hexadecimal numbers are entered as strings of hexadecimal digits (0-9 and a-f),
without spaces, preceded by ‘0x’, where each hexadecimal digit represents 4 bits. All
base64 numbers are entered as strings of base64 digits (0-9, A-Z, a-z, ‘+’ and ‘/’), without
spaces, preceded by ‘0s’, where each hexadecimal digit represents 6 bits and ‘=’ is used
for padding.
The deletion of an SA which has been grouped will result in the entire chain being deleted.
The form with no additional arguments lists the contents of /proc/net/ipsec_spi. The
format of /proc/net/ipsec_spi is discussed in ipsec_spi(5).
The lifetime severity of soft sets a limit when the key management daemons are asked to
rekey the SA. The lifetime severity of hard sets a limit when the SA must expire. The
lifetime type allocations tells the system when to expire the SA because it is being shared
by too many eroutes (not currently used). The lifetime type of bytes tells the system to
expire the SA after a certain number of bytes have been processed with that SA. The
lifetime type of addtime tells the system to expire the SA a certain number of seconds
after the SA was installed. The lifetime type of usetime tells the system to expire the SA a
certain number of seconds after that SA has processed its first packet. The lifetime type of
packets tells the system to expire the SA after a certain number of packets have been
processed with that SA.
› OPTIONS
—af
specifies the address family (inet for IPv4, inet6 for IPv6)
—edst
—spi
—proto
—said
—ah
hmac-md5-96
transform following the HMAC and MD5 standards, using a 128-bit key to produce a
96-bit authenticator (RFC2403)
hmac-sha1-96
transform following the HMAC and SHA1 standards, using a 160-bit key to produce
a 96-bit authenticator (RFC2404)
—esp
3des
3des-sha1-96
—replay_window replayw
—life life_param[,life_param]
sets the lifetime expiry; the format of life_param consists of a comma-separated list
of lifetime specifications without spaces; a lifetime specification is comprised of a
severity of soft or hard followed by a ‘-‘, followed by a lifetime type of allocations,
bytes, addtime, usetime or packets followed by an ‘=’ and finally by a value
—comp
deflate
—ip4
—ip6
—src
specify the source end of an IP-in-IP tunnel from encap-src to encap-dst and also
specifies the source address of the Security Association to be used in inbound policy
checking and must be the same address family as af and edst
—dst
—del
—clear
—help
display synopsis
—version
groups 3 SAs together, all destined for gw2, but with an IPv4-in-IPv4 tunnel SA
applied first with SPI 0x113, then an ESP header to encrypt the packet with SPI
0x115, and finally an AH header to authenticate the packet with SPI 0x116.
groups 3 SAs together, all destined for gw2, but with an IPv4-in-IPv4 tunnel SA
applied first with SPI 0x113, then an ESP header to encrypt the packet with SPI
0x115, and finally an AH header to authenticate the packet with SPI 0x116.
groups 3 SAs together, all destined for 3049:1::1, but with an IPv6-in-IPv6 tunnel SA
applied first with SPI 0x233, then an ESP header to encrypt the packet with SPI
0x235, and finally an AH header to authenticate the packet with SPI 0x236.
ipsec spigrp inet6 3049:1::1 0x233 tun inet6 3049:1::1 0x235 esp inet6 3049:1::1
0x236 ah
groups 3 SAs together, all destined for 3049:1::1, but with an IPv6-in-IPv6 tunnel SA
applied first with SPI 0x233, then an ESP header to encrypt the packet with SPI
0x235, and finally an AH header to authenticate the packet with SPI 0x236.
› FILES
/proc/net/ipsec_spigrp, /usr/local/bin/ipsec
› SEE ALSO
ipsec(8), ipsec_manual(8), ipsec_tncfg(8), ipsec_eroute(8), ipsec_spi(8),
ipsec_klipsdebug(8), ipsec_spigrp(5)
› HISTORY
Written for the Linux FreeS/WAN project <http://www.freeswan.org/m[]> by Richard
Guy Briggs.
› BUGS
Yes, it really is limited to a maximum of four SAs, although admittedly it’s hard to see
why you would need more.
› AUTHOR
Paul Wouters
IKE’s Job
Pluto
pluto runs as a daemon with userid root. Before running it, a few things must be set up.
pluto requires a working IPsec stack.
pluto supports multiple public networks (that is, networks that are considered insecure and
thus need to have their traffic encrypted or authenticated). It discovers the public
interfaces to use by looking at all interfaces that are configured (the —interface option
can be used to limit the interfaces considered). It does this only when whack tells it to —
listen, so the interfaces must be configured by then. Each interface with a name of the
form ipsec[0-9] is taken as a KLIPS virtual public interface. Another network interface
with the same IP address (the first one found will be used) is taken as the corresponding
real public interface. The —listen can be used to limit listening on only 1 IP address of a
certain interface. ifconfig(8) or ip(8) with the -a flag will show the name and status of
each network interface.
pluto requires a database of preshared secrets and RSA private keys. This is described in
the ipsec.secrets(5). pluto is told of RSA public keys via whack commands. If the
connection is Opportunistic, and no RSA public key is known, pluto will attempt to fetch
RSA keys using the Domain Name System.
The most basic network topology that pluto supports has two security gateways
negotiating on behalf of client subnets. The diagram of RGB’s testbed is a good example
(see klips/doc/rgb_setup.txt).
The file INSTALL in the base directory of this distribution explains how to start setting up
the whole system, including KLIPS.
Make sure that the security gateways have routes to each other. This is usually covered by
the default route, but may require issuing route(8) commands. The route must go through
a particular IP interface (we will assume it is eth0, but it need not be). The interface that
connects the security gateway to its client must be a different one.
It is necessary to issue a ipsec_tncfg(8) command on each gateway. The required
command is:
ipsec tncfg —attach —virtual ipsec0 —physical eth0
A command to set up the ipsec0 virtual interface will also need to be run. It will have the
same parameters as the command used to set up the physical interface to which it has just
been connected using ipsec_tncfg(8).
No special requirements are necessary to use NETKEY - it ships with all modern versions
of Linux 2.4 and 2.6. however, note that certain vendors or older distributions use old
versions or backports of NETKEY which are broken. If possible use a NETKEY version
that is at least based on, or backported from Linux 2.6.11 or newer.
ipsec.secrets file
A pluto daemon and another IKE daemon (for example, another instance of pluto) must
convince each other that they are who they are supposed to be before any negotiation can
succeed. This authentication is accomplished by using either secrets that have been shared
beforehand (manually) or by using RSA signatures. There are other techniques, but they
have not been implemented in pluto.
The file /etc/ipsec.secrets is used to keep preshared secret keys, RSA private keys, X.509
encoded keyfiles and XAUTH passwords. Smartcards are handled via NSS. For
debugging, there is an argument to the pluto command to use a different file. This file is
described in ipsec.secrets(5).
Running Pluto
To fire up the daemon, just type pluto (be sure to be running as the superuser). The default
IKE port number is 500, the UDP port assigned by IANA for IKE Daemons. pluto must
be run by the superuser to be able to use the UDP 500 port. If pluto is told to enable NAT-
Traversal, then UDP port 4500 is also taken by pluto to listen on.
Pluto supports different IPstacks on different operating systems. This can be configured
using one of the options —use-netkey (the default), —use-klips, —use-mast, —use-
bsdkame, —use-win2k or —use-nostack. The latter is meant for testing only - no actual
IPsec connections will be loaded into the kernel. The option —use-auto has been
obsoleted. On startup, pluto might also read the protostack= option to select the IPsec
stack to use if —config /etc/ipsec.conf is given as argument to pluto. If both —use-XXX
and —config /etc/ipsec.conf are specified, the last command line argument specified takes
precedence.
Pluto supports RFC 3947 NAT-Traversal. The allowed range behind the NAT routers is
submitted using the —virtual-private option. See ipsec.conf(5) for the syntax. The option
—force-keepalive forces the sending of the keep-alive packets, which are send to prevent
the NAT router from closing its port when there is not enough traffic on the IPsec
connection. The —keep-alive sets the delay (in seconds) of these keep-alive packets. The
newer NAT-T standards support port floating, and Libreswan enables this per default.
Pluto supports the use of X.509 certificates and sends it certificate when needed. This can
confuse IKE implementations that do not implement this, such as the old FreeS/WAN
implementation. The —nocrsend prevents pluto from sending these. At startup, pluto
loads all the X.509 related files from the directories /etc/ipsec.d/certs, /etc/ipsec.d/cacerts,
/etc/ipsec.d/aacerts, /etc/ipsec.d/private and /etc/ipsec.d/crls. The Certificate Revocation
Lists can also be retrieved from an URL. The option —crlcheckinterval sets the time
between checking for CRL expiration and issuing new fetch commands. The first attempt
to update a CRL is started at 2*crlcheckinterval before the next update time. Pluto logs a
warning if no valid CRL was loaded or obtained for a connection. If —strictcrlpolicy is
given, the connection will be rejected until a valid CRL has been loaded.
Pluto can also use helper children to off-load cryptographic operations. This behavior can
be fine tuned using the —nhelpers. Pluto will start (n-1) of them, where n is the number
of CPU’s you have (including hypherthreaded CPU’s). A value of 0 forces pluto to do all
operations in the main process. A value of -1 tells pluto to perform the above calculation.
Any other value forces the number to that amount.
Pluto uses the NSS crypto library as its random source. Some government Three Letter
Agency requires that pluto reads 440 bits from /dev/random and feed this into the NSS
RNG before drawing random from the NSS library, despite the NSS library itself already
seeding its internal state. As this process can block pluto for an extended time, the default
is to not perform this redundant seeding. The —seedbits option can be used to specify the
number of bits that will be pulled from /dev/random and seeded into the NSS RNG. This
can also be accomplished by specifying seedbits in the “config setup” section of
ipsec.conf. This option should not be used by most people.
pluto attempts to create a lockfile with the name /var/run/pluto/pluto.pid. If the lockfile
cannot be created, pluto exits - this prevents multiple plutos from competing Any
“leftover” lockfile must be removed before pluto will run. pluto writes its PID into this
file so that scripts can find it. This lock will not function properly if it is on an NFS
volume (but sharing locks on multiple machines doesn’t make sense anyway).
pluto then forks and the parent exits. This is the conventional “daemon fork”. It can make
debugging awkward, so there is an option to suppress this fork. In certain configurations,
pluto might also launch helper programs to assist with DNS queries or to offload
cryptographic operations.
All logging, including diagnostics, is sent to syslog(3) with facility=authpriv; it decides
where to put these messages (possibly in /var/log/secure or /var/log/auth.log). Since this
too can make debugging awkward, the option —stderrlog is used to steer logging to
stderr.
Alternatively, —logfile can be used to send all logging information to a specific file.
If the —perpeerlog option is given, then pluto will open a log file per connection. By
default, this is in /var/log/pluto/peer, in a subdirectory formed by turning all dot (.) [IPv4}
or colon (:) [IPv6] into slashes (/).
The base directory can be changed with the —perpeerlogbase.
Once pluto is started, it waits for requests from whack.
To understand how to use pluto, it is helpful to understand a little about its internal state.
Furthermore, the terminology is needed to decipher some of the diagnostic messages.
Pluto supports food groups, and X.509 certificates. These are located in /etc/ipsec.d, or
another directory as specified by —ipsecdir.
Pluto may core dump. It will normally do so into the current working directory. You can
specify the —coredir option for pluto, or specify the dumpdir= option in ipsec.conf.
If you are investigating a potential memory leak in pluto, start pluto with the —leak-
detective option. Before the leak causes the system or pluto to die, shut down pluto in the
regular way. pluto will display a list of leaks it has detected.
The (potential) connection database describes attributes of a connection. These include the
IP addresses of the hosts and client subnets and the security characteristics desired. pluto
requires this information (simply called a connection) before it can respond to a request to
build an SA. Each connection is given a name when it is created, and all references are
made using this name.
During the IKE exchange to build an SA, the information about the negotiation is
represented in a state object. Each state object reflects how far the negotiation has
reached. Once the negotiation is complete and the SA established, the state object remains
to represent the SA. When the SA is terminated, the state object is discarded. Each State
object is given a serial number and this is used to refer to the state objects in logged
messages.
Each state object corresponds to a connection and can be thought of as an instantiation of
that connection. At any particular time, there may be any number of state objects
corresponding to a particular connection. Often there is one representing an ISAKMP SA
and another representing an IPsec SA.
KLIPS hooks into the routing code in a LINUX kernel. Traffic to be processed by an
IPsec SA must be directed through KLIPS by routing commands. Furthermore, the
processing to be done is specified by ipsec eroute(8) commands. pluto takes the
responsibility of managing both of these special kinds of routes.
NETKEY requires no special routing.
Each connection may be routed, and must be while it has an IPsec SA. The connection
specifies the characteristics of the route: the interface on this machine, the “gateway” (the
nexthop), and the peer’s client subnet. Two connections may not be simultaneously routed
if they are for the same peer’s client subnet but use different interfaces or gateways
(pluto‘s logic does not reflect any advanced routing capabilities).
On KLIPS, each eroute is associated with the state object for an IPsec SA because it has
the particular characteristics of the SA. Two eroutes conflict if they specify the identical
local and remote clients (unlike for routes, the local clients are taken into account).
When pluto needs to install a route for a connection, it must make sure that no conflicting
route is in use. If another connection has a conflicting route, that route will be taken down,
as long as there is no IPsec SA instantiating that connection. If there is such an IPsec SA,
the attempt to install a route will fail.
There is an exception. If pluto, as Responder, needs to install a route to a fixed client
subnet for a connection, and there is already a conflicting route, then the SAs using the
route are deleted to make room for the new SAs. The rationale is that the new connection
is probably more current. The need for this usually is a product of Road Warrior
connections (these are explained later; they cannot be used to initiate).
When pluto needs to install an eroute for an IPsec SA (for a state object), first the state
object’s connection must be routed (if this cannot be done, the eroute and SA will not be
installed). If a conflicting eroute is already in place for another connection, the eroute and
SA will not be installed (but note that the routing exception mentioned above may have
already deleted potentially conflicting SAs). If another IPsec SA for the same connection
already has an eroute, all its outgoing traffic is taken over by the new eroute. The
incoming traffic will still be processed. This characteristic is exploited during rekeying.
All of these routing characteristics are expected change when KLIPS and NETKEY
merge into a single new stack.
Using whack
whack is used to command a running pluto. whack uses a UNIX domain socket to speak
to pluto (by default, /var/pluto.ctl).
whack has an intricate argument syntax. This syntax allows many different functions to be
specified. The help form shows the usage or version information. The connection form
gives pluto a description of a potential connection. The public key form informs pluto of
the RSA public key for a potential peer. The delete form deletes a connection description
and all SAs corresponding to it. The listen form tells pluto to start or stop listening on the
public interfaces for IKE requests from peers. The route form tells pluto to set up routing
for a connection; the unroute form undoes this. The initiate form tells pluto to negotiate an
SA corresponding to a connection. The terminate form tells pluto to remove all SAs
corresponding to a connection, including those being negotiated. The status form displays
the pluto‘s internal state. The debug form tells pluto to change the selection of debugging
output “on the fly”. The shutdown form tells pluto to shut down, deleting all SAs.
The crash option asks pluto to consider a particularly target IP to have crashed, and to
attempt to restart all connections with that IP address as a gateway. In general, you should
use Dead Peer Detection to detect this kind of situation automatically, but this is not
always possible.
Most options are specific to one of the forms, and will be described with that form. There
are three options that apply to all forms.
—ctlbase path
path.ctl is used as the UNIX domain socket for talking to pluto. This option
facilitates debugging.
—label string
—version
The connection form describes a potential connection to pluto. pluto needs to know what
connections can and should be negotiated. When pluto is the initiator, it needs to know
what to propose. When pluto is the responder, it needs to know enough to decide whether
is is willing to set up the proposed connection.
The description of a potential connection can specify a large number of details. Each
connection has a unique name. This name will appear in a updown shell command, so it
should not contain punctuation that would make the command ill-formed.
—name connection-name
the identity of the end. Currently, this can be an IP address (specified as dotted quad
or as a Fully Qualified Domain Name, which will be resolved immediately) or as a
Fully Qualified Domain Name itself (prefixed by “@” to signify that it should not be
resolved), or as user@FQDN, or an X.509 DN, or as the magic value %myid. Pluto
only authenticates the identity, and does not use it for addressing, so, for example, an
IP address need not be the one to which packets are to be sent. If the option is absent,
the identity defaults to the IP address specified by —host. %myid allows the identity
to be separately specified (by the pluto or whack option —myid or by the
ipsec.conf(5)config setup parameter myid). Otherwise, pluto tries to guess what
%myid should stand for: the IP address of %defaultroute, if it is supported by a
suitable TXT record in the reverse domain for that IP address, or the system’s
hostname, if it is supported by a suitable TXT record in its forward domain.
the IP address of the end (generally the public interface). If pluto is to act as a
responder for IKE negotiations initiated from unknown IP addresses (the “Road
Warrior” case), the IP address should be specified as %any (currently, the obsolete
notation 0.0.0.0 is also accepted for this). If pluto is to opportunistically initiate the
connection, use %opportunistic
—cert filename
The filename of the X.509 certificate. This must be the public key certificate only,
and cannot be the PKCS#12 certificate file. See ipsec.conf(5) on how to extrac this
from the PKCS#12 file.
—ca distinguished name
the X.509 Certificate Authority’s Distinguished Name (DN) used as trust anchor for
this connection. This is the CA certificate that signed the host certificate, as well as
the certificate of the incoming client.
—sendcert yes|forced|always|ifasked|no|never
Whether or not to send our X.509 certificate credentials. This could potentially give
an attacker too much information about which identities are allowed to connect to
this host. The default is to use ifasked when we are a Responder, and to use yes
(which is the same as forced and always if we are an Initiator. The values no and
never are equivalent. NOTE: “forced” does not seem to be actually implemented - do
not use it.
—sendca none|issuer|all
How much of our available X.509 trust chain to send with the end certificate,
excluding any root CAs. Specifying issuer sends just the issuing intermediate CA,
while all will send the entire chain of intermediate CAs.none will not send any CA
certs. The default is none which maintains the current libreswan behavior.
—certtype number
—ikeport port-number
the UDP port that IKE listens to on that host. The default is 500. (pluto on this
machine uses the port specified by its own command line argument, so this only
affects where pluto sends messages.)
—nexthop ip-address
where to route packets for the peer’s client (presumably for the peer too, but it will
not be used for this). When pluto installs an IPsec SA, it issues a route command. It
uses the nexthop as the gateway. The default is the peer’s IP address (this can be
explicitly written as %direct; the obsolete notation 0.0.0.0 is accepted). This option
is necessary if pluto‘s host’s interface used for sending packets to the peer is neither
point-to-point nor directly connected to the peer.
—client subnet
the subnet for which the IPsec traffic will be destined. If not specified, the host will
be the client. The subnet can be specified in any of the forms supported by
ipsec_atosubnet(3). The general form is address/mask. The address can be either a
domain name or four decimal numbers (specifying octets) separated by dots. The
most convenient form of the mask is a decimal integer, specifying the number of
leading one bits in the mask. So, for example, 10.0.0.0/8 would specify the class A
network “Net 10”.
—clientwithin subnet
This option is obsolete and will be removed. Do not use this option anymore.
—clientprotoport protocol/port
specify the Port Selectors (filters) to be used on this connection. The general form is
protocol/port. This is most commonly used to limit the connection to L2TP traffic
only by specifying a value of 17/1701 for UDP (protocol 17) and port 1701. The
notation 17/%any can be used to allow all UDP traffic and is needed for L2TP
connections with Windows XP machines before Service Pack 2.
—srcip ip-address
the IP address for this host to use when transmitting a packet to the remote IPsec
gateway itself. This option is used to make the gateway itself use its internal IP,
which is part of the —client subnet. Otherwise it will use its nearest IP address,
which is its public IP address, which is not part of the subnet-subnet IPsec tunnel, and
would therefor not get encrypted.
—xauthserver
this end is an xauthserver. It will lookup the xauth user name and password and
verify this before allowing the connection to get established.
—xauthclient
this end is an xauthclient. To bring this connection up with the —initiate also
requires the client to specify —xauthuser username and —xauthpass password
—xauthuser
The username for the xauth authentication.This option is normally passed along by
ipsec_auto(8) when an xauth connection is started using ipsec auto —up conn
—xauthpass
The password for the xauth authentication. This option is normally passed along by
ipsec_auto(8) when an xauth connection is started using ipsec auto —up conn
—modecfgserver
—modecfgclient
—modecfgdns1
The IP address of the first DNS server to pass along to the ModeConfig Client
—modecfgdns2
The IP address of the second DNS server to pass along to the ModeConfig Client
—dnskeyondemand
specifies that when an RSA public key is needed to authenticate this host, and it isn’t
already known, fetch it from DNS.
—updown updown
—to
separates the specification of the left and right ends of the connection. Pluto tries to
decide whether it is left or right based on the information provided on both sides of
this option.
The potential connection description also specifies characteristics of rekeying and security.
—psk
Propose and allow preshared secret authentication for IKE peers. This authentication
requires that each side use the same secret. May be combined with —rsasig; at least
one must be specified.
—rsasig
Propose and allow RSA signatures for authentication of IKE peers. This
authentication requires that each side have have a private key of its own and know
the public key of its peer. May be combined with —psk; at least one must be
specified.
—encrypt
All proposed or accepted IPsec SAs will include non-null ESP. The actual choices of
transforms are wired into pluto.
—authenticate
All proposed IPsec SAs will include AH. All accepted IPsec SAs will include AH or
ESP with authentication. The actual choices of transforms are wired into pluto. Note
that this has nothing to do with IKE authentication.
—compress
All proposed IPsec SAs will include IPCOMP (compression). This will be ignored if
KLIPS is not configured with IPCOMP support.
—tunnel
the IPsec SA should use tunneling. Implicit if the SA is for clients. Must only be used
with —authenticate or —encrypt.
—ipv4
The host addresses will be interpreted as IPv4 addresses. This is the default. Note that
for a connection, all host addresses must be of the same Address Family (IPv4 and
IPv6 use different Address Families).
—ipv6
The host addresses (including nexthop) will be interpreted as IPv6 addresses. Note
that for a connection, all host addresses must be of the same Address Family (IPv4
and IPv6 use different Address Families).
—tunnelipv4
The client addresses will be interpreted as IPv4 addresses. The default is to match
what the host will be. This does not imply —tunnel so the flag can be safely used
when no tunnel is actually specified. Note that for a connection, all tunnel addresses
must be of the same Address Family.
—tunnelipv6
The client addresses will be interpreted as IPv6 addresses. The default is to match
what the host will be. This does not imply —tunnel so the flag can be safely used
when no tunnel is actually specified. Note that for a connection, all tunnel addresses
must be of the same Address Family.
—pfs
There should be Perfect Forward Secrecy - new keying material will be generated for
each IPsec SA rather than being derived from the ISAKMP SA keying material. Since
the group to be used cannot be negotiated (a dubious feature of the standard), pluto
will propose the same group that was used during Phase 1. We don’t implement a
stronger form of PFS which would require that the ISAKMP SA be deleted after the
IPSEC SA is negotiated.
—pfsgroup modp-group
—disablearrivalcheck
If the connection is a tunnel, allow packets arriving through the tunnel to have any
source and destination addresses.
—esp esp-algos
ESP encryption/authentication algorithm to be used for the connection (phase2 aka
IPsec SA). The options must be suitable as a value of ipsec_spi(8). See ipsec.conf(5)
for a detailed description of the algorithm format.
—aggrmode
This tunnel is using aggressive mode ISAKMP negotiation. The default is main
mode. Aggressive mode is less secure than main mode as it reveals your identity to
an eavesdropper, but is needed to support road warriors using PSK keys or to
interoperate with other buggy implementations insisting on using aggressive mode.
—modecfgpull
Pull the Mode Config network information from the peer.
—dpddelay seconds
Set the delay (in seconds) between Dead Peer Detection (RFC 3706) keepalives
(R_U_THERE, R_U_THERE_ACK) that are sent for this connection (default 30
seconds).
—timeout seconds
Set the length of time (in seconds) we will idle without hearing either an
R_U_THERE poll from our peer, or an R_U_THERE_ACK reply. After this period
has elapsed with no response and no traffic, we will declare the peer dead, and
remove the SA (default 120 seconds).
—dpdaction action
When a DPD enabled peer is declared dead, what action should be taken.
hold(default) means the eroute will be put into %hold status, while clearmeans the
eroute and SA with both be cleared. Clear is really only useful on the server of a
Road Warrior config. The action restart is used on tunnels that need to be
permanently up, and have static IP addresses. The action restart_by_peerhas been
obsoleted and its functionality has been moved into the restart action.
—forceencaps
In some cases, for example when ESP packets are filtered or when a broken IPsec
peer does not properly recognise NAT, it can be useful to force RFC-3948
encapsulation using this option. It causes pluto lie and tell the remote peer that RFC-
3948 encapsulation (ESP in UDP port 4500 packets) is required.
If none of the —encrypt, —authenticate, —compress, or —pfs flags is given, the
initiating the connection will only build an ISAKMP SA. For such a connection, client
subnets have no meaning and must not be specified.
Apart from initiating directly using the —initiate option, a tunnel can be loaded with a
different policy
—initiateontraffic
Only initiate the connection when we have traffic to send over the connection
—pass
Allow unencrypted traffic to flow until the tunnel is initiated.
—drop
Drop unencrypted traffic silently.
—reject
Drop unencrypted traffic silently, but send an ICMP message notifying the other end.
These options need to be documented
—failnone
to be documented
—failpass
to be documented
—faildrop
to be documented
—failreject
to be documented
pluto supports various X.509 Certificate related options.
—utc
display all times in UTC.
—listall
lists all of the X.509 information known to pluto.
—listpubkeys
list all the public keys that have been successfully loaded.
—listcerts
list all the X.509 certificates that are currently loaded.
—checkpubkeys
list all the loaded X.509 certificates which are about to expire or have been expired.
—listcacerts
list all the X.509 Certificate Authority (CA) certificates that are currently loaded.
—listacerts
list all the X.509 Attribute certificates that are currently loaded
—listaacerts
—listgroups
—listcrls
list all the loaded Certificate Revocation Lists (CRLs)
The corresponding options —rereadsecrets, —rereadall, —rereadcacerts, —
rereadacerts, —rereadaacerts, and —rereadcrls options reread this information from
their respective sources, and purge all the online obtained information. The option —
listevents lists all pending CRL fetch commands.
More work is needed to allow for flexible policies. Currently policy is hardwired in the
source file spdb.c. The ISAKMP SAs may use Oakley groups MODP1024 and
MODP1536; AES or 3DES encryption; SHA1-96 and MD5-96 authentication. The IPsec
SAs may use AES or 3DES and MD5-96 or SHA1-96 for ESP, or just MD5-96 or SHA1-
96 for AH. IPCOMP Compression is always Deflate.
—ikelifetime seconds
how long pluto will propose that an ISAKMP SA be allowed to live. The default is
3600 (one hour) and the maximum is 86400 (1 day). This option will not affect what
is accepted. pluto will reject proposals that exceed the maximum.
—ipseclifetime seconds
how long pluto will propose that an IPsec SA be allowed to live. The default is
28800 (eight hours) and the maximum is 86400 (one day). This option will not affect
what is accepted. pluto will reject proposals that exceed the maximum.
—rekeymargin seconds
how long before an SA’s expiration should pluto try to negotiate a replacement SA.
This will only happen if pluto was the initiator. The default is 540 (nine minutes).
—rekeyfuzz percentage
maximum size of random component to add to rekeymargin, expressed as a
percentage of rekeymargin. pluto will select a delay uniformly distributed within this
range. By default, the percentage will be 100. If greater determinism is desired,
specify 0. It may be appropriate for the percentage to be much larger than 100.
—keyingtries count
how many times pluto should try to negotiate an SA, either for the first time or for
rekeying. A value of 0 is interpreted as a very large number: never give up. The
default is three.
—dontrekey
A misnomer. Only rekey a connection if we were the Initiator and there was recent
traffic on the existing connection. This applies to Phase 1 and Phase 2. This is
currently the only automatic way for a connection to terminate. It may be useful with
Road Warrior or Opportunistic connections. Since SA lifetime negotiation is take-it-
or-leave it, a Responder normally uses the shorter of the negotiated or the configured
lifetime. This only works because if the lifetime is shorter than negotiated, the
Responder will rekey in time so that everything works. This interacts badly with —
dontrekey. In this case, the Responder will end up rekeying to rectify a shortfall in
an IPsec SA lifetime; for an ISAKMP SA, the Responder will accept the negotiated
lifetime.
—delete
when used in the connection form, it causes any previous connection with this name
to be deleted before this one is added. Unlike a normal delete, no diagnostic is
produced if there was no previous connection to delete. Any routing in place for the
connection is undone.
—delete, —name connection-name
The delete form deletes a named connection description and any SAs established or
negotiations initiated using this connection. Any routing in place for the connection is
undone.
—deletestate state-number
The deletestate form deletes the state object with the specified serial number. This is
useful for selectively deleting instances of connections.
The route form of the whack command tells pluto to set up routing for a connection.
Although like a traditional route, it uses an ipsec device as a virtual interface. Once
routing is set up, no packets will be sent “in the clear” to the peer’s client specified in the
connection. A TRAP shunt eroute will be installed; if outbound traffic is caught, Pluto will
initiate the connection. An explicit whack route is not always needed: if it hasn’t been
done when an IPsec SA is being installed, one will be automatically attempted.
—route, —name connection-name
When a routing is attempted for a connection, there must not already be a routing for
a different connection with the same subnet but different interface or destination, or if
there is, it must not be being used by an IPsec SA. Otherwise the attempt will fail.
—unroute, —name connection-name
The unroute form of the whack command tells pluto to undo a routing. pluto will
refuse if an IPsec SA is using the connection. If another connection is sharing the
same routing, it will be left in place. Without a routing, packets will be sent without
encryption or authentication.
The initiate form tells pluto to initiate a negotiation with another pluto (or other IKE
daemon) according to the named connection. Initiation requires a route that —route
would provide; if none is in place at the time an IPsec SA is being installed, pluto
attempts to set one up.
—initiate, —name connection-name, —asynchronous
The initiate form of the whack command will relay back from pluto status
information via the UNIX domain socket (unless —asynchronous is specified). The
status information is meant to look a bit like that from FTP. Currently whack simply
copies this to stderr. When the request is finished (eg. the SAs are established or
pluto gives up), pluto closes the channel, causing whack to terminate.
The opportunistic initiate form is mainly used for debugging.
—tunnelipv4, —tunnelipv6, —oppohere ip-address, —oppothere ip-address
This will cause pluto to attempt to opportunistically initiate a connection from here
to the there, even if a previous attempt had been made. The whack log will show the
progress of this attempt.
Ending an connection
—terminate, —name connection-name
the terminate form tells pluto to delete any SAs that use the specified connection and
to stop any negotiations in process. it does not prevent new negotiations from starting
(the delete form has this effect).
—crash ip-address
If the remote peer has crashed, and therefor did not notify us, we keep sending
encrypted traffic, and rejecting all plaintext (non-IKE) traffic from that remote peer.
The —crash brings our end down as well for all the known connections to the
specified ip-address
—whackrecordfilename, —whackstoprecord
this causes plutoto open the given filename for write, and record each of the
messages received from whack or addconn. This continues until the whackstoprecord
option is used. This option may not be combined with any other command. The
start/stop commands are not recorded themselves. These files are usually used to
create input files for unit tests, particularly for complex setups where policies may in
fact overlap.
The format of the file consists of a line starting with #!pluto-whack and the date that
the file was started, as well as the hostname, and a linefeed. What follows are binary
format records consisting of a 32-bit record length in bytes, (including the length
record itself), a 64-bit timestamp, and then the literal contents of the whack message
that was received. All integers are in host format. In order to unambigously determine
the host order, the first record is an empty record that contains only the current
WHACK_MAGIC value. This record is 16 bytes long.
ip-address
If the remote peer has crashed, and therefor did not notify us, we keep sending
encrypted traffic, and rejecting all plaintext (non-IKE) traffic from that remote peer.
The —crash brings our end down as well for all the known connections to the
specified ip-address
The public key for informs pluto of the RSA public key for a potential peer. Private keys
must be kept secret, so they are kept in ipsec.secrets(5).
—keyid id
specififies the identity of the peer for which a public key should be used. Its form is
identical to the identity in the connection. If no public key is specified, pluto
attempts to find KEY records from DNS for the id (if a FQDN) or through reverse
lookup (if an IP address). Note that there several interesting ways in which this is not
secure.
—addkey
specifies that the new key is added to the collection; otherwise the new key replaces
any old ones.
—pubkeyrsa key
specifies the value of the RSA public key. It is a sequence of bytes as described in
RFC 2537 “RSA/MD5 KEYs and SIGs in the Domain Name System (DNS)”. It is
denoted in a way suitable for ipsec_ttodata(3). For example, a base 64 numeral starts
with 0s.
The listen form tells pluto to start listening for IKE requests on its public interfaces. To
avoid race conditions, it is normal to load the appropriate connections into pluto before
allowing it to listen. If pluto isn’t listening, it is pointless to initiate negotiations, so it will
refuse requests to do so. Whenever the listen form is used, pluto looks for public
interfaces and will notice when new ones have been added and when old ones have been
removed. This is also the trigger for pluto to read the ipsec.secrets file. So listen may
useful more than once.
—listen
start listening for IKE traffic on public interfaces.
—unlisten
stop listening for IKE traffic on public interfaces.
The status form will display information about the internal state of pluto: information
about each potential connection, about each state object, and about each shunt that pluto is
managing without an associated connection.
—status
The trafficstatus form will display the xauth username, add_time and the total in and out
bytes of the IPsec SA’s.
—trafficstatus
The shutdown form is the proper way to shut down pluto. It will tear down the SAs on
this machine that pluto has negotiated. It does not inform its peers, so the SAs on their
machines remain.
—shutdown
Examples
It would be normal to start pluto in one of the system initialization scripts. It needs to be
run by the superuser. Generally, no arguments are needed. To run in manually, the
superuser can simply type
ipsec pluto
The command will immediately return, but a pluto process will be left running, waiting
for requests from whack or a peer.
Using whack, several potential connections would be described:
ipsec whack —name silly —host 127.0.0.1 —to —host 127.0.0.2 —ikelifetime 900 —
ipseclifetime 800 —keyingtries 3
Since this silly connection description specifies neither encryption, authentication, nor
tunneling, it could only be used to establish an ISAKMP SA.
ipsec whack —name conn_name —host 10.0.0.1 —client 10.0.1.0/24 —to —
host 10.0.0.2 —client 10.0.2.0/24 —encrypt
This is something that must be done on both sides. If the other side is pluto, the same
whack command could be used on it (the command syntax is designed to not distinguish
which end is ours).
Now that the connections are specified, pluto is ready to handle requests and replies via
the public interfaces. We must tell it to discover those interfaces and start accepting
messages from peers:
ipsec whack —listen
If we don’t immediately wish to bring up a secure connection between the two clients, we
might wish to prevent insecure traffic. The routing form asks pluto to cause the packets
sent from our client to the peer’s client to be routed through the ipsec0 device; if there is
no SA, they will be discarded:
ipsec whack —route conn_name
Finally, we are ready to get pluto to initiate negotiation for an IPsec SA (and implicitly, an
ISAKMP SA):
ipsec whack —initiate —name conn_name
A small log of interesting events will appear on standard output (other logging is sent to
syslog).
whack can also be used to terminate pluto cleanly, tearing down all SAs that it has
negotiated.
ipsec whack —shutdown
Notification of any IPSEC SA deletion, but not ISAKMP SA deletion is sent to the peer.
Unfortunately, such Notification is not reliable. Furthermore, pluto itself ignores
Notifications.
XAUTH
Whenever pluto brings a connection up or down, it invokes the updown command. This
command is specified using the —updown option. This allows for customized control
over routing and firewall manipulation.
The updown is invoked for five different operations. Each of these operations can be for
our client subnet or for our host itself.
prepare-host or prepare-client
is run before bringing up a new connection if no other connection with the same
clients is up. Generally, this is useful for deleting a route that might have been set up
before pluto was run or perhaps by some agent not known to pluto.
route-host or route-client
is run when bringing up a connection for a new peer client subnet (even if prepare-
host or prepare-client was run). The command should install a suitable route.
Routing decisions are based only on the destination (peer’s client) subnet address,
unlike eroutes which discriminate based on source too.
unroute-host or unroute-client
is run when bringing down the last connection for a particular peer client subnet. It
should undo what the route-host or route-client did.
up-host or up-client
is run when bringing up a tunnel eroute with a pair of client subnets that does not
already have a tunnel eroute. This command should install firewall rules as
appropriate. It is generally a good idea to allow IKE messages (UDP port 500) travel
between the hosts.
down-host or down-client
is run when bringing down the eroute for a pair of client subnets. This command
should delete firewall rules as appropriate. Note that there may remain some inbound
IPsec SAs with these client subnets.
The script is passed a large number of environment variables to specify what needs to be
done.
PLUTO_VERSION
indicates what version of this interface is being used. This document describes
version 1.1. This is upwardly compatible with version 1.0.
PLUTO_VERB
specifies the name of the operation to be performed (prepare-host,r prepare-client,
up-host, up-client, down-host, or down-client). If the address family for security
gateway to security gateway communications is IPv6, then a suffix of -v6 is added to
the verb.
PLUTO_CONNECTION
is the name of the connection for which we are routing.
PLUTO_NEXT_HOP
is the next hop to which packets bound for the peer must be sent.
PLUTO_INTERFACE
is the name of the ipsec interface to be used.
PLUTO_ME
is the IP address of our host.
PLUTO_MY_CLIENT
is the IP address / count of our client subnet. If the client is just the host, this will be
the host’s own IP address / max (where max is 32 for IPv4 and 128 for IPv6).
PLUTO_MY_CLIENT_NET
is the IP address of our client net. If the client is just the host, this will be the host’s
own IP address.
PLUTO_MY_CLIENT_MASK
is the mask for our client net. If the client is just the host, this will be
255.255.255.255.
PLUTO_PEER
is the IP address of our peer.
PLUTO_PEER_CLIENT
is the IP address / count of the peer’s client subnet. If the client is just the peer, this
will be the peer’s own IP address / max (where max is 32 for IPv4 and 128 for IPv6).
PLUTO_PEER_CLIENT_NET
is the IP address of the peer’s client net. If the client is just the peer, this will be the
peer’s own IP address.
PLUTO_PEER_CLIENT_MASK
is the mask for the peer’s client net. If the client is just the peer, this will be
255.255.255.255.
PLUTO_MY_PROTOCOL
lists the protocols allowed over this IPsec SA.
PLUTO_PEER_PROTOCOL
lists the protocols the peer allows over this IPsec SA.
PLUTO_MY_PORT
lists the ports allowed over this IPsec SA.
PLUTO_PEER_PORT
lists the ports the peer allows over this IPsec SA.
PLUTO_MY_ID
lists our id.
PLUTO_PEER_ID
Dlists our peer’s id.
PLUTO_PEER_CA
lists the peer’s CA.
All output sent by the script to stderr or stdout is logged. The script should return an exit
status of 0 if and only if it succeeds.
Pluto waits for the script to finish and will not do any other processing while it is waiting.
The script may assume that pluto will not change anything while the script runs. The
script should avoid doing anything that takes much time and it should not issue any
command that requires processing by pluto. Either of these activities could be performed
by a background subprocess of the script.
Rekeying
When an SA that was initiated by pluto has only a bit of lifetime left, pluto will initiate
the creation of a new SA. This applies to ISAKMP and IPsec SAs. The rekeying will be
initiated when the SA’s remaining lifetime is less than the rekeymargin plus a random
percentage, between 0 and rekeyfuzz, of the rekeymargin.
Similarly, when an SA that was initiated by the peer has only a bit of lifetime left, pluto
will try to initiate the creation of a replacement. To give preference to the initiator, this
rekeying will only be initiated when the SA’s remaining lifetime is half of rekeymargin. If
rekeying is done by the responder, the roles will be reversed: the responder for the old SA
will be the initiator for the replacement. The former initiator might also initiate rekeying,
so there may be redundant SAs created. To avoid these complications, make sure that
rekeymargin is generous.
One risk of having the former responder initiate is that perhaps none of its proposals is
acceptable to the former initiator (they have not been used in a successful negotiation). To
reduce the chances of this happening, and to prevent loss of security, the policy settings
are taken from the old SA (this is the case even if the former initiator is initiating). These
may be stricter than those of the connection.
pluto will not rekey an SA if that SA is not the most recent of its type (IPsec or ISAKMP)
for its potential connection. This avoids creating redundant SAs.
The random component in the rekeying time (rekeyfuzz) is intended to make certain
pathological patterns of rekeying unstable. If both sides decide to rekey at the same time,
twice as many SAs as necessary are created. This could become a stable pattern without
the randomness.
Another more important case occurs when a security gateway has SAs with many other
security gateways. Each of these connections might need to be rekeyed at the same time.
This would cause a high peek requirement for resources (network bandwidth, CPU time,
entropy for random numbers). The rekeyfuzz can be used to stagger the rekeying times.
Once a new set of SAs has been negotiated, pluto will never send traffic on a superseded
one. Traffic will be accepted on an old SA until it expires.
When pluto receives an initial Main Mode message, it needs to decide which connection
this message is for. It picks based solely on the source and destination IP addresses of the
message. There might be several connections with suitable IP addresses, in which case one
of them is arbitrarily chosen. (The ISAKMP SA proposal contained in the message could
be taken into account, but it is not.)
The ISAKMP SA is negotiated before the parties pass further identifying information, so
all ISAKMP SA characteristics specified in the connection description should be the same
for every connection with the same two host IP addresses. At the moment, the only
characteristic that might differ is authentication method.
Up to this point, all configuring has presumed that the IP addresses are known to all
parties ahead of time. This will not work when either end is mobile (or assigned a dynamic
IP address for other reasons). We call this situation “Road Warrior”. It is fairly tricky and
has some important limitations, most of which are features of the IKE protocol.
Only the initiator may be mobile: the initiator may have an IP number unknown to the
responder. When the responder doesn’t recognize the IP address on the first Main Mode
packet, it looks for a connection with itself as one end and %any as the other. If it cannot
find one, it refuses to negotiate. If it does find one, it creates a temporary connection that
is a duplicate except with the %any replaced by the source IP address from the packet; if
there was no identity specified for the peer, the new IP address will be used.
When pluto is using one of these temporary connections and needs to find the preshared
secret or RSA private key in ipsec.secrets, and and the connection specified no identity for
the peer, %any is used as its identity. After all, the real IP address was apparently
unknown to the configuration, so it is unreasonable to require that it be used in this table.
Part way into the Phase 1 (Main Mode) negotiation using one of these temporary
connection descriptions, pluto will be receive an Identity Payload. At this point, pluto
checks for a more appropriate connection, one with an identity for the peer that matches
the payload but which would use the same keys so-far used for authentication. If it finds
one, it will switch to using this better connection (or a temporary derived from this, if it
has %any for the peer’s IP address). It may even turn out that no connection matches the
newly discovered identity, including the current connection; if so, pluto terminates
negotiation.
Unfortunately, if preshared secret authentication is being used, the Identity Payload is
encrypted using this secret, so the secret must be selected by the responder without
knowing this payload. This limits there to being at most one preshared secret for all Road
Warrior systems connecting to a host. RSA Signature authentications does not require that
the responder know how to select the initiator’s public key until after the initiator’s
Identity Payload is decoded (using the responder’s private key, so that must be
preselected).
When pluto is responding to a Quick Mode negotiation via one of these temporary
connection descriptions, it may well find that the subnets specified by the initiator don’t
match those in the temporary connection description. If so, it will look for a connection
with matching subnets, its own host address, a peer address of %any and matching
identities. If it finds one, a new temporary connection is derived from this one and used
for the Quick Mode negotiation of IPsec SAs. If it does not find one, pluto terminates
negotiation.
Be sure to specify an appropriate nexthop for the responder to send a message to the
initiator: pluto has no way of guessing it (if forwarding isn’t required, use an explicit
%direct as the nexthop and the IP address of the initiator will be filled in; the obsolete
notation 0.0.0.0 is still accepted).
pluto has no special provision for the initiator side. The current (possibly dynamic) IP
address and nexthop must be used in defining connections. These must be properly
configured each time the initiator’s IP address changes. pluto has no mechanism to do this
automatically.
Although we call this Road Warrior Support, it could also be used to support encrypted
connections with anonymous initiators. The responder’s organization could announce the
preshared secret that would be used with unrecognized initiators and let anyone connect.
Of course the initiator’s identity would not be authenticated.
If any Road Warrior connections are supported, pluto cannot reject an exchange initiated
by an unknown host until it has determined that the secret is not shared or the signature is
invalid. This must await the third Main Mode message from the initiator. If no Road
Warrior connection is supported, the first message from an unknown source would be
rejected. This has implications for ease of debugging configurations and for denial of
service attacks.
Although a Road Warrior connection must be initiated by the mobile side, the other side
can and will rekey using the temporary connection it has created. If the Road Warrior
wishes to be able to disconnect, it is probably wise to set —keyingtries to 1 in the
connection on the non-mobile side to prevent it trying to rekey the connection.
Unfortunately, there is no mechanism to unroute the connection automatically.
Debugging
pluto accepts several optional arguments, useful mostly for debugging. Except for —
interface, each should appear at most once.
—interface interfacename
specifies that the named real public network interface should be considered. The
interface name specified should not be ipsecN. If the option doesn’t appear, all
interfaces are considered. To specify several interfaces, use the option once for each.
One use of this option is to specify which interface should be used when two or more
share the same IP address.
—ikeport port-number
changes the UDP port that pluto will use (default, specified by IANA: 500)
—ctlbase path
basename for control files. path.ctl is the socket through which whack communicates
with pluto. path.pid is the lockfile to prevent multiple pluto instances. The default is
/var/run/pluto/pluto).
—secretsfile file
specifies the file for authentication secrets (default: /etc/ipsec.secrets). This name is
subject to “globbing” as in sh(1), so every file with a matching name is processed.
Quoting is generally needed to prevent the shell from doing the globbing.
—adns path to adns
specifies where to find pluto‘s helper program for asynchronous DNS lookup. pluto
can be built to use _pluto_adns. By default, pluto will look for the program in
$IPSEC_DIR (if that environment variable is defined) or, failing that, in the same
directory as pluto.
—nofork
disable “daemon fork” (default is to fork). In addition, after the lock file and control
socket are created, print the line “Pluto initialized” to standard out.
—uniqueids
if this option has been selected, whenever a new ISAKMP SA is established, any
connection with the same Peer ID but a different Peer IP address is unoriented
(causing all its SAs to be deleted). This helps clean up dangling SAs when a
connection is lost and then regained at another IP address.
—force-busy
if this option has been selected, pluto will be forced to be “busy”. In this state, which
happens when there is a Denial of Service attack, will force pluto to use cookies
before accepting new incoming IKE packets. Cookies are send and required in ikev1
Aggressive Mode and in ikev2. This option is mostly used for testing purposes, but
can be selected by paranoid administrators as well.
—stderrlog
log goes to standard out {default is to use syslogd(8))
For example
pluto —secretsfile ipsec.secrets —ctlbase pluto.base —ikeport 8500 —nofork —use-
nostack —stderrlog
lets one test pluto without using the superuser account.
pluto is willing to produce a prodigious amount of debugging information. To do so, it
must be compiled with -DDEBUG. There are several classes of debugging output, and
pluto may be directed to produce a selection of them. All lines of debugging output are
prefixed with “| ” to distinguish them from error messages.
When pluto is invoked, it may be given arguments to specify which classes to output. The
current options are:
—debug-none
disable all debugging
—debug-all
enable all debugging
—debug-raw
show the raw bytes of messages
—debug-crypt
show the encryption and decryption of messages
—debug-parsing
show the structure of input messages
—debug-emitting
show the structure of output messages
—debug-control
show pluto‘s decision making
—debug-controlmore
show even more detailed pluto decision making
—debug-lifecycle
[this option is temporary] log more detail of lifecycle of SAs
—debug-klips
show pluto‘s interaction with KLIPS
—debug-pfkey
show pluto‘s PFKEYinterface communication
—debug-dns
show pluto‘s interaction with DNS for KEY and TXT records
—debug-dpd
show pluto‘s Dead Peer Detection handling
—debug-natt
show pluto‘s NAT Traversal handling
—debug-oppo
show why pluto didn’t find a suitable DNS TXT record to authorize opportunistic
initiation
—debug-oppoinfo
log when connections are initiated due to acquires from the kernel. This is often
useful to know, but can be extremely chatty on a busy system.
—debug-whackwatch
if set, causes pluto not to release the whack —initiate channel until the SA is
completely up. This will cause the requestor to possibly wait forever while pluto
unsuccessfully negotiates. Used often in test cases.
—debug-private
allow debugging output with private keys.
The debug form of the whack command will change the selection in a running pluto. If a
connection name is specified, the flags are added whenever pluto has identified that it is
dealing with that connection. Unfortunately, this is often part way into the operation being
observed.
For example, to start a pluto with a display of the structure of input and output:
pluto —debug-emitting —debug-parsing
To later change this pluto to only display raw bytes:
whack —debug-raw
For testing, SSH’s IKE test page is quite useful:
http://isakmp-test.ssh.fi/m[]
Hint: ISAKMP SAs are often kept alive by IKEs even after the IPsec SA is established.
This allows future IPsec SA’s to be negotiated directly. If one of the IKEs is restarted, the
other may try to use the ISAKMP SA but the new IKE won’t know about it. This can lead
to much confusion. pluto is not yet smart enough to get out of such a mess.
When pluto doesn’t understand or accept a message, it just ignores the message. It is not
yet capable of communicating the problem to the other IKE daemon (in the future it might
use Notifications to accomplish this in many cases). It does log a diagnostic.
When pluto gets no response from a message, it resends the same message (a message
will be sent at most three times). This is appropriate: UDP is unreliable.
When pluto gets a message that it has already seen, there are many cases when it notices
and discards it. This too is appropriate for UDP.
Combine these three rules, and you can explain many apparently mysterious behaviours.
In a pluto log, retrying isn’t usually the interesting event. The critical thing is either earlier
(pluto got a message which it didn’t like and so ignored, so it was still awaiting an
acceptable message and got impatient) or on the other system (pluto didn’t send a reply
because it wasn’t happy with the previous message).
Notes
If pluto is compiled without -DKLIPS, it negotiates Security Associations but never ask
the kernel to put them in place and never makes routing changes. This allows pluto to be
tested on systems without KLIPS, but makes it rather useless.
Each IPsec SA is assigned an SPI, a 32-bit number used to refer to the SA. The IKE
protocol lets the destination of the SA choose the SPI. The range 0 to 0xFF is reserved for
IANA. Pluto also avoids choosing an SPI in the range 0x100 to 0xFFF, leaving these SPIs
free for manual keying. Remember that the peer, if not pluto, may well chose SPIs in this
range.
Policies
This catalogue of policies may be of use when trying to configure Pluto and another IKE
implementation to interoperate.
In Phase 1, only Main Mode is supported. We are not sure that Aggressive Mode is secure.
For one thing, it does not support identity protection. It may allow more severe Denial Of
Service attacks.
No Informational Exchanges are supported. These are optional and since their delivery is
not assured, they must not matter. It is the case that some IKE implementations won’t
interoperate without Informational Exchanges, but we feel they are broken.
No Informational Payloads are supported. These are optional, but useful. It is of concern
that these payloads are not authenticated in Phase 1, nor in those Phase 2 messages
authenticated with HASH(3).
Diffie Hellman Groups MODP 1024 and MODP 1536 (2 and 5) are supported. Group
MODP768 (1) is not supported because it is too weak.
Host authentication can be done by RSA Signatures or Pre-Shared Secrets.
3DES CBC (Cypher Block Chaining mode) is the only encryption supported, both for
ISAKMP SAs and IPSEC SAs.
MD5 and SHA1 hashing are supported for packet authentication in both kinds of
SAs.
The ESP, AH, or AH plus ESP are supported. If, and only if, AH and ESP are
combined, the ESP need not have its own authentication component. The selection is
controlled by the —encrypt and —authenticate flags.
Each of these may be combined with IPCOMP Deflate compression, but only if the
potential connection specifies compression and only if KLIPS is configured with
IPCOMP support.
The IPSEC SAs may be tunnel or transport mode, where appropriate. The —tunnel
flag controls this when pluto is initiating.
When responding to an ISAKMP SA proposal, the maximum acceptable lifetime is
eight hours. The default is one hour. There is no minimum. The —ikelifetime flag
controls this when pluto is initiating.
When responding to an IPSEC SA proposal, the maximum acceptable lifetime is one
day. The default is eight hours. There is no minimum. The —ipseclifetime flag
controls this when pluto is initiating.
PFS is acceptable, and will be proposed if the —pfs flag was specified. The DH
group proposed will be the same as negotiated for Phase 1.
› SIGNALS
Pluto responds to SIGHUP by issuing a suggestion that “whack —listen” might
have been intended.
Pluto exits when it receives SIGTERM.
› EXIT STATUS
pluto normally forks a daemon process, so the exit status is normally a very
preliminary result.
0
means that all is OK so far.
1
means that something was wrong.
10
means that the lock file already exists.
If whack detects a problem, it will return an exit status of 1. If it received progress
messages from pluto, it returns as status the value of the numeric prefix from the last such
message that was not a message sent to syslog or a comment (but the prefix for success is
treated as 0). Otherwise, the exit status is 0.
› FILES
/var/run/pluto/pluto.pid/var/run/pluto/pluto.ctl/etc/ipsec.secrets/dev/urandom
› ENVIRONMENT
IPSEC_EXECDIRIPSECmyidPLUTO_CORE_DIR
› SEE ALSO
The rest of the Libreswan distribution, in particular ipsec(8).
ipsec_auto(8) is designed to make using pluto more pleasant. Use it!
ipsec.secrets(5) describes the format of the secrets file.
ipsec_atoaddr(3), part of the Libreswan distribution, describes the forms that IP
addresses may take. ipsec_atosubnet(3), part of the Libreswan distribution, describes
the forms that subnet specifications.
For more information on IPsec, the mailing list, and the relevant documents, see:
https://datatracker.ietf.org/wg/ipsecme/charter/m[]
At the time of writing, the most relevant IETF RFCs are:
RFC5996 Internet Key Exchange Protocol Version 2 (IKEv2)
The Libreswan web site <https://libreswan.org> and the mailing lists described there.
› HISTORY
This code is released under the GPL terms. See the accompanying files COPYING
and CREDITS.* for more details.
This software was originally written for the FreeS/WAN project
<http://www.freeswan.orgm[]>, founded by John Gilmore and managed by Hugh
Daniel. It was written by Angelos D. Keromytis ([email protected]), in
May/June 1997, in Athens, Greece. Thanks go to John Ioannidis for his help.
FreeS/WAN’s Pluto was developed/maintained from 2000-2004 by D. Hugh
Redelmeier ([email protected]), in Canada. The regulations of Greece and Canada
allow the code to be freely redistributable.
Richard Guy Briggs <[email protected]> was the main resource on KLIPS
development
IKE version 2 was initially written by Michael Richardson, Antony Antony and Paul
Wouters. It has since been extended by Avesh Agarwal, D. Hugh Redelmeier, Matt
Rogers, Antony Antony and Paul Wouters.
From 2003 onwards, the code was developed and maintained by The Openswan
Project by developers worldwide and distributed from The Netherland and Finland.
Due to a lawsuit by Xelerance over the trademark, the project was forced to rename
itself and the code to The Libreswan Project in 2012.
See further: the CHANGES/CREDITS files in the main directory and the doc/
directory.
› BUGS
Please see <https://bugs.libreswan.orgm[]> for a list of currently known bugs and
missing features.
Bugs should be reported to the <[email protected]> mailing list.
› AUTHOR
Paul Wouters
placeholder to suppress warning
iptables-extensions
› NAME
iptables-extensions – list of extensions in the standard iptables distribution
› SYNOPSIS
ip6tables [-m name [module-options…]] [-j target-name [target-options…]
iptables [-m name [module-options…]] [-j target-name [target-options…]
› MATCH EXTENSIONS
iptables can use extended packet matching modules with the -m or —match options,
followed by the matching module name; after these, various extra command line
options become available, depending on the specific module. You can specify
multiple extended match modules in one line, and you can use the -h or —help
options after the module has been specified to receive help specific to that module.
The extended match modules are evaluated in the order they are specified in the rule.
If the -p or —protocol was specified and if and only if an unknown option is
encountered, iptables will try load a match module of the same name as the protocol,
to try making the option available.
addrtype
This module matches packets based on their address type. Address types are used
within the kernel networking stack and categorize addresses into various groups. The
exact definition of that group depends on the specific layer three protocol.
The following address types are possible:
UNSPEC
an unspecified address (i.e. 0.0.0.0)
UNICAST
an unicast address
LOCAL
a local address
BROADCAST
a broadcast address
ANYCAST
an anycast packet
MULTICAST
a multicast address
BLACKHOLE
a blackhole address
UNREACHABLE
an unreachable address
PROHIBIT
a prohibited address
THROW
FIXME
NAT
FIXME
XRESOLVE
[!] —src-type type
Matches if the source address is of given type
[!] —dst-type type
Matches if the destination address is of given type
—limit-iface-in
The address type checking can be limited to the interface the packet is coming in.
This option is only valid in the PREROUTING, INPUT and FORWARD chains. It
cannot be specified with the —limit-iface-out option.
—limit-iface-out
The address type checking can be limited to the interface the packet is going out.
This option is only valid in the POSTROUTING, OUTPUT and FORWARD
chains. It cannot be specified with the —limit-iface-in option.
ah (IPv6-specific)
ah (IPv4-specific)
bpf
Match using Linux Socket Filter. Expects a BPF program in decimal format. This is the
format generated by the nfbpf_compile utility.
—bytecode code
Pass the BPF byte code format (described in the example below).
The code format is similar to the output of the tcpdump -ddd command: one line that
stores the number of instructions, followed by one line for each instruction. Instruction
lines follow the pattern ‘u16 u8 u8 u32’ in decimal notation. Fields encode the operation,
jump offset if true, jump offset if false and generic multiuse field ‘K’. Comments are not
supported.
For example, to read only packets matching ‘ip proto 6’, insert the following, without the
comments or trailing whitespace:
4 # number of instructions 48 0 0 9 # load byte ip->proto 21 0 1 6 # jump equal
IPPROTO_TCP 6 0 0 1 # return pass (non-zero) 6 0 0 0 # return fail (zero)
You can pass this filter to the bpf match with the following command:
iptables -A OUTPUT -m bpf —bytecode ‘4,48 0 0 9,21 0 1 6,6 0 0 1,6 0 0 0’ -j
ACCEPT
Or instead, you can invoke the nfbpf_compile utility.
iptables -A OUTPUT -m bpf —bytecode “`nfbpf_compile RAW ‘ip proto 6’`” -j
ACCEPT
You may want to learn more about BPF from FreeBSD’s bpf(4) manpage.
cgroup
cluster
Allows you to deploy gateway and back-end load-sharing clusters without the need of
load-balancers.
This match requires that all the nodes see the same packets. Thus, the cluster match
decides if this node has to handle a packet given the following options:
—cluster-total-nodes num
Set number of total nodes in cluster.
[!] —cluster-local-node num
Set the local node number ID.
[!] —cluster-local-nodemask mask
Set the local node number ID mask. You can use this option instead of —cluster-
local-node.
—cluster-hash-seed value
Set seed value of the Jenkins hash.
Example:
iptables -A PREROUTING -t mangle -i eth1 -m cluster —cluster-total-nodes 2 —
cluster-local-node 1 —cluster-hash-seed 0xdeadbeef -j MARK —set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth2 -m cluster —cluster-total-nodes 2 —
cluster-local-node 1 —cluster-hash-seed 0xdeadbeef -j MARK —set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth1 -m mark ! —mark 0xffff -j DROP
iptables -A PREROUTING -t mangle -i eth2 -m mark ! —mark 0xffff -j DROP
And the following commands to make all nodes see the same packets:
ip maddr add 01:00:5e:00:01:01 dev eth1
ip maddr add 01:00:5e:00:01:02 dev eth2
arptables -A OUTPUT -o eth1 —h-length 6 -j mangle —mangle-mac-s
01:00:5e:00:01:01
arptables -A INPUT -i eth1 —h-length 6 —destination-mac 01:00:5e:00:01:01 -j
mangle —mangle-mac-d 00:zz:yy:xx:5a:27
arptables -A OUTPUT -o eth2 —h-length 6 -j mangle —mangle-mac-s
01:00:5e:00:01:02
arptables -A INPUT -i eth2 —h-length 6 —destination-mac 01:00:5e:00:01:02 -j
mangle —mangle-mac-d 00:zz:yy:xx:5a:27
NOTE: the arptables commands above use mainstream syntax. If you are using arptables-
jf included in some RedHat, CentOS and Fedora versions, you will hit syntax errors.
Therefore, you’ll have to adapt these to the arptables-jf syntax to get them working.
In the case of TCP connections, pickup facility has to be disabled to avoid marking TCP
ACK packets coming in the reply direction as valid.
echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose
comment
connbytes
Match by how many bytes or packets a connection (or one of the two flows constituting
the connection) has transferred so far, or by average bytes per packet.
The counters are 64-bit and are thus not expected to overflow ;)
The primary use is to detect long-lived downloads and mark them to be scheduled using a
lower priority band in traffic control.
The transferred bytes per connection can also be viewed through `conntrack -L` and
accessed via ctnetlink.
NOTE that for connections which have no accounting information, the match will always
return false. The “net.netfilter.nf_conntrack_acct” sysctl flag controls whether new
connections will be byte/packet counted. Existing connection flows will not be
gaining/losing a/the accounting structure when be sysctl flag is flipped.
[!] —connbytes from[:to]
match packets from a connection whose packets/bytes/average packet size is more
than FROM and less than TO bytes/packets. if TO is omitted only FROM check is
done. “!” is used to match packets not falling in the range.
—connbytes-dir {original|reply|both}
which packets to consider
—connbytes-mode {packets|bytes|avgpkt}
whether to check the amount of packets, number of bytes transferred or the average
size (in bytes) of all packets received so far. Note that when “both” is used together
with “avgpkt”, and data is going (mainly) only in one direction (for example HTTP),
the average packet size will be about half of the actual data packets.
Example:
iptables .. -m connbytes —connbytes 10000:100000 —connbytes-dir both —
connbytes-mode bytes …
connlabel
connlimit
Allows you to restrict the number of parallel connections to a server per client IP address
(or client address block).
—connlimit-upto n
Match if the number of existing connections is below or equal n.
—connlimit-above n
Match if the number of existing connections is above n.
—connlimit-mask prefix_length
Group hosts using the prefix length. For IPv4, this must be a number between
(including) 0 and 32. For IPv6, between 0 and 128. If not specified, the maximum
prefix length for the applicable protocol is used.
—connlimit-saddr
Apply the limit onto the source group. This is the default if —connlimit-daddr is not
specified.
—connlimit-daddr
Apply the limit onto the destination group.
Examples:
# allow 2 telnet connections per client host
iptables -A INPUT -p tcp —syn —dport 23 -m connlimit —connlimit-above 2 -j
REJECT
# you can also match the other way around:
iptables -A INPUT -p tcp —syn —dport 23 -m connlimit —connlimit-upto 2 -j
ACCEPT
# limit the number of parallel HTTP requests to 16 per class C sized source network (24
bit netmask)
iptables -p tcp —syn —dport 80 -m connlimit —connlimit-above 16 —connlimit-
mask 24 -j REJECT
# limit the number of parallel HTTP requests to 16 for the link local network
(ipv6) ip6tables -p tcp —syn —dport 80 -s fe80::/64 -m connlimit —connlimit-above
16 —connlimit-mask 64 -j REJECT
# Limit the number of connections to a particular host:
ip6tables -p tcp —syn —dport 49152:65535 -d 2001:db8::1 -m connlimit —
connlimit-above 100 -j REJECT
connmark
This module matches the netfilter mark field associated with a connection (which can be
set using the CONNMARK target below).
[!] —mark value[/mask]
Matches packets in connections with the given mark value (if a mask is specified, this
is logically ANDed with the mark before the comparison).
conntrack
This module, when combined with connection tracking, allows access to the connection
tracking state for this packet/connection.
[!] —ctstate statelist
statelist is a comma separated list of the connection states to match. Possible states
are listed below.
[!] —ctproto l4proto
Layer-4 protocol to match (by number or name)
[!] —ctorigsrc address[/mask]
[!] —ctorigdst address[/mask]
[!] —ctreplsrc address[/mask]
[!] —ctrepldst address[/mask]
Match against original/reply source/destination address
[!] —ctorigsrcport port[:port]
[!] —ctorigdstport port[:port]
[!] —ctreplsrcport port[:port]
[!] —ctrepldstport port[:port]
Match against original/reply source/destination port (TCP/UDP/etc.) or GRE key.
Matching against port ranges is only supported in kernel versions above 2.6.38.
[!] —ctstatus statelist
statuslist is a comma separated list of the connection statuses to match. Possible
statuses are listed below.
[!] —ctexpire time[:time]
Match remaining lifetime in seconds against given value or range of values
(inclusive)
—ctdir {ORIGINAL|REPLY}
Match packets that are flowing in the specified direction. If this flag is not specified
at all, matches packets in both directions.
States for —ctstate:
INVALID
The packet is associated with no known connection.
NEW
The packet has started a new connection or otherwise associated with a connection
which has not seen packets in both directions.
ESTABLISHED
The packet is associated with a connection which has seen packets in both directions.
RELATED
The packet is starting a new connection, but is associated with an existing
connection, such as an FTP data transfer or an ICMP error.
UNTRACKED
The packet is not tracked at all, which happens if you explicitly untrack it by using -j
CT —notrack in the raw table.
SNAT
A virtual state, matching if the original source address differs from the reply
destination.
DNAT
A virtual state, matching if the original destination differs from the reply source.
Statuses for —ctstatus:
NONE
None of the below.
EXPECTED
This is an expected connection (i.e. a conntrack helper set it up).
SEEN_REPLY
Conntrack has seen packets in both directions.
ASSURED
Conntrack entry should never be early-expired.
CONFIRMED
Connection is confirmed: originating packet has left box.
cpu
dccp
devgroup
dscp
This module matches the 6 bit DSCP field within the TOS field in the IP header. DSCP has
superseded TOS within the IETF.
[!] —dscp value
Match against a numeric (decimal or hex) value [0-63].
[!] —dscp-class class
Match the DiffServ class. This value may be any of the BE, EF, AFxx or CSx classes.
It will then be converted into its according numeric value.
dst (IPv6-specific)
ecn
This allows you to match the ECN bits of the IPv4/IPv6 and TCP header. ECN is the
Explicit Congestion Notification mechanism as specified in RFC3168
[!] —ecn-tcp-cwr
This matches if the TCP ECN CWR (Congestion Window Received) bit is set.
[!] —ecn-tcp-ece
This matches if the TCP ECN ECE (ECN Echo) bit is set.
[!] —ecn-ip-ect num
This matches a particular IPv4/IPv6 ECT (ECN-Capable Transport). You have to
specify a number between `0’ and `3’.
esp
eui64 (IPv6-specific)
This module matches the EUI-64 part of a stateless autoconfigured IPv6 address. It
compares the EUI-64 derived from the source MAC address in Ethernet frame with the
lower 64 bits of the IPv6 source address. But “Universal/Local” bit is not compared. This
module doesn’t match other link layer frame, and is only valid in the PREROUTING,
INPUT and FORWARD chains.
frag (IPv6-specific)
hashlimit
hashlimit uses hash buckets to express a rate limiting match (like the limit match) for a
group of connections using a single iptables rule. Grouping can be done per-hostgroup
(source and/or destination address) and/or per-port. It gives you the ability to express “N
packets per time quantum per group” or “N bytes per seconds” (see below for some
examples).
A hash limit option (—hashlimit-upto, —hashlimit-above) and —hashlimit-name are
required.
—hashlimit-upto amount[/second|/minute|/hour|/day]
Match if the rate is below or equal to amount/quantum. It is specified either as a
number, with an optional time quantum suffix (the default is 3/hour), or as
amountb/second (number of bytes per second).
—hashlimit-above amount[/second|/minute|/hour|/day]
Match if the rate is above amount/quantum.
—hashlimit-burst amount
Maximum initial number of packets to match: this number gets recharged by one
every time the limit specified above is not reached, up to this number; the default is
5. When byte-based rate matching is requested, this option specifies the amount of
bytes that can exceed the given rate. This option should be used with caution — if the
entry expires, the burst value is reset too.
—hashlimit-mode {srcip|srcport|dstip|dstport},…
A comma-separated list of objects to take into consideration. If no —hashlimit-mode
option is given, hashlimit acts like limit, but at the expensive of doing the hash
housekeeping.
—hashlimit-srcmask prefix
When —hashlimit-mode srcip is used, all source addresses encountered will be
grouped according to the given prefix length and the so-created subnet will be subject
to hashlimit. prefix must be between (inclusive) 0 and 32. Note that —hashlimit-
srcmask 0 is basically doing the same thing as not specifying srcip for —hashlimit-
mode, but is technically more expensive.
—hashlimit-dstmask prefix
Like —hashlimit-srcmask, but for destination addresses.
—hashlimit-name foo
The name for the /proc/net/ipt_hashlimit/foo entry.
—hashlimit-htable-size buckets
The number of buckets of the hash table
—hashlimit-htable-max entries
Maximum entries in the hash.
—hashlimit-htable-expire msec
After how many milliseconds do hash entries expire.
—hashlimit-htable-gcinterval msec
How many milliseconds between garbage collection intervals.
Examples:
matching on source host
“1000 packets per second for every host in 192.168.0.0/16” => -s 192.168.0.0/16 —
hashlimit-mode srcip —hashlimit-upto 1000/sec
matching on source port
“100 packets per second for every service of 192.168.1.1” => -s 192.168.1.1 —
hashlimit-mode srcport —hashlimit-upto 100/sec
matching on subnet
“10000 packets per minute for every /28 subnet (groups of 8 addresses) in 10.0.0.0/8”
=> -s 10.0.0.0/8 —hashlimit-mask 28 —hashlimit-upto 10000/min
matching bytes per second
“flows exceeding 512kbyte/s” => —hashlimit-mode srcip,dstip,srcport,dstport —
hashlimit-above 512kb/s
matching bytes per second
“hosts that exceed 512kbyte/s, but permit up to 1Megabytes without matching” —
hashlimit-mode dstip —hashlimit-above 512kb/s —hashlimit-burst 1mb
hbh (IPv6-specific)
helper
hl (IPv6-specific)
This module matches the Hop Limit field in the IPv6 header.
[!] —hl-eq value
Matches if Hop Limit equals value.
—hl-lt value
Matches if Hop Limit is less than value.
—hl-gt value
Matches if Hop Limit is greater than value.
icmp (IPv4-specific)
This extension can be used if `—protocol icmp’ is specified. It provides the following
option:
[!] —icmp-type {type[/code]|typename}
This allows specification of the ICMP type, which can be a numeric ICMP type,
type/code pair, or one of the ICMP type names shown by the command iptables -p
icmp -h
icmp6 (IPv6-specific)
iprange
ipv6header (IPv6-specific)
This module matches IPv6 extension headers and/or upper layer header.
—soft
Matches if the packet includes any of the headers specified with —header.
[!] —header header[,header…]
Matches the packet which EXACTLY includes all specified headers. The headers
encapsulated with ESP header are out of scope. Possible header types can be:
hop|hop-by-hop
Hop-by-Hop Options header
dst
Destination Options header
route
Routing header
frag
Fragment header
auth
Authentication header
esp
Encapsulating Security Payload header
none
No Next header which matches 59 in the ‘Next Header field’ of IPv6 header or any
IPv6 extension headers
proto
which matches any upper layer protocol header. A protocol name from /etc/protocols
and numeric value also allowed. The number 255 is equivalent to proto.
ipvs
length
This module matches the length of the layer-3 payload (e.g. layer-4 packet) of a packet
against a specific value or range of values.
[!] —length length[:length]
limit
This module matches at a limited rate using a token bucket filter. A rule using this
extension will match until this limit is reached. It can be used in combination with the
LOG target to give limited logging, for example.
xt_limit has no negation support - you will have to use -m hashlimit ! —hashlimit rate in
this case whilst omitting —hashlimit-mode.
—limit rate[/second|/minute|/hour|/day]
Maximum average matching rate: specified as a number, with an optional `/second’,
`/minute’, `/hour’, or `/day’ suffix; the default is 3/hour.
—limit-burst number
Maximum initial number of packets to match: this number gets recharged by one
every time the limit specified above is not reached, up to this number; the default is
5.
mac
mark
This module matches the netfilter mark field associated with a packet (which can be set
using the MARK target below).
[!] —mark value[/mask]
Matches packets with the given unsigned mark value (if a mask is specified, this is
logically ANDed with the mask before the comparison).
mh (IPv6-specific)
multiport
This module matches a set of source or destination ports. Up to 15 ports can be specified.
A port range (port:port) counts as two ports. It can only be used in conjunction with one of
the following protocols: tcp, udp, udplite, dccp and sctp.
[!] —source-ports,—sports port[,port|,port:port]…
Match if the source port is one of the given ports. The flag —sports is a convenient
alias for this option. Multiple ports or port ranges are separated using a comma, and a
port range is specified using a colon. 53,1024:65535 would therefore match ports 53
and all from 1024 through 65535.
[!] —destination-ports,—dports port[,port|,port:port]…
Match if the destination port is one of the given ports. The flag —dports is a
convenient alias for this option.
[!] —ports port[,port|,port:port]…
Match if either the source or destination ports are equal to one of the given ports.
nfacct
The nfacct match provides the extended accounting infrastructure for iptables. You have to
use this match together with the standalone user-space utility nfacct(8)
The only option available for this match is the following:
—nfacct-name name
This allows you to specify the existing object name that will be use for accounting
the traffic that this rule-set is matching.
To use this extension, you have to create an accounting object:
nfacct add http-traffic
Then, you have to attach it to the accounting object via iptables:
iptables -I INPUT -p tcp —sport 80 -m nfacct —nfacct-name http-traffic
iptables -I OUTPUT -p tcp —dport 80 -m nfacct —nfacct-name http-traffic
Then, you can check for the amount of traffic that the rules match:
nfacct get http-traffic
{ pkts = 00000000000000000156, bytes = 00000000000000151786 } = http-traffic;
You can obtain nfacct(8) from http://www.netfilter.org or, alternatively, from the
git.netfilter.org repository.
osf
The osf module does passive operating system fingerprinting. This modules compares
some data (Window Size, MSS, options and their order, TTL, DF, and others) from
packets with the SYN bit set.
[!] —genre string
Match an operating system genre by using a passive fingerprinting.
—ttl level
Do additional TTL checks on the packet to determine the operating system. level can
be one of the following values: 0 - True IP address and fingerprint TTL comparison.
This generally works for LANs. 1 - Check if the IP header’s TTL is less than the
fingerprint one. Works for globally-routable addresses. 2 - Do not compare the TTL
at all.
—log level
Log determined genres into dmesg even if they do not match the desired one. level
can be one of the following values: 0 - Log all matched or unknown signatures 1 -
Log only the first one 2 - Log all known matched signatures
You may find something like this in syslog:
Windows [2000:SP3:Windows XP Pro SP1, 2000 SP3]: 11.22.33.55:4024 ->
11.22.33.44:139 hops=3 Linux [2.5-2.6:] : 1.2.3.4:42624 -> 1.2.3.5:22 hops=4
OS fingerprints are loadable using the nfnl_osf program. To load fingerprints from a file,
use:
nfnl_osf -f /usr/share/xtables/pf.os
To remove them again,
nfnl_osf -f /usr/share/xtables/pf.os -d
The fingerprint database can be downlaoded from http://www.openbsd.org/cgi-
bin/cvsweb/src/etc/pf.os .
owner
This module attempts to match various characteristics of the packet creator, for locally
generated packets. This match is only valid in the OUTPUT and POSTROUTING chains.
Forwarded packets do not have any socket associated with them. Packets from kernel
threads do have a socket, but usually no owner.
[!] —uid-owner username
[!] —uid-owner userid[-userid]
Matches if the packet socket’s file structure (if it has one) is owned by the given user.
You may also specify a numerical UID, or an UID range.
[!] —gid-owner groupname
[!] —gid-owner groupid[-groupid]
Matches if the packet socket’s file structure is owned by the given group. You may
also specify a numerical GID, or a GID range.
[!] —socket-exists
Matches if the packet is associated with a socket.
physdev
This module matches on the bridge port input and output devices enslaved to a bridge
device. This module is a part of the infrastructure that enables a transparent bridging IP
firewall and is only useful for kernel versions above version 2.5.44.
[!] —physdev-in name
Name of a bridge port via which a packet is received (only for packets entering the
INPUT, FORWARD and PREROUTING chains). If the interface name ends in a
“+”, then any interface which begins with this name will match. If the packet didn’t
arrive through a bridge device, this packet won’t match this option, unless ‘!’ is used.
[!] —physdev-out name
Name of a bridge port via which a packet is going to be sent (for packets entering the
FORWARD, OUTPUT and POSTROUTING chains). If the interface name ends in
a “+”, then any interface which begins with this name will match. Note that in the nat
and mangle OUTPUT chains one cannot match on the bridge output port, however
one can in the filter OUTPUT chain. If the packet won’t leave by a bridge device or
if it is yet unknown what the output device will be, then the packet won’t match this
option, unless ‘!’ is used.
[!] —physdev-is-in
Matches if the packet has entered through a bridge interface.
[!] —physdev-is-out
Matches if the packet will leave through a bridge interface.
[!] —physdev-is-bridged
Matches if the packet is being bridged and therefore is not being routed. This is only
useful in the FORWARD and POSTROUTING chains.
pkttype
policy
This modules matches the policy used by IPsec for handling a packet.
—dir {in|out}
Used to select whether to match the policy used for decapsulation or the policy that
will be used for encapsulation. in is valid in the PREROUTING, INPUT and
FORWARD chains, out is valid in the POSTROUTING, OUTPUT and
FORWARD chains.
—pol {none|ipsec}
Matches if the packet is subject to IPsec processing. —pol none cannot be combined
with —strict.
—strict
Selects whether to match the exact policy or match if any rule of the policy matches
the given policy.
For each policy element that is to be described, one can use one or more of the following
options. When —strict is in effect, at least one must be used per element.
[!] —reqid id
Matches the reqid of the policy rule. The reqid can be specified with setkey(8) using
unique:id as level.
[!] —spi spi
Matches the SPI of the SA.
[!] —proto {ah|esp|ipcomp}
Matches the encapsulation protocol.
[!] —mode {tunnel|transport}
Matches the encapsulation mode.
[!] —tunnel-src addr[/mask]
Matches the source end-point address of a tunnel mode SA. Only valid with —mode
tunnel.
[!] —tunnel-dst addr[/mask]
Matches the destination end-point address of a tunnel mode SA. Only valid with —
mode tunnel.
—next
Start the next element in the policy specification. Can only be used with —strict.
quota
Implements network quotas by decrementing a byte counter with each packet. The
condition matches until the byte counter reaches zero. Behavior is reversed with negation
(i.e. the condition does not match until the byte counter reaches zero).
[!] —quota bytes
The quota in bytes.
rateest
The rate estimator can match on estimated rates as collected by the RATEEST target. It
supports matching on absolute bps/pps values, comparing two rate estimators and
matching on the difference between two rate estimators.
For a better understanding of the available options, these are all possible combinations:
rateest operator rateest-bps rateest operator rateest-pps (rateest minus rateest-bps1)
operator rateest-bps2 (rateest minus rateest-pps1) operator rateest-pps2 rateest1 operator
rateest2 rateest-bps(without rate!) rateest1 operator rateest2 rateest-pps(without rate!)
(rateest1 minus rateest-bps1) operator (rateest2 minus rateest-bps2) (rateest1 minus
rateest-pps1) operator (rateest2 minus rateest-pps2)
—rateest-delta
For each estimator (either absolute or relative mode), calculate the difference
between the estimator-determined flow rate and the static value chosen with the
BPS/PPS options. If the flow rate is higher than the specified BPS/PPS, 0 will be
used instead of a negative value. In other words, “max(0, rateest#_rate -
rateest#_bps)” is used.
[!] —rateest-lt
Match if rate is less than given rate/estimator.
[!] —rateest-gt
Match if rate is greater than given rate/estimator.
[!] —rateest-eq
Match if rate is equal to given rate/estimator.
In the so-called “absolute mode”, only one rate estimator is used and compared against a
static value, while in “relative mode”, two rate estimators are compared against another.
—rateest name
Name of the one rate estimator for absolute mode.
—rateest1 name
—rateest2 name
The names of the two rate estimators for relative mode.
—rateest-bps [value]
—rateest-pps [value]
—rateest-bps1 [value]
—rateest-bps2 [value]
—rateest-pps1 [value]
—rateest-pps2 [value]
Compare the estimator(s) by bytes or packets per second, and compare against the
chosen value. See the above bullet list for which option is to be used in which case. A
unit suffix may be used - available ones are: bit, [kmgt]bit, [KMGT]ibit, Bps,
[KMGT]Bps, [KMGT]iBps.
Example: This is what can be used to route outgoing data connections from an FTP server
over two lines based on the available bandwidth at the time the data connection was
started:
# Estimate outgoing rates
iptables -t mangle -A POSTROUTING -o eth0 -j RATEEST —rateest-name eth0 —
rateest-interval 250ms —rateest-ewma 0.5s
iptables -t mangle -A POSTROUTING -o ppp0 -j RATEEST —rateest-name ppp0 —
rateest-interval 250ms —rateest-ewma 0.5s
# Mark based on available bandwidth
iptables -t mangle -A balance -m conntrack —ctstate NEW -m helper —helper ftp -m
rateest —rateest-delta —rateest1 eth0 —rateest-bps1 2.5mbit —rateest-gt —rateest2 ppp0
—rateest-bps2 2mbit -j CONNMARK —set-mark 1
iptables -t mangle -A balance -m conntrack —ctstate NEW -m helper —helper ftp -m
rateest —rateest-delta —rateest1 ppp0 —rateest-bps1 2mbit —rateest-gt —rateest2 eth0
—rateest-bps2 2.5mbit -j CONNMARK —set-mark 2
iptables -t mangle -A balance -j CONNMARK —restore-mark
realm (IPv4-specific)
This matches the routing realm. Routing realms are used in complex routing setups
involving dynamic routing protocols like BGP.
[!] —realm value[/mask]
Matches a given realm number (and optionally mask). If not a number, value can be a
named realm from /etc/iproute2/rt_realms (mask can not be used in that case).
recent
Allows you to dynamically create a list of IP addresses and then match against that list in a
few different ways.
For example, you can create a “badguy” list out of people attempting to connect to port
139 on your firewall and then DROP all future packets from them without considering
them.
—set, —rcheck, —update and —remove are mutually exclusive.
—name name
Specify the list to use for the commands. If no name is given then DEFAULT will be
used.
[!] —set
This will add the source address of the packet to the list. If the source address is
already in the list, this will update the existing entry. This will always return success
(or failure if ! is passed in).
—rsource
Match/save the source address of each packet in the recent list table. This is the
default.
—rdest
Match/save the destination address of each packet in the recent list table.
—mask netmask
Netmask that will be applied to this recent list.
[!] —rcheck
Check if the source address of the packet is currently in the list.
[!] —update
Like —rcheck, except it will update the “last seen” timestamp if it matches.
[!] —remove
Check if the source address of the packet is currently in the list and if so that address
will be removed from the list and the rule will return true. If the address is not found,
false is returned.
—seconds seconds
This option must be used in conjunction with one of —rcheck or —update. When
used, this will narrow the match to only happen when the address is in the list and
was seen within the last given number of seconds.
—reap
This option can only be used in conjunction with —seconds. When used, this will
cause entries older than the last given number of seconds to be purged.
—hitcount hits
This option must be used in conjunction with one of —rcheck or —update. When
used, this will narrow the match to only happen when the address is in the list and
packets had been received greater than or equal to the given value. This option may
be used along with —seconds to create an even narrower match requiring a certain
number of hits within a specific time frame. The maximum value for the hitcount
parameter is given by the “ip_pkt_list_tot” parameter of the xt_recent kernel module.
Exceeding this value on the command line will cause the rule to be rejected.
—rttl
This option may only be used in conjunction with one of —rcheck or —update.
When used, this will narrow the match to only happen when the address is in the list
and the TTL of the current packet matches that of the packet which hit the —set rule.
This may be useful if you have problems with people faking their source address in
order to DoS you via this module by disallowing others access to your site by sending
bogus packets to you.
Examples:
iptables -A FORWARD -m recent —name badguy —rcheck —seconds 60 -j DROP
iptables -A FORWARD -p tcp -i eth0 —dport 139 -m recent —name badguy —set -j
DROP
/proc/net/xt_recent/* are the current lists of addresses and information about each entry
of each list.
Each file in /proc/net/xt_recent/ can be read from to see the current list or written two
using the following commands to modify the list:
echo +addr >/proc/net/xt_recent/DEFAULT
to add addr to the DEFAULT list
echo -addr >/proc/net/xt_recent/DEFAULT
to remove addr from the DEFAULT list
echo / >/proc/net/xt_recent/DEFAULT
to flush the DEFAULT list (remove all entries).
The module itself accepts parameters, defaults shown:
ip_list_tot=100
Number of addresses remembered per table.
ip_pkt_list_tot=20
Number of packets per address remembered.
ip_list_hash_size=0
Hash table size. 0 means to calculate it based on ip_list_tot, default: 512.
ip_list_perms=0644
Permissions for /proc/net/xt_recent/* files.
ip_list_uid=0
Numerical UID for ownership of /proc/net/xt_recent/* files.
ip_list_gid=0
Numerical GID for ownership of /proc/net/xt_recent/* files.
rpfilter
Performs a reverse path filter test on a packet. If a reply to the packet would be sent via
the same interface that the packet arrived on, the packet will match. Note that, unlike the
in-kernel rp_filter, packets protected by IPSec are not treated specially. Combine this
match with the policy match if you want this. Also, packets arriving via the loopback
interface are always permitted. This match can only be used in the PREROUTING chain
of the raw or mangle table.
—loose
Used to specifiy that the reverse path filter test should match even if the selected
output device is not the expected one.
—validmark
Also use the packets’ nfmark value when performing the reverse path route lookup.
—accept-local
This will permit packets arriving from the network with a source address that is also
assigned to the local machine.
—invert
This will invert the sense of the match. Instead of matching packets that passed the
reverse path filter test, match those that have failed it.
Example to log and drop packets failing the reverse path filter test:
iptables -t raw -N RPFILTER
iptables -t raw -A RPFILTER -m rpfilter -j RETURN
iptables -t raw -A RPFILTER -m limit —limit 10/minute -j NFLOG —nflog-prefix
“rpfilter drop”
iptables -t raw -A RPFILTER -j DROP
iptables -t raw -A PREROUTING -j RPFILTER
Example to drop failed packets, without logging:
iptables -t raw -A RPFILTER -m rpfilter —invert -j DROP
rt (IPv6-specific)
sctp
set
socket
This matches if an open TCP/UDP socket can be found by doing a socket lookup on the
packet. It matches if there is an established or non-zero bound listening socket (possibly
with a non-local address). The lookup is performed using the packet tuple of TCP/UDP
packets, or the original TCP/UDP header embedded in an ICMP/ICPMv6 error packet.
—transparent
Ignore non-transparent sockets.
—nowildcard
Do not ignore sockets bound to ‘any’ address. The socket match won’t accept zero-
bound listeners by default, since then local services could intercept traffic that would
otherwise be forwarded. This option therefore has security implications when used to
match traffic being forwarded to redirect such packets to local machine with policy
routing. When using the socket match to implement fully transparent proxies bound
to non-local addresses it is recommended to use the —transparent option instead.
Example (assuming packets with mark 1 are delivered locally):
-t mangle -A PREROUTING -m socket —transparent -j MARK —set-mark 1
state
The “state” extension is a subset of the “conntrack” module. “state” allows access to the
connection tracking state for this packet.
[!] —state state
Where state is a comma separated list of the connection states to match. Only a
subset of the states unterstood by “conntrack” are recognized: INVALID,
ESTABLISHED, NEW, RELATED or UNTRACKED. For their description, see
the “conntrack” heading in this manpage.
statistic
This module matches packets based on some statistic condition. It supports two distinct
modes settable with the —mode option.
Supported options:
—mode mode
Set the matching mode of the matching rule, supported modes are random and nth.
[!] —probability p
Set the probability for a packet to be randomly matched. It only works with the
random mode. p must be within 0.0 and 1.0. The supported granularity is in
1/2147483648th increments.
[!] —every n
Match one packet every nth packet. It works only with the nth mode (see also the —
packet option).
—packet p
Set the initial counter value (0 <= p <= n-1, default 0) for the nth mode.
string
This modules matches a given string by using some pattern matching strategy. It requires a
linux kernel >= 2.6.14.
—algo {bm|kmp}
Select the pattern matching strategy. (bm = Boyer-Moore, kmp = Knuth-Pratt-Morris)
—from offset
Set the offset from which it starts looking for any matching. If not passed, default is
0.
—to offset
Set the offset up to which should be scanned. That is, byte offset-1 (counting from 0)
is the last one that is scanned. If not passed, default is the packet size.
[!] —string pattern
Matches the given pattern.
[!] —hex-string pattern
Matches the given pattern in hex notation.
Examples:
# The string pattern can be used for simple text characters. iptables -A INPUT -p tcp
—dport 80 -m string —algo bm —string ‘GET /index.html’ -j LOG
# The hex string pattern can be used for non-printable characters, like |0D 0A| or
|0D0A|. iptables -p udp —dport 53 -m string —algo bm —from 40 —to 57 —hex-
string ‘|03|www|09|netfilter|03|org|00|’
tcp
These extensions can be used if `—protocol tcp’ is specified. It provides the following
options:
[!] —source-port,—sport port[:port]
Source port or port range specification. This can either be a service name or a port
number. An inclusive range can also be specified, using the format first:last. If the
first port is omitted, “0” is assumed; if the last is omitted, “65535” is assumed. If the
first port is greater than the second one they will be swapped. The flag —sport is a
convenient alias for this option.
[!] —destination-port,—dport port[:port]
Destination port or port range specification. The flag —dport is a convenient alias
for this option.
[!] —tcp-flags mask comp
Match when the TCP flags are as specified. The first argument mask is the flags
which we should examine, written as a comma-separated list, and the second
argument comp is a comma-separated list of flags which must be set. Flags are: SYN
ACK FIN RST URG PSH ALL NONE. Hence the command iptables -A
FORWARD -p tcp --tcp-flags SYN,ACK,FIN,RST SYN will only match packets with
the SYN flag set, and the ACK, FIN and RST flags unset.
[!] —syn
Only match TCP packets with the SYN bit set and the ACK,RST and FIN bits
cleared. Such packets are used to request TCP connection initiation; for example,
blocking such packets coming in an interface will prevent incoming TCP
connections, but outgoing TCP connections will be unaffected. It is equivalent to —
tcp-flags SYN,RST,ACK,FIN SYN. If the “!” flag precedes the “—syn”, the sense
of the option is inverted.
[!] —tcp-option number
Match if TCP option set.
tcpmss
This matches the TCP MSS (maximum segment size) field of the TCP header. You can
only use this on TCP SYN or SYN/ACK packets, since the MSS is only negotiated during
the TCP handshake at connection startup time.
[!] —mss value[:value]
Match a given TCP MSS value or range.
time
This matches if the packet arrival time/date is within a given range. All options are
optional, but are ANDed when specified. All times are interpreted as UTC by default.
—datestart YYYY[-MM[-DD[Thh[:mm[:ss]]]]]
—datestop YYYY[-MM[-DD[Thh[:mm[:ss]]]]]
Only match during the given time, which must be in ISO 8601 “T” notation. The
possible time range is 1970-01-01T00:00:00 to 2038-01-19T04:17:07.
If —datestart or —datestop are not specified, it will default to 1970-01-01 and 2038-
01-19, respectively.
—timestart hh:mm[:ss]
—timestop hh:mm[:ss]
Only match during the given daytime. The possible time range is 00:00:00 to
23:59:59. Leading zeroes are allowed (e.g. “06:03”) and correctly interpreted as base-
10.
[!] —monthdays day[,day…]
Only match on the given days of the month. Possible values are 1 to 31. Note that
specifying 31 will of course not match on months which do not have a 31st day; the
same goes for 28- or 29-day February.
[!] —weekdays day[,day…]
Only match on the given weekdays. Possible values are Mon, Tue, Wed, Thu, Fri,
Sat, Sun, or values from 1 to 7, respectively. You may also use two-character
variants (Mo, Tu, etc.).
—contiguous
When —timestop is smaller than —timestart value, match this as a single time
period instead distinct intervals. See EXAMPLES.
—kerneltz
Use the kernel timezone instead of UTC to determine whether a packet meets the
time regulations.
About kernel timezones: Linux keeps the system time in UTC, and always does so. On
boot, system time is initialized from a referential time source. Where this time source has
no timezone information, such as the x86 CMOS RTC, UTC will be assumed. If the time
source is however not in UTC, userspace should provide the correct system time and
timezone to the kernel once it has the information.
Local time is a feature on top of the (timezone independent) system time. Each process
has its own idea of local time, specified via the TZ environment variable. The kernel also
has its own timezone offset variable. The TZ userspace environment variable specifies
how the UTC-based system time is displayed, e.g. when you run date(1), or what you see
on your desktop clock. The TZ string may resolve to different offsets at different dates,
which is what enables the automatic time-jumping in userspace. when DST changes. The
kernel’s timezone offset variable is used when it has to convert between non-UTC sources,
such as FAT filesystems, to UTC (since the latter is what the rest of the system uses).
The caveat with the kernel timezone is that Linux distributions may ignore to set the
kernel timezone, and instead only set the system time. Even if a particular distribution
does set the timezone at boot, it is usually does not keep the kernel timezone offset - which
is what changes on DST - up to date. ntpd will not touch the kernel timezone, so running it
will not resolve the issue. As such, one may encounter a timezone that is always +0000, or
one that is wrong half of the time of the year. As such, using —kerneltz is highly
discouraged.
EXAMPLES. To match on weekends, use:
-m time —weekdays Sa,Su
Or, to match (once) on a national holiday block:
-m time —datestart 2007-12-24 —datestop 2007-12-27
Since the stop time is actually inclusive, you would need the following stop time to not
match the first second of the new day:
-m time —datestart 2007-01-01T17:00 —datestop 2007-01-01T23:59:59
During lunch hour:
-m time —timestart 12:30 —timestop 13:30
The fourth Friday in the month:
-m time —weekdays Fr —monthdays 22,23,24,25,26,27,28
(Note that this exploits a certain mathematical property. It is not possible to say “fourth
Thursday OR fourth Friday” in one rule. It is possible with multiple rules, though.)
Matching across days might not do what is expected. For instance,
-m time —weekdays Mo —timestart 23:00 —timestop 01:00 Will match Monday, for
one hour from midnight to 1 a.m., and then again for another hour from 23:00
onwards. If this is unwanted, e.g. if you would like ‘match for two hours from
Montay 23:00 onwards’ you need to also specify the —contiguous option in the
example above.
tos
This module matches the 8-bit Type of Service field in the IPv4 header (i.e. including the
“Precedence” bits) or the (also 8-bit) Priority field in the IPv6 header.
[!] —tos value[/mask]
Matches packets with the given TOS mark value. If a mask is specified, it is logically
ANDed with the TOS mark before the comparison.
[!] —tos symbol
You can specify a symbolic name when using the tos match for IPv4. The list of
recognized TOS names can be obtained by calling iptables with -m tos -h. Note that
this implies a mask of 0x3F, i.e. all but the ECN bits.
ttl (IPv4-specific)
u32
U32 tests whether quantities of up to 4 bytes extracted from a packet have specified
values. The specification of what to extract is general enough to find data at given offsets
from tcp headers or payloads.
[!] —u32 tests
The argument amounts to a program in a small language described below.
tests := location “=” value | tests “&&” location “=” value
value := range | value “,” range
range := number | number “:” number
a single number, n, is interpreted the same as n:n. n:m is interpreted as the range of
numbers >=n and <=m.
location := number | location operator number
operator := “&” | “<<” | “>>” | “@”
The operators &, <<, >> and && mean the same as in C. The = is really a set membership
operator and the value syntax describes a set. The @ operator is what allows moving to
the next header and is described further below.
There are currently some artificial implementation limits on the size of the tests:
*
no more than 10 of “= (and 9 &&“s) in the u32 argument
*
no more than 10 ranges (and 9 commas) per value
*
no more than 10 numbers (and 9 operators) per location
To describe the meaning of location, imagine the following machine that interprets it.
There are three registers:
A is of type char *, initially the address of the IP header
B and C are unsigned 32 bit integers, initially zero
The instructions are:
number B = number;
C = (*(A+B)<<24) + (*(A+B+1)<<16) + (*(A+B+2)<<8) + *(A+B+3)
&number C = C & number
<< number C = C << number
>> number C = C >> number
@number A = A + C; then do the instruction number
Any access of memory outside [skb->data,skb->end] causes the match to fail. Otherwise
the result of the computation is the final value of C.
Whitespace is allowed but not required in the tests. However, the characters that do occur
there are likely to require shell quoting, so it is a good idea to enclose the arguments in
quotes.
Example:
match IP packets with total length >= 256
The IP header contains a total length field in bytes 2-3.
—u32 “0 & 0xFFFF = 0x100:0xFFFF”
read bytes 0-3
AND that with 0xFFFF (giving bytes 2-3), and test whether that is in the range
[0x100:0xFFFF]
Example: (more realistic, hence more complicated)
match ICMP packets with icmp type 0
First test that it is an ICMP packet, true iff byte 9 (protocol) = 1
—u32 “6 & 0xFF = 1 && …
read bytes 6-9, use & to throw away bytes 6-8 and compare the result to 1. Next test
that it is not a fragment. (If so, it might be part of such a packet but we cannot always
tell.) N.B.: This test is generally needed if you want to match anything beyond the IP
header. The last 6 bits of byte 6 and all of byte 7 are 0 iff this is a complete packet
(not a fragment). Alternatively, you can allow first fragments by only testing the last
5 bits of byte 6.
… 4 & 0x3FFF = 0 && …
Last test: the first byte past the IP header (the type) is 0. This is where we have to use
the @syntax. The length of the IP header (IHL) in 32 bit words is stored in the right
half of byte 0 of the IP header itself.
… 0 >> 22 & 0x3C @ 0 >> 24 = 0”
The first 0 means read bytes 0-3, >>22 means shift that 22 bits to the right. Shifting
24 bits would give the first byte, so only 22 bits is four times that plus a few more
bits. &3C then eliminates the two extra bits on the right and the first four bits of the
first byte. For instance, if IHL=5, then the IP header is 20 (4 x 5) bytes long. In this
case, bytes 0-1 are (in binary) xxxx0101 yyzzzzzz, >>22 gives the 10 bit value
xxxx0101yy and &3C gives 010100. @ means to use this number as a new offset
into the packet, and read four bytes starting from there. This is the first 4 bytes of the
ICMP payload, of which byte 0 is the ICMP type. Therefore, we simply shift the
value 24 to the right to throw out all but the first byte and compare the result with 0.
Example:
TCP payload bytes 8-12 is any of 1, 2, 5 or 8
First we test that the packet is a tcp packet (similar to ICMP).
—u32 “6 & 0xFF = 6 && …
Next, test that it is not a fragment (same as above).
… 0 >> 22 & 0x3C @ 12 >> 26 & 0x3C @ 8 = 1,2,5,8”
0>>22&3C as above computes the number of bytes in the IP header. @ makes this
the new offset into the packet, which is the start of the TCP header. The length of the
TCP header (again in 32 bit words) is the left half of byte 12 of the TCP header. The
12>>26&3C computes this length in bytes (similar to the IP header before). “@”
makes this the new offset, which is the start of the TCP payload. Finally, 8 reads
bytes 8-12 of the payload and = checks whether the result is any of 1, 2, 5 or 8.
udp
These extensions can be used if `—protocol udp’ is specified. It provides the following
options:
[!] —source-port,—sport port[:port]
Source port or port range specification. See the description of the —source-port
option of the TCP extension for details.
[!] —destination-port,—dport port[:port]
Destination port or port range specification. See the description of the —destination-
port option of the TCP extension for details.
unclean (IPv4-specific)
This module takes no options, but attempts to match packets which seem malformed or
unusual. This is regarded as experimental.
› TARGET EXTENSIONS
iptables can use extended target modules: the following are included in the standard
distribution.
AUDIT
This target allows to create audit records for packets hitting the target. It can be used
to record accepted, dropped, and rejected packets. See auditd(8) for additional details.
—type {accept|drop|reject}
Set type of audit record.
Example:
iptables -N AUDIT_DROP
iptables -A AUDIT_DROP -j AUDIT —type drop
iptables -A AUDIT_DROP -j DROP
CHECKSUM
This target allows to selectively work around broken/old applications. It can only be used
in the mangle table.
—checksum-fill
Compute and fill in the checksum in a packet that lacks a checksum. This is
particularly useful, if you need to work around old applications such as dhcp clients,
that do not work well with checksum offloads, but don’t want to disable checksum
offload in your device.
CLASSIFY
This module allows you to set the skb->priority value (and thus classify the packet into a
specific CBQ class).
—set-class major:minor
Set the major and minor class value. The values are always interpreted as
hexadecimal even if no 0x prefix is given.
CLUSTERIP (IPv4-specific)
This module allows you to configure a simple cluster of nodes that share a certain IP and
MAC address without an explicit load balancer in front of them. Connections are statically
distributed between the nodes in this cluster.
—new
Create a new ClusterIP. You always have to set this on the first rule for a given
ClusterIP.
—hashmode mode
Specify the hashing mode. Has to be one of sourceip, sourceip-sourceport,
sourceip-sourceport-destport.
—clustermac mac
Specify the ClusterIP MAC address. Has to be a link-layer multicast address
—total-nodes num
Number of total nodes within this cluster.
—local-node num
Local node number within this cluster.
—hash-init rnd
Specify the random seed used for hash initialization.
CONNMARK
This module sets the netfilter mark value associated with a connection. The mark is 32 bits
wide.
—set-xmark value[/mask]
Zero out the bits given by mask and XOR value into the ctmark.
—save-mark [—nfmask nfmask] [—ctmask ctmask]
Copy the packet mark (nfmark) to the connection mark (ctmark) using the given
masks. The new nfmark value is determined as follows:
ctmark = (ctmark & ~ctmask) ^ (nfmark & nfmask)
i.e. ctmask defines what bits to clear and nfmask what bits of the nfmark to XOR into
the ctmark. ctmask and nfmask default to 0xFFFFFFFF.
—restore-mark [—nfmask nfmask] [—ctmask ctmask]
Copy the connection mark (ctmark) to the packet mark (nfmark) using the given
masks. The new ctmark value is determined as follows:
nfmark = (nfmark & ~nfmask) ^ (ctmark & ctmask);
i.e. nfmask defines what bits to clear and ctmask what bits of the ctmark to XOR into
the nfmark. ctmask and nfmask default to 0xFFFFFFFF.
—restore-mark is only valid in the mangle table.
The following mnemonics are available for —set-xmark:
—and-mark bits
Binary AND the ctmark with bits. (Mnemonic for —set-xmark 0/invbits, where
invbits is the binary negation of bits.)
—or-mark bits
Binary OR the ctmark with bits. (Mnemonic for —set-xmark bits/bits.)
—xor-mark bits
Binary XOR the ctmark with bits. (Mnemonic for —set-xmark bits/0.)
—set-mark value[/mask]
Set the connection mark. If a mask is specified then only those bits set in the mask
are modified.
—save-mark [—mask mask]
Copy the nfmark to the ctmark. If a mask is specified, only those bits are copied.
—restore-mark [—mask mask]
Copy the ctmark to the nfmark. If a mask is specified, only those bits are copied. This
is only valid in the mangle table.
CONNSECMARK
This module copies security markings from packets to connections (if unlabeled), and
from connections back to packets (also only if unlabeled). Typically used in conjunction
with SECMARK, it is valid in the security table (for backwards compatibility with older
kernels, it is also valid in the mangle table).
—save
If the packet has a security marking, copy it to the connection if the connection is not
marked.
—restore
If the packet does not have a security marking, and the connection does, copy the
security marking from the connection to the packet.
CT
The CT target allows to set parameters for a packet or its associated connection. The target
attaches a “template” connection tracking entry to the packet, which is then used by the
conntrack core when initializing a new ct entry. This target is thus only valid in the “raw”
table.
—notrack
Disables connection tracking for this packet.
—helper name
Use the helper identified by name for the connection. This is more flexible than
loading the conntrack helper modules with preset ports.
—ctevents event[,…]
Only generate the specified conntrack events for this connection. Possible event types
are: new, related, destroy, reply, assured, protoinfo, helper, mark (this refers to
the ctmark, not nfmark), natseqinfo, secmark (ctsecmark).
—expevents event[,…]
Only generate the specified expectation events for this connection. Possible event
types are: new.
—zone id
Assign this packet to zone id and only have lookups done in that zone. By default,
packets have zone 0.
—timeout name
Use the timeout policy identified by name for the connection. This is provides more
flexible timeout policy definition than global timeout values available at
/proc/sys/net/netfilter/nf_conntrack_*_timeout_*.
DNAT
This target is only valid in the nat table, in the PREROUTING and OUTPUT chains,
and user-defined chains which are only called from those chains. It specifies that the
destination address of the packet should be modified (and all future packets in this
connection will also be mangled), and rules should cease being examined. It takes the
following options:
—to-destination [ipaddr[-ipaddr]][:port[-port]]
which can specify a single new destination IP address, an inclusive range of IP
addresses. Optionally a port range, if the rule also specifies one of the following
protocols: tcp, udp, dccp or sctp. If no port range is specified, then the destination
port will never be modified. If no IP address is specified then only the destination
port will be modified. In Kernels up to 2.6.10 you can add several —to-destination
options. For those kernels, if you specify more than one destination address, either
via an address range or multiple —to-destination options, a simple round-robin (one
after another in cycle) load balancing takes place between these addresses. Later
Kernels (>= 2.6.11-rc1) don’t have the ability to NAT to multiple ranges anymore.
—random
If option —random is used then port mapping will be randomized (kernel >=
2.6.22).
—persistent
Gives a client the same source-/destination-address for each connection. This
supersedes the SAME target. Support for persistent mappings is available from
2.6.29-rc2.
IPv6 support available since Linux kernels >= 3.7.
DNPT (IPv6-specific)
This target allows to alter the value of the DSCP bits within the TOS header of the IPv4
packet. As this manipulates a packet, it can only be used in the mangle table.
—set-dscp value
Set the DSCP field to a numerical value (can be decimal or hex)
—set-dscp-class class
Set the DSCP field to a DiffServ class.
ECN (IPv4-specific)
This target allows to selectively work around known ECN blackholes. It can only be used
in the mangle table.
—ecn-tcp-remove
Remove all ECN bits from the TCP header. Of course, it can only be used in
conjunction with -p tcp.
HL (IPv6-specific)
This is used to modify the Hop Limit field in IPv6 header. The Hop Limit field is similar
to what is known as TTL value in IPv4. Setting or incrementing the Hop Limit field can
potentially be very dangerous, so it should be avoided at any cost. This target is only valid
in mangle table.
Don’t ever set or increment the value on packets that leave your local network!
—hl-set value
Set the Hop Limit to `value’.
—hl-dec value
Decrement the Hop Limit `value’ times.
—hl-inc value
Increment the Hop Limit `value’ times.
HMARK
Like MARK, i.e. set the fwmark, but the mark is calculated from hashing packet selector
at choice. You have also to specify the mark range and, optionally, the offset to start from.
ICMP error messages are inspected and used to calculate the hashing.
Existing options are:
—hmark-tuple tuple
Possible tuple members are: src meaning source address (IPv4, IPv6 address), dst
meaning destination address (IPv4, IPv6 address), sport meaning source port (TCP,
UDP, UDPlite, SCTP, DCCP), dport meaning destination port (TCP, UDP, UDPlite,
SCTP, DCCP), spi meaning Security Parameter Index (AH, ESP), and ct meaning the
usage of the conntrack tuple instead of the packet selectors.
—hmark-mod value (must be > 0)
Modulus for hash calculation (to limit the range of possible marks)
—hmark-offset value
Offset to start marks from.
For advanced usage, instead of using —hmark-tuple, you can specify custom
prefixes and masks:
—hmark-src-prefix cidr
The source address mask in CIDR notation.
—hmark-dst-prefix cidr
The destination address mask in CIDR notation.
—hmark-sport-mask value
A 16 bit source port mask in hexadecimal.
—hmark-dport-mask value
A 16 bit destination port mask in hexadecimal.
—hmark-spi-mask value
A 32 bit field with spi mask.
—hmark-proto-mask value
An 8 bit field with layer 4 protocol number.
—hmark-rnd value
A 32 bit random custom value to feed hash calculation.
Examples:
iptables -t mangle -A PREROUTING -m conntrack —ctstate NEW -j HMARK —hmark-
tuple ct,src,dst,proto —hmark-offset 10000 —hmark-mod 10 —hmark-rnd 0xfeedcafe
iptables -t mangle -A PREROUTING -j HMARK —hmark-offset 10000 —hmark-tuple
src,dst,proto —hmark-mod 10 —hmark-rnd 0xdeafbeef
IDLETIMER
This target can be used to identify when interfaces have been idle for a certain period of
time. Timers are identified by labels and are created when a rule is set with a new label.
The rules also take a timeout value (in seconds) as an option. If more than one rule uses
the same timer label, the timer will be restarted whenever any of the rules get a hit. One
entry for each timer is created in sysfs. This attribute contains the timer remaining for the
timer to expire. The attributes are located under the xt_idletimer class:
/sys/class/xt_idletimer/timers/<label>
When the timer expires, the target module sends a sysfs notification to the userspace,
which can then decide what to do (eg. disconnect to save power).
—timeout amount
This is the time in seconds that will trigger the notification.
—label string
This is a unique identifier for the timer. The maximum length for the label string is 27
characters.
LED
This creates an LED-trigger that can then be attached to system indicator lights, to blink or
illuminate them when certain packets pass through the system. One example might be to
light up an LED for a few minutes every time an SSH connection is made to the local
machine. The following options control the trigger behavior:
—led-trigger-id name
This is the name given to the LED trigger. The actual name of the trigger will be
prefixed with “netfilter-“.
—led-delay ms
This indicates how long (in milliseconds) the LED should be left illuminated when a
packet arrives before being switched off again. The default is 0 (blink as fast as
possible.) The special value inf can be given to leave the LED on permanently once
activated. (In this case the trigger will need to be manually detached and reattached to
the LED device to switch it off again.)
—led-always-blink
Always make the LED blink on packet arrival, even if the LED is already on. This
allows notification of new packets even with long delay values (which otherwise
would result in a silent prolonging of the delay time.)
Example:
Create an LED trigger for incoming SSH traffic:
iptables -A INPUT -p tcp —dport 22 -j LED —led-trigger-id ssh
Then attach the new trigger to an LED:
echo netfilter-ssh >/sys/class/leds/ledname/trigger
LOG
Turn on kernel logging of matching packets. When this option is set for a rule, the Linux
kernel will print some information on all matching packets (like most IP/IPv6 header
fields) via the kernel log (where it can be read with dmesg(1) or read in the syslog).
This is a “non-terminating target”, i.e. rule traversal continues at the next rule. So if you
want to LOG the packets you refuse, use two separate rules with the same matching
criteria, first using target LOG then DROP (or REJECT).
—log-level level
Level of logging, which can be (system-specific) numeric or a mnemonic. Possible
values are (in decreasing order of priority): emerg, alert, crit, error, warning,
notice, info or debug.
—log-prefix prefix
Prefix log messages with the specified prefix; up to 29 letters long, and useful for
distinguishing messages in the logs.
—log-tcp-sequence
Log TCP sequence numbers. This is a security risk if the log is readable by users.
—log-tcp-options
Log options from the TCP packet header.
—log-ip-options
Log options from the IP/IPv6 packet header.
—log-uid
Log the userid of the process which generated the packet.
MARK
This target is used to set the Netfilter mark value associated with the packet. It can, for
example, be used in conjunction with routing based on fwmark (needs iproute2). If you
plan on doing so, note that the mark needs to be set in the PREROUTING chain of the
mangle table to affect routing. The mark field is 32 bits wide.
—set-xmark value[/mask]
Zeroes out the bits given by mask and XORs value into the packet mark (“nfmark”).
If mask is omitted, 0xFFFFFFFF is assumed.
—set-mark value[/mask]
Zeroes out the bits given by mask and ORs value into the packet mark. If mask is
omitted, 0xFFFFFFFF is assumed.
The following mnemonics are available:
—and-mark bits
Binary AND the nfmark with bits. (Mnemonic for —set-xmark 0/invbits, where
invbits is the binary negation of bits.)
—or-mark bits
Binary OR the nfmark with bits. (Mnemonic for —set-xmark bits/bits.)
—xor-mark bits
Binary XOR the nfmark with bits. (Mnemonic for —set-xmark bits/0.)
MASQUERADE
This target is only valid in the nat table, in the POSTROUTING chain. It should only be
used with dynamically assigned IP (dialup) connections: if you have a static IP address,
you should use the SNAT target. Masquerading is equivalent to specifying a mapping to
the IP address of the interface the packet is going out, but also has the effect that
connections are forgotten when the interface goes down. This is the correct behavior when
the next dialup is unlikely to have the same interface address (and hence any established
connections are lost anyway).
—to-ports port[-port]
This specifies a range of source ports to use, overriding the default SNAT source
port-selection heuristics (see above). This is only valid if the rule also specifies one
of the following protocols: tcp, udp, dccp or sctp.
—random
Randomize source port mapping If option —random is used then port mapping will
be randomized (kernel >= 2.6.21).
IPv6 support available since Linux kernels >= 3.7.
MIRROR (IPv4-specific)
This is an experimental demonstration target which inverts the source and destination
fields in the IP header and retransmits the packet. It is only valid in the INPUT,
FORWARD and PREROUTING chains, and user-defined chains which are only called
from those chains. Note that the outgoing packets are NOT seen by any packet filtering
chains, connection tracking or NAT, to avoid loops and other problems.
NETMAP
This target allows you to statically map a whole network of addresses onto another
network of addresses. It can only be used from rules in the nat table.
—to address[/mask]
Network address to map to. The resulting address will be constructed in the following
way: All ‘one’ bits in the mask are filled in from the new `address’. All bits that are
zero in the mask are filled in from the original address.
IPv6 support available since Linux kernels >= 3.7.
NFLOG
This target provides logging of matching packets. When this target is set for a rule, the
Linux kernel will pass the packet to the loaded logging backend to log the packet. This is
usually used in combination with nfnetlink_log as logging backend, which will multicast
the packet through a netlink socket to the specified multicast group. One or more
userspace processes may subscribe to the group to receive the packets. Like LOG, this is a
non-terminating target, i.e. rule traversal continues at the next rule.
—nflog-group nlgroup
The netlink group (0 - 2^16-1) to which packets are (only applicable for
nfnetlink_log). The default value is 0.
—nflog-prefix prefix
A prefix string to include in the log message, up to 64 characters long, useful for
distinguishing messages in the logs.
—nflog-range size
The number of bytes to be copied to userspace (only applicable for nfnetlink_log).
nfnetlink_log instances may specify their own range, this option overrides it.
—nflog-threshold size
Number of packets to queue inside the kernel before sending them to userspace (only
applicable for nfnetlink_log). Higher values result in less overhead per packet, but
increase delay until the packets reach userspace. The default value is 1.
NFQUEUE
This target passes the packet to userspace using the nfnetlink_queue handler. The packet
is put into the queue identified by its 16-bit queue number. Userspace can inspect and
modify the packet if desired. Userspace must then drop or reinject the packet into the
kernel. Please see libnetfilter_queue for details. nfnetlink_queue was added in Linux
2.6.14. The queue-balance option was added in Linux 2.6.31, queue-bypass in 2.6.39.
—queue-num value
This specifies the QUEUE number to use. Valid queue numbers are 0 to 65535. The
default value is 0.
—queue-balance value:value
This specifies a range of queues to use. Packets are then balanced across the given
queues. This is useful for multicore systems: start multiple instances of the userspace
program on queues x, x+1, .. x+n and use “—queue-balance x:x+n“. Packets
belonging to the same connection are put into the same nfqueue.
—queue-bypass
By default, if no userspace program is listening on an NFQUEUE, then all packets
that are to be queued are dropped. When this option is used, the NFQUEUE rule
behaves like ACCEPT instead, and the packet will move on to the next table.
—queue-cpu-fanout
Available starting Linux kernel 3.10. When used together with —queue-balance this
will use the CPU ID as an index to map packets to the queues. The idea is that you
can improve performance if there’s a queue per CPU. This requires —queue-balance
to be specified.
NOTRACK
This extension disables connection tracking for all packets matching that rule. It is
equivalent with -j CT —notrack. Like CT, NOTRACK can only be used in the raw table.
RATEEST
The RATEEST target collects statistics, performs rate estimation calculation and saves the
results for later evaluation using the rateest match.
—rateest-name name
Count matched packets into the pool referred to by name, which is freely choosable.
—rateest-interval amount{s|ms|us}
Rate measurement interval, in seconds, milliseconds or microseconds.
—rateest-ewmalog value
Rate measurement averaging time constant.
REDIRECT
This target is only valid in the nat table, in the PREROUTING and OUTPUT chains,
and user-defined chains which are only called from those chains. It redirects the packet to
the machine itself by changing the destination IP to the primary address of the incoming
interface (locally-generated packets are mapped to the localhost address, 127.0.0.1 for
IPv4 and ::1 for IPv6).
—to-ports port[-port]
This specifies a destination port or range of ports to use: without this, the destination
port is never altered. This is only valid if the rule also specifies one of the following
protocols: tcp, udp, dccp or sctp.
—random
If option —random is used then port mapping will be randomized (kernel >=
2.6.22).
IPv6 support available starting Linux kernels >= 3.7.
REJECT (IPv6-specific)
This is used to send back an error packet in response to the matched packet: otherwise it is
equivalent to DROP so it is a terminating TARGET, ending rule traversal. This target is
only valid in the INPUT, FORWARD and OUTPUT chains, and user-defined chains
which are only called from those chains. The following option controls the nature of the
error packet returned:
—reject-with type
The type given can be icmp6-no-route, no-route, icmp6-adm-prohibited, adm-
prohibited, icmp6-addr-unreachable, addr-unreach, or icmp6-port-unreachable,
which return the appropriate ICMPv6 error message (icmp6-port-unreachable is the
default). Finally, the option tcp-reset can be used on rules which only match the TCP
protocol: this causes a TCP RST packet to be sent back. This is mainly useful for
blocking ident (113/tcp) probes which frequently occur when sending mail to broken
mail hosts (which won’t accept your mail otherwise). tcp-reset can only be used with
kernel versions 2.6.14 or later.
REJECT (IPv4-specific)
This is used to send back an error packet in response to the matched packet: otherwise it is
equivalent to DROP so it is a terminating TARGET, ending rule traversal. This target is
only valid in the INPUT, FORWARD and OUTPUT chains, and user-defined chains
which are only called from those chains. The following option controls the nature of the
error packet returned:
—reject-with type
The type given can be icmp-net-unreachable, icmp-host-unreachable, icmp-port-
unreachable, icmp-proto-unreachable, icmp-net-prohibited, icmp-host-
prohibited, or icmp-admin-prohibited (*), which return the appropriate ICMP error
message (icmp-port-unreachable is the default). The option tcp-reset can be used
on rules which only match the TCP protocol: this causes a TCP RST packet to be sent
back. This is mainly useful for blocking ident (113/tcp) probes which frequently
occur when sending mail to broken mail hosts (which won’t accept your mail
otherwise).
(*) Using icmp-admin-prohibited with kernels that do not support it will result in a plain
DROP instead of REJECT
SAME (IPv4-specific)
Similar to SNAT/DNAT depending on chain: it takes a range of addresses (`—to 1.2.3.4-
1.2.3.7’) and gives a client the same source-/destination-address for each connection.
N.B.: The DNAT target’s —persistent option replaced the SAME target.
—to ipaddr[-ipaddr]
Addresses to map source to. May be specified more than once for multiple ranges.
—nodst
Don’t use the destination-ip in the calculations when selecting the new source-ip
—random
Port mapping will be forcibly randomized to avoid attacks based on port prediction
(kernel >= 2.6.21).
SECMARK
This is used to set the security mark value associated with the packet for use by security
subsystems such as SELinux. It is valid in the security table (for backwards compatibility
with older kernels, it is also valid in the mangle table). The mark is 32 bits wide.
—selctx security_context
SET
This module adds and/or deletes entries from IP sets which can be defined by ipset(8).
—add-set setname flag[,flag…]
add the address(es)/port(s) of the packet to the set
—del-set setname flag[,flag…]
delete the address(es)/port(s) of the packet from the set
where flag(s) are src and/or dst specifications and there can be no more than six of
them.
—timeout value
when adding an entry, the timeout value to use instead of the default one from the set
definition
—exist
when adding an entry if it already exists, reset the timeout value to the specified one
or to the default from the set definition
Use of -j SET requires that ipset kernel support is provided, which, for standard kernels, is
the case since Linux 2.6.39.
SNAT
This target is only valid in the nat table, in the POSTROUTING and INPUT chains, and
user-defined chains which are only called from those chains. It specifies that the source
address of the packet should be modified (and all future packets in this connection will
also be mangled), and rules should cease being examined. It takes the following options:
—to-source [ipaddr[-ipaddr]][:port[-port]]
which can specify a single new source IP address, an inclusive range of IP addresses.
Optionally a port range, if the rule also specifies one of the following protocols: tcp,
udp, dccp or sctp. If no port range is specified, then source ports below 512 will be
mapped to other ports below 512: those between 512 and 1023 inclusive will be
mapped to ports below 1024, and other ports will be mapped to 1024 or above.
Where possible, no port alteration will occur. In Kernels up to 2.6.10, you can add
several —to-source options. For those kernels, if you specify more than one source
address, either via an address range or multiple —to-source options, a simple round-
robin (one after another in cycle) takes place between these addresses. Later Kernels
(>= 2.6.11-rc1) don’t have the ability to NAT to multiple ranges anymore.
—random
If option —random is used then port mapping will be randomized (kernel >=
2.6.21).
—persistent
Gives a client the same source-/destination-address for each connection. This
supersedes the SAME target. Support for persistent mappings is available from
2.6.29-rc2.
Kernels prior to 2.6.36-rc1 don’t have the ability to SNAT in the INPUT chain.
IPv6 support available since Linux kernels >= 3.7.
SNPT (IPv6-specific)
Provides stateless source IPv6-to-IPv6 Network Prefix Translation (as described by RFC
6296).
You have to use this target in the mangle table, not in the nat table. It takes the following
options:
—src-pfx [prefix/length]
Set source prefix that you want to translate and length
—dst-pfx [prefix/length]
Set destination prefix that you want to use in the translation and length
You have to use the DNPT target to undo the translation. Example:
ip6tables -t mangle -I POSTROUTING -s fd00::/64 -o vboxnet0 -j SNPT —src-pfx
fd00::/64 —dst-pfx 2001:e20:2000:40f::/64
ip6tables -t mangle -I PREROUTING -i wlan0 -d 2001:e20:2000:40f::/64 -j DNPT
—src-pfx 2001:e20:2000:40f::/64 —dst-pfx fd00::/64
You may need to enable IPv6 neighbor proxy:
sysctl -w net.ipv6.conf.all.proxy_ndp=1
You also have to use the NOTRACK target to disable connection tracking for translated
flows.
TCPMSS
This target allows to alter the MSS value of TCP SYN packets, to control the maximum
size for that connection (usually limiting it to your outgoing interface’s MTU minus 40 for
IPv4 or 60 for IPv6, respectively). Of course, it can only be used in conjunction with -p
tcp.
This target is used to overcome criminally braindead ISPs or servers which block “ICMP
Fragmentation Needed” or “ICMPv6 Packet Too Big” packets. The symptoms of this
problem are that everything works fine from your Linux firewall/router, but machines
behind it can never exchange large packets:
1.
Web browsers connect, then hang with no data received.
2.
Small mail works fine, but large emails hang.
3.
ssh works fine, but scp hangs after initial handshaking.
Workaround: activate this option and add a rule to your firewall configuration like:
iptables -t mangle -A FORWARD -p tcp —tcp-flags SYN,RST SYN -
j TCPMSS —clamp-mss-to-pmtu
—set-mss value
Explicitly sets MSS option to specified value. If the MSS of the packet is already
lower than value, it will not be increased (from Linux 2.6.25 onwards) to avoid more
problems with hosts relying on a proper MSS.
—clamp-mss-to-pmtu
Automatically clamp MSS value to (path_MTU - 40 for IPv4; -60 for IPv6). This
may not function as desired where asymmetric routes with differing path MTU exist
– the kernel uses the path MTU which it would use to send packets from itself to the
source and destination IP addresses. Prior to Linux 2.6.25, only the path MTU to the
destination IP address was considered by this option; subsequent kernels also
consider the path MTU to the source IP address.
These options are mutually exclusive.
TCPOPTSTRIP
This target will strip TCP options off a TCP packet. (It will actually replace them by NO-
OPs.) As such, you will need to add the -p tcp parameters.
—strip-options option[,option…]
Strip the given option(s). The options may be specified by TCP option number or by
symbolic name. The list of recognized options can be obtained by calling iptables
with -j TCPOPTSTRIP -h.
TEE
The TEE target will clone a packet and redirect this clone to another machine on the local
network segment. In other words, the nexthop must be the target, or you will have to
configure the nexthop to forward it further if so desired.
—gateway ipaddr
Send the cloned packet to the host reachable at the given IP address. Use of 0.0.0.0
(for IPv4 packets) or :: (IPv6) is invalid.
To forward all incoming traffic on eth0 to an Network Layer logging box:
-t mangle -A PREROUTING -i eth0 -j TEE —gateway 2001:db8::1
TOS
This module sets the Type of Service field in the IPv4 header (including the “precedence”
bits) or the Priority field in the IPv6 header. Note that TOS shares the same bits as DSCP
and ECN. The TOS target is only valid in the mangle table.
—set-tos value[/mask]
Zeroes out the bits given by mask (see NOTE below) and XORs value into the
TOS/Priority field. If mask is omitted, 0xFF is assumed.
—set-tos symbol
You can specify a symbolic name when using the TOS target for IPv4. It implies a
mask of 0xFF (see NOTE below). The list of recognized TOS names can be obtained
by calling iptables with -j TOS -h.
The following mnemonics are available:
—and-tos bits
Binary AND the TOS value with bits. (Mnemonic for —set-tos 0/invbits, where
invbits is the binary negation of bits. See NOTE below.)
—or-tos bits
Binary OR the TOS value with bits. (Mnemonic for —set-tos bits/bits. See NOTE
below.)
—xor-tos bits
Binary XOR the TOS value with bits. (Mnemonic for —set-tos bits/0. See NOTE
below.)
NOTE: In Linux kernels up to and including 2.6.38, with the exception of longterm
releases 2.6.32 (>=.42), 2.6.33 (>=.15), and 2.6.35 (>=.14), there is a bug whereby IPv6
TOS mangling does not behave as documented and differs from the IPv4 version. The
TOS mask indicates the bits one wants to zero out, so it needs to be inverted before
applying it to the original TOS field. However, the aformentioned kernels forgo the
inversion which breaks —set-tos and its mnemonics.
TPROXY
This target is only valid in the mangle table, in the PREROUTING chain and user-
defined chains which are only called from this chain. It redirects the packet to a local
socket without changing the packet header in any way. It can also change the mark value
which can then be used in advanced routing rules. It takes three options:
—on-port port
This specifies a destination port to use. It is a required option, 0 means the new
destination port is the same as the original. This is only valid if the rule also specifies
-p tcp or -p udp.
—on-ip address
This specifies a destination address to use. By default the address is the IP address of
the incoming interface. This is only valid if the rule also specifies -p tcp or -p udp.
—tproxy-mark value[/mask]
Marks packets with the given value/mask. The fwmark value set here can be used by
advanced routing. (Required for transparent proxying to work: otherwise these
packets will get forwarded, which is probably not what you want.)
TRACE
This target marks packets so that the kernel will log every rule which match the packets as
those traverse the tables, chains, rules.
A logging backend, such as ip(6)t_LOG or nfnetlink_log, must be loaded for this to be
visible. The packets are logged with the string prefix: “TRACE:
tablename:chainname:type:rulenum ” where type can be “rule” for plain rule, “return” for
implicit rule at the end of a user defined chain and “policy” for the policy of the built in
chains. It can only be used in the raw table.
TTL (IPv4-specific)
This is used to modify the IPv4 TTL header field. The TTL field determines how many
hops (routers) a packet can traverse until it’s time to live is exceeded.
Setting or incrementing the TTL field can potentially be very dangerous, so it should be
avoided at any cost. This target is only valid in mangle table.
Don’t ever set or increment the value on packets that leave your local network!
—ttl-set value
Set the TTL value to `value’.
—ttl-dec value
Decrement the TTL value `value’ times.
—ttl-inc value
Increment the TTL value `value’ times.
ULOG (IPv4-specific)
This is the deprecated ipv4-only predecessor of the NFLOG target. It provides userspace
logging of matching packets. When this target is set for a rule, the Linux kernel will
multicast this packet through a netlink socket. One or more userspace processes may then
subscribe to various multicast groups and receive the packets. Like LOG, this is a “non-
terminating target”, i.e. rule traversal continues at the next rule.
—ulog-nlgroup nlgroup
This specifies the netlink group (1-32) to which the packet is sent. Default value is 1.
—ulog-prefix prefix
Prefix log messages with the specified prefix; up to 32 characters long, and useful for
distinguishing messages in the logs.
—ulog-cprange size
Number of bytes to be copied to userspace. A value of 0 always copies the entire
packet, regardless of its size. Default is 0.
—ulog-qthreshold size
Number of packet to queue inside kernel. Setting this value to, e.g. 10 accumulates
ten packets inside the kernel and transmits them as one netlink multipart message to
userspace. Default is 1 (for backwards compatibility).
IPTABLES-RESTORE
› NAME
iptables-restore – Restore IP Tables
ip6tables-restore – Restore IPv6 Tables
› SYNOPSIS
iptables-restore [-chntv] [-M modprobe]
ip6tables-restore [-chntv] [-M modprobe] [-T name]
› DESCRIPTION
iptables-restore and ip6tables-restore are used to restore IP and IPv6 Tables from
data specified on STDIN. Use I/O redirection provided by your shell to read from a
file
-c, —counters
restore the values of all packet and byte counters
-h, —help
Print a short option summary.
-n, —noflush
don’t flush the previous contents of the table. If not specified, both commands flush
(delete) all previous contents of the respective table.
-t, —test
Only parse and construct the ruleset, but do not commit it.
-v, —verbose
Print additional debug info during ruleset processing.
-M, —modprobe modprobe_program
Specify the path to the modprobe program. By default, iptables-restore will inspect
/proc/sys/kernel/modprobe to determine the executable’s path.
-T, —table name
Restore only the named table even if the input stream contains other ones.
› BUGS
None known as of iptables-1.2.1 release
› AUTHORS
Harald Welte <[email protected]> wrote iptables-restore based on code from
Rusty Russell. Andras Kis-Szabo <[email protected]> contributed ip6tables-restore.
› SEE ALSO
iptables-save(8), iptables(8)
The iptables-HOWTO, which details more iptables usage, the NAT-HOWTO, which
details NAT, and the netfilter-hacking-HOWTO which details the internals.
IPTABLES-SAVE
› NAME
iptables-save – dump iptables rules to stdout
ip6tables-save – dump iptables rules to stdout
› SYNOPSIS
iptables-save [-M,—modprobe modprobe] [-c] [-t table]
ip6tables-save [-M,—modprobe modprobe] [-c] [-t table]
› DESCRIPTION
iptables-save and ip6tables-save are used to dump the contents of IP or IPv6 Table
in easily parseable format to STDOUT. Use I/O-redirection provided by your shell to
write to a file.
-M,—modprobe modprobe_program
Specify the path to the modprobe program. By default, iptables-save will inspect
/proc/sys/kernel/modprobe to determine the executable’s path.
-c, —counters
include the current values of all packet and byte counters in the output
-t, —table tablename
restrict output to only one table. If not specified, output includes all available tables.
› BUGS
None known as of iptables-1.2.1 release
› AUTHORS
Harald Welte <[email protected]> Rusty Russell <[email protected]>
Andras Kis-Szabo <[email protected]> contributed ip6tables-save.
› SEE ALSO
iptables-restore(8), iptables(8)
The iptables-HOWTO, which details more iptables usage, the NAT-HOWTO, which
details NAT, and the netfilter-hacking-HOWTO which details the internals.
IPTABLES
› NAME
iptables/ip6tables – administration tool for IPv4/IPv6 packet filtering and NAT
› SYNOPSIS
iptables [-t table] {-A|-C|-D} chain rule-specification
ip6tables [-t table] {-A|-C|-D} chain rule-specification
iptables [-t table] -I chain [rulenum] rule-specification
iptables [-t table] -R chain rulenum rule-specification
iptables [-t table] -D chain rulenum
iptables [-t table] -S [chain [rulenum]]
iptables [-t table] {-F|-L|-Z} [chain [rulenum]] [options…]
iptables [-t table] -N chain
iptables [-t table] -X [chain]
iptables [-t table] -P chain target
iptables [-t table] -E old-chain-name new-chain-name
rule-specification = [matches…] [target]
match = -m matchname [per-match-options]
target = -j targetname [per-target-options]
› DESCRIPTION
Iptables and ip6tables are used to set up, maintain, and inspect the tables of IPv4
and IPv6 packet filter rules in the Linux kernel. Several different tables may be
defined. Each table contains a number of built-in chains and may also contain user-
defined chains.
Each chain is a list of rules which can match a set of packets. Each rule specifies
what to do with a packet that matches. This is called a `target’, which may be a jump
to a user-defined chain in the same table.
› TARGETS
A firewall rule specifies criteria for a packet and a target. If the packet does not
match, the next rule in the chain is examined; if it does match, then the next rule is
specified by the value of the target, which can be the name of a user-defined chain,
one of the targets described in iptables-extensions(8), or one of the special values
ACCEPT, DROP or RETURN.
ACCEPT means to let the packet through. DROP means to drop the packet on the
floor. RETURN means stop traversing this chain and resume at the next rule in the
previous (calling) chain. If the end of a built-in chain is reached or a rule in a built-in
chain with target RETURN is matched, the target specified by the chain policy
determines the fate of the packet.
› TABLES
There are currently five independent tables (which tables are present at any time
depends on the kernel configuration options and which modules are present).
-t, —table table
This option specifies the packet matching table which the command should operate
on. If the kernel is configured with automatic module loading, an attempt will be
made to load the appropriate module for that table if it is not already there.
The tables are as follows:
filter:
This is the default table (if no -t option is passed). It contains the built-in chains
INPUT (for packets destined to local sockets), FORWARD (for packets being routed
through the box), and OUTPUT (for locally-generated packets).
nat:
This table is consulted when a packet that creates a new connection is encountered. It
consists of three built-ins: PREROUTING (for altering packets as soon as they
come in), OUTPUT (for altering locally-generated packets before routing), and
POSTROUTING (for altering packets as they are about to go out). IPv6 NAT
support is available since kernel 3.7.
mangle:
This table is used for specialized packet alteration. Until kernel 2.4.17 it had two
built-in chains: PREROUTING (for altering incoming packets before routing) and
OUTPUT (for altering locally-generated packets before routing). Since kernel
2.4.18, three other built-in chains are also supported: INPUT (for packets coming
into the box itself), FORWARD (for altering packets being routed through the box),
and POSTROUTING (for altering packets as they are about to go out).
raw:
This table is used mainly for configuring exemptions from connection tracking in
combination with the NOTRACK target. It registers at the netfilter hooks with higher
priority and is thus called before ip_conntrack, or any other IP tables. It provides the
following built-in chains: PREROUTING (for packets arriving via any network
interface) OUTPUT (for packets generated by local processes)
security:
This table is used for Mandatory Access Control (MAC) networking rules, such as
those enabled by the SECMARK and CONNSECMARK targets. Mandatory
Access Control is implemented by Linux Security Modules such as SELinux. The
security table is called after the filter table, allowing any Discretionary Access
Control (DAC) rules in the filter table to take effect before MAC rules. This table
provides the following built-in chains: INPUT (for packets coming into the box
itself), OUTPUT (for altering locally-generated packets before routing), and
FORWARD (for altering packets being routed through the box).
› OPTIONS
The options that are recognized by iptables and ip6tables can be divided into several
different groups.
COMMANDS
These options specify the desired action to perform. Only one of them can be
specified on the command line unless otherwise stated below. For long versions of
the command and option names, you need to use only enough letters to ensure that
iptables can differentiate it from all other options.
-A, —append chain rule-specification
Append one or more rules to the end of the selected chain. When the source and/or
destination names resolve to more than one address, a rule will be added for each
possible address combination.
-C, —check chain rule-specification
Check whether a rule matching the specification does exist in the selected chain. This
command uses the same logic as -D to find a matching entry, but does not alter the
existing iptables configuration and uses its exit code to indicate success or failure.
-D, —delete chain rule-specification
-D, —delete chain rulenum
Delete one or more rules from the selected chain. There are two versions of this
command: the rule can be specified as a number in the chain (starting at 1 for the first
rule) or a rule to match.
-I, —insert chain [rulenum] rule-specification
Insert one or more rules in the selected chain as the given rule number. So, if the rule
number is 1, the rule or rules are inserted at the head of the chain. This is also the
default if no rule number is specified.
-R, —replace chain rulenum rule-specification
Replace a rule in the selected chain. If the source and/or destination names resolve to
multiple addresses, the command will fail. Rules are numbered starting at 1.
-L, —list [chain]
List all rules in the selected chain. If no chain is selected, all chains are listed. Like
every other iptables command, it applies to the specified table (filter is the default),
so NAT rules get listed by iptables -t nat -n -L Please note that it is often used
with the -n option, in order to avoid long reverse DNS lookups. It is legal to specify
the -Z (zero) option as well, in which case the chain(s) will be atomically listed and
zeroed. The exact output is affected by the other arguments given. The exact rules are
suppressed until you use iptables -L -v
-S, —list-rules [chain]
Print all rules in the selected chain. If no chain is selected, all chains are printed like
iptables-save. Like every other iptables command, it applies to the specified table
(filter is the default).
-F, —flush [chain]
Flush the selected chain (all the chains in the table if none is given). This is
equivalent to deleting all the rules one by one.
-Z, —zero [chain [rulenum]]
Zero the packet and byte counters in all chains, or only the given chain, or only the
given rule in a chain. It is legal to specify the -L, —list (list) option as well, to see the
counters immediately before they are cleared. (See above.)
-N, —new-chain chain
Create a new user-defined chain by the given name. There must be no target of that
name already.
-X, —delete-chain [chain]
Delete the optional user-defined chain specified. There must be no references to the
chain. If there are, you must delete or replace the referring rules before the chain can
be deleted. The chain must be empty, i.e. not contain any rules. If no argument is
given, it will attempt to delete every non-builtin chain in the table.
-P, —policy chain target
Set the policy for the chain to the given target. See the section TARGETS for the
legal targets. Only built-in (non-user-defined) chains can have policies, and neither
built-in nor user-defined chains can be policy targets.
-E, —rename-chain old-chain new-chain
Rename the user specified chain to the user supplied name. This is cosmetic, and has
no effect on the structure of the table.
-h
Help. Give a (currently very brief) description of the command syntax.
PARAMETERS
The following parameters make up a rule specification (as used in the add, delete, insert,
replace and append commands).
-4, —ipv4
This option has no effect in iptables and iptables-restore. If a rule using the -4 option
is inserted with (and only with) ip6tables-restore, it will be silently ignored. Any
other uses will throw an error. This option allows to put both IPv4 and IPv6 rules in a
single rule file for use with both iptables-restore and ip6tables-restore.
-6, —ipv6
If a rule using the -6 option is inserted with (and only with) iptables-restore, it will be
silently ignored. Any other uses will throw an error. This option allows to put both
IPv4 and IPv6 rules in a single rule file for use with both iptables-restore and
ip6tables-restore. This option has no effect in ip6tables and ip6tables-restore.
[!] -p, —protocol protocol
The protocol of the rule or of the packet to check. The specified protocol can be one
of tcp, udp, udplite, icmp, icmpv6,esp, ah, sctp, mh or the special keyword “all“,
or it can be a numeric value, representing one of these protocols or a different one. A
protocol name from /etc/protocols is also allowed. A “!” argument before the protocol
inverts the test. The number zero is equivalent to all. “all” will match with all
protocols and is taken as default when this option is omitted. Note that, in ip6tables,
IPv6 extension headers except esp are not allowed. esp and ipv6-nonext can be used
with Kernel version 2.6.11 or later. The number zero is equivalent to all, which
means that you cannot test the protocol field for the value 0 directly. To match on a
HBH header, even if it were the last, you cannot use -p 0, but always need -m hbh.
[!] -s, —source address[/mask][,…]
Source specification. Address can be either a network name, a hostname, a network
IP address (with /mask), or a plain IP address. Hostnames will be resolved once only,
before the rule is submitted to the kernel. Please note that specifying any name to be
resolved with a remote query such as DNS is a really bad idea. The mask can be
either an ipv4 network mask (for iptables) or a plain number, specifying the number
of 1’s at the left side of the network mask. Thus, an iptables mask of 24 is equivalent
to 255.255.255.0. A “!” argument before the address specification inverts the sense of
the address. The flag —src is an alias for this option. Multiple addresses can be
specified, but this will expand to multiple rules (when adding with -A), or will
cause multiple rules to be deleted (with -D).
[!] -d, —destination address[/mask][,…]
Destination specification. See the description of the -s (source) flag for a detailed
description of the syntax. The flag —dst is an alias for this option.
-m, —match match
Specifies a match to use, that is, an extension module that tests for a specific
property. The set of matches make up the condition under which a target is invoked.
Matches are evaluated first to last as specified on the command line and work in
short-circuit fashion, i.e. if one extension yields false, evaluation will stop.
-j, —jump target
This specifies the target of the rule; i.e., what to do if the packet matches it. The
target can be a user-defined chain (other than the one this rule is in), one of the
special builtin targets which decide the fate of the packet immediately, or an
extension (see EXTENSIONS below). If this option is omitted in a rule (and -g is
not used), then matching the rule will have no effect on the packet’s fate, but the
counters on the rule will be incremented.
-g, —goto chain
This specifies that the processing should continue in a user specified chain. Unlike
the —jump option return will not continue processing in this chain but instead in the
chain that called us via —jump.
[!] -i, —in-interface name
Name of an interface via which a packet was received (only for packets entering the
INPUT, FORWARD and PREROUTING chains). When the “!” argument is used
before the interface name, the sense is inverted. If the interface name ends in a “+”,
then any interface which begins with this name will match. If this option is omitted,
any interface name will match.
[!] -o, —out-interface name
Name of an interface via which a packet is going to be sent (for packets entering the
FORWARD, OUTPUT and POSTROUTING chains). When the “!” argument is
used before the interface name, the sense is inverted. If the interface name ends in a
“+”, then any interface which begins with this name will match. If this option is
omitted, any interface name will match.
[!] -f, —fragment
This means that the rule only refers to second and further IPv4 fragments of
fragmented packets. Since there is no way to tell the source or destination ports of
such a packet (or ICMP type), such a packet will not match any rules which specify
them. When the “!” argument precedes the “-f” flag, the rule will only match head
fragments, or unfragmented packets. This option is IPv4 specific, it is not available in
ip6tables.
-c, —set-counters packets bytes
This enables the administrator to initialize the packet and byte counters of a rule
(during INSERT, APPEND, REPLACE operations).
OTHER OPTIONS
iscsiadm supports the iSNS (isns) or SendTargets (st) discovery type. An SLP
implementation is under development.
› EXIT STATUS
On success 0 is returned. On error one of the return codes below will be returned.
Commands that operation on multiple objects (sessions, records, etc),
iscsiadm/iscsistart will return the first error that is encountered. iscsiadm/iscsistart
will attempt to execute the operation on the objects it can. If no objects are found
ISCSI_ERR_NO_OBJS_FOUND is returned.
0
ISCSI_SUCCESS - command executed successfully.
1
ISCSI_ERR - generic error code.
2
ISCSI_ERR_SESS_NOT_FOUND - session could not be found.
3
ISCSI_ERR_NOMEM - could not allocate resource for operation.
4
ISCSI_ERR_TRANS - connect problem caused operation to fail.
5
ISCSI_ERR_LOGIN - generic iSCSI login failure.
6
ISCSI_ERR_IDBM - error accessing/managing iSCSI DB.
7
ISCSI_ERR_INVAL - invalid argument.
8
ISCSI_ERR_TRANS_TIMEOUT - connection timer exired while trying to connect.
9
ISCSI_ERR_INTERNAL - generic internal iscsid/kernel failure.
10
ISCSI_ERR_LOGOUT - iSCSI logout failed.
11
ISCSI_ERR_PDU_TIMEOUT - iSCSI PDU timedout.
12
ISCSI_ERR_TRANS_NOT_FOUND - iSCSI transport module not loaded in kernel
or iscsid.
13
ISCSI_ERR_ACCESS - did not have proper OS permissions to access iscsid or
execute iscsiadm command.
14
ISCSI_ERR_TRANS_CAPS - transport module did not support operation.
15
ISCSI_ERR_SESS_EXISTS - session is logged in.
16
ISCSI_ERR_INVALID_MGMT_REQ - invalid IPC MGMT request.
17
ISCSI_ERR_ISNS_UNAVAILABLE - iSNS service is not supported.
18
ISCSI_ERR_ISCSID_COMM_ERR - a read/write to iscsid failed.
19
ISCSI_ERR_FATAL_LOGIN - fatal iSCSI login error.
20
ISCSI_ERR_ISCSID_NOTCONN - could ont connect to iscsid.
21
ISCSI_ERR_NO_OBJS_FOUND - no records/targets/sessions/portals found to
execute operation on.
22
ISCSI_ERR_SYSFS_LOOKUP - could not lookup object in sysfs.
23
ISCSI_ERR_HOST_NOT_FOUND - could not lookup host.
24
ISCSI_ERR_LOGIN_AUTH_FAILED - login failed due to authorization failure.
25
ISCSI_ERR_ISNS_QUERY - iSNS query failure.
26
ISCSI_ERR_ISNS_REG_FAILED - iSNS registration/deregistration failed.
› EXAMPLES
Discover targets at a given IP address: iscsiadm --mode discoverydb --
type sendtargets --portal 192.168.1.10 --discover Login, must use a
node record id found by the discovery: iscsiadm --mode node --
targetname iqn.2001-05.com.doe:test --portal 192.168.1.1:3260 --login
Logout: iscsiadm --mode node --targetname iqn.2001-05.com.doe:test --
portal 192.168.1.1:3260 --logout List node records: iscsiadm --mode
node Display all data for a given node record: iscsiadm --mode node --
targetname iqn.2001-05.com.doe:test --portal 192.168.1.1:3260
› FILES
/etc/iscsi/iscsid.conf
The configuration file read by iscsid and iscsiadm on startup.
/etc/iscsi/initiatorname.iscsi
The file containing the iSCSI InitiatorName and InitiatorAlias read by iscsid and
iscsiadm on startup.
/var/lib/iscsi/nodes/
This directory contains the nodes with their targets.
/var/lib/iscsi/send_targets
This directory contains the portals.
› SEE ALSO
iscsid(8)
› AUTHORS
Open-iSCSI project <http://www.open-iscsi.org/> Alex Aizman
<[email protected]> Dmitry Yusupov <[email protected]>
ISCSID
› NAME
iscsid - Open-iSCSI daemon
› SYNOPSIS
iscsid [OPTION]
› DESCRIPTION
The iscsid implements the control path of iSCSI protocol, plus some management
facilities. For example, the daemon could be configured to automatically re-start
discovery at startup, based on the contents of persistent iSCSI database.
› OPTIONS
[-c|—config=]config-file
Read configuration from config-file rather than the default /etc/iscsi/iscsid.conf file.
[-i|—initiatorname=]iname-file
Read initiator name from iname-file rather than the default
/etc/iscsi/initiatorname.iscsi file.
[-f|—foreground]
run iscsid in the foreground.
[-d|—debug=]debug_level
print debugging information. Valid values for debug_level are 0 to 8.
[-u|—uid=]uid
run under user ID uid (default is the current user ID)
[-g|—gid=]gid
run under user group ID gid (default is the current user group ID).
[-n|—no-pid-file]
do not write a process ID file.
[-p|—pid=]pid-file
write process ID to pid-file rather than the default /var/run/iscsid.pid
[-h|—help]
display this help and exit
[-v|—version]
display version and exit.
› FILES
/etc/iscsi/iscsid.conf
The configuration file read by iscsid and iscsiadm on startup.
/etc/iscsi/initiatorname.iscsi
The file containing the iSCSI initiatorname and initiatoralias read by iscsid and
iscsiadm on startup.
/etc/iscsi/nodes
Open-iSCSI persistent configuration database
› SEE ALSO
iscsiadm(8)
› AUTHORS
Open-iSCSI project <http://www.open-iscsi.org/> Alex Aizman
<[email protected]> Dmitry Yusupov <[email protected]>
ISCSISTART
› NAME
iscsistart - iSCSI boot tool
› SYNOPSIS
iscsistart [OPTION]
› DESCRIPTION
iscsistart will start a session using the settings passed in, or using the iBFT or Open
Firmware [OF] boot information. This program should not be run to manage
sessions. Its primary use is to start sessions used for iSCSI root boot.
› OPTIONS
[-i|—initiatorname=]name
Set InitiatorName to name (Required if not using iBFT or OF)
[-t|—targetname=]name
Set TargetName to name (Required if not using iBFT or OF)
[-g|—tgpt=]N
Set target portal group tag to N (Required if not using iBFT or OF)
[-a|—address=]A.B.C.D
Set IP addres to A.B.C.D (Required if not using iBFT or OF)
[-p|—port=]N
Set port to N (Optional. Default 3260)
[-u|—username=]N
Set username to N (Optional)
[-w|—password=]N
Set password to N (Optional)
[-U|-username_in=]N
Set incoming username to N (Optional)
[-W|—password_in=]N
Set incoming password to N (Optional)
[-d|—debug=]debug_level
Print debugging information
[-b|—fwparam_connect]
Create a session to the target using the iBFT or OF info
[-N|—fwparam_network]
Bring up the network as specified by iBFT or OF
[-f|—fwparam_print]
Print the iBFT or OF info to STDOUT
[-P|—param=]NAME=VALUE
Set the parameter with the name NAME to VALUE. NAME is one of the settings in
the node record or iscsid.conf. Multiple params can be passed in.
[-h|—help]
Display this help and exit
[-v|—version]
Display version and exit
› SEE ALSO
iscsiadm(8)
› AUTHORS
Open-iSCSI project <http://www.open-iscsi.org/> Mike Christie
<[email protected]>
iscsiuio
› NAME
iscsiuio - iSCSI UserSpace I/O driver
› SYNOPSIS
iscsiuio [-d-f-v]
› DESCRIPTION
iscsiuio is the UserSpace I/O driver for the Broadcom NetXtreme II
BCM5706/5708/5709 series PCI/PCI-X Gigabit Ethernet Network Interface Card
(NIC) and for the Broadcom NetXtreme II
BCM57710/57711/57712/57800/57810/57840 series PCI-E 10 Gigabit Ethernet
Network Interface Card. The driver has been tested on 2.6.28 kernels and above.
Refer to the README.TXT from the driver package on how to compile and install
the driver.
Refer to various Linux documentations on how to configure network protocol and
address.
› DRIVER DEPENDENCIES
› PARAMETERS
There are very few parameters when running this application.
-d<debuglevel>
This is to enable debug mode where debug messages will be sent to stdout The
following debug modes are supported
The size of the file (or block device) holding an iso9660 filesystem can be marginally
larger than the actual size of the iso9660 filesystem. One reason for this is that cd writers
are allowed to add “run out” sectors at the end of an iso9660 image.
› AVAILABILITY
The isosize command is part of the util-linux package and is available from Linux
Kernel Archive
IW
› NAME
iw - show / manipulate wireless devices and their configuration
› SYNOPSIS
iw [ OPTIONS ] { help [ command ] | OBJECT COMMAND }
OBJECT := { dev | phy | reg }
OPTIONS := { —version | —debug }
› OPTIONS
—version
print version information and exit.
—debug
enable netlink message debugging.
› IW - COMMAND SYNTAX
OBJECT
COMMAND
Specifies the action to perform on the object. The set of possible actions depends on the
object type. iw help will print all supported commands, while iw help command will
print the help for all matching commands.
› SEE ALSO
ip(8), crda(8), regdbdump(8), regulatory.bin(5)
http://wireless.kernel.org/en/users/Documentation/iw
KBDRATE
› NAME
kbdrate - reset the keyboard repeat rate and delay time
› SYNOPSIS
kbdrate [ -s ] [ -r rate ] [ -d delay ]
› DESCRIPTION
kbdrate is used to change the keyboard repeat rate and delay time. The delay is the
amount of time that a key must be depressed before it will start to repeat.
Using kbdrate without any options will reset the repeat rate to 10.9 characters per
second (cps) and the delay to 250 milliseconds (ms) for Intel- and M68K-based
systems. These are the IBM defaults. On SPARC-based systems it will reset the
repeat rate to 20 cps and the delay to 200 ms.
› OPTIONS
-s
Silent. No messages are printed.
-r rate
Change the keyboard repeat rate to rate cps. For Intel-based systems, the allowable
range is from 2.0 to 30.0 cps. Only certain, specific values are possible, and the
program will select the nearest possible value to the one specified. The possible
values are given, in characters per second, as follows: 2.0, 2.1, 2.3, 2.5, 2.7, 3.0, 3.3,
3.7, 4.0, 4.3, 4.6, 5.0, 5.5, 6.0, 6.7, 7.5, 8.0, 8.6, 9.2, 10.0, 10.9, 12.0, 13.3, 15.0, 16.0,
17.1, 18.5, 20.0, 21.8, 24.0, 26.7, 30.0. For SPARC-based systems, the allowable
range is from 0 (no repeat) to 50 cps.
-d delay
Change the delay to delay milliseconds. For Intel-based systems, the allowable range
is from 250 to 1000 ms, in 250 ms steps. For SPARC systems, possible values are
between 10 ms and 1440 ms, in 10 ms steps.
-V
Display a version number and exit.
› BUGS
Not all keyboards support all rates.
Not all keyboards have the rates mapped in the same way.
Setting the repeat rate on the Gateway AnyKey keyboard does not work. If someone
with a Gateway figures out how to program the keyboard, please send mail to util-
[email protected].
All this is very architecture dependent. Nowadays kbdrate first tries the
KDKBDREP and KIOCSRATE ioctls. (The former usually works on an m68k
machine, the latter for SPARC.) When these ioctls fail an ioport interface as on i386
is assumed.
› FILES
/etc/rc.local /dev/port /dev/kbd
KDUMP
› NAME
kdump - This is just a placeholder until real man page has been written
› SYNOPSIS
kdump [options] start_address…
› DESCRIPTION
kdump does not have a man page yet.
› OPTIONS
› SEE ALSO
› AUTHOR
kdump was written by Eric Biederman.
This manual page was written by Khalid Aziz <[email protected]>, for the Debian
project (but may be used by others).
KDUMPCTL
› NAME
kdumpctl - control interface for kdump
› SYNOPSIS
kdumpctl COMMAND
› DESCRIPTION
kdumpctl is used to check or control the kdump service. In most cases, you should
use systemctl to start / stop / enable kdump service instead. However, kdumpctl
provides more details for debug and a helper to setup ssh key authentication.
› COMMANDS
start
Start the service.
stop
Stop the service.
status
Prints the current status of kdump service. It returns non-zero value if kdump is not
operational.
restart
Is equal to start; stop
propagate
Helps to setup key authentication for ssh storage since it’s impossible to use
password authentication during kdump.
showmem
Prints the size of reserved memory for crash kernel in megabytes.
› SEE ALSO
kdump.conf(5), mkdumprd(8)
KERNEL-INSTALL
› NAME
kernel-install - Add and remove kernel and initramfs images to and from /boot
› SYNOPSIS
kernel-install COMMAND KERNEL-VERSION [KERNEL-IMAGE]
› DESCRIPTION
kernel-install
is used to install and remove kernel and initramfs images to and from /boot.
kernel-install will execute the files located in the directory /usr/lib/kernel/install.d/
and the local administration directory /etc/kernel/install.d/. All files are collectively
sorted and executed in lexical order, regardless of the directory in which they live.
However, files with identical filenames replace each other. Files in
/etc/kernel/install.d/ take precedence over files with the same name in
/usr/lib/kernel/install.d/. This can be used to override a system-supplied executables
with a local file if needed; a symbolic link in /etc/kernel/install.d/ with the same name
as an executable in /usr/lib/kernel/install.d/, pointing to /dev/null, disables the
executable entirely. Executables must have the extension “.install”; other extensions
are ignored.
› COMMANDS
The following commands are understood:
add KERNEL-VERSION KERNEL-IMAGE
kernel-install creates the directory /boot/MACHINE-ID/KERNEL-VERSION/ and
calls every executable /usr/lib/kernel/install.d/*.install and
/etc/kernel/install.d/*.install with the arguments
add KERNEL-VERSION /boot/MACHINE-ID/KERNEL-VERSION/
/etc/kernel/cmdline /proc/cmdline
The content of the file /etc/kernel/cmdline specifies the kernel command line to use.
If that file does not exist, /proc/cmdline is used.
/etc/machine-id
The content of the file specifies the machine identification MACHINE-ID.
/etc/os-release /usr/lib/os-release
The content of the file specifies the operating system title PRETTY_NAME.
› SEE ALSO
machine-id(5), os-release(5), Boot loader specificationm[]
[1]
› NOTES
1.
Boot loader specification
http://www.freedesktop.org/wiki/Specifications/BootLoaderSpec
kexec
› NAME
kexec - directly boot into a new kernel
› SYNOPSIS
/sbin/kexec [-v (—version)] [-f (—force)] [-x (—no-ifdown)] [-l (—load)] [-p (—
load-panic)] [-u (—unload)] [-e (—exec)] [-t (—type)] [—mem-min=addr] [—
mem-max=addr]
› DESCRIPTION
kexec is a system call that enables you to load and boot into another kernel from the
currently running kernel. kexec performs the function of the boot loader from within
the kernel. The primary difference between a standard system boot and a kexec boot
is that the hardware initialization normally performed by the BIOS or firmware
(depending on architecture) is not performed during a kexec boot. This has the effect
of reducing the time required for a reboot.
Make sure you have selected CONFIG_KEXEC=y when configuring the kernel.
The CONFIG_KEXEC option enables the kexec system call.
› USAGE
Using kexec consists of
(1) loading the kernel to be rebooted to into memory, and
After this kernel is loaded, it can be booted to at any time using the command:
kexec -e
› OPTIONS
-d (—debug)
Enable debugging messages.
-e (—exec)
Run the currently loaded kernel. Note that it will reboot into the loaded kernel
without calling shutdown(8).
-f (—force)
Force an immediate kexec call, do not call shutdown(8) (contrary to the default
action without any option parameter). This option performs the same actions like
executing -l and -e in one call.
-h (—help)
Open a help file for kexec.
-l (—load) kernel
Load the specified kernel into the current kernel.
-p (—load-panic)
Load the new kernel for use on panic.
-t (—type=type)
Specify that the new kernel is of this type.
-u (—unload)
Unload the current kexec target kernel. If a capture kernel is being unloaded then
specify -p with -u.
-v (—version)
Return the version number of the installed utility.
-x (—no-ifdown)
Shut down the running kernel, but restore the interface on reload. (If this option is
used, it must be specified last.)
—mem-min=addr
Specify the lowest memory address addr to load code into.
—mem-max=addr
Specify the highest memory address addr to load code into.
—entry=addr
Specify the jump back address. (0 means it’s not jump back or preserve context)
—load-preserve-context
Load the new kernel and preserve context of current kernel during kexec.
—load-jump-back-helper
Load a helper image to jump back to original kernel.
—reuseinitrd
Reuse initrd from first boot.
› SUPPORTED KERNEL FILE TYPES AND OPTIONS
Beoboot-x86
—args-elf
Pass ELF boot notes.
—args-linux
Pass Linux kernel style options.
—real-mode
Use the kernel’s real mode entry point.
elf-x86
—append=string
Append string to the kernel command line.
—command-line=string
Set the kernel command line to string.
—reuse-cmdline
Use the command line from the running system. When a panic kernel is loaded, it
strips the crashkernel parameter automatically. The BOOT_IMAGE parameter is also
stripped.
—initrd=file
Use file as the kernel’s initial ramdisk.
—ramdisk=file
Use file as the kernel’s initial ramdisk.
bzImage-x86
—append=string
Append string to the kernel command line.
—command-line=string
Set the kernel command line to string.
—reuse-cmdline
Use the command line from the running system. When a panic kernel is loaded, it
strips the crashkernel parameter automatically. The BOOT_IMAGE parameter is also
stripped.
—initrd=file
Use file as the kernel’s initial ramdisk.
—ramdisk=file
Use file as the kernel’s initial ramdisk.
—real-mode
Use real-mode entry point.
multiboot-x86
—command-line=string
Set the kernel command line to string.
—reuse-cmdline
Use the command line from the running system. When a panic kernel is loaded, it
strips the crashkernel parameter automatically. The BOOT_IMAGE parameter is also
stripped.
—module=mod arg1 arg2 …
Load module mod with command-line arguments arg1 arg2 … This parameter can be
specified multiple times.
› ARCHITECTURE OPTIONS
—console-serial
Enable the serial console.
—console-vga
Enable the VGA console.
—elf32-core-headers
Prepare core headers in ELF32 format.
—elf64-core-headers
Prepare core headers in ELF64 format.
—reset-vga
Attempt to reset a standard VGA device.
—serial=port
Specify the serial port for debug output.
—serial-baud=baud_rate
Specify the baud rate of the serial port.
KEY.DNS_RESOLVER
› NAME
key.dns_resolver - Upcall for request-key to handle dns_resolver keys
› SYNOPSIS
/sbin/key.dns_resolver <key> /sbin/key.dns_resolver -D [-v] [-v] <keydesc>
<calloutinfo>
› DESCRIPTION
This program is invoked by request-key on behalf of the kernel when kernel services
(such as NFS, CIFS and AFS) want to perform a hostname lookup and the kernel
does not have the key cached. It is not ordinarily intended to be called directly.
It can be called in debugging mode to test its functionality by passing a -D flag on the
command line. For this to work, the key description and the callout information must
be supplied. Verbosity can be increased by supplying one or more -v flags.
› ERRORS
All errors will be logged to the syslog.
› SEE ALSO
request-key(8), request-key.conf(5)
KILLALL5
› NAME
killall5 — send a signal to all processes.
› SYNOPSIS
killall5 -signalnumber [-o omitpid[,omitpid..]] [-o omitpid[,omitpid..]..]
› DESCRIPTION
killall5 is the SystemV killall command. It sends a signal to all processes except
kernel threads and the processes in its own session, so it won’t kill the shell that is
running the script it was called from. Its primary (only) use is in the rc scripts found
in the /etc/init.d directory.
› OPTIONS
-o omitpid
Tells killall5 to omit processes with that process id.
› NOTES
killall5 can also be invoked as pidof, which is simply a (symbolic) link to the killall5
program.
› EXIT STATUS
The program return zero if it killed processes. It return 2 if no process were killed,
and 1 if it was unable to find any processes (/proc/ is missing).
› SEE ALSO
halt(8), reboot(8), pidof(8)
› AUTHOR
Miquel van Smoorenburg, [email protected]
KMOD
› NAME
kmod - Program to manage Linux Kernel modules
› SYNOPSIS
kmod [OPTIONS…] [COMMAND] [COMMAND_OPTIONS…]
› DESCRIPTION
kmod
is a multi-call binary which implements the programs used to control Linux Kernel
modules. Most users will only run it using its other names.
› OPTIONS
-V —version
Show the program version and exit.
-h —help
Show the help message.
› COMMANDS
help
Show the help message.
list
List the currently loaded modules.
static-nodes
Output the static device nodes information provided by the modules of the currently
running kernel version.
› COPYRIGHT
This manual page originally Copyright 2014, Marco d’Itri. Maintained by Lucas De
Marchi and others.
› SEE ALSO
lsmod(8), rmmod(8), insmod(8), modinfo(8), modprobe(8), depmod(8)
› AUTHOR
Lucas De Marchi <[email protected]>
Developer
KPARTX
› NAME
kpartx - Create device maps from partition tables
› SYNOPSIS
kpartx [-a | -d | -l] [-v] wholedisk
› DESCRIPTION
This tool, derived from util-linux’ partx, reads partition tables on specified device and
create device maps over partitions segments detected. It is called from hotplug upon
device maps creation and deletion.
› OPTIONS
-a
Add partition mappings
-r
Read-only partition mappings
-d
Delete partition mappings
-u
Update partition mappings
-l
List partition mappings that would be added -a
-p
set device name-partition number delimiter
-f
force creation of mappings; overrides ‘no_partitions’ feature
-g
force GUID partition table (GPT)
-v
Operate verbosely
-s
Sync mode. Don’t return until the partitions are created
› EXAMPLE
To mount all the partitions in a raw disk image:
kpartx -av disk.img
-h, —help
Display help message and exit.
-R, —root CHROOT_DIR
Apply changes in the CHROOT_DIR directory and use the configuration files from
the CHROOT_DIR directory.
-t, —time DAYS
Print the lastlog records more recent than DAYS.
-u, —user LOGIN|RANGE
Print the lastlog record of the specified user(s).
The users can be specified by a login name, a numerical user ID, or a RANGE of
users. This RANGE of users can be specified with a min and max values (UID_MIN-
UID_MAX), a max value (-UID_MAX), or a min value (UID_MIN-).
If the user has never logged in the message ** Never logged in** will be displayed instead
of the port and time.
Only the entries for the current users of the system will be displayed. Other entries may
exist for users that were deleted previously.
› NOTE
The lastlog file is a database which contains info on the last login of each user. You
should not rotate it. It is a sparse file, so its size on the disk is usually much smaller
than the one shown by “ls -l (which can indicate a really big file if you have in
passwd users with a high UID). You can display its real size with ls -s“.
› FILES
/var/log/lastlog
Database times of previous user logins.
› CAVEATS
Large gaps in UID numbers will cause the lastlog program to run longer with no
output to the screen (i.e. if in lastlog database there is no entries for users with UID
between 170 and 800 lastlog will appear to hang as it processes entries with UIDs
171-799).
LD.SO
› NAME
ld.so, ld-linux.so* - dynamic linker/loader
› SYNOPSIS
The dynamic linker can be run either indirectly by running some dynamically linked
program or library (in which case no command-line options to the dynamic linker can
be passed and, in the ELF case, the dynamic linker which is stored in the .interp
section of the program is executed) or directly by running:
/lib/ld-linux.so.* [OPTIONS] [PROGRAM [ARGUMENTS]]
› DESCRIPTION
The programs ld.so and ld-linux.so* find and load the shared libraries needed by a
program, prepare the program to run, and then run it.
Linux binaries require dynamic linking (linking at run time) unless the -static option
was given to ld(1) during compilation.
The program ld.so handles a.out binaries, a format used long ago; ld-linux.so*
handles ELF (/lib/ld-linux.so.1 for libc5, /lib/ld-linux.so.2 for glibc2), which
everybody has been using for years now. Otherwise both have the same behavior, and
use the same support files and programs ldd(1), ldconfig(8) and /etc/ld.so.conf.
When resolving library dependencies, the dynamic linker first inspects each
dependency string to see if it contains a slash (this can occur if a library pathname
containing slashes was specified at link time). If a slash is found, then the
dependency string is interpreted as a (relative or absolute) pathname, and the library
is loaded using that pathname.
If a library dependency does not contain a slash, then it is searched for in the
following order:
o
(ELF only) Using the directories specified in the DT_RPATH dynamic section
attribute of the binary if present and DT_RUNPATH attribute does not exist. Use of
DT_RPATH is deprecated.
o
Using the environment variable LD_LIBRARY_PATH. Except if the executable is a
set-user-ID/set-group-ID binary, in which case it is ignored.
o
(ELF only) Using the directories specified in the DT_RUNPATH dynamic section
attribute of the binary if present.
o
From the cache file /etc/ld.so.cache, which contains a compiled list of candidate
libraries previously found in the augmented library path. If, however, the binary was
linked with the -z nodeflib linker option, libraries in the default library paths are
skipped. Libraries installed in hardware capability directories (see below) are
preferred to other libraries.
o
In the default path /lib, and then /usr/lib. If the binary was linked with the -z nodeflib
linker option, this step is skipped.
On systems that provide multiple versions of a shared library (in different directories
in the search path) that have different minimum kernel ABI version requirements,
LD_ASSUME_KERNEL can be used to select the version of the library that is used
(dependent on the directory search order). Historically, the most common use of the
LD_ASSUME_KERNEL feature was to manually select the older LinuxThreads
POSIX threads implementation on systems that provided both LinuxThreads and
NPTL (which latter was typically the default on such systems); see pthreads(7).
LD_BIND_NOT
(glibc since 2.2) Don’t update the Global Offset Table (GOT) and Procedure Linkage
Table (PLT) when resolving a symbol.
LD_BIND_NOW
(libc5; glibc since 2.1.1) If set to a nonempty string, causes the dynamic linker to
resolve all symbols at program startup instead of deferring function call resolution to
the point when they are first referenced. This is useful when using a debugger.
LD_LIBRARY_PATH
A colon-separated list of directories in which to search for ELF libraries at execution-
time. Similar to the PATH environment variable. Ignored in set-user-ID and set-
group-ID programs.
LD_PRELOAD
A list of additional, user-specified, ELF shared libraries to be loaded before all others.
The items of the list can be separated by spaces or colons. This can be used to
selectively override functions in other shared libraries. The libraries are searched for
using the rules given under DESCRIPTION. For set-user-ID/set-group-ID ELF
binaries, preload pathnames containing slashes are ignored, and libraries in the
standard search directories are loaded only if the set-user-ID permission bit is enabled
on the library file.
LD_TRACE_LOADED_OBJECTS
(ELF only) If set to a nonempty string, causes the program to list its dynamic library
dependencies, as if run by ldd(1), instead of running normally.
Then there are lots of more or less obscure variables, many obsolete or only for internal
use.
LD_AOUT_LIBRARY_PATH
(libc5) Version of LD_LIBRARY_PATH for a.out binaries only. Old versions of ld-
linux.so.1 also supported LD_ELF_LIBRARY_PATH.
LD_AOUT_PRELOAD
(libc5) Version of LD_PRELOAD for a.out binaries only. Old versions of ld-
linux.so.1 also supported LD_ELF_PRELOAD.
LD_AUDIT
(glibc since 2.4) A colon-separated list of user-specified, ELF shared objects to be
loaded before all others in a separate linker namespace (i.e., one that does not intrude
upon the normal symbol bindings that would occur in the process). These libraries
can be used to audit the operation of the dynamic linker. LD_AUDIT is ignored for
set-user-ID/set-group-ID binaries.
The dynamic linker will notify the audit libraries at so-called auditing checkpoints–
for example, loading a new library, resolving a symbol, or calling a symbol from
another shared object–by calling an appropriate function within the audit library. For
details, see rtld-audit(7). The auditing interface is largely compatible with that
provided on Solaris, as described in its Linker and Libraries Guide, in the chapter
Runtime Linker Auditing Interface.
LD_BIND_NOT
(glibc since 2.1.95) Do not update the GOT (global offset table) and PLT (procedure
linkage table) after resolving a symbol.
LD_DEBUG
(glibc since 2.1) Output verbose debugging information about the dynamic linker. If
set to all prints all debugging information it has, if set to help prints a help message
about which categories can be specified in this environment variable. Since glibc
2.3.4, LD_DEBUG is ignored for set-user-ID/set-group-ID binaries.
LD_DEBUG_OUTPUT
(glibc since 2.1) File in which LD_DEBUG output should be written. The default is
standard output. LD_DEBUG_OUTPUT is ignored for set-user-ID/set-group-ID
binaries.
LD_DYNAMIC_WEAK
(glibc since 2.1.91) Allow weak symbols to be overridden (reverting to old glibc
behavior). For security reasons, since glibc 2.3.4, LD_DYNAMIC_WEAK is
ignored for set-user-ID/set-group-ID binaries.
LD_HWCAP_MASK
(glibc since 2.1) Mask for hardware capabilities.
LD_KEEPDIR
(a.out only)(libc5) Don’t ignore the directory in the names of a.out libraries to be
loaded. Use of this option is strongly discouraged.
LD_NOWARN
(a.out only)(libc5) Suppress warnings about a.out libraries with incompatible minor
version numbers.
LD_ORIGIN_PATH
(glibc since 2.1) Path where the binary is found (for non-set-user-ID programs). For
security reasons, since glibc 2.4, LD_ORIGIN_PATH is ignored for set-user-ID/set-
group-ID binaries.
LD_POINTER_GUARD
(glibc since 2.4) Set to 0 to disable pointer guarding. Any other value enables pointer
guarding, which is also the default. Pointer guarding is a security mechanism
whereby some pointers to code stored in writable program memory (return addresses
saved by setjmp(3) or function pointers used by various glibc internals) are mangled
semi-randomly to make it more difficult for an attacker to hijack the pointers for use
in the event of a buffer overrun or stack-smashing attack.
LD_PROFILE
(glibc since 2.1) Shared object to be profiled, specified either as a pathname or a
soname. Profiling output is written to the file whose name is:
“$LD_PROFILE_OUTPUT/$LD_PROFILE.profile”.
LD_PROFILE_OUTPUT
(glibc since 2.1) Directory where LD_PROFILE output should be written. If this
variable is not defined, or is defined as an empty string, then the default is /var/tmp.
LD_PROFILE_OUTPUT is ignored for set-user-ID and set-group-ID programs,
which always use /var/profile.
LD_SHOW_AUXV
(glibc since 2.1) Show auxiliary array passed up from the kernel. For security
reasons, since glibc 2.3.5, LD_SHOW_AUXV is ignored for set-user-ID/set-group-
ID binaries.
LD_USE_LOAD_BIAS
By default (i.e., if this variable is not defined) executables and prelinked shared
objects will honor base addresses of their dependent libraries and (nonprelinked)
position-independent executables (PIEs) and other shared objects will not honor
them. If LD_USE_LOAD_BIAS is defined wit the value, both executables and PIEs
will honor the base addresses. If LD_USE_LOAD_BIAS is defined with the value 0,
neither executables nor PIEs will honor the base addresses. This variable is ignored
by set-user-ID and set-group-ID programs.
LD_VERBOSE
(glibc since 2.1) If set to a nonempty string, output symbol versioning information
about the program if the LD_TRACE_LOADED_OBJECTS environment variable
has been set.
LD_WARN
(ELF only)(glibc since 2.1.3) If set to a nonempty string, warn about unresolved
symbols.
LDD_ARGV0
(libc5) argv[0] to be used by ldd(1) when none is present.
› FILES
/lib/ld.so
a.out dynamic linker/loader
/lib/ld-linux.so.{1,2}
ELF dynamic linker/loader
/etc/ld.so.cache
File containing a compiled list of directories in which to search for libraries and an
ordered list of candidate libraries.
/etc/ld.so.preload
File containing a whitespace-separated list of ELF shared libraries to be loaded
before the program.
lib*.so*
shared libraries
› NOTES
The ld.so functionality is available for executables compiled using libc version 4.4.3
or greater. ELF functionality is available since Linux 1.1.52 and libc5.
› SEE ALSO
ldd(1), sln(1), getauxval(3), rtld-audit(7), ldconfig(8)
› COLOPHON
This page is part of release 3.53 of the Linux man-pages project. A description of the
project, and information about reporting bugs, can be found at
http://www.kernel.org/doc/man-pages/.
LDATTACH
› NAME
ldattach - attach a line discipline to a serial line
› SYNOPSIS
ldattach [-dhV78neo12] [-s speed] [-i iflag] ldisc device
› DESCRIPTION
The ldattach daemon opens the specified device file (which should refer to a serial
device) and attaches the line discipline ldisc to it for processing of the sent and/or
received data. It then goes into the background keeping the device open so that the
line discipline stays loaded.
The line discipline ldisc may be specified either by name or by number.
In order to detach the line discipline, kill(1) the ldattach process.
With no arguments, ldattach prints usage information.
› LINE DISCIPLINES
Depending on the kernel release, the following line disciplines are supported:
TTY(0)
The default line discipline, providing transparent operation (raw mode) as well as the
habitual terminal line editing capabilities (cooked mode).
SLIP(1)
Serial Line IP (SLIP) protocol processor for transmitting TCP/IP packets over serial
lines.
MOUSE(2)
Device driver for RS232 connected pointing devices (serial mice).
PPP(3)
Point to Point Protocol (PPP) processor for transmitting network packets over serial
lines.
STRIP(4)
AX25(5)
X25(6)
Line driver for transmitting X.25 packets over asynchronous serial lines.
6PACK(7)
R3964(9)
Driver for Simatic R3964 module.
IRDA(11)
Linux IrDa (infrared data transmission) driver - see http://irda.sourceforge.net/
HDLC(13)
Synchronous HDLC driver.
SYNC_PPP(14)
Synchronous PPP driver.
HCI(15)
Bluetooth HCI UART driver.
GIGASET_M101(16)
Driver for Siemens Gigaset M101 serial DECT adapter.
PPS(18)
Driver for serial line Pulse Per Second (PPS) source.
› OPTIONS
-d | —debug
Causes ldattach to stay in the foreground so that it can be interrupted or debugged,
and to print verbose messages about its progress to the standard error output.
-h | —help
Prints a usage message and exits.
-V | —version
Prints the program version.
-s value | —speed value
Set the speed of the serial line to the specified value.
-7 | —sevenbits
Sets the character size of the serial line to 7 bits.
-8 | —eightbits
Sets the character size of the serial line to 8 bits.
-n | —noparity
Sets the parity of the serial line to none.
-e | —evenparity
Sets the parity of the serial line to even.
-o | —oddparity
Sets the parity of the serial line to odd.
-1 | —onestopbit
Sets the number of stop bits of the serial line to one.
-2 | —twostopbits
Sets the number of stop bits of the serial line to two.
-i value | —iflag [-]value{,…}
Sets the specified bits in the c_iflag word of the serial line. Value may be a number or
a symbolic name. If value is prefixed by a minus sign, clear the specified bits instead.
Several comma separated values may be given in order to set and clear multiple bits.
› SEE ALSO
inputattach(1), ttys(4)
› AUTHOR
Tilman Schmidt ([email protected])
› AVAILABILITY
The ldattach command is part of the util-linux package and is available from
ftp://ftp.kernel.org/pub/linux/utils/util-linux/.
LDCONFIG
› NAME
ldconfig - configure dynamic linker run-time bindings
› SYNOPSIS
/sbin/ldconfig [ -nNvXV ] [ -f conf ] [ -C cache ] [ -r root ] directory …
/sbin/ldconfig -l [ -v ] library …
/sbin/ldconfig -p
› DESCRIPTION
ldconfig creates the necessary links and cache to the most recent shared libraries
found in the directories specified on the command line, in the file /etc/ld.so.conf, and
in the trusted directories (/lib and /usr/lib). The cache is used by the run-time linker,
ld.so or ld-linux.so. ldconfig checks the header and filenames of the libraries it
encounters when determining which versions should have their links updated.
ldconfig will attempt to deduce the type of ELF libs (i.e., libc5 or libc6/glibc) based
on what C libs, if any, the library was linked against.
Some existing libs do not contain enough information to allow the deduction of their
type. Therefore, the /etc/ld.so.conf file format allows the specification of an expected
type. This is used only for those ELF libs which we can not work out. The format is
“dirname=TYPE”, where TYPE can be libc4, libc5, or libc6. (This syntax also works
on the command line.) Spaces are not allowed. Also see the -p option. ldconfig
should normally be run by the superuser as it may require write permission on some
root owned directories and files.
› OPTIONS
-v
Verbose mode. Print current version number, the name of each directory as it is
scanned, and any links that are created. Overrides quiet mode.
-n
Only process directories specified on the command line. Don’t process the trusted
directories (/lib and /usr/lib) nor those specified in /etc/ld.so.conf. Implies -N.
-N
Don’t rebuild the cache. Unless -X is also specified, links are still updated.
-X
Don’t update links. Unless -N is also specified, the cache is still rebuilt.
-f conf
Use conf instead of /etc/ld.so.conf.
-C cache
Use cache instead of /etc/ld.so.cache.
-r root
Change to and use root as the root directory.
-l
Library mode. Manually link individual libraries. Intended for use by experts only.
-p
Print the lists of directories and candidate libraries stored in the current cache.
› FILES
/lib/ld.so
run-time linker/loader
/etc/ld.so.conf
File containing a list of colon, space, tab, newline, or comma-separated directories in
which to search for libraries.
/etc/ld.so.cache
File containing an ordered list of libraries found in the directories specified in
/etc/ld.so.conf, as well as those found in /lib and /usr/lib.
› SEE ALSO
ldd(1), ld.so(8)
› COLOPHON
This page is part of release 3.53 of the Linux man-pages project. A description of the
project, and information about reporting bugs, can be found at
http://www.kernel.org/doc/man-pages/.
ledctl
› NAME
ledctl - Intel(R) LED control application for a storage enclosures.
› SYNOPSIS
ledctl [OPTIONS] pattern_name=list_of_devices …
› DESCRIPTION
The ledctl is an user space application designed to control LEDs associated with each
slot in an enclosure or a drive bay. The LEDs of devices listed in list_of_devices are
set to the given pattern pattern_name and all other LEDs are turned off. User must
have root privileges to use this application.
There are two types of systems: 2-LEDs systems (Activity LED, Status LED) and 3-
LEDs systems (Activity LED, Locate LED, Fail LED). The ledctl application uses SGPIO
and SES-2 protocol to control LEDs. The program implements IBPI patterns of SFF-8489
specification for SGPIO. Please note some enclosures do not stick close to SFF-8489
specification. It might happen that enclosure’s processor will accept an IBPI pattern
but it will blink the LEDs at variance with SFF-8489 specification or it has limited
number of patterns supported.
LED management (AHCI) and SAF-TE protocols are not supported.
The ledctl application has been verified to work with Intel(R) storage controllers (i.e.
Intel(R) AHCI controller and Intel(R) SAS controller). The application might work with
storage controllers of other vendors (especially SCSI/SAS controllers). However,
storage controllers of other vendors have not been tested.
The ledmon application has the highest priority when accessing LEDs. It means that
some patterns set by ledctl may have no effect if ledmon is running (except Locate
pattern).
The ledctl application is a part of Intel(R) Enclosure LED Utilities.
Pattern Names
The ledctl application accepts the following names for pattern_name argument
according to SFF-8489 specification.
locate
Turns Locate LED associated with the given device(s) or empty slot(s) on.
locate_off
Turns only Locate LED off.
normal
Turns Status LED, Failure LED and Locate LED off.
off
Turns only Status LED and Failure LED off.
ica or degraded
Visualizes “In a Critical Array” pattern.
rebuild or rebuild_p
Visualizes “Rebuild” pattern.
ifa or failed_array
Visualizes “In a Failed Array” pattern.
hotspare
Visualizes “Hotspare” pattern.
pfa
Visualizes “Predicted Failure Analysis” pattern.
failure or disk_failed
Visualizes “Failure” pattern.
ses_abort
SES-2 R/R ABORD
ses_rebuild
SES-2 REBUILD/REMAP
ses_ifa
SES-2 IN FAILED ARRAY
ses_ica
SES-2 IN CRIT ARRAY
ses_cons_check
SES-2 CONS CHECK
ses_hotspare
SES-2 HOT SPARE
ses_rsvd_dev
SES-2 RSVD DEVICE
ses_ok
SES-2 OK
ses_ident
SES-2 IDENT
ses_rm
SES-2 REMOVE
ses_insert
SES-2 INSERT
ses_missing
SES-2 MISSING
ses_dnr
SES-2 DO NOT REMOVE
ses_active
SES-2 ACTIVE
ses_enbale_bb
SES-2 ENABLE BYP B
ses_enable_ba
SES-2 ENABLE BYP A
ses_devoff
SES-2 DEVICE OFF
ses_fault
SES-2 FAULT
Patterns Translation
When non SES-2 pattern is send to device in enclosure automatic translation is being done.
locate
locate is translated to ses_ident
locate_off
locate_off is translated to ~ses_ident
normal
normal is translated to ses_ok
off
off is translated to ses_ok
degraded
degraded is translated to ses_ica
rebuild
rebuild is translated to ses_rebuild
rebuild_p
rebuild_p is translated to ses_rebuild
failed
failed is translated to ses_ifa
hotspare
hotspare is translated to ses_hotspare
pfa
pfa is translated to ses_rsvd_dev
failure
failure is translated to ses_fault
disk_failed
disk_failed is translated to ses_fault
List of Devices
The application accepts a list of devices in two formats. The first format is a list with
comma separated elements. The second format is a list in curly braces and elements are
separated by space. See examples section bellow for details.
A device is a path to file in /dev directory or in /sys/block directory. It may identify a
block device, a RAID device or a container device. In case of a RAID device or a container
device a state will be set for all block devices associated, respectively.
The LEDs of devices listed in list_of_devices are set to the given pattern pattern_name
and all other LEDs are turned off.
› OPTIONS
-c or —config=path
Sets a path to local configuration file. If this option is specified the global
configuration file and user configuration file has no effect.
-l or —log=path
Sets a path to local log file. If this option is specified the global log file
/var/log/ledctl.log is not used.
-h or —help
Prints this text out and exits.
-v or —version
Displays version of ledctl and information about the license and exits.
› FILES
/var/log/ledctl.log
Global log file, used by all instances of ledctl application. To force logging to user
defined file use -l option switch.
~/.ledctl
User configuration file, shared between ledmon and all ledctl application instances.
/etc/ledcfg.conf
Global configuration file, shared between ledmon and all ledctl application instances.
› EXAMPLES
The following example illustrates how to locate a single block device.
ledctl locate=/dev/sda
The following example illustrates how to turn Locate LED off for the same block
device.
ledctl locate_off=/dev/sda
The following example illustrates how to locate disks of a RAID device and how to set
rebuild pattern for two block devices at the same time. This example uses both
formats of device list.
ledctl locate=/dev/md127 rebuild={ /sys/block/sd[a-b] }
The following example illustrates how to turn Status LED and Failure LED off for the
given device(s).
ledctl off={ /dev/sda /dev/sdb }
The following example illustrates how to locate a three block devices. This example
uses the first format of device list.
ledctl locate=/dev/sda,/dev/sdb,/dev/sdc
› LICENSE
Copyright (c) 2009-2013 Intel Corporation.
This program is distributed under the terms of the GNU General Public License as
published by the Free Software Foundation. See the built-in help for details on the
License and the lack of warranty.
› SEE ALSO
ledmon(8), ledctl.conf(5)
› AUTHOR
This manual page was written by Artur Wojcik <[email protected]>. It may be
used by others.
ledmon
› NAME
ledmon - Intel(R) LED monitor service for storage enclosures.
› SYNOPSIS
ledmon [OPTIONS]
› DESCRIPTION
The ledmon application is a daemon process used to monitor a state of software RAID
devices (md only) or a state of block devices. The state is visualizing on LEDs
associated to each slot in an enclosure or a drive bay. There are two types of system:
2-LEDs system (Activity LED, Status LED) and 3-LEDs system (Activity LED, Locate
LED, Fail LED). This application has the highest priority when accessing the LEDs.
The ledmon application uses SGPIO and SES-2 protocol to control LEDs. The program
implements IBPI patterns of SFF-8489 specification for SGPIO. Please note some
enclosures do not stick close to SFF-8489 specification. It might happen that enclosure
processor will accept IBPI pattern but it will blink LEDs not according to SFF-8489
specification or it has limited number of patterns supported.
LED management (AHCI) and SAF-TE protocols are not supported.
There’s no method provided to specify which RAID volume should be monitored and
which not. The ledmon application monitors all RAID devices and visualizes their
state.
The ledmon application has been verified to work with Intel(R) storage controllers
(i.e. Intel(R) AHCI controller and Intel(R) SAS controller). The application might work
with storage controllers of other vendors (especially SAS/SCSI controllers). However
storage controllers of other vendors have not been tested.
The ledmon application is part of Intel(R) Enclosure LED Utilities. Only single
instance of the application is allowed.
› OPTIONS
-c or —config=path
Sets a path to local configuration file. If this option is specified the global
configuration file and user configuration file has no effect.
-l or —log=path
Sets a path to local log file. If this option is specified the global log file
/var/log/ledmon.log is not used.
-t or —interval=seconds
Sets time interval between scans of sysfs. The value is given in seconds. The
minimum is 5 seconds the maximum is not specified.
—quiet or —error or —warning or —info or —debug or —all
Verbose level - ‘quiet’ means no logging at all and ‘all’ means to log everything. The
levels are given in order. If user specifies more then one verbose option the last
option comes into effect.
-h or —help
Prints this text out and exits.
-v or —version
Displays version of ledmon and information about the license and exits.
› FILES
/var/log/ledmon.log
Global log file, used by ledmon application. To force logging to user defined file use
-l option switch.
~/.ledctl
User configuration file, shared between ledmon and all ledctl application instances.
/etc/ledcfg.conf
Global configuration file, shared between ledmon and all ledctl application instances.
› LICENSE
Copyright (c) 2009-2013 Intel Corporation.
This program is distributed under the terms of the GNU General Public License as
published by the Free Software Foundation. See the build-in help for details on the
License and the lack of warranty.
› BUGS
The ledmon application does not recognize PFA state (Predicted Failure Analysis),
hence the PFA pattern from SFF-8489 specification is not visualized.
› SEE ALSO
ledctl(8), ledctl.conf(5)
› AUTHOR
This manual page was written by Artur Wojcik <[email protected]>. It may be
used by others.
NSS-MYHOSTNAME
› NAME
nss-myhostname, libnss_myhostname.so.2 - Provide hostname resolution for the
locally configured system hostname.
› SYNOPSIS
libnss_myhostname.so.2
› DESCRIPTION
nss-myhostname is a plugin for the GNU Name Service Switch (NSS) functionality
of the GNU C Library (glibc) primarily providing hostname resolution for the locally
configured system hostname as returned by gethostname(2). The precise hostnames
resolved by this module are:
The local, configured hostname is resolved to all locally configured IP addresses
ordered by their scope, or – if none are configured – the IPv4 address 127.0.0.2
(which is on the local loopback) and the IPv6 address ::1 (which is the local host).
SYSCONFDIR/libvirtd.conf
The default configuration file used by libvirtd, unless overridden on the command
line using the -f|—config option.
LOCALSTATEDIR/run/libvirt/libvirt-sock
LOCALSTATEDIR/run/libvirt/libvirt-sock-ro
The sockets libvirtd will use.
SYSCONFDIR/pki/CA/cacert.pem
The TLS Certificate Authority certificate libvirtd will use.
SYSCONFDIR/pki/libvirt/servercert.pem
The TLS Server certificate libvirtd will use.
SYSCONFDIR/pki/libvirt/private/serverkey.pem
The TLS Server private key libvirtd will use.
LOCALSTATEDIR/run/libvirtd.pid
The PID file to use, unless overridden by the -p|—pid-file option.
$XDG_CONFIG_HOME/libvirtd.conf
The default configuration file used by libvirtd, unless overridden on the command
line using the -f|—config option.
$XDG_RUNTIME_DIR/libvirt/libvirt-sock
The socket libvirtd will use.
$HOME/.pki/libvirt/cacert.pem
The TLS Certificate Authority certificate libvirtd will use.
$HOME/.pki/libvirt/servercert.pem
The TLS Server certificate libvirtd will use.
$HOME/.pki/libvirt/serverkey.pem
The TLS Server private key libvirtd will use.
$XDG_RUNTIME_DIR/libvirt/libvirtd.pid
The PID file to use, unless overridden by the -p|—pid-file option.
If $XDG_CONFIG_HOME is not set in your environment, libvirtd will use
$HOME/.config
If $XDG_RUNTIME_DIR is not set in your environment, libvirtd will use
$HOME/.cache
› EXAMPLES
To retrieve the version of libvirtd:
# libvirtd --version libvirtd (libvirt) 0.8.2 #
IEEE DCBX is the default DCBX mode for a DCB capable interface. Therefore the
default and configured IEEE DCBX TLVs will be transmitted when the interface comes
up. lldpad can be globally configured to support one of the legacy DCBX versions (CIN
or CEE). If the remote LLDP agent does not transmit any IEEE DCBX TLVs and does
transmit a legacy DCBX TLV which matches the configured legacy DCBX version, then
the DCBX mode will drop back to legacy DCBX mode. It will not transition back to IEEE
DCBX mode until the next link reset. If lldpad has dropped back to legacy DCBX mode
for a given interface and the daemon is stopped and restarted, the legacy DCBX mode for
that interface will be used instead of starting out in IEEE DCBX mode. This behavior only
applies to the case where lldpad is restarted and is not persistent across a system reboot.
The DCBX mode can be queried and configured by using the special tlvid IEEE-DCBX.
There is not an actual TLV which corresponds to this tlvid. Its use is to query and reset the
IEEE DCBX mode. When queried, IEEE DCBX mode can take the following values:
auto - IEEE DCBX will be used (initially) if lldpad is restarted. An exception is if the
DCBX mode has been forced to a legacy DCBX mode, then the specified legacy mode
will be used. See information about the ‘dcbx’ parameter in dcbtool(8) for more
information about this exception.”
CEE - CEE DCBX will be used if lldpad is restarted
CIN - CIN DCBX will be used if lldpad is restarted
The IEEE DCBX mode can be reset to auto by setting the mode argument to the value
reset
› DESCRIPTION - DCBX CONFIGURATION
The detailed configuration of the IEEE DCBX TLVs can be found in related lldptool
man pages for each IEEE DCBX TLV (see lldptool-pfc(8), lldptool-ets(8) and
lldptool-app(8)).
The detailed configuration of the CIN and CEE DCBX TLVs is performed using
dcbtool (see dcbtool(8)). However, lldptool can configure the enableTx parameter of
the CIN and CEE DCBX TLVs (as it can with most other TLVs). Since lldpad only
transmits TLVs for one version of DCBX on any given interface, the enableTx
parameter for the CIN and CEE DCBX TLVs (and for the IEEE DCBX feature
TLVs) takes effect only when the corresponding DCBX version is active.
› ARGUMENTS
mode=reset
Reset the DCBX mode that will be used if lldpad is restarted by setting the mode
argument to reset using the special tlvid IEEE-DCBX.
enableTx
Enable the specified DCBX TLV (CIN-DCBX or CEE-DCBX) to be transmitted in
the LLDPDU if that DCBX mode for the specified interface has been selected.
› EXAMPLE & USAGE
Query the current DCBX mode that will be used if lldpad is restarted. (this is not a
persistent setting)
lldptool -t -i eth3 -V IEEE-DCBX -c mode
Reset the DCBX mode to be ‘auto’ (start in IEEE DCBX mode) after the next lldpad
restart
lldptool -T -i eth3 -V IEEE-DCBX mode=reset
Enable transmission of the CEE DCBX TLV
lldptool -T -i eth3 -V CEE-DCBX enableTx=yes
Disable transmission of the CIN DCBX TLV
lldptool -T -i eth3 -V CIN-DCBX enableTx=no
Query the configuration of enableTx for the CEE DCBX TLV
lldptool -t -i eth3 -V CEE-DCBX -c enableTx
› NOTES
› SEE ALSO
dcbtool(8), lldptool(8), lldptool-ets(8), lldptool-pfc(8), lldptool-app(8), lldpad(8)
› AUTHOR
Eric Multanen
lldptool
› NAME
ETS-{CFG|REC} - Show / manipulate ETS TLV configuration
› SYNOPSIS
lldptool -t -i ethx -V ETS-CFG <-c CONFIG_ARG …>
CONFIG_ARG := { enableTx | willing | tsa | up2tc | numtcs | tcbw }
CONFIG_ARG=value …
CONFIG_ARG :=
enableTx = {yes|no} |
willing = {yes|no} |
tsa = tc:{ets|strict|vendor},… |
up2tc = priority:tc,… |
tcbw = bw1,bw2,…
<-c CONFIG_ARG …>
CONFIG_ARG=value …
CONFIG_ARG :=
enableTx = {yes|no} |
tsa = tc:{ets|strict|vendor},… |
up2tc = priority:tc,… |
tcbw = bw1,bw2,…
› DESCRIPTION
The Enhanced Transmission Selection (ETS) feature has a recommendation TLV and
a configuration TLV configured with ETS-REC and ETS-CFG respectively.
› ARGUMENTS
enableTx
Enables the ETS TLV to be transmitted
willing
Sets the ETS-CFG willing bit
tsa
Transmission selection algorithm, sets a comma separated list of traffic classes to the
corresponding selection algorithm. Valid algorithms include “ets”, “strict” and
“vendor”.
up2tc
Comma separated list mapping user priorities to traffic classes.
tcbw
Comma separated list of bandwidths for each traffic class the first value being
assigned to traffic class 0 and the second to traffic class 1 and so on. Undefined
bandwidths are presumed to be 0.
numtcs
Displays the number of ETS supported traffic classes.
› THEORY OF OPERATIONS
IEEE 802.1Qaz is enabled by default on hardware that supports this mode indicated
by support for the DCBNL interface. Kernel config option CONFIG_DCB. The ETS-
CFG TLV is advertised by default with the attributes indicated by querying the
hardware for its current configuration. A valid configuration is to map all priorities to
a single traffic class and use the link strict selection algorithm. This is equivalent to
being in a non-DCB enabled mode.
To support legacy DCBX (CIN or CEE) the ETS-CFG and ETS-REC TLVs are
disabled if a legacy DCBX TLV is received and no valid IEEE DCBX TLVs are
received. The hardware DCBX mode will also be set to the legacy mode and IEEE
mode is disabled. This allows switches to be configured and end nodes will then be
configured accordingly without any configuration required on the end node. See
lldpad(8) for more information about the operation of the DCBX selection
mechanism.
Mapping applications and protocols to traffic classes is required for ETS to be useful.
User space programs can encode the priority of an application with the
SO_PRIORITY option. The net_prio cgroup can be used to assign application traffic
to specific priorities. See the kernel documentation and cgdcbxd(8) for net_prio
cgroup information.
› EXAMPLE & USAGE
Configure willing bit for interface eth2
lldptool -T -i eth2 -V ETS-CFG willing=yes
Configure traffic classes for ETS and strict priority on eth2
lldptool -T -i eth2 -V ETS-CFG tsa=0:ets,1:ets,2:ets,3:ets,4:strict,5:strict
Configure 1:1 mapping from priority to traffic classes on eth2
lldptool -T -i eth2 -V ETS-CFG up2tc=0:0,1:1,2:2,3:3,4:4,5:5,6:6,7:7
Display local configured ETS-CFG parameters for tcbw
lldptool -t -i eth2 -V ETS-CFG -c tcbw
Display last transmitted ETS-CFG TLV
lldptool -t -i eth2 -V ETS-CFG
Configure ETS-CFG and ETS-REC for default DCB on eth2
lldptool -T -i eth2 -V ETS-CFG tsa=0:ets,1:ets,2:ets,3:ets,4:ets,5:ets,6:ets,7:ets
up2tc=0:0,1:1,2:2,3:3,4:4,5:5,6:6,7:7 tcbw=12,12,12,12,13,13,13,13
lldptool -T -i eth2 -V ETS-REC tsa=0:ets,1:ets,2:ets,3:ets,4:ets,5:ets,6:ets,7:ets
up2tc=0:0,1:1,2:2,3:3,4:4,5:5,6:6,7:7 tcbw=12,12,12,12,13,13,13,13
› SOURCE
o
IEEE 802.1Qaz (http://www.ieee802.org/1/pages/dcbridges.html)
› NOTES
Support for tc-mqprio was added in 2.6.38 on older kernels other mechanisms may
need to be used to map applications to traffic classes.
› SEE ALSO
lldptool(8), lldptool-app(8), lldpad(8), tc-mqprio(8),
› AUTHOR
John Fastabend
lldptool
› NAME
evb - Show / manipulate EVB TLV configuration
› SYNOPSIS
lldptool -t -g ncb -i ethx -V evbCfg -c enableTx
lldptool -T -g ncb -i ethx -V evbCfg -c enableTx=[yes|no]
lldptool -t -g ncb -i ethx -V evbCfg -c fmode
lldptool -T -g ncb -i ethx -V evbCfg -c fmode=[bridge|reflectiverelay]
lldptool -t -g ncb -i ethx -V evbCfg -c capabilities
lldptool -T -g ncb -i ethx -V evbCfg -c capabilities=[rte|ecp|vdp|none]
lldptool -t -g ncb -i ethx -V evbCfg -c rte
lldptool -T -g ncb -i ethx -V evbCfg -c rte=[<rte>]
lldptool -t -g ncb -i ethx -V evbCfg -c vsis
lldptool -T -g ncb -i ethx -V evbCfg -c vsis=[<number of vsis>]
› DESCRIPTION
The EVB TLV is a TLV to announce the station and bridge’s edge virtual bridging
(evb) capabilities and request the bridge forwarding mode. If both sides have agree
on edge control protocol (ECP), VSI discovery protocol (VDP) capabilities, both
sides can exchange VDP TLV using ECP frames. The vsis parameter will set the
maximum number of VSIs and show the number of currently configured VSIs.
› ARGUMENTS
enableTx
Enables the EVB TLV to be transmitted
fmode
shows or sets the forwarding mode between bridge (default) or reflectiverelay (RR).
capabilities
shows or sets the local capabilities that are announced to the adjacent switch in the
TLV. This parameter will accept any combination of rte, vdp or ecp, separated by “,”.
Use the keyword “none” if you do not want to set any capabilities.
rte
shows or set the local run time exponent (RTE). The RTE will be used as the base for
the timing of the ECP and VDP protocols.
vsis
shows or sets the number of virtual station interfaces (VSIs) that are announced to the
adjacent switch in the TLV. This parameter expects a number between 0 and 65535.
› THEORY OF OPERATION
The EVB TLV is used to announce and exchange supported parameters between the
station and an adjacent switch. The TLV uses the nearest customer bridge agent.
If “reflectiverelay” is set as forwarding mode, the switch will allow “reflection” of
frames coming from different sources at the same port back to the port. This will
allow communication between virtual machines on the same host via the switch.
The capabilities parameter is used to set RTE, ECP and VDP support. VDP TLVs in
ECP frames can only be exchanged if both sides agree on ECP and VDP as
capabilities. RTE will be used as the base timing parameter for ECP and VDP.
› EXAMPLE & USAGE
Display locally configured values for eth8
lldptool -t -g ncb -i eth8 -V evbCfg
Display remotely configured values for eth8
lldptool -n -g ncb -t -i eth8 -V evbCfg
Display wether evb tlv is configured for tx on eth8
lldptool -t -g ncb -i eth8 -V evbCfg -c enableTx
Display the currently requested forwarding mode for eth8
lldptool -t -g ncb -i eth8 -V evbCfg -c fmode
Set the forwarding mode to reflective relay
lldptool -T -g ncb -i eth8 -V evbCfg -c fmode=reflectiverelay
Display the currently configured capabilities
lldptool -t -g ncb -i ethx -V evbCfg -c capabilities
Set the locally possible capabilities to RTE, ECP and VDP
lldptool -T -g ncb -i ethx -V evbCfg -c capabilities=rte,ecp,vdp
Resets the locally possible capabilities to “none”
lldptool -T -g ncb -i ethx -V evbCfg -c capabilities=none
Display the locally configured value for RTE
lldptool -t -g ncb -i ethx -V evbCfg -c rte
Set the value for RTE
lldptool -T -g ncb -i ethx -V evbCfg -c rte=[<rte>]
Display the configured maximum number of VSIs
lldptool -t -g ncb -i ethx -V evbCfg -c vsis
Set the maximum number of VSIs
lldptool -T -g ncb -i ethx -V evbCfg -c vsis=[<number of vsis>]
› SOURCE
o
IEEE 802.1Qbg (http://www.ieee802.org/1/pages/802.1bg.html)
› NOTES
Currently the code in lldpad reflects draft 0 of the upcoming standard. EVB TLVs on
the wire can be decoded with wireshark > v1.6.
› SEE ALSO
lldptool-vdp(8), lldptool(8), lldpad(8)
› AUTHOR
Jens Osterkamp
lldptool
› NAME
evb22 - Show / manipulate EVB IEEE 802.1 Ratified Standard TLV configuration
› SYNOPSIS
lldptool -t -i ethx -g ncb -V evb
lldptool -t -i ethx -g ncb -V evb -c
lldptool -t -i ethx -g ncb -V evb -c enabletx
lldptool -T -i ethx -g ncb -V evb -c enabletx=[yes|no]
lldptool -t -i ethx -g ncb -V evb -c evbmode
lldptool -T -i ethx -g ncb -V evb -c evbmode=[bridge|station]
lldptool -t -i ethx -g ncb -V evb -c evbrrreq
lldptool -T -i ethx -g ncb -V evb -c evbrrreq=[yes|no]
lldptool -t -i ethx -g ncb -V evb -c evbrrcap
lldptool -T -i ethx -g ncb -V evb -c evbrrcap=[yes|no]
lldptool -t -i ethx -g ncb -V evb -c evbgpid
lldptool -T -i ethx -g ncb -V evb -c evbgpid=[yes|no]
lldptool -t -i ethx -g ncb -V evb -c ecpretries
lldptool -T -i ethx -g ncb -V evb -c ecpretries=[0..7]
lldptool -t -i ethx -g ncb -V evb -c ecprte
lldptool -T -i ethx -g ncb -V evb -c ecprte=[0..31]
lldptool -t -i ethx -g ncb -V evb -c vdprwd
lldptool -T -i ethx -g ncb -V evb -c vdprwd=[0..31]
lldptool -t -i ethx -g ncb -V evb -c vdprka
lldptool -T -i ethx -g ncb -V evb -c vdprka=[0..31]
› DESCRIPTION
The Edge Virtual Bridge (EVB) TLV is a TLV to announce the station and bridge’s
edge virtual bridging (EVB) capabilities and may request the bridge port to be set
into reflective relay (hairpin) mode. If both sides agree on the modes and time out
values, the edge control protocol (ECP) will be used to exchange VSI discovery
protocol (VDP) data using ECP frames between the host interface and the adjacent
switch port facing the host interface.
This man pages describes the IEEE 802.1 Qbg ratified standard dated from July 5th,
2012. The arguments and parameters differ from the IEEE 802.1 Qbg draft 0.2,
which is also implemented. The EVB protocol version to be used depends on the
organizational unique identifier (OUI) of the EVB TLV in the LLDP data stream. A
OUI value of 0x001b3f stands for the IEEE 802.1 Qbg draft 0.2, a OUI value of
0x0080c2 stands for the IEEE 802.1 Qbg ratified standard. The version of the ECP
and VDP protocols are determined by the ethernet type field in the ethernet header.
The ethernet type value for IEEE 802.1 Qbg draft 0.2 is 0x88b7, the value for IEEE
802.1 Qbg ratified standard is 0x8890. Note that the EVB protocol is exchanged
between nearest customer bridges only, employing the reserved multicast MAC
address 01:80:c2:00:00:00 as destination MAC address. lldpad(8) supports both
versions, the switch port configuration determines which version will be used. The
switch port configuration should select only one protocol version, never both.
The command line options and arguments are explained in the lldptool(8) man pages.
Only the EVB, ECP and VDP protocol specific parameters are detailed in this manual
page.
› ARGUMENTS
The invocation without command line option ‘-c’ and argument displays the complete
EVB, ECP and VDP protocol settings in a pretty print out. See below for a detailed
description on how to interpret the output.
-c text
Use of the command line option ‘-c’ without any argument displays all known
parameters in the format of key=value, which is suitable for post processing. Use
command line option ‘-c’ and one of the following arguments to display and set
individual parameters. Text can be one of the following values:
enabletx
Enables or disables the EVB TLV to be transmitted. When set to disabled no EVB
TLV will be included in the LLDP data stream. Furthermore the output of the
complete EVB settings without option ‘-c’ will be empty.
evbmode
Display the current role or sets the role the to given value. Supported values are
either “station” or “bridge”.
evbrrreq
Shows the current reflective relay (hairpin) request mode or sets the reflective relay
(hairpin) request mode. If the value is “yes”, the station requests the interface facing
switch port to be set in reflective relay (hairpin) mode. This field is only valid for
stations, the output of evbmode equals “station”.
evbrrcap
Shows the current reflective relay (hairpin) capabilities or sets the reflective relay
(hairpin) capabilities. If the value is “yes”, the switch port will be set in reflective
relay (hairpin) mode. This field is only valid for switches, the output of evbmode
equals “bridge”.
gpid
Shows the current station or switch support for grouping or turns on/off the station or
switch support for grouping. If set to true, the station or switch wants to use group
identifiers in VDP protocols.
ecpretries
Shows or sets the maximum number of retries for ECP frames to be retransmitted. A
retransmit occurs when no ECP acknowledgement message has been received during
a given time period.
ecprte
Shows or sets the local run time exponent (RTE). The RTE will be used as the base
for the timing of the ECP protocol time outs and retransmits. The wait time is
calculated as 10*2ecprte micro seconds.
vdprwd
Shows or sets the number of resource wait delay value. This value is calculated as
10*2vdprwd micro seconds and determines the maximum wait time for VDP protocol
acknowledgements.
vdprka
Shows or sets the number of re-init keep alive value. This value is calculated as
10*2vdprka micro seconds and determines the wait time for VDP protocol to send a
keep alive message.
› THEORY OF OPERATION
The EVB TLV is used to announce and exchange supported parameters between the
station and an adjacent switch. If reflectiverelay is active, the switch sends back
ethernet frames on the very same port it received the frame on. This is an extension to
the current bridging standard and allows communication between virtual machines on
the same host through the switch port.
› EXAMPLE & USAGE
Display locally configured values for eth0
lldptool -t -g ncb -i eth0 -V evb EVB Configuration TLV bridge:(00)
station:rrreq,rrstat(0x5) retries:7 rte:31 mode:station r/l:0 rwd:31
r/l:0 rka:8
This output is displayed when enabletx has been enabled. The first line shows the
currently known status of the bridge. The second line shows the currently known status of
the station. The status is displayed verbose appended by the hexadecimal value in
parenthesis. The verbose output uses the bit naming convention used in the standard
document. The third line displays the values for the ECP protocol number of retransmits
(retries) and the retransmit timeout exponent. The forth line shows the current mode of
operation, either bridge or station, the resource wait delay value (rwd) and an indication if
the local (0) or remote (1) rwd value is used. The fifth line displays the value of the re-init
keep alive counter (rka) and an indication if the local (0) or remote (1) rka value is used.
Display the currently requested forwarding mode for eth0
lldptool -t -g ncb -i eth0 -V evb -c evbrrreq
Display the locally configured value for RTE
lldptool -t -g ncb -i eth0 -V evb -c evbrte
Set the value for RTE to its maximum value
lldptool -T -g ncb -i eth0 -V evb -c rte=7
Set the value for enabletx to yes
lldptool -T -g ncb -i eth0 -V evb -c enabletx=yes
› NOTES
Currently the code in lldpad reflects IEEE 802.1 Qbg draft 0.2 of the upcoming
standard. Wireshark support for IEEE 802.1 Qbg ratified standard TLVs is currently
missing. Support for the IEEE 802.1 Qbg ratified standard protocols ECP and VDP is
currently under development and not fully functional.
› SEE ALSO
lldptool-vdp(8), lldptool(8), lldpad(8) IEEE 802.1Qbg
(http://www.ieee802.org/1/pages/802.1bg.html)
› AUTHOR
Thomas Richter
lldptool
› NAME
LLDP-MED - Show / manipulate MED TLV configurations
› SYNOPSIS
lldptool -t -i ethx -V [ TLV_TYPE ] enableTx
lldptool -T -i ethx -V [ TLV_TYPE ] enableTx = { yes | no }
lldptool -T -i ethx -V LLDP-MED devtype = { class1 | class2 | class3 | none }
lldptool -t -i ethx -V LLDP-MED devtype
TLV_TYPE : = {LLDP-MED | medCap | medPolicy | medLoc | medPower |
medHwRev | medFwRev | medSwRev | medSerNum | medManuf | medModel |
medAssetID }
› DESCRIPTION
The LLDP-MED extensions support the Link Layer Discovery Protocol for Media
Endpoint Devices defined in the ANSI/TIA-1057-2006 document. Each TLV can be
configured as a class1 , class2 or class3 device. Class I devices are the most basic
class of Endpoint Device, Class II devices support media stream capabilities and
Class III devices directly support end users of the IP communication system. See
ANS-TIA-1057 for clarification of class types.
› ARGUMENTS
enableTx
Enables the TLV to be transmitted
devtype
Set or query the class type of the device.
› TLV_TYPE
List of supported TLV specifiers applicable to Media Endpoint Devices.
LLDP-MED
apply arguments to all supported MED TLVs.
medCAP
LLDP-MED Capabilities TLV
medPolicy
LLDP-MED Network Policy TLV
medLoc
LLDP-MED Location TLV
medPower
LLDP-MED Extended Power-via-MDI TLV
medHwRev
LLDP-MED Hardware Revision TLV
medFwRev
LLDP-MED Firmware Revision TLV
medSwRev
LLDP-MED Software Revision TLV
medSerNum
LLDP-MED Serial Number TLV
medManuf
LLDP-MED Manufacturer Name TL
medModel
LLDP-MED Model Name TLV
medAssetID
LLDP-MED Asset ID TLV
› EXAMPLE & USAGE
Enable class1 MED device on eth2
lldptool -T -i eth2 -V LLDP-MED enableTx=yes devtype=class1
Query class type of MED on eth2
lldptool -t -i eth2 -V LLDP-MED -c devtype
Query transmit state of medPolicy on device eth2
lldptool -t -i eth2 -V medPolicy -c enableTx
› SOURCE
o
Link Layer Discovery Protocol for Media Endpoint Devices
(http://www.tiaonline.org/standards/technology/voip/documents/ANSI-TIA-
1057_final_for_publication.pdf)
› NOTES
› SEE ALSO
lldptool(8), lldpad(8)
› AUTHOR
John Fastabend
lldptool
› NAME
PFC - Show / manipulate PFC TLV configuration
› SYNOPSIS
lldptool -t -i ethx -V PFC [ -c [ enableTx | willing | enabled | delay ] ]
lldptool -T -i ethx -V PFC <CONFIG_ARG=value …>
CONFIG_ARG:=
enableTx=<yes|no>
willing=<yes|no|0|1>
enabled=<none|[0..7],[0..7],…>
delay=<integer value>
› DESCRIPTION
The PFC TLV is used to display and set current PFC TLV attributes.
› ARGUMENTS
enableTx
Enable the PFC TLV to be transmitted in the LLDP PDU for the specified interface.
willing
Display or set the willing attribute. If set to yes and a peer TLV is received then the
peer PFC attributes will be used. If set to no then locally configured attributes are
used.
enabled
Display or set the priorities with PFC enabled. The set attribute takes a comma
separated list of priorities to enable, or the string none to disable all priorities.
delay
Display or set the delay attribute used to configure PFC thresholds in hardware
buffers. If PFC is enabled and frames continue to be dropped due to full hardware
buffers then increasing this value may help.
› THEORY OF OPERATIONS
The PFC TLV uses the Symmetric attribute passing state machine defined in IEEE
802.1Qaz. This means the attributes used will depend on the willing bit. If the willing
bit is set to 1 and a peer TLV is received then the peers attributes will be used. If the
willing bit is set to 0 the local attributes will be used. When both the peer and local
configuration are willing a tie breaking scheme is used. For more detailed coverage
see the specification.
› EXAMPLE & USAGE
Enable PFC for priorities 1, 2, and 4 on eth2
lldptool -T -i eth2 -V PFC enabled=1,2,4
Disable PFC for all priorities on eth2
lldptool -T -i eth2 -V PFC enabled=none
Display configuration of PFC enabled priorities for eth2
lldptool -t -i eth2 -V PFC -c enabled
Display last transmitted PFC TLV on eth2
lldptool -t -i eth2 -V PFC
› SOURCE
o
IEEE 802.1Qaz (http://www.ieee802.org/1/pages/dcbridges.html)
› NOTES
› SEE ALSO
lldptool(8), lldpad(8)
› AUTHOR
John Fastabend
lldptool
› NAME
vdp - Show / manipulate VDP TLV configuration
› SYNOPSIS
lldptool -t -i ethx -V vdp -c enableTx
lldptool -T -i ethx -V vdp -c enableTx=[yes|no]
lldptool -t -i ethx -V vdp -c mode
lldptool -T -i ethx -V vdp -c mode=<mode>,<mgrid>,<typeid>,<typeidversion>,
<instanceid>,<mac>,<vlan>
lldptool -t -i ethx -V vdp -c role
lldptool -T -i ethx -V vdp -c role=[station|bridge]
› DESCRIPTION
The VSI discovery protocol (VDP) is NOT a TLV in the LLDP sense but rather a
protocol to manage the association and deassociation of virtual station interfaces
(VSIs) between the station and an adjacent switch. VDP uses ECP as transport for
VDP TLVs. An ECP frame may contain multiple VDP TLVs. Each VDP TLVs
contains a mode, typeid, version, instanceid, mac and vlan for a VSI.
› ARGUMENTS
enableTx
Enables or disables VDP
mode
shows or sets modes for VSIs with the following parameters:
<mode>
mode (0=preassociate, 1=preassociate with RR, 2=associate, 3=deassociate)
<mgrid>
manager (database) id
<typeid>
VSI type id
<version>
VSI type id version
<instanceid>
VSI instance id
<format>
VDP filter info format
<mac>
VSI mac address
<vlan>
VSI vlan id
role shows or sets the role of the local machine to act as either station (default) or bridge.
› THEORY OF OPERATION
The VDP protocol is used to pre-associate, associate or deassociate VSIs to and
adjacent switch. Information about the VSIs is formatted into VDP TLVs which are
then handed to ECP for lower-level transport. Each ECP frame may contain multiple
VDP TLVs.
Two ways to receive VSI information exist in llpdad: via netlink or with lldptool.
netlink is used by libvirt to communicate VSIs to lldpad. lldptool can be used to
associate/deassociate VSIs from the command line. This is especially helpful for
testing purposes.
› EXAMPLE & USAGE
Display if vdp is enabled on eth8
lldptool -t -i eth8 -V vdp -c enableTx
Enable vdp on eth8
lldptool -T -i eth8 -V vdp -c enableTx=yes
Display the currently configured VSIs for eth8
lldptool -t -i eth8 -V vdp -c mode
Associate a VSI on eth8
lldptool -T -i eth8 -V vdp -c mode=2,12,1193046,1,fa9b7fff-b0a0-4893-8e0e-
beef4ff18f8f,2,52:54:00:C7:3E:CE,3
Display the locally configured role for VDP on eth8
lldptool -t -i eth8 -V vdp -c role
Set the local role for VDP on eth8
lldptool -T -i eth8 -V vdp -c role=bridge
› SOURCE
o
IEEE 802.1Qbg (http://www.ieee802.org/1/pages/802.1bg.html)
› NOTES
Currently the code in lldpad reflects draft 0 of the upcoming standard. ECP/VDP
TLVs on the wire can be decoded with wireshark > v1.8.
› SEE ALSO
lldptool-evb(8), lldptool(8), lldpad(8)
› AUTHOR
Jens Osterkamp
lldptool
› NAME
lldptool - manage the LLDP settings and status of lldpad
› SYNOPSIS
lldptool <command> [options] [argument]
› DESCRIPTION
lldptool is used to query and configure lldpad. It connects to the client interface of
lldpad to perform these operations. lldptool will operate in interactive mode if it is
executed without a command. In interactive mode, lldptool will also function as an
event listener to print out events as they are received asynchronously from lldpad. It
will use libreadline for interactive input when available.
› OPTIONS
-i [ifname]
specifies the network interface to which the command applies. Most lldptool
commands require specifying a network interface.
-V [tlvid]
specifies the TLV identifier The tlvid is an integer value used to identify specific
LLDP TLVs. The tlvid value is the Type value for types not equal to 127 (the
organizationally specific type). For organizationally specific TLVs, the tlvid is the
value represented by the 3 byte OUI and 1 byte subtype - where the subtype is the
lowest order byte of the tlvid. The tlvid can be entered as a numerical value (e.g. 10
or 0xa), or for supported TLVs, as a keyword. Review the lldptool help output to see
the list of supported TLV keywords.
-n
“neighbor” option for commands which can use it (e.g. get-tlv)
-g [bridge scope]
specify the bridge scope this command operates on. Allows to set and query all
LLDP TLV modules for “nearest_bridge” (short: “nb”), “nearest_customer_bridge”
(“ncb”) and “nearest_nontpmr_bridge” (“nntpmrb”) group mac addresses.
Configurations are saved into independent sections in lldpad.conf. If no bridge scope
is supplied this defaults to “nearest bridge” to preserve the previous behaviour.
-c <argument list>
“config” option for TLV queries. Indicates that the query is for the configuration
elements for the specified TLV. The argument list specifies the specific elements to
query. If no arguments are listed, then all configuration elements for the TLV are
returned.
-r
show raw client interface messages
-R
show only raw Client interface messages
› COMMANDS
license
show license information
-h, help
show usage information
-v, version
show version information
-S, stats
get LLDP statistics for the specified interface
-t, get-tlv
get TLV information for the specified interface
-T, set-tlv
set TLV information for the specified interface
-l, get-lldp
get LLDP parameters for the specified interface
-L, set-lldp
set LLDP parameters for the specified interface
-p, ping
display the process identifier of the running lldpad process
-q, quit
exit from interactive mode
› ARGUMENTS
This section lists arguments which are available for administration of LLDP
parameters. Arguments for basic TLV’s (non-organizationally specific TLVs) are also
described. See the SEE ALSO section for references to other lldptool man pages
which contain usage details and arguments for various organizationally specific
TLVs.
adminStatus
Argument for the get-lldp/set-lldp commands. Configures the LLDP adminStatus
parameter for the specified interface. Valid values are: disabled, rx, tx, rxtx
enableTx
Argument for the get-tlv/set-tlv commands. May be applied per interface for a
specified TLV. Valid values are: yes, no. If the DCBX TLV enableTx is set to no, then
all of the DCB feature TLVs DCBX advertise settings will be turned off as well.
Setting enableTx to yes will enable the DCBX advertise settings.
ipv4
Argument for the get-tlv/set-tlv commands with respect to the Management Address
TLV. The get command will retrieve the configured value. Set values take the form of
an IPv4 address: A.B.C.D
ipv6
Argument for the get-tlv/set-tlv commands with respect to the Management Address
TLV. The get command will retrieve the configured value. Set values take the form of
an IPv6 address: 1111:2222:3333:4444:5555:6666:7777:8888 and various shorthand
variations.
› EXAMPLES
Configure LLDP adminStatus to Receive and Transmit for interface eth2
lldptool -L -i eth2 adminStatus=rxtx lldptool set-lldp -i eth2 adminStatus=rxtx
Query the LLDP adminStatus for interface eth3
lldptool -l -i eth3 adminStatus lldptool get-lldp -i eth3 adminStatus
Query the LLDP statistics for interface eth3
lldptool -S -i eth3 adminStatus lldptool stats -i eth3 adminStatus
Query the local TLVs which are being transmitted for a given interface:
lldptool -t -i eth3 lldptool get-tlv -i eth3
Query the received neighbor TLVs received on a given interface:
lldptool -t -n -i eth3 lldptool get-tlv -n -i eth3
Query the value of the System Description TLV as received from the neighbor on a given
interface:
lldptool -t -n -i eth3 -V sysDesc lldptool get-tlv -n -i eth3 -V 6
Disable transmit of the IEEE 802.3 MAC/PHY Configuration Status TLV for a given
interface:
lldptool -T -i eth3 -V macPhyCfg enableTx=no lldptool set-tlv -i eth3 -V
0x120f01 enableTx=no
Query value of the transmit setting for the Port Description TLV for a given interface:
lldptool -t -i eth3 -V portDesc -c enableTx lldptool get-tlv -i eth3 -V 4 -c
enableTx
Set a Management Address TLV on eth3 to carry IPv4 address 192.168.10.10
lldptool -T -i eth3 -V mngAddr ipv4=192.168.10.10
Set a Management Address TLV on eth3 to carry IPv6 address ::192.168.10.10
lldptool -T -i eth3 -V mngAddr ipv6=::192.168.10.10
Get the configured IPv4 address for the Management Address TLV on eth3
lldptool -t -i eth3 -V mngAddr -c ipv4
Get all configured attributes for the Management Address TLV on eth3
lldptool -t -i eth3 -V mngAddr -c
Enable transmit of the Edge Virtual Bridging TLV for interface eth4
lldptool -i eth4 -T -g ncb -V evbCfg enableTx=yes
Enable transmit of VDP for interface eth4
lldptool -i eth4 -T -V vdp enableTx=yes
Display process identifier of lldpad
lldptool -p
› SEE ALSO
lldptool-dcbx(8), lldptool-ets(8), lldptool-pfc(8), lldptool-app(8), lldptool-med(8),
lldptool-vdp(8), lldptool-evb(8), lldptool-evb22(8), dcbtool(8), lldpad(8)
› COPYRIGHT
lldptool - LLDP agent configuration utility Copyright(c) 2007-2012 Intel
Corporation. Portions of lldptool are based on:
hostapd-0.5.7
Copyright
(c) 2004-2008, Jouni Malinen <[email protected]>
› LICENSE
This program is free software; you can redistribute it and/or modify it under the terms
and conditions of the GNU General Public License, version 2, as published by the
Free Software Foundation.
This program is distributed in the hope it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
You should have received a copy of the GNU General Public License along with this
program; if not, write to the Free Software Foundation, Inc., 51 Franklin St - Fifth
Floor, Boston, MA 02110-1301 USA.
The full GNU General Public License is included in this distribution in the file called
“COPYING”.
› SUPPORT
Contact Information: open-lldp Mailing List <[email protected]>
SMTP
› NAME
smtp - Postfix SMTP+LMTP client
› SYNOPSIS
smtp [generic Postfix daemon options]
› DESCRIPTION
The Postfix SMTP+LMTP client implements the SMTP and LMTP mail delivery
protocols. It processes message delivery requests from the queue manager. Each
request specifies a queue file, a sender address, a domain or host to deliver to, and
recipient information. This program expects to be run from the master(8) process
manager.
The SMTP+LMTP client updates the queue file and marks recipients as finished, or it
informs the queue manager that delivery should be tried again at a later time.
Delivery status reports are sent to the bounce(8), defer(8) or trace(8) daemon as
appropriate.
The SMTP+LMTP client looks up a list of mail exchanger addresses for the
destination host, sorts the list by preference, and connects to each listed address until
it finds a server that responds.
When a server is not reachable, or when mail delivery fails due to a recoverable error
condition, the SMTP+LMTP client will try to deliver the mail to an alternate host.
After a successful mail transaction, a connection may be saved to the scache(8)
connection cache server, so that it may be used by any SMTP+LMTP client for a
subsequent transaction.
By default, connection caching is enabled temporarily for destinations that have a
high volume of mail in the active queue. Connection caching can be enabled
permanently for specific destinations.
› SMTP DESTINATION SYNTAX
SMTP destinations have the following form:
domainname
domainname:port
Look up the mail exchangers for the specified domain, and connect to the specified
port (default: smtp).
[hostname]
[hostname]:port
Look up the address(es) of the specified host, and connect to the specified port
(default: smtp).
[address]
[address]:port
Connect to the host at the specified address, and connect to the specified port
(default: smtp). An IPv6 address must be formatted as [ipv6:address].
› LMTP DESTINATION SYNTAX
LMTP destinations have the following form:
unix:pathname
Connect to the local UNIX-domain server that is bound to the specified pathname. If
the process runs chrooted, an absolute pathname is interpreted relative to the Postfix
queue directory.
inet:hostname
inet:hostname:port
inet:[address]
inet:[address]:port
Connect to the specified TCP port on the specified local or remote host. If no port is
specified, connect to the port defined as lmtp in services(4). If no such service is
found, the lmtp_tcp_port configuration parameter (default value of 24) will be used.
An IPv6 address must be formatted as [ipv6:address].
› SECURITY
The SMTP+LMTP client is moderately security-sensitive. It talks to SMTP or LMTP
servers and to DNS servers on the network. The SMTP+LMTP client can be run
chrooted at fixed low privilege.
› STANDARDS
RFC 821 (SMTP protocol) RFC 822 (ARPA Internet Text Messages) RFC 1651
(SMTP service extensions) RFC 1652 (8bit-MIME transport) RFC 1870
(Message Size Declaration) RFC 2033 (LMTP protocol) RFC 2034 (SMTP
Enhanced Error Codes) RFC 2045 (MIME: Format of Internet Message
Bodies) RFC 2046 (MIME: Media Types) RFC 2554 (AUTH command) RFC 2821
(SMTP protocol) RFC 2920 (SMTP Pipelining) RFC 3207 (STARTTLS command)
RFC 3461 (SMTP DSN Extension) RFC 3463 (Enhanced Status Codes) RFC 4954
(AUTH command) RFC 5321 (SMTP protocol)
› DIAGNOSTICS
Problems and transactions are logged to syslogd(8). Corrupted message files are
marked so that the queue manager can move them to the corrupt queue for further
inspection.
Depending on the setting of the notify_classes parameter, the postmaster is notified
of bounces, protocol problems, and of other trouble.
› BUGS
SMTP and LMTP connection caching does not work with TLS. The necessary
support for TLS object passivation and re-activation does not exist without closing
the session, which defeats the purpose.
SMTP and LMTP connection caching assumes that SASL credentials are valid for all
destinations that map onto the same IP address and TCP port.
› CONFIGURATION PARAMETERS
Before Postfix version 2.3, the LMTP client is a separate program that implements
only a subset of the functionality available with SMTP: there is no support for TLS,
and connections are cached in-process, making it ineffective when the client is used
for multiple domains.
Most smtp_xxx configuration parameters have an lmtp_xxx “mirror” parameter for
the equivalent LMTP feature. This document describes only those LMTP-related
parameters that aren’t simply “mirror” parameters.
Changes to main.cf are picked up automatically, as smtp(8) processes run for only a
limited amount of time. Use the command “postfix reload” to speed up a change.
The text below provides only a parameter summary. See postconf(5) for more details
including examples.
› COMPATIBILITY CONTROLS
ignore_mx_lookup_error (no)
Ignore DNS MX lookups that produce no response.
smtp_always_send_ehlo (yes)
Always send EHLO at the start of an SMTP session.
smtp_never_send_ehlo (no)
Never send EHLO at the start of an SMTP session.
smtp_defer_if_no_mx_address_found (no)
Defer mail delivery when no MX record resolves to an IP address.
smtp_line_length_limit (998)
The maximal length of message header and body lines that Postfix will send via
SMTP.
smtp_pix_workaround_delay_time (10s)
How long the Postfix SMTP client pauses before sending “.<CR><LF>” in order to
work around the PIX firewall “<CR><LF>.<CR><LF>” bug.
smtp_pix_workaround_threshold_time (500s)
How long a message must be queued before the Postfix SMTP client turns on the PIX
firewall “<CR><LF>.<CR><LF>” bug workaround for delivery through firewalls
with “smtp fixup” mode turned on.
smtp_pix_workarounds (disable_esmtp, delay_dotcrlf)
A list that specifies zero or more workarounds for CISCO PIX firewall bugs.
smtp_pix_workaround_maps (empty)
Lookup tables, indexed by the remote SMTP server address, with per-destination
workarounds for CISCO PIX firewall bugs.
smtp_quote_rfc821_envelope (yes)
Quote addresses in Postfix SMTP client MAIL FROM and RCPT TO commands as
required by RFC 5321.
smtp_reply_filter (empty)
A mechanism to transform replies from remote SMTP servers one line at a time.
smtp_skip_5xx_greeting (yes)
Skip remote SMTP servers that greet with a 5XX status code (go away, do not try
again later).
smtp_skip_quit_response (yes)
Do not wait for the response to the SMTP QUIT command.
Additional remote client information is made available via the following environment
variables:
CLIENT_ADDRESS
Remote client network address. Available as of Postfix 2.2.
CLIENT_HELO
Remote client EHLO command parameter. Available as of Postfix 2.2.
CLIENT_HOSTNAME
Remote client hostname. Available as of Postfix 2.2.
CLIENT_PROTOCOL
Remote client protocol. Available as of Postfix 2.2.
SASL_METHOD
SASL authentication method specified in the remote client AUTH command.
Available as of Postfix 2.2.
SASL_SENDER
SASL sender address specified in the remote client MAIL FROM command.
Available as of Postfix 2.2.
SASL_USERNAME
SASL username specified in the remote client AUTH command. Available as of
Postfix 2.2.
The PATH environment variable is always reset to a system-dependent default path, and
environment variables whose names are blessed by the export_environment
configuration parameter are exported unchanged.
The current working directory is the mail queue directory.
The local(8) daemon prepends a “From sender time_stamp” envelope header to each
message, prepends an X-Original-To: header with the recipient address as given to
Postfix, prepends an optional Delivered-To: header with the final recipient envelope
address, prepends a Return-Path: header with the sender envelope address, and appends
no empty line.
› EXTERNAL FILE DELIVERY
The delivery format depends on the destination filename syntax. The default is to use
UNIX-style mailbox format. Specify a name ending in / for qmail-compatible
maildir delivery.
The allow_mail_to_files configuration parameter restricts delivery to external files.
The default setting (alias, forward) forbids file destinations in :include: files.
In the case of UNIX-style mailbox delivery, the local(8) daemon prepends a “From
sender time_stamp” envelope header to each message, prepends an X-Original-To:
header with the recipient address as given to Postfix, prepends an optional Delivered-
To: header with the final recipient envelope address, prepends a > character to lines
beginning with “From “, and appends an empty line. The envelope sender address is
available in the Return-Path: header. When the destination is a regular file, it is
locked for exclusive access while delivery is in progress. In case of problems, an
attempt is made to truncate a regular file to its original length.
In the case of maildir delivery, the local daemon prepends an optional Delivered-To:
header with the final envelope recipient address, and prepends an X-Original-To:
header with the recipient address as given to Postfix. The envelope sender address is
available in the Return-Path: header.
› ADDRESS EXTENSION
The optional recipient_delimiter configuration parameter specifies how to separate
address extensions from local recipient names.
For example, with “recipient_delimiter = +“, mail for name+foo is delivered to the
alias name+foo or to the alias name, to the destinations listed in ~name/.forward+foo
or in ~name/.forward, to the mailbox owned by the user name, or it is sent back as
undeliverable.
› DELIVERY RIGHTS
Deliveries to external files and external commands are made with the rights of the
receiving user on whose behalf the delivery is made. In the absence of a user context,
the local(8) daemon uses the owner rights of the :include: file or alias database.
When those files are owned by the superuser, delivery is made with the rights
specified with the default_privs configuration parameter.
› STANDARDS
RFC 822 (ARPA Internet Text Messages) RFC 3463 (Enhanced status codes)
› DIAGNOSTICS
Problems and transactions are logged to syslogd(8). Corrupted message files are
marked so that the queue manager can move them to the corrupt queue afterwards.
Depending on the setting of the notify_classes parameter, the postmaster is notified
of bounces and of other trouble.
› SECURITY
The local(8) delivery agent needs a dual personality 1) to access the private Postfix
queue and IPC mechanisms, 2) to impersonate the recipient and deliver to recipient-
specified files or commands. It is therefore security sensitive.
The local(8) delivery agent disallows regular expression substitution of $1 etc. in
alias_maps, because that would open a security hole.
The local(8) delivery agent will silently ignore requests to use the proxymap(8)
server within alias_maps. Instead it will open the table directly. Before Postfix
version 2.2, the local(8) delivery agent will terminate with a fatal error.
› BUGS
For security reasons, the message delivery status of external commands or of external
files is never checkpointed to file. As a result, the program may occasionally deliver
more than once to a command or external file. Better safe than sorry.
Mutually-recursive aliases or ~/.forward files are not detected early. The resulting
mail forwarding loop is broken by the use of the Delivered-To: message header.
› CONFIGURATION PARAMETERS
Changes to main.cf are picked up automatically, as local(8) processes run for only a
limited amount of time. Use the command “postfix reload” to speed up a change.
The text below provides only a parameter summary. See postconf(5) for more details
including examples.
› COMPATIBILITY CONTROLS
biff (yes)
Whether or not to use the local biff service.
expand_owner_alias (no)
When delivering to an alias “aliasname” that has an “owner-aliasname” companion
alias, set the envelope sender address to the expansion of the “owner-aliasname”
alias.
owner_request_special (yes)
Give special treatment to owner-listname and listname-request address localparts:
don’t split such addresses when the recipient_delimiter is set to “-“.
sun_mailtool_compatibility (no)
Obsolete SUN mailtool compatibility feature.
The first few lines set global options; in the example, logs are compressed after they
are rotated. Note that comments may appear anywhere in the config file as long as
the first non-whitespace character on the line is a #.
The next section of the config files defined how to handle the log file
/var/log/messages. The log will go through five weekly rotations before being
removed. After the log file has been rotated (but before the old version of the log has
been compressed), the command /sbin/killall -HUP syslogd will be executed.
The next section defines the parameters for both /var/log/httpd/access.log and
/var/log/httpd/error.log. They are rotated whenever it grows over 100k in size, and
the old logs files are mailed (uncompressed) to [email protected] after going through 5
rotations, rather than being removed. The sharedscripts means that the postrotate
script will only be run once (after the old logs have been compressed), not once for
each log which is rotated. Note that the double quotes around the first filename at the
beginning of this section allows logrotate to rotate logs with spaces in the name.
Normal shell quoting rules apply, with ‘, “, and \ characters supported.
The next section defines the parameters for all of the files in /var/log/news. Each file
is rotated on a monthly basis. This is considered a single rotation directive and if
errors occur for more than one file, the log files are not compressed.
The last section uses tilde expansion to rotate log files in the home directory of the
current user. This is only available, if your glob library supports tilde expansion.
GNU glob does support this.
Please use wildcards with caution. If you specify *, logrotate will rotate all files,
including previously rotated ones. A way around this is to use the olddir directive or
a more exact wildcard (such as *.log).
Here is more information on the directives which may be included in a logrotate
configuration file:
compress
Old versions of log files are compressed with gzip(1) by default. See also
nocompress.
compresscmd
Specifies which command to use to compress log files. The default is gzip. See also
compress.
uncompresscmd
Specifies which command to use to uncompress log files. The default is gunzip.
compressext
Specifies which extension to use on compressed logfiles, if compression is enabled.
The default follows that of the configured compression command.
compressoptions
Command line options may be passed to the compression program, if one is in use.
The default, for gzip(1), is “-6” (biased towards high compression at the expense of
speed). If you use a different compression command, you may need to change the
compressoptions to match.
copy
Make a copy of the log file, but don’t change the original at all. This option can be
used, for instance, to make a snapshot of the current log file, or when some other
utility needs to truncate or parse the file. When this option is used, the create option
will have no effect, as the old log file stays in place.
copytruncate
Truncate the original log file in place after creating a copy, instead of moving the old
log file and optionally creating a new one. It can be used when some program cannot
be told to close its logfile and thus might continue writing (appending) to the
previous log file forever. Note that there is a very small time slice between copying
the file and truncating it, so some logging data might be lost. When this option is
used, the create option will have no effect, as the old log file stays in place.
create mode owner group, create owner group
Immediately after rotation (before the postrotate script is run) the log file is created
(with the same name as the log file just rotated). mode specifies the mode for the log
file in octal (the same as chmod(2)), owner specifies the user name who will own the
log file, and group specifies the group the log file will belong to. Any of the log file
attributes may be omitted, in which case those attributes for the new file will use the
same values as the original log file for the omitted attributes. This option can be
disabled using the nocreate option.
createolddir mode owner group
If the directory specified by olddir directive does not exist, it is created. mode
specifies the mode for the olddir directory in octal (the same as chmod(2)), owner
specifies the user name who will own the olddir directory, and group specifies the
group the olddir directory will belong to. This option can be disabled using the
nocreateolddir option.
daily
Log files are rotated every day.
dateext
Archive old versions of log files adding a date extension like YYYYMMDD instead
of simply adding a number. The extension may be configured using the dateformat
and dateyesterday options.
dateformat format_string
Specify the extension for dateext using the notation similar to strftime(3) function.
Only %Y %m %d and %s specifiers are allowed. The default value is -%Y%m%d.
Note that also the character separating log name from the extension is part of the
dateformat string. The system clock must be set past Sep 9th 2001 for %s to work
correctly. Note that the datestamps generated by this format must be lexically sortable
(i.e., first the year, then the month then the day. e.g., 2001/12/01 is ok, but
01/12/2001 is not, since 01/11/2002 would sort lower while it is later). This is
because when using the rotate option, logrotate sorts all rotated filenames to find out
which logfiles are older and should be removed.
dateyesterday
Use yesterday’s instead of today’s date to create the dateext extension, so that the
rotated log file has a date in its name that is the same as the timestamps within it.
delaycompress
Postpone compression of the previous log file to the next rotation cycle. This only
has effect when used in combination with compress. It can be used when some
program cannot be told to close its logfile and thus might continue writing to the
previous log file for some time.
extension ext
Log files with ext extension can keep it after the rotation. If compression is used, the
compression extension (normally .gz) appears after ext. For example you have a
logfile named mylog.foo and want to rotate it to mylog.1.foo.gz instead of
mylog.foo.1.gz.
hourly
Log files are rotated every hour. Note that usually logrotate is configured to be run by
cron daily. You have to change this configuration and run logrotate hourly to be able
to really rotate logs hourly.
ifempty
Rotate the log file even if it is empty, overriding the notifempty option (ifempty is
the default).
include file_or_directory
Reads the file given as an argument as if it was included inline where the include
directive appears. If a directory is given, most of the files in that directory are read in
alphabetic order before processing of the including file continues. The only files
which are ignored are files which are not regular files (such as directories and named
pipes) and files whose names end with one of the taboo extensions, as specified by
the tabooext directive.
mail address
When a log is rotated out-of-existence, it is mailed to address. If no mail should be
generated by a particular log, the nomail directive may be used.
mailfirst
When using the mail command, mail the just-rotated file, instead of the about-to-
expire file.
maillast
When using the mail command, mail the about-to-expire file, instead of the just-
rotated file (this is the default).
maxage count
Remove rotated logs older than <count> days. The age is only checked if the logfile
is to be rotated. The files are mailed to the configured address if maillast and mail
are configured.
maxsize size
Log files are rotated when they grow bigger than size bytes even before the
additionally specified time interval (daily, weekly, monthly, or yearly). The related
size option is similar except that it is mutually exclusive with the time interval
options, and it causes log files to be rotated without regard for the last rotation time.
When maxsize is used, both the size and timestamp of a log file are considered.
minsize size
Log files are rotated when they grow bigger than size bytes, but not before the
additionally specified time interval (daily, weekly, monthly, or yearly). The related
size option is similar except that it is mutually exclusive with the time interval
options, and it causes log files to be rotated without regard for the last rotation time.
When minsize is used, both the size and timestamp of a log file are considered.
missingok
If the log file is missing, go on to the next one without issuing an error message. See
also nomissingok.
monthly
Log files are rotated the first time logrotate is run in a month (this is normally on the
first day of the month).
nocompress
Old versions of log files are not compressed. See also compress.
nocopy
Do not copy the original log file and leave it in place. (this overrides the copy
option).
nocopytruncate
Do not truncate the original log file in place after creating a copy (this overrides the
copytruncate option).
nocreate
New log files are not created (this overrides the create option).
nocreateolddir
olddir directory is not created by logrotate when it does not exist.
nodelaycompress
Do not postpone compression of the previous log file to the next rotation cycle (this
overrides the delaycompress option).
nodateext
Do not archive old versions of log files with date extension (this overrides the
dateext option).
nomail
Don’t mail old log files to any address.
nomissingok
If a log file does not exist, issue an error. This is the default.
noolddir
Logs are rotated in the same directory the log normally resides in (this overrides the
olddir option).
nosharedscripts
Run prerotate and postrotate scripts for every log file which is rotated (this is the
default, and overrides the sharedscripts option). The absolute path to the log file is
passed as first argument to the script. If the scripts exit with error, the remaining
actions will not be executed for the affected log only.
noshred
Do not use shred when deleting old log files. See also shred.
notifempty
Do not rotate the log if it is empty (this overrides the ifempty option).
olddir directory
Logs are moved into directory for rotation. The directory must be on the same
physical device as the log file being rotated, unless copy, copytruncate or
renamecopy option is used. The directory is assumed to be relative to the directory
holding the log file unless an absolute path name is specified. When this option is
used all old versions of the log end up in directory. This option may be overridden by
the noolddir option.
postrotate/endscript
The lines between postrotate and endscript (both of which must appear on lines by
themselves) are executed (using /bin/sh) after the log file is rotated. These directives
may only appear inside a log file definition. Normally, the absolute path to the log
file is passed as first argument to the script. If sharedscripts is specified, whole
pattern is passed to the script. See also prerotate. See sharedscripts and
nosharedscripts for error handling.
prerotate/endscript
The lines between prerotate and endscript (both of which must appear on lines by
themselves) are executed (using /bin/sh) before the log file is rotated and only if the
log will actually be rotated. These directives may only appear inside a log file
definition. Normally, the absolute path to the log file is passed as first argument to the
script. If sharedscripts is specified, whole pattern is passed to the script. See also
postrotate. See sharedscripts and nosharedscripts for error handling.
firstaction/endscript
The lines between firstaction and endscript (both of which must appear on lines by
themselves) are executed (using /bin/sh) once before all log files that match the
wildcarded pattern are rotated, before prerotate script is run and only if at least one
log will actually be rotated. These directives may only appear inside a log file
definition. Whole pattern is passed to the script as first argument. If the script exits
with error, no further processing is done. See also lastaction.
lastaction/endscript
The lines between lastaction and endscript (both of which must appear on lines by
themselves) are executed (using /bin/sh) once after all log files that match the
wildcarded pattern are rotated, after postrotate script is run and only if at least one log
is rotated. These directives may only appear inside a log file definition. Whole
pattern is passed to the script as first argument. If the script exits with error, just an
error message is shown (as this is the last action). See also firstaction.
preremove/endscript
The lines between preremove and endscript (both of which must appear on lines by
themselves) are executed (using /bin/sh) once just before removal of a log file. The
logrotate will pass the name of file which is soon to be removed. See also firstaction.
rotate count
Log files are rotated count times before being removed or mailed to the address
specified in a mail directive. If count is 0, old versions are removed rather than
rotated.
size size
Log files are rotated only if they grow bigger then size bytes. If size is followed by k,
the size is assumed to be in kilobytes. If the M is used, the size is in megabytes, and if
G is used, the size is in gigabytes. So size 100, size 100k, size 100M and size 100G
are all valid.
sharedscripts
Normally, prerotate and postrotate scripts are run for each log which is rotated and
the absolute path to the log file is passed as first argument to the script. That means a
single script may be run multiple times for log file entries which match multiple files
(such as the /var/log/news/* example). If sharedscripts is specified, the scripts are
only run once, no matter how many logs match the wildcarded pattern, and whole
pattern is passed to them. However, if none of the logs in the pattern require rotating,
the scripts will not be run at all. If the scripts exit with error, the remaining actions
will not be executed for any logs. This option overrides the nosharedscripts option
and implies create option.
shred
Delete log files using shred -u instead of unlink(). This should ensure that logs are
not readable after their scheduled deletion; this is off by default. See also noshred.
shredcycles count
Asks GNU shred(1) to overwrite log files count times before deletion. Without this
option, shred‘s default will be used.
start count
This is the number to use as the base for rotation. For example, if you specify 0, the
logs will be created with a .0 extension as they are rotated from the original log files.
If you specify 9, log files will be created with a .9, skipping 0-8. Files will still be
rotated the number of times specified with the rotate directive.
su user group
Rotate log files set under this user and group instead of using default user/group
(usually root). user specifies the user name used for rotation and group specifies the
group used for rotation.
tabooext [+] list
The current taboo extension list is changed (see the include directive for information
on the taboo extensions). If a + precedes the list of extensions, the current taboo
extension list is augmented, otherwise it is replaced. At startup, the taboo extension
list contains .rpmsave, .rpmorig, ~, .disabled, .dpkg-old, .dpkg-dist, .dpkg-new,
.cfsaved, .ucf-old, .ucf-dist, .ucf-new, .rpmnew, .swp, .cfsaved, .rhn-cfg-tmp-*
weekly
Log files are rotated if the current weekday is less than the weekday of the last
rotation or if more than a week has passed since the last rotation. This is normally the
same as rotating logs on the first day of the week, but it works better if logrotate is
not run every night.
yearly
Log files are rotated if the current year is not the same as the last rotation.
› FILES
/var/lib/logrotate.status
Default state file.
/etc/logrotate.conf
Configuration options.
› SEE ALSO
gzip(1)
<http://fedorahosted.org/logrotate/>
› AUTHORS
Erik Troan, Preston Brown, Jan Kaluza. <logrotate-
[email protected]>
LOGSAVE
› NAME
logsave - save the output of a command in a logfile
› SYNOPSIS
logsave [ -asv ] logfile cmd_prog [ … ]
› DESCRIPTION
The logsave program will execute cmd_prog with the specified argument(s), and save
a copy of its output to logfile. If the containing directory for logfile does not exist,
logsave will accumulate the output in memory until it can be written out. A copy of
the output will also be written to standard output.
If cmd_prog is a single hyphen (‘-‘), then instead of executing a program, logsave
will take its input from standard input and save it in logfile
logsave is useful for saving the output of initial boot scripts until the /var partition is
mounted, so the output can be written to /var/log.
› OPTIONS
-a
This option will cause the output to be appended to logfile, instead of replacing its
current contents.
-s
This option will cause logsave to skip writing to the log file text which is bracketed
with a control-A (ASCII 001 or Start of Header) and control-B (ASCII 002 or Start of
Text). This allows progress bar information to be visible to the user on the console,
while not being written to the log file.
-v
This option will make logsave to be more verbose in its output to the user.
› AUTHOR
Theodore Ts’o ([email protected])
› SEE ALSO
fsck(8)
LOSETUP
› NAME
losetup - set up and control loop devices
› SYNOPSIS
Get info:
losetup loopdev
losetup -l [-a]
losetup -j file [-o offset]
Delete loop:
losetup -d loopdev…
Delete all used loop devices:
losetup -D
Print name of first unused loop device:
losetup -f
Setup loop device:
losetup [-o offset] [—sizelimit size] [-p pfd] [-rP] {-f[—show]|loopdev} file
Resize loop device:
losetup -c loopdev
› DESCRIPTION
losetup is used to associate loop devices with regular files or block devices, to detach
loop devices and to query the status of a loop device. If only the loopdev argument is
given, the status of the corresponding loop device is shown.
Note that the old output format (e.g. losetup -a) with comma delimited strings is
deprecated in favour of the —list output format (e.g. losetup -a -l).
› OPTIONS
+The size and offset arguments may be followed by the multiplicative +suffixes
KiB=1024, MiB=1024*1024, and so on for GiB, TiB, PiB, EiB, ZiB and YiB +(the
“iB” is optional, e.g. “K” has the same meaning as “KiB”) or the suffixes +KB=1000,
MB=1000*1000, and so on for GB, TB, PB, EB, ZB and YB.
-a, —all
show status of all loop devices. Note that not all information are accessible for non-
root users. See also —list. The old output format (as printed without —list) is
deprecated.
-c, —set-capacity loopdev
force loop driver to reread size of the file associated with the specified loop device
-d, —detach loopdev…
detach the file or device associated with the specified loop device(s)
-D, —detach-all
detach all associated loop devices
-f, —find
find the first unused loop device. If a file argument is present, use this device.
Otherwise, print its name
-h, —help
print help
-j, —associated file
show status of all loop devices associated with given file
-l, —list
if a loop device or the -a option is specified, print default columns for either the
specified loop device or all loop devices, default is to print info about all devices.
-o, —offset offset
the data start is moved offset bytes into the specified file or device
-O, —output columns
specify which columns are to be printed for the —list output
—sizelimit size
the data end is set to no more than size bytes after the data start
-P, —partscan
force kernel to scan partition table on newly created loop device
-r, —read-only
setup read-only loop device
—show
print device name if the -f option and a file argument are present.
-v, —verbose
verbose mode
› ENCRYPTION
Cryptoloop is no longer supported in favor of dm-crypt. For more details see
cryptsetup(8).
› RETURN VALUE
losetup returns 0 on success, nonzero on failure. When losetup displays the status of
a loop device, it returns 1 if the device is not configured and 2 if an error occurred
which prevented from determining the status of the device.
› FILES
/dev/loop[0..N]
loop block devices
/dev/loop-cotrol
loop control device
› EXAMPLE
The following commands can be used as an example of using the loop device.
# dd if=/dev/zero of=~/file.img bs=1MiB count=10 # losetup --find --
show ~/file.img /dev/loop0 # mkfs -t ext2 /dev/loop0 # mount /dev/loop0
/mnt ... # umount /dev/loop0 # losetup --detach /dev/loop0
› AUTHORS
Karel Zak <[email protected]>, based on original version from Theodore Ts’o
<[email protected]>
› AVAILABILITY
The losetup command is part of the util-linux package and is available from
ftp://ftp.kernel.org/pub/linux/utils/util-linux/.
lpadmin
› NAME
lpadmin - configure cups printers and classes
› SYNOPSIS
lpadmin [ -E ] [-U username ] [ -h server[:port] ] -d destination lpadmin [ -E ] [-U
username ] [ -h server[:port] ] -p destination [ -R name-default ] option(s) lpadmin [
-E ] [-U username ] [ -h server[:port] ] -x destination
› DESCRIPTION
lpadmin configures printer and class queues provided by CUPS. It can also be used to
set the server default printer or class.
When specified before the -d, -p, or -x options, the -E option forces encryption when
connecting to the server.
The first form of the command (-d) sets the default printer or class to destination.
Subsequent print jobs submitted via the lp(1) or lpr(1) commands will use this
destination unless the user specifies otherwise with the lpoptions(1) command.
The second form of the command (-p) configures the named printer or class. The
additional options are described below.
The third form of the command (-x) deletes the printer or class destination. Any jobs
that are pending for the destination will be removed and any job that is currently
printed will be aborted.
› CONFIGURATION OPTIONS
The following options are recognized when configuring a printer queue:
-c class
Adds the named printer to class. If class does not exist it is created automatically.
-i interface
Sets a System V style interface script for the printer. This option cannot be specified
with the -P option (PPD file) and is intended for providing support for legacy printer
drivers.
-m model
Sets a standard System V interface script or PPD file for the printer from the model
directory or using one of the driver interfaces. Use the -m option with the lpinfo(8)
command to get a list of supported models.
-o cupsIPPSupplies=true
-o cupsIPPSupplies=false
Specifies whether IPP supply level values should be reported.
-o cupsSNMPSupplies=true
-o cupsSNMPSupplies=false
Specifies whether SNMP supply level (RFC 3805) values should be reported.
-o job-k-limit=value
Sets the kilobyte limit for per-user quotas. The value is an integer number of
kilobytes; one kilobyte is 1024 bytes.
-o job-page-limit=value
Sets the page limit for per-user quotas. The value is the integer number of pages that
can be printed; double-sided pages are counted as two pages.
-o job-quota-period=value
Sets the accounting period for per-user quotas. The value is an integer number of
seconds; 86,400 seconds are in one day.
-o job-sheets-default=banner
-o job-sheets-default=banner,banner
Sets the default banner page(s) to use for print jobs.
-o name=value
Sets a PPD option for the printer. PPD options can be listed using the -l option with
the lpoptions(1) command.
-o name-default=value
Sets a default server-side option for the destination. Any print-time option can be
defaulted, e.g. “-o cpi-default=17” to set the default “cpi” option value to 17.
-o port-monitor=name
Sets the binary communications program to use when printing, “none”, “bcp”, or
“tbcp”. The default program is “none”. The specified port monitor must be listed in
the printer’s PPD file.
-o printer-error-policy=name
Sets the error policy to be used when the printer backend is unable to send the job to
the printer. The name must be one of “abort-job”, “retry-job”, “retry-current-job”, or
“stop-printer”. The default error policy is “stop-printer” for printers and “retry-
current-job” for classes.
-o printer-is-shared=true/false
Sets the destination to shared/published or unshared/unpublished. Shared/published
destinations are publicly announced by the server on the LAN based on the browsing
configuration in cupsd.conf, while unshared/unpublished destinations are not
announced. The default value is “true”.
-o printer-op-policy=name
Sets the IPP operation policy associated with the destination. The name must be
defined in the cupsd.conf in a Policy section. The default operation policy is
“default”.
-R name-default
Deletes the named option from printer.
-r class
Removes the named printer from class. If the resulting class becomes empty it is
removed.
-u allow:user,user,@group
-u deny:user,user,@group
-u allow:all
-u deny:none
Sets user-level access control on a destination. Names starting with “@” are
interpreted as UNIX groups. The latter two forms turn user-level access control off.
-v “device-uri”
Sets the device-uri attribute of the printer queue. Use the -v option with the lpinfo(8)
command to get a list of supported device URIs and schemes.
-D “info”
Provides a textual description of the destination.
-E
Enables the destination and accepts jobs; this is the same as running the
cupsaccept(8) and cupsenable(8) programs on the destination.
-L “location”
Provides a textual location of the destination.
-P ppd-file
Specifies a PostScript Printer Description file to use with the printer. If specified, this
option overrides the -i option (interface script).
› COMPATIBILITY
Unlike the System V printing system, CUPS allows printer names to contain any
printable character except SPACE, TAB, “/”, or “#”. Also, printer and class names
are not case-sensitive. Finally, the CUPS version of lpadmin may ask the user for an
access password depending on the printing system configuration. This differs from
the System V version which requires the root user to execute this command.
› LIMITATIONS
The CUPS version of lpadmin does not support all of the System V or Solaris
printing system configuration options.
› SEE ALSO
cupsaccept(8), cupsenable(8), lpinfo(8), lpoptions(1), http://localhost:631/help
› COPYRIGHT
Copyright 2007-2013 by Apple Inc.
lpc
› NAME
lpc - line printer control program
› SYNOPSIS
lpc [ command [ parameter(s) ] ]
› DESCRIPTION
lpc provides limited control over printer and class queues provided by CUPS. It can
also be used to query the state of queues.
If no command is specified on the command-line, lpc will display a prompt and
accept commands from the standard input.
› COMMANDS
The lpc program accepts a subset of commands accepted by the Berkeley lpc
program of the same name:
exit
Exits the command interpreter.
help [command]
? [command]
Displays a short help message.
quit
Exits the command interpreter.
status [queue]
Displays the status of one or more printer or class queues.
› LIMITATIONS
Since lpc is geared towards the Berkeley printing system, it is impossible to use lpc
to configure printer or class queues provided by CUPS. To configure printer or class
queues you must use the lpadmin(8) command or another CUPS-compatible client
with that functionality.
› COMPATIBILITY
The CUPS version of lpc does not implement all of the standard Berkeley or LPRng
commands.
› SEE ALSO
cancel(1), cupsaccept(8), cupsenable(8), lp(1), lpr(1), lprm(1), lpstat(1),
http://localhost:631/help
› COPYRIGHT
Copyright 2007-2013 by Apple Inc.
lpinfo
› NAME
lpinfo - show available devices or drivers
› SYNOPSIS
lpinfo [ -E ] [ -U username ] [ -h server[:port] ] [ -l ] [ —device-id device-id-string ]
[ —exclude-schemes scheme-list ] [ —include-schemes scheme-list ] [ —language
locale ] [ —make-and-model name ] [ —product name ] -m lpinfo [ -E ] [ -U
username ] [ -h server[:port] ] [ -l ] [ —exclude-schemes scheme-list ] [ —include-
schemes scheme-list ] [ —timeout seconds ] -v
› DESCRIPTION
lpinfo lists the available devices or drivers known to the CUPS server. The first form
(-m) lists the available drivers, while the second form (-v) lists the available devices.
› OPTIONS
lpinfo accepts the following options:
-E
Forces encryption when connecting to the server.
-U username
Sets the username to use when connecting to the server.
-h server[:port]
Selects an alternate server.
-l
Shows a “long” listing of devices or drivers.
—device-id device-id-string
Specifies the IEEE-1284 device ID to match when listing drivers with the -m option.
—exclude-schemes scheme-list
Specifies a comma-separated list of device or PPD schemes that should be excluded
from the results. Static PPD files use the “file” scheme.
—include-schemes scheme-list
Specifies a comma-separated list of device or PPD schemes that should be included
in the results. Static PPD files use the “file” scheme.
—language locale
Specifies the language to match when listing drivers with the -m option.
—make-and-model name
Specifies the make and model to match when listing drivers with the -m option.
—product name
Specifies the product to match when listing drivers with the -m option.
—timeout seconds
Specifies the timeout when listing devices with the -v option.
› COMPATIBILITY
The lpinfo command is unique to CUPS.
› SEE ALSO
lpadmin(8), http://localhost:631/help
› COPYRIGHT
Copyright 2007-2013 by Apple Inc.
lpmove
› NAME
lpmove - move a job or all jobs to a new destination
› SYNOPSIS
lpmove [ -E ] [ -h server[:port] ] [ -U username ] job destination lpmove [ -E ] [ -h
server[:port] ] [ -U username ] source destination
› DESCRIPTION
lpmove moves the specified job or all jobs from source to destination. job can be the
job ID number or the old destination and job ID: lpmove 123 newprinter lpmove
oldprinter-123 newprinter
› OPTIONS
The lpmove command supports the following options:
-E
Forces encryption when connecting to the server.
-U username
Specifies an alternate username.
-h server[:port]
Specifies an alternate server.
› SEE ALSO
cancel(1), lp(1), http://localhost:631/help
› COPYRIGHT
Copyright 2007-2013 by Apple Inc.
LSBLK
› NAME
lsblk - list block devices
› SYNOPSIS
lsblk [options] [device…]
› DESCRIPTION
lsblk lists information about all available or the specified block devices. The lsblk
command reads the sysfs filesystem to gather information.
The command prints all block devices (except RAM disks) in a tree-like format by
default. Use lsblk —help to get a list of all available columns.
The default output, as well as the default output from options like —fs and —
topology, is subject to change. So whenever possible, you should avoid using default
outputs in your scripts. Always explicitly define expected columns by using —
output columns-list in environments where a stable output is required.
› OPTIONS
-a, —all
Also list empty devices. (By default they are skipped.)
-b, —bytes
Print the SIZE column in bytes rather than in a human-readable format.
-D, —discard
Print information about the discarding capabilities (TRIM, UNMAP) for each device.
-d, —nodeps
Do not print holder devices or slaves. For example, lsblk —nodeps /dev/sda prints
information about the sda device only.
-e, —exclude list
Exclude the devices specified by the comma-separated list of major device numbers.
Note that RAM disks (major=1) are excluded by default. The filter is applied to the
top-level devices only.
-f, —fs
Output info about filesystems. This option is equivalent to -
o NAME,FSTYPE,LABEL,MOUNTPOINT. The authoritative information about
filesystems and raids is provided by the blkid(8) command.
-h, —help
Print a help text and exit.
-I, —include list
Include devices specified by the comma-separated list of major device numbers. The
filter is applied to the top-level devices only.
-i, —ascii
Use ASCII characters for tree formatting.
-l, —list
Produce output in the form of a list.
-m, —perms
Output info about device owner, group and mode. This option is equivalent to -
o NAME,SIZE,OWNER,GROUP,MODE.
-n, —noheadings
Do not print a header line.
-o, —output list
Specify which output columns to print. Use —help to get a list of all supported
columns.
The default list of columns may be extended if list is specified in the format +list
(e.g. lsblk -o +UUID).
-P, —pairs
Produce output in the form of key=“value” pairs. All potentially unsafe characters are
hex-escaped (\x<code>).
-p, —paths
Print full device paths.
-r, —raw
Produce output in raw format. All potentially unsafe characters are hex-escaped
(\x<code>) in the NAME, KNAME, LABEL, PARTLABEL and MOUNTPOINT
columns.
-S, —scsi
Output info about SCSI devices only. All partitions, slaves and holder devices are
ignored.
-s, —inverse
Print dependencies in inverse order.
-t, —topology
Output info about block-device topology. This option is equivalent to -
o NAME,ALIGNMENT,MIN-IO,OPT-IO,PHY-SEC,LOG-
SEC,ROTA,SCHED,RQ-SIZE,WSAME.
-V, —version
Output version information and exit.
› NOTES
For partitions, some information (e.g. queue attributes) is inherited from the parent
device.
The lsblk command needs to be able to look up each block device by major:minor
numbers, which is done by using /sys/dev/block. This sysfs block directory appeared
in kernel 2.6.27 (October 2008). In case of problems with a new enough kernel,
check that CONFIG_SYSFS was enabled at the time of the kernel build.
› AUTHORS
Milan Broz <[email protected]> Karel Zak <[email protected]>
› ENVIRONMENT
Setting LIBMOUNT_DEBUG=0xffff enables debug output.
› SEE ALSO
findmnt(8), blkid(8), ls(1)
› AVAILABILITY
The lsblk command is part of the util-linux package and is available from
ftp://ftp.kernel.org/pub/linux/utils/util-linux/.
LSLOCKS
› NAME
lslocks - list local system locks
› SYNOPSIS
lslocks [options]
› DESCRIPTION
lslocks lists information about all the currently held file locks in a Linux system.
› OPTIONS
-h, —help
Print a help text and exit.
-n, —noheadings
Do not print a header line.
-o, —output list
Specify which output columns to print. Use —help to get a list of all supported
columns.
The default list of columns may be extended if list is specified in the format +list
(e.g. lslocks -o +BLOCKER).
-p, —pid pid
Display only the locks held by the process with this pid.
-r, —raw
Use the raw output format.
-u, —notruncate
Do not truncate text in columns.
› OUTPUT
COMMAND
The command name of the process holding the lock.
PID
The process ID of the process which holds the lock.
TYPE
The type of lock; can be FLOCK (created with flock(2)) or POSIX (created with
fcntl(2) and lockf(3)).
SIZE
Size of the locked file.
MODE
The lock’s access permissions (read, write). If the process is blocked and waiting for
the lock, then the mode is postfixed with an ‘*’ (asterisk).
M
Whether the lock is mandatory; 0 means no (meaning the lock is only advisory), 1
means yes. (See fcntl(2)).
START
Relative byte offset of the lock.
END
Ending offset of the lock.
PATH
Full path of the lock. If none is found, or there are no permissions to read the path, it
will fall back to the device’s mountpoint. The path might be truncated; use —
notruncate to get the full path.
BLOCKER
The PID of the process which blocks the lock.
› NOTES
The lslocks command is meant to replace the lslk(8) command, originally
written by Victor A. Abell <[email protected]> and unmaintained since
2001.
› AUTHORS
Davidlohr Bueso <[email protected]>
› SEE ALSO
flock(1), fcntl(2), lockf(2)
› AVAILABILITY
The lslocks command is part of the util-linux package and is available from
ftp://ftp.kernel.org/pub/linux/utils/util-linux/.
LSMOD
› NAME
lsmod - Show the status of modules in the Linux Kernel
› SYNOPSIS
lsmod
› DESCRIPTION
lsmod
is a trivial program which nicely formats the contents of the /proc/modules, showing
what kernel modules are currently loaded.
› COPYRIGHT
This manual page originally Copyright 2002, Rusty Russell, IBM Corporation.
Maintained by Jon Masters and others.
› SEE ALSO
insmod(8), modprobe(8), modinfo(8)
› AUTHORS
Jon Masters <[email protected]>
Developer
(See the DISTRIBUTION section of this manual page for information on how to
obtain the latest lsof revision.)
An open file may be a regular file, a directory, a block special file, a character special
file, an executing text reference, a library, a stream or a network file (Internet socket,
NFS file or UNIX domain socket.) A specific file or all the files in a file system may
be selected by path.
Instead of a formatted display, lsof will produce output that can be parsed by other
programs. See the -F, option description, and the OUTPUT FOR OTHER
PROGRAMS section for more information.
In addition to producing a single output list, lsof will run in repeat mode. In repeat
mode it will produce output, delay, then repeat the output operation until stopped
with an interrupt or quit signal. See the +|-r [t[m<fmt>]] option description for more
information.
› OPTIONS
In the absence of any options, lsof lists all open files belonging to all active
processes.
If any list request option is specified, other list requests must be specifically
requested - e.g., if -U is specified for the listing of UNIX socket files, NFS files
won’t be listed unless -N is also specified; or if a user list is specified with the -u
option, UNIX domain socket files, belonging to users not in the list, won’t be listed
unless the -U option is also specified.
Normally list options that are specifically stated are ORed - i.e., specifying the -i
option without an address and the -ufoo option produces a listing of all network files
OR files belonging to processes owned by user “foo”. The exceptions are:
1)
the `^’ (negated) login name or user ID (UID), specified with the -u option;
2)
the `^’ (negated) process ID (PID), specified with the -p option;
3)
the `^’ (negated) process group ID (PGID), specified with the -g option;
4)
the `^’ (negated) command, specified with the -c option;
5)
the (`^’) negated TCP or UDP protocol state names, specified with the -s [p:s] option.
Since they represent exclusions, they are applied without ORing or ANDing and take
effect before any other selection criteria are applied.
The -a option may be used to AND the selections. For example, specifying -a, -U, and -
ufoo produces a listing of only UNIX socket files that belong to processes owned by user
“foo”.
Caution: the -a option causes all list selection options to be ANDed; it can’t be used to
cause ANDing of selected pairs of selection options by placing it between them, even
though its placement there is acceptable. Wherever -a is placed, it causes the ANDing of
all selection options.
Items of the same selection set - command names, file descriptors, network addresses,
process identifiers, user identifiers, zone names, security contexts - are joined in a single
ORed set and applied before the result participates in ANDing. Thus, for example,
specifying [email protected], [email protected], -a, and -ufff,ggg will select the listing of files that
belong to either login “fff” OR “ggg” AND have network connections to either host
aaa.bbb OR ccc.ddd.
Options may be grouped together following a single prefix — e.g., the option set “-a -b -
C” may be stated as -abC. However, since values are optional following +|-f, -F, -g, -i, +|-
L, -o, +|-r, -s, -S, -T, -x and -z. when you have no values for them be careful that the
following character isn’t ambiguous. For example, -Fn might represent the -F and -n
options, or it might represent the n field identifier character following the -F option. When
ambiguity is possible, start a new option with a `-‘ character - e.g., “-F -n”. If the next
option is a file name, follow the possibly ambiguous option with “—” - e.g., “-F —
name”.
Either the `+’ or the `-‘ prefix may be applied to a group of options. Options that don’t
take on separate meanings for each prefix - e.g., -i - may be grouped under either prefix.
Thus, for example, “+M -i” may be stated as “+Mi” and the group means the same as the
separate options. Be careful of prefix grouping when one or more options in the group
does take on separate meanings under different prefixes - e.g., +|-M; “-iM” is not the same
request as “-i +M”. When in doubt, use separate options with appropriate prefixes.
-? -h
These two equivalent options select a usage (help) output list. Lsof displays a
shortened form of this output when it detects an error in the options supplied to it,
after it has displayed messages explaining each error. (Escape the `?’ character as
your shell requires.)
-a
causes list selection options to be ANDed, as described above.
-A A
is available on systems configured for AFS whose AFS kernel code is implemented
via dynamic modules. It allows the lsof user to specify A as an alternate name list file
where the kernel addresses of the dynamic modules might be found. See the lsof FAQ
(The FAQ section gives its location.) for more information about dynamic modules,
their symbols, and how they affect lsof.
-b
causes lsof to avoid kernel functions that might block - lstat(2), readlink(2), and
stat(2).
See the BLOCKS AND TIMEOUTS and AVOIDING KERNEL BLOCKS
sections for information on using this option.
-c c
selects the listing of files for processes executing the command that begins with the
characters of c. Multiple commands may be specified, using multiple -c options.
They are joined in a single ORed set before participating in AND option selection.
If c begins with a `^’, then the following characters specify a command name whose
processes are to be ignored (excluded.)
If c begins and ends with a slash (‘/’), the characters between the slashes are
interpreted as a regular expression. Shell meta-characters in the regular expression
must be quoted to prevent their interpretation by the shell. The closing slash may be
followed by these modifiers:
b the regular expression is a basic one. i ignore the case of letters.
x the regular expression is an extended one (default).
See the lsof FAQ (The FAQ section gives its location.) for more information on basic
and extended regular expressions.
The simple command specification is tested first. If that test fails, the command
regular expression is applied. If the simple command test succeeds, the command
regular expression test isn’t made. This may result in “no command found for regex:”
messages when lsof’s -V option is specified.
+c w
defines the maximum number of initial characters of the name, supplied by the UNIX
dialect, of the UNIX command associated with a process to be printed in the
COMMAND column. (The lsof default is nine.)
Note that many UNIX dialects do not supply all command name characters to lsof in
the files and structures from which lsof obtains command name. Often dialects limit
the number of characters supplied in those sources. For example, Linux 2.4.27 and
Solaris 9 both limit command name length to 16 characters.
If w is zero (‘0’), all command characters supplied to lsof by the UNIX dialect will be
printed.
If w is less than the length of the column title, “COMMAND”, it will be raised to that
length.
-C
disables the reporting of any path name components from the kernel’s name cache.
See the KERNEL NAME CACHE section for more information.
+d s
causes lsof to search for all open instances of directory s and the files and directories
it contains at its top level. +d does NOT descend the directory tree, rooted at s. The
+D D option may be used to request a full-descent directory tree search, rooted at
directory D.
Processing of the +d option does not follow symbolic links within s unless the -x or -
x l option is also specified. Nor does it search for open files on file system mount
points on subdirectories of s unless the -x or -x f option is also specified.
Note: the authority of the user of this option limits it to searching for files that the
user has permission to examine with the system stat(2) function.
-d s
specifies a list of file descriptors (FDs) to exclude from or include in the output
listing. The file descriptors are specified in the comma-separated set s - e.g.,
“cwd,1,3”, “^6,^2”. (There should be no spaces in the set.)
The list is an exclusion list if all entries of the set begin with `^’. It is an inclusion list
if no entry begins with `^’. Mixed lists are not permitted.
A file descriptor number range may be in the set as long as neither member is empty,
both members are numbers, and the ending member is larger than the starting one -
e.g., “0-7” or “3-10”. Ranges may be specified for exclusion if they have the `^’
prefix - e.g., “^0-7” excludes all file descriptors 0 through 7.
Multiple file descriptor numbers are joined in a single ORed set before participating
in AND option selection.
When there are exclusion and inclusion members in the set, lsof reports them as
errors and exits with a non-zero return code.
See the description of File Descriptor (FD) output values in the OUTPUT section for
more information on file descriptor names.
+D D
causes lsof to search for all open instances of directory D and all the files and
directories it contains to its complete depth.
Processing of the +D option does not follow symbolic links within D unless the -x or
-x l option is also specified. Nor does it search for open files on file system mount
points on subdirectories of D unless the -x or -x f option is also specified.
Note: the authority of the user of this option limits it to searching for files that the
user has permission to examine with the system stat(2) function.
Further note: lsof may process this option slowly and require a large amount of
dynamic memory to do it. This is because it must descend the entire directory tree,
rooted at D, calling stat(2) for each file and directory, building a list of all the files it
finds, and searching that list for a match with every open file. When directory D is
large, these steps can take a long time, so use this option prudently.
-D D
directs lsof’s use of the device cache file. The use of this option is sometimes
restricted. See the DEVICE CACHE FILE section and the sections that follow it for
more information on this option.
-D must be followed by a function letter; the function letter may optionally be
followed by a path name. Lsof recognizes these function letters:
? - report device cache file paths b - build the device cache file i -
ignore the device cache file r - read the device cache file u - read
and update the device cache file
The b, r, and u functions, accompanied by a path name, are sometimes restricted.
When these functions are restricted, they will not appear in the description of the -D
option that accompanies -h or -? option output. See the DEVICE CACHE FILE
section and the sections that follow it for more information on these functions and
when they’re restricted.
The ? function reports the read-only and write paths that lsof can use for the device
cache file, the names of any environment variables whose values lsof will examine
when forming the device cache file path, and the format for the personal device cache
file path. (Escape the `?’ character as your shell requires.)
When available, the b, r, and u functions may be followed by the device cache file’s
path. The standard default is .lsof_hostname in the home directory of the real user ID
that executes lsof, but this could have been changed when lsof was configured and
compiled. (The output of the -h and -? options show the current default prefix - e.g.,
“.lsof”.) The suffix, hostname, is the first component of the host’s name returned by
gethostname(2).
When available, the b function directs lsof to build a new device cache file at the
default or specified path.
The i function directs lsof to ignore the default device cache file and obtain its
information about devices via direct calls to the kernel.
The r function directs lsof to read the device cache at the default or specified path,
but prevents it from creating a new device cache file when none exists or the existing
one is improperly structured. The r function, when specified without a path name,
prevents lsof from updating an incorrect or outdated device cache file, or creating a
new one in its place. The r function is always available when it is specified without a
path name argument; it may be restricted by the permissions of the lsof process.
When available, the u function directs lsof to read the device cache file at the default
or specified path, if possible, and to rebuild it, if necessary. This is the default device
cache file function when no -D option has been specified.
+|-e s
exempts the file system whose path name is s from being subjected to kernel function
calls that might block. The +e option exempts stat(2), lstat(2) and most readlink(2)
kernel function calls. The -e option exempts only stat(2) and lstat(2) kernel function
calls. Multiple file systems may be specified with separate +|-e specifications and
each may have readlink(2) calls exempted or not.
This option is currently implemented only for Linux.
CAUTION: this option can easily be mis-applied to other than the file system of
interest, because it uses path name rather than the more reliable device and inode
numbers. (Device and inode numbers are acquired via the potentially blocking stat(2)
kernel call and are thus not available, but see the +|-m m option as a possible
alternative way to supply device numbers.) Use this option with great care and
fully specify the path name of the file system to be exempted.
When open files on exempted file systems are reported, it may not be possible to
obtain all their information. Therefore, some information columns will be blank, the
characters “UNKN” preface the values in the TYPE column, and the applicable
exemption option is added in parentheses to the end of the NAME column. (Some
device number information might be made available via the +|-m m option.)
+|-f [cfgGn]
f by itself clarifies how path name arguments are to be interpreted. When followed by
c, f, g, G, or n in any combination it specifies that the listing of kernel file structure
information is to be enabled (`+’) or inhibited (`-‘).
Normally a path name argument is taken to be a file system name if it matches a
mounted-on directory name reported by mount(8), or if it represents a block device,
named in the mount output and associated with a mounted directory name. When +f
is specified, all path name arguments will be taken to be file system names, and lsof
will complain if any are not. This can be useful, for example, when the file system
name (mounted-on device) isn’t a block device. This happens for some CD-ROM file
systems.
When -f is specified by itself, all path name arguments will be taken to be simple
files. Thus, for example, the “-f — /” arguments direct lsof to search for open files
with a `/’ path name, not all open files in the `/’ (root) file system.
Be careful to make sure +f and -f are properly terminated and aren’t followed by a
character (e.g., of the file or file system name) that might be taken as a parameter. For
example, use “—” after +f and -f as in these examples.
$ lsof +f—/file/system/name $ lsof -f—/file/name
The listing of information from kernel file structures, requested with the +f [cfgGn]
option form, is normally inhibited, and is not available in whole or part for some
dialects - e.g., /proc-based Linux kernels below 2.6.22. When the prefix to f is a plus
sign (`+’), these characters request file structure information:
c file structure use count (not Linux) f file structure address (not
Linux) g file flag abbreviations (Linux 2.6.22 and up) G file flags in
hexadecimal (Linux 2.6.22 and up) n file structure node address (not
Linux)
When the prefix is minus (`-‘) the same characters disable the listing of the indicated
values.
File structure addresses, use counts, flags, and node addresses may be used to detect
more readily identical files inherited by child processes and identical files in use by
different processes. Lsof column output can be sorted by output columns holding the
values and listed to identify identical file use, or lsof field output can be parsed by an
AWK or Perl post-filter script, or by a C program.
-F f
specifies a character list, f, that selects the fields to be output for processing by
another program, and the character that terminates each output field. Each field to be
output is specified with a single character in f. The field terminator defaults to NL,
but may be changed to NUL (000). See the OUTPUT FOR OTHER PROGRAMS
section for a description of the field identification characters and the field output
process.
When the field selection character list is empty, all standard fields are selected
(except the raw device field, security context and zone field for compatibility
reasons) and the NL field terminator is used.
When the field selection character list contains only a zero (`0’), all fields are
selected (except the raw device field for compatibility reasons) and the NUL
terminator character is used.
Other combinations of fields and their associated field terminator character must be
set with explicit entries in f, as described in the OUTPUT FOR OTHER
PROGRAMS section.
When a field selection character identifies an item lsof does not normally list - e.g.,
PPID, selected with -R - specification of the field character - e.g., “-FR” - also
selects the listing of the item.
When the field selection character list contains the single character `?’, lsof will
display a help list of the field identification characters. (Escape the `?’ character as
your shell requires.)
-g [s]
excludes or selects the listing of files for the processes whose optional process group
IDentification (PGID) numbers are in the comma-separated set s - e.g., “123” or
“123,^456”. (There should be no spaces in the set.)
PGID numbers that begin with `^’ (negation) represent exclusions.
Multiple PGID numbers are joined in a single ORed set before participating in AND
option selection. However, PGID exclusions are applied without ORing or ANDing
and take effect before other selection criteria are applied.
The -g option also enables the output display of PGID numbers. When specified
without a PGID set that’s all it does.
-i [i]
selects the listing of files any of whose Internet address matches the address specified
in i. If no address is specified, this option selects the listing of all Internet and x.25
(HP-UX) network files.
If -i4 or -i6 is specified with no following address, only files of the indicated IP
version, IPv4 or IPv6, are displayed. (An IPv6 specification may be used only if the
dialects supports IPv6, as indicated by “[46]” and “IPv[46]” in lsof’s -h or -? output.)
Sequentially specifying -i4, followed by -i6 is the same as specifying -i, and vice-
versa. Specifying -i4, or -i6 after -i is the same as specifying -i4 or -i6 by itself.
Multiple addresses (up to a limit of 100) may be specified with multiple -i options. (A
port number or service name range is counted as one address.) They are joined in a
single ORed set before participating in AND option selection.
An Internet address is specified in the form (Items in square brackets are optional.):
[46][protocol][@hostname|hostaddr][:service|port]
where: 46 specifies the IP version, IPv4 or IPv6 that applies to the
following address. '6' may be be specified only if the UNIX dialect
supports IPv6. If neither '4' nor '6' is specified, the following
address applies to all IP versions. protocol is a protocol name - TCP,
UDP hostname is an Internet host name. Unless a specific IP version is
specified, open network files associated with host names of all
versions will be selected. hostaddr is a numeric Internet IPv4 address
in dot form; or an IPv6 numeric address in colon form, enclosed in
brackets, if the UNIX dialect supports IPv6. When an IP version is
selected, only its numeric addresses may be specified. service is an
/etc/services name - e.g., smtp - or a list of them. port is a port
number, or a list of them.
IPv6 options may be used only if the UNIX dialect supports IPv6. To see if the
dialect supports IPv6, run lsof and specify the -h or -? (help) option. If the displayed
description of the -i option contains “[46]” and “IPv[46]”, IPv6 is supported.
IPv4 host names and addresses may not be specified if network file selection is
limited to IPv6 with -i 6. IPv6 host names and addresses may not be specified if
network file selection is limited to IPv4 with -i 4. When an open IPv4 network file’s
address is mapped in an IPv6 address, the open file’s type will be IPv6, not IPv4, and
its display will be selected by ‘6’, not ‘4’.
At least one address component - 4, 6, protocol, hostname, hostaddr, or service -
must be supplied. The `@’ character, leading the host specification, is always
required; as is the `:’, leading the port specification. Specify either hostname or
hostaddr. Specify either service name list or port number list. If a service name list is
specified, the protocol may also need to be specified if the TCP, UDP and UDPLITE
port numbers for the service name are different. Use any case - lower or upper - for
protocol.
Service names and port numbers may be combined in a list whose entries are
separated by commas and whose numeric range entries are separated by minus signs.
There may be no embedded spaces, and all service names must belong to the
specified protocol. Since service names may contain embedded minus signs, the
starting entry of a range can’t be a service name; it can be a port number, however.
Here are some sample addresses: -i6 - IPv6 only TCP:25 - TCP and port 25
@1.2.3.4 - Internet IPv4 host address 1.2.3.4 @[3ffe:1ebc::1]:1234 -
Internet IPv6 host address 3ffe:1ebc::1, port 1234 UDP:who - UDP who
service port [email protected]:513 - TCP, port 513 and host name lsof.itap
tcp@foo:1-10,smtp,99 - TCP, ports 1 through 10, service name smtp, port
99, host name foo tcp@bar:1-smtp - TCP, ports 1 through smtp, host bar
:time - either TCP, UDP or UDPLITE time service port
-K
selects the listing of tasks (threads) of processes, on dialects where task (thread)
reporting is supported. (If help output - i.e., the output of the -h or -? options - shows
this option, then task (thread) reporting is supported by the dialect.)
When -K and -a are both specified on Linux, and the tasks of a main process are
selected by other options, the main process will also be listed as though it were a
task, but without a task ID. (See the description of the TID column in the OUTPUT
section.)
Where the FreeBSD version supports threads, all threads will be listed with their IDs.
In general threads and tasks inherit the files of the caller, but may close some and
open others, so lsof always reports all the open files of threads and tasks.
-k k
specifies a kernel name list file, k, in place of /vmunix, /mach, etc. -k is not available
under AIX on the IBM RISC/System 6000.
-l
inhibits the conversion of user ID numbers to login names. It is also useful when
login name lookup is working improperly or slowly.
+|-L [l]
enables (`+’) or disables (`-‘) the listing of file link counts, where they are available -
e.g., they aren’t available for sockets, or most FIFOs and pipes.
When +L is specified without a following number, all link counts will be listed.
When -L is specified (the default), no link counts will be listed.
When +L is followed by a number, only files having a link count less than that
number will be listed. (No number may follow -L.) A specification of the form
“+L1” will select open files that have been unlinked. A specification of the form
“+aL1 <file_system>” will select unlinked open files on the specified file system.
For other link count comparisons, use field output (-F) and a post-processing script or
program.
+|-m m
specifies an alternate kernel memory file or activates mount table supplement
processing.
The option form -m m specifies a kernel memory file, m, in place of /dev/kmem or
/dev/mem - e.g., a crash dump file.
The option form +m requests that a mount supplement file be written to the standard
output file. All other options are silently ignored.
There will be a line in the mount supplement file for each mounted file system,
containing the mounted file system directory, followed by a single space, followed by
the device number in hexadecimal “0x” format - e.g.,
/ 0x801
Lsof can use the mount supplement file to get device numbers for file systems when it
can’t get them via stat(2) or lstat(2).
The option form +m m identifies m as a mount supplement file.
Note: the +m and +m m options are not available for all supported dialects. Check
the output of lsof’s -h or -? options to see if the +m and +m m options are available.
+|-M
Enables (+) or disables (-) the reporting of portmapper registrations for local TCP,
UDP and UDPLITE ports, where port mapping is supported. (See the last paragraph
of this option description for information about where portmapper registration
reporting is suported.)
The default reporting mode is set by the lsof builder with the HASPMAPENABLED
#define in the dialect’s machine.h header file; lsof is distributed with the
HASPMAPENABLED #define deactivated, so portmapper reporting is disabled by
default and must be requested with +M. Specifying lsof’s -h or -? option will report
the default mode. Disabling portmapper registration when it is already disabled or
enabling it when already enabled is acceptable. When portmapper registration
reporting is enabled, lsof displays the portmapper registration (if any) for local TCP,
UDP or UDPLITE ports in square brackets immediately following the port numbers
or service names - e.g., “:1234[name]” or “:name[100083]”. The registration
information may be a name or number, depending on what the registering program
supplied to the portmapper when it registered the port.
When portmapper registration reporting is enabled, lsof may run a little more slowly
or even become blocked when access to the portmapper becomes congested or
stopped. Reverse the reporting mode to determine if portmapper registration
reporting is slowing or blocking lsof.
For purposes of portmapper registration reporting lsof considers a TCP, UDP or
UDPLITE port local if: it is found in the local part of its containing kernel structure;
or if it is located in the foreign part of its containing kernel structure and the local and
foreign Internet addresses are the same; or if it is located in the foreign part of its
containing kernel structure and the foreign Internet address is
INADDR_LOOPBACK (127.0.0.1). This rule may make lsof ignore some foreign
ports on machines with multiple interfaces when the foreign Internet address is on a
different interface from the local one.
See the lsof FAQ (The FAQ section gives its location.) for further discussion of
portmapper registration reporting issues.
Portmapper registration reporting is supported only on dialects that have RPC header
files. (Some Linux distributions with GlibC 2.14 do not have them.) When
portmapper registration reporting is supported, the -h or -? help output will show the
+|-M option.
-n
inhibits the conversion of network numbers to host names for network files.
Inhibiting conversion may make lsof run faster. It is also useful when host name
lookup is not working properly.
-N
selects the listing of NFS files.
-o
directs lsof to display file offset at all times. It causes the SIZE/OFF output column
title to be changed to OFFSET. Note: on some UNIX dialects lsof can’t obtain
accurate or consistent file offset information from its kernel data sources, sometimes
just for particular kinds of files (e.g., socket files.) Consult the lsof FAQ (The FAQ
section gives its location.) for more information.
The -o and -s options are mutually exclusive; they can’t both be specified. When
neither is specified, lsof displays whatever value - size or offset - is appropriate and
available for the type of the file.
-o o
defines the number of decimal digits (o) to be printed after the “0t” for a file offset
before the form is switched to “0x…”. An o value of zero (unlimited) directs lsof to
use the “0t” form for all offset output.
This option does NOT direct lsof to display offset at all times; specify -o (without a
trailing number) to do that. -o o only specifies the number of digits after “0t” in either
mixed size and offset or offset-only output. Thus, for example, to direct lsof to
display offset at all times with a decimal digit count of 10, use:
-o -o 10 or -oo10
The default number of digits allowed after “0t” is normally 8, but may have been
changed by the lsof builder. Consult the description of the -o o option in the output of
the -h or -? option to determine the default that is in effect.
-O
directs lsof to bypass the strategy it uses to avoid being blocked by some kernel
operations - i.e., doing them in forked child processes. See the BLOCKS AND
TIMEOUTS and AVOIDING KERNEL BLOCKS sections for more information
on kernel operations that may block lsof.
While use of this option will reduce lsof startup overhead, it may also cause lsof to
hang when the kernel doesn’t respond to a function. Use this option cautiously.
-p s
excludes or selects the listing of files for the processes whose optional process
IDentification (PID) numbers are in the comma-separated set s - e.g., “123” or
“123,^456”. (There should be no spaces in the set.)
PID numbers that begin with `^’ (negation) represent exclusions.
Multiple process ID numbers are joined in a single ORed set before participating in
AND option selection. However, PID exclusions are applied without ORing or
ANDing and take effect before other selection criteria are applied.
-P
inhibits the conversion of port numbers to port names for network files. Inhibiting the
conversion may make lsof run a little faster. It is also useful when port name lookup
is not working properly.
+|-r [t[m<fmt>]]
puts lsof in repeat mode. There lsof lists open files as selected by other options,
delays t seconds (default fifteen), then repeats the listing, delaying and listing
repetitively until stopped by a condition defined by the prefix to the option.
If the prefix is a `-‘, repeat mode is endless. Lsof must be terminated with an interrupt
or quit signal.
If the prefix is `+’, repeat mode will end the first cycle no open files are listed - and
of course when lsof is stopped with an interrupt or quit signal. When repeat mode
ends because no files are listed, the process exit code will be zero if any open files
were ever listed; one, if none were ever listed.
Lsof marks the end of each listing: if field output is in progress (the -F, option has
been specified), the default marker is `m’; otherwise the default marker is
“========”. The marker is followed by a NL character.
The optional “m<fmt>” argument specifies a format for the marker line. The <fmt>
characters following `m’ are interpreted as a format specification to the strftime(3)
function, when both it and the localtime(3) function are available in the dialect’s C
library. Consult the strftime(3) documentation for what may appear in its format
specification. Note that when field output is requested with the -F option, <fmt>
cannot contain the NL format, “%n”. Note also that when <fmt> contains spaces or
other characters that affect the shell’s interpretation of arguments, <fmt> must be
quoted appropriately.
Repeat mode reduces lsof startup overhead, so it is more efficient to use this mode
than to call lsof repetitively from a shell script, for example.
To use repeat mode most efficiently, accompany +|-r with specification of other lsof
selection options, so the amount of kernel memory access lsof does will be kept to a
minimum. Options that filter at the process level - e.g., -c, -g, -p, -u - are the most
efficient selectors.
Repeat mode is useful when coupled with field output (see the -F, option description)
and a supervising awk or Perl script, or a C program.
-R
directs lsof to list the Parent Process IDentification number in the PPID column.
-s [p:s]
s alone directs lsof to display file size at all times. It causes the SIZE/OFF output
column title to be changed to SIZE. If the file does not have a size, nothing is
displayed.
The optional -s p:s form is available only for selected dialects, and only when the -h
or -? help output lists it.
When the optional form is available, the s may be followed by a protocol name (p),
either TCP or UDP, a colon (`:’) and a comma-separated protocol state name list, the
option causes open TCP and UDP files to be excluded if their state name(s) are in the
list (s) preceded by a `^’; or included if their name(s) are not preceded by a `^’.
When an inclusion list is defined, only network files with state names in the list will
be present in the lsof output. Thus, specifying one state name means that only
network files with that lone state name will be listed.
Case is unimportant in the protocol or state names, but there may be no spaces and
the colon (`:’) separating the protocol name (p) and the state name list (s) is required.
If only TCP and UDP files are to be listed, as controlled by the specified exclusions
and inclusions, the -i option must be specified, too. If only a single protocol’s files
are to be listed, add its name as an argument to the -i option.
For example, to list only network files with TCP state LISTEN, use:
-iTCP -sTCP:LISTEN
Or, for example, to list network files with all UDP states except Idle, use:
-iUDP -sUDP:Idle
State names vary with UNIX dialects, so it’s not possible to provide a complete list.
Some common TCP state names are: CLOSED, IDLE, BOUND, LISTEN,
ESTABLISHED, SYN_SENT, SYN_RCDV, ESTABLISHED, CLOSE_WAIT,
FIN_WAIT1, CLOSING, LAST_ACK, FIN_WAIT_2, and TIME_WAIT. Two
common UDP state names are Unbound and Idle.
See the lsof FAQ (The FAQ section gives its location.) for more information on how
to use protocol state exclusion and inclusion, including examples.
The -o (without a following decimal digit count) and -s option (without a following
protocol and state name list) are mutually exclusive; they can’t both be specified.
When neither is specified, lsof displays whatever value - size or offset - is appropriate
and available for the type of file.
Since some types of files don’t have true sizes - sockets, FIFOs, pipes, etc. - lsof
displays for their sizes the content amounts in their associated kernel buffers, if
possible.
-S [t]
specifies an optional time-out seconds value for kernel functions - lstat(2),
readlink(2), and stat(2) - that might otherwise deadlock. The minimum for t is two;
the default, fifteen; when no value is specified, the default is used.
See the BLOCKS AND TIMEOUTS section for more information.
-T [t]
controls the reporting of some TCP/TPI information, also reported by netstat(1),
following the network addresses. In normal output the information appears in
parentheses, each item except TCP or TPI state name identified by a keyword,
followed by `=’, separated from others by a single space:
<TCP or TPI state name> QR=<read queue length> QS=<send queue length>
SO=<socket options and values> SS=<socket states> TF=<TCP flags and
values> WR=<window read length> WW=<window write length>
Not all values are reported for all UNIX dialects. Items values (when available) are
reported after the item name and ‘=’.
When the field output mode is in effect (See OUTPUT FOR OTHER
PROGRAMS.) each item appears as a field with a `T’ leading character.
-T with no following key characters disables TCP/TPI information reporting.
-T with following characters selects the reporting of specific TCP/TPI information:
f selects reporting of socket options, states and values, and TCP flags
and values. q selects queue length reporting. s selects connection
state reporting. w selects window size reporting.
Not all selections are enabled for some UNIX dialects. State may be selected for all
dialects and is reported by default. The -h or -? help output for the -T option will
show what selections may be used with the UNIX dialect.
When -T is used to select information - i.e., it is followed by one or more selection
characters - the displaying of state is disabled by default, and it must be explicitly
selected again in the characters following -T. (In effect, then, the default is equivalent
to -Ts.) For example, if queue lengths and state are desired, use -Tqs.
Socket options, socket states, some socket values, TCP flags and one TCP value may
be reported (when available in the UNIX dialect) in the form of the names that
commonly appear after SO_, so_, SS_, TCP_ and TF_ in the dialect’s header files -
most often <sys/socket.h>, <sys/socketvar.h> and <netinet/tcp_var.h>. Consult those
header files for the meaning of the flags, options, states and values.
“SO=” precedes socket options and values; “SS=”, socket states; and “TF=”, TCP
flags and values.
If a flag or option has a value, the value will follow an ‘=’ and the name — e.g.,
“SO=LINGER=5”, “SO=QLIM=5”, “TF=MSS=512”. The following seven values
may be reported:
Name Reported Description (Common Symbol) KEEPALIVE keep alive time
(SO_KEEPALIVE) LINGER linger time (SO_LINGER) MSS maximum segment size
(TCP_MAXSEG) PQLEN partial listen queue connections QLEN established
listen queue connections QLIM established listen queue limit RCVBUF
receive buffer length (SO_RCVBUF) SNDBUF send buffer length (SO_SNDBUF)
Details on what socket options and values, socket states, and TCP flags and values
may be displayed for particular UNIX dialects may be found in the answer to the
“Why doesn’t lsof report socket options, socket states, and TCP flags and values for
my dialect?” and “Why doesn’t lsof report the partial listen queue connection count
for my dialect?” questions in the lsof FAQ (The FAQ section gives its location.)
-t
specifies that lsof should produce terse output with process identifiers only and no
header - e.g., so that the output may be piped to kill(1). -t selects the -w option.
-u s
selects the listing of files for the user whose login names or user ID numbers are in
the comma-separated set s - e.g., “abe”, or “548,root”. (There should be no spaces in
the set.)
Multiple login names or user ID numbers are joined in a single ORed set before
participating in AND option selection.
If a login name or user ID is preceded by a `^’, it becomes a negation - i.e., files of
processes owned by the login name or user ID will never be listed. A negated login
name or user ID selection is neither ANDed nor ORed with other selections; it is
applied before all other selections and absolutely excludes the listing of the files of
the process. For example, to direct lsof to exclude the listing of files belonging to root
processes, specify “-u^root” or “-u^0”.
-U
selects the listing of UNIX domain socket files.
-v
selects the listing of lsof version information, including: revision number; when the
lsof binary was constructed; who constructed the binary and where; the name of the
compiler used to construct the lsof binary; the version number of the compiler when
readily available; the compiler and loader flags used to construct the lsof binary; and
system information, typically the output of uname‘s -a option.
-V
directs lsof to indicate the items it was asked to list and failed to find - command
names, file names, Internet addresses or files, login names, NFS files, PIDs, PGIDs,
and UIDs.
When other options are ANDed to search options, or compile-time options restrict the
listing of some files, lsof may not report that it failed to find a search item when an
ANDed option or compile-time option prevents the listing of the open file containing
the located search item.
For example, “lsof -V -iTCP@foobar -a -d 999” may not report a failure to locate
open files at “TCP@foobar” and may not list any, if none have a file descriptor
number of 999. A similar situation arises when HASSECURITY and
HASNOSOCKSECURITY are defined at compile time and they prevent the listing
of open files.
+|-w
Enables (+) or disables (-) the suppression of warning messages.
The lsof builder may choose to have warning messages disabled or enabled by
default. The default warning message state is indicated in the output of the -h or -?
option. Disabling warning messages when they are already disabled or enabling them
when already enabled is acceptable.
The -t option selects the -w option.
-x [fl]
may accompany the +d and +D options to direct their processing to cross over
symbolic links and|or file system mount points encountered when scanning the
directory (+d) or directory tree (+D).
If -x is specified by itself without a following parameter, cross-over processing of
both symbolic links and file system mount points is enabled. Note that when -x is
specified without a parameter, the next argument must begin with ‘-‘ or ‘+’.
The optional ‘f’ parameter enables file system mount point cross-over processing; ‘l’,
symbolic link cross-over processing.
The -x option may not be supplied without also supplying a +d or +D option.
-X
This is a dialect-specific option.
AIX:
This IBM AIX RISC/System 6000 option requests the reporting of executed text file
and shared library references.
WARNING: because this option uses the kernel readx() function, its use on a busy
AIX system might cause an application process to hang so completely that it can
neither be killed nor stopped. I have never seen this happen or had a report of its
happening, but I think there is a remote possibility it could happen.
By default use of readx() is disabled. On AIX 5L and above lsof may need setuid-root
permission to perform the actions this option requests.
The lsof builder may specify that the -X option be restricted to processes whose real
UID is root. If that has been done, the -X option will not appear in the -h or -? help
output unless the real UID of the lsof process is root. The default lsof distribution
allows any UID to specify -X, so by default it will appear in the help output.
When AIX readx() use is disabled, lsof may not be able to report information for all
text and loader file references, but it may also avoid exacerbating an AIX kernel
directory search kernel error, known as the Stale Segment ID bug.
The readx() function, used by lsof or any other program to access some sections of
kernel virtual memory, can trigger the Stale Segment ID bug. It can cause the kernel’s
dir_search() function to believe erroneously that part of an in-memory copy of a file
system directory has been zeroed. Another application process, distinct from lsof,
asking the kernel to search the directory - e.g., by using open(2) - can cause
dir_search() to loop forever, thus hanging the application process.
Consult the lsof FAQ (The FAQ section gives its location.) and the 00README file
of the lsof distribution for a more complete description of the Stale Segment ID bug,
its APAR, and methods for defining readx() use when compiling lsof.
Linux:
This Linux option requests that lsof skip the reporting of information on all open
TCP, UDP and UDPLITE IPv4 and IPv6 files.
This Linux option is most useful when the system has an extremely large number of
open TCP, UDP and UDPLITE files, the processing of whose information in the
/proc/net/tcp* and /proc/net/udp* files would take lsof a long time, and whose
reporting is not of interest.
Use this option with care and only when you are sure that the information you want
lsof to display isn’t associated with open TCP, UDP or UDPLITE socket files.
Solaris 10 and above:
This Solaris 10 and above option requests the reporting of cached paths for files that
have been deleted - i.e., removed with rm(1) or unlink(2).
The cached path is followed by the string “ (deleted)” to indicate that the path by
which the file was opened has been deleted.
Because intervening changes made to the path - i.e., renames with mv(1) or
rename(2) - are not recorded in the cached path, what lsof reports is only the path by
which the file was opened, not its possibly different final path.
-z [z]
specifies how Solaris 10 and higher zone information is to be handled.
Without a following argument - e.g., NO z - the option specifies that zone names are
to be listed in the ZONE output column.
The -z option may be followed by a zone name, z. That causes lsof to list only open
files for processes in that zone. Multiple -z z option and argument pairs may be
specified to form a list of named zones. Any open file of any process in any of the
zones will be listed, subject to other conditions specified by other options and
arguments.
-Z [Z]
specifies how SELinux security contexts are to be handled. It and ‘Z’ field output
character support are inhibited when SELinux is disabled in the running Linux
kernel. See OUTPUT FOR OTHER PROGRAMS for more information on the ‘Z’
field output character.
Without a following argument - e.g., NO Z - the option specifies that security
contexts are to be listed in the SECURITY-CONTEXT output column.
The -Z option may be followed by a wildcard security context name, Z. That causes
lsof to list only open files for processes in that security context. Multiple -Z Z option
and argument pairs may be specified to form a list of security contexts. Any open file
of any process in any of the security contexts will be listed, subject to other
conditions specified by other options and arguments. Note that Z can be A:B:C or
*:B:C or A:B:* or *:*:C to match against the A:B:C context.
—
The double minus sign option is a marker that signals the end of the keyed options. It
may be used, for example, when the first file name begins with a minus sign. It may
also be used when the absence of a value for the last keyed option must be signified
by the presence of a minus sign in the following option and before the start of the file
names.
names
These are path names of specific files to list. Symbolic links are resolved before use.
The first name may be separated from the preceding options with the “—” option.
If a name is the mounted-on directory of a file system or the device of the file system,
lsof will list all the files open on the file system. To be considered a file system, the
name must match a mounted-on directory name in mount(8) output, or match the
name of a block device associated with a mounted-on directory name. The +|-f option
may be used to force lsof to consider a name a file system identifier (+f) or a simple
file (-f).
If name is a path to a directory that is not the mounted-on directory name of a file
system, it is treated just as a regular file is treated - i.e., its listing is restricted to
processes that have it open as a file or as a process-specific directory, such as the root
or current working directory. To request that lsof look for open files inside a directory
name, use the +d s and +D D options.
If a name is the base name of a family of multiplexed files - e. g, AIX’s /dev/pt[cs] -
lsof will list all the associated multiplexed files on the device that are open - e.g.,
/dev/pt[cs]/1, /dev/pt[cs]/2, etc.
If a name is a UNIX domain socket name, lsof will usually search for it by the
characters of the name alone - exactly as it is specified and is recorded in the kernel
socket structure. (See the next paragraph for an exception to that rule for Linux.)
Specifying a relative path - e.g., ./file - in place of the file’s absolute path - e.g.,
/tmp/file - won’t work because lsof must match the characters you specify with what
it finds in the kernel UNIX domain socket structures.
If a name is a Linux UNIX domain socket name, in one case lsof is able to search for
it by its device and inode number, allowing name to be a relative path. The case
requires that the absolute path — i.e., one beginning with a slash (‘/’) be used by the
process that created the socket, and hence be stored in the /proc/net/unix file; and it
requires that lsof be able to obtain the device and node numbers of both the absolute
path in /proc/net/unix and name via successful stat(2) system calls. When those
conditions are met, lsof will be able to search for the UNIX domain socket when
some path to it is is specified in name. Thus, for example, if the path is /dev/log, and
an lsof search is initiated when the working directory is /dev, then name could be
./log.
If a name is none of the above, lsof will list any open files whose device and inode
match that of the specified path name.
If you have also specified the -b option, the only names you may safely specify are
file systems for which your mount table supplies alternate device numbers. See the
AVOIDING KERNEL BLOCKS and ALTERNATE DEVICE NUMBERS
sections for more information.
Multiple file names are joined in a single ORed set before participating in AND
option selection.
› AFS
Lsof supports the recognition of AFS files for these dialects (and AFS versions):
AIX 4.1.4 (AFS 3.4a) HP-UX 9.0.5 (AFS 3.4a) Linux 1.2.13 (AFS 3.3)
Solaris 2.[56] (AFS 3.4a)
It may recognize AFS files on other versions of these dialects, but has not been tested
there. Depending on how AFS is implemented, lsof may recognize AFS files in other
dialects, or may have difficulties recognizing AFS files in the supported dialects.
Lsof may have trouble identifying all aspects of AFS files in supported dialects when
AFS kernel support is implemented via dynamic modules whose addresses do not
appear in the kernel’s variable name list. In that case, lsof may have to guess at the
identity of AFS files, and might not be able to obtain volume information from the
kernel that is needed for calculating AFS volume node numbers. When lsof can’t
compute volume node numbers, it reports blank in the NODE column.
The -A A option is available in some dialect implementations of lsof for specifying
the name list file where dynamic module kernel addresses may be found. When this
option is available, it will be listed in the lsof help output, presented in response to
the -h or -?
See the lsof FAQ (The FAQ section gives its location.) for more information about
dynamic modules, their symbols, and how they affect lsof options.
Because AFS path lookups don’t seem to participate in the kernel’s name cache
operations, lsof can’t identify path name components for AFS files.
› SECURITY
Lsof has three features that may cause security concerns. First, its default compilation
mode allows anyone to list all open files with it. Second, by default it creates a user-
readable and user-writable device cache file in the home directory of the real user ID
that executes lsof. (The list-all-open-files and device cache features may be disabled
when lsof is compiled.) Third, its -k and -m options name alternate kernel name list
or memory files.
Restricting the listing of all open files is controlled by the compile-time
HASSECURITY and HASNOSOCKSECURITY options. When HASSECURITY is
defined, lsof will allow only the root user to list all open files. The non-root user may
list only open files of processes with the same user IDentification number as the real
user ID number of the lsof process (the one that its user logged on with).
However, if HASSECURITY and HASNOSOCKSECURITY are both defined,
anyone may list open socket files, provided they are selected with the -i option.
When HASSECURITY is not defined, anyone may list all open files.
Help output, presented in response to the -h or -? option, gives the status of the
HASSECURITY and HASNOSOCKSECURITY definitions.
See the Security section of the 00README file of the lsof distribution for
information on building lsof with the HASSECURITY and
HASNOSOCKSECURITY options enabled.
Creation and use of a user-readable and user-writable device cache file is controlled
by the compile-time HASDCACHE option. See the DEVICE CACHE FILE section
and the sections that follow it for details on how its path is formed. For security
considerations it is important to note that in the default lsof distribution, if the real
user ID under which lsof is executed is root, the device cache file will be written in
root’s home directory - e.g., / or /root. When HASDCACHE is not defined, lsof does
not write or attempt to read a device cache file.
When HASDCACHE is defined, the lsof help output, presented in response to the -h,
-D?, or -? options, will provide device cache file handling information. When
HASDCACHE is not defined, the -h or -? output will have no -D option description.
Before you decide to disable the device cache file feature - enabling it improves the
performance of lsof by reducing the startup overhead of examining all the nodes in
/dev (or /devices) - read the discussion of it in the 00DCACHE file of the lsof
distribution and the lsof FAQ (The FAQ section gives its location.)
WHEN IN DOUBT, YOU CAN TEMPORARILY DISABLE THE USE OF THE
DEVICE CACHE FILE WITH THE -Di OPTION.
When lsof user declares alternate kernel name list or memory files with the -k and -m
options, lsof checks the user’s authority to read them with access(2). This is intended
to prevent whatever special power lsof’s modes might confer on it from letting it read
files not normally accessible via the authority of the real user ID.
› OUTPUT
This section describes the information lsof lists for each open file. See the OUTPUT
FOR OTHER PROGRAMS section for additional information on output that can
be processed by another program.
Lsof only outputs printable (declared so by isprint(3)) 8 bit characters. Non-printable
characters are printed in one of three forms: the C “\[bfrnt]” form; the control
character `^’ form (e.g., “^@”); or hexadecimal leading “\x” form (e.g., “\xab”).
Space is non-printable in the COMMAND column (“\x20”) and printable elsewhere.
For some dialects - if HASSETLOCALE is defined in the dialect’s machine.h header
file - lsof will print the extended 8 bit characters of a language locale. The lsof
process must be supplied a language locale environment variable (e.g., LANG)
whose value represents a known language locale in which the extended characters are
considered printable by isprint(3). Otherwise lsof considers the extended characters
non-printable and prints them according to its rules for non-printable characters,
stated above. Consult your dialect’s setlocale(3) man page for the names of other
environment variables that may be used in place of LANG - e.g., LC_ALL,
LC_CTYPE, etc.
Lsof’s language locale support for a dialect also covers wide characters - e.g., UTF-8
- when HASSETLOCALE and HASWIDECHAR are defined in the dialect’s
machine.h header file, and when a suitable language locale has been defined in the
appropriate environment variable for the lsof process. Wide characters are printable
under those conditions if iswprint(3) reports them to be. If HASSETLOCALE,
HASWIDECHAR and a suitable language locale aren’t defined, or if iswprint(3)
reports wide characters that aren’t printable, lsof considers the wide characters non-
printable and prints each of their 8 bits according to its rules for non-printable
characters, stated above.
Consult the answers to the “Language locale support” questions in the lsof FAQ (The
FAQ section gives its location.) for more information.
Lsof dynamically sizes the output columns each time it runs, guaranteeing that each
column is a minimum size. It also guarantees that each column is separated from its
predecessor by at least one space.
COMMAND
contains the first nine characters of the name of the UNIX command associated with
the process. If a non-zero w value is specified to the +c w option, the column contains
the first w characters of the name of the UNIX command associated with the process
up to the limit of characters supplied to lsof by the UNIX dialect. (See the description
of the +c w command or the lsof FAQ for more information. The FAQ section gives
its location.)
If w is less than the length of the column title, “COMMAND”, it will be raised to that
length.
If a zero w value is specified to the +c w option, the column contains all the
characters of the name of the UNIX command associated with the process.
All command name characters maintained by the kernel in its structures are displayed
in field output when the command name descriptor (`c’) is specified. See the
OUTPUT FOR OTHER COMMANDS section for information on selecting field
output and the associated command name descriptor.
PID
is the Process IDentification number of the process.
TID
is the task (thread) IDentification number, if task (thread) reporting is supported by
the dialect and a task (thread) is being listed. (If help output - i.e., the output of the -h
or -? options - shows this option, then task (thread) reporting is supported by the
dialect.)
A blank TID column in Linux indicates a process - i.e., a non-task.
ZONE
is the Solaris 10 and higher zone name. This column must be selected with the -z
option.
SECURITY-CONTEXT
is the SELinux security context. This column must be selected with the -Z option.
Note that the -Z option is inhibited when SELinux is disabled in the running Linux
kernel.
PPID
is the Parent Process IDentification number of the process. It is only displayed when
the -R option has been specified.
PGID
is the process group IDentification number associated with the process. It is only
displayed when the -g option has been specified.
USER
is the user ID number or login name of the user to whom the process belongs, usually
the same as reported by ps(1). However, on Linux USER is the user ID number or
login that owns the directory in /proc where lsof finds information about the process.
Usually that is the same value reported by ps(1), but may differ when the process has
changed its effective user ID. (See the -l option description for information on when a
user ID number or login name is displayed.)
FD
is the File Descriptor number of the file or:
cwd current working directory; Lnn library references (AIX); err FD
information error (see NAME column); jld jail directory (FreeBSD); ltx
shared library text (code and data); Mxx hex memory-mapped type number
xx. m86 DOS Merge mapped file; mem memory-mapped file; mmap memory-
mapped device; pd parent directory; rtd root directory; tr kernel trace
file (OpenBSD); txt program text (code and data); v86 VP/ix mapped
file;
FD is followed by one of these characters, describing the mode under which the file
is open:
r for read access; w for write access; u for read and write
access; space if mode unknown and no lock character
follows; `-‘ if mode unknown and lock character follows.
The mode character is followed by one of these lock characters, describing the type
of lock applied to the file:
N for a Solaris NFS lock of unknown type; r for read lock on part of
the file; R for a read lock on the entire file; w for a write lock on part
of the file; W for a write lock on the entire file; u for a read and
write lock of any length; U for a lock of unknown type; x for an
SCO OpenServer Xenix lock on part of the file; X for an SCO
OpenServer Xenix lock on the entire file; space if there is no lock.
See the LOCKS section for more information on the lock information character.
The FD column contents constitutes a single field for parsing in post-processing
scripts.
TYPE
is the type of the node associated with the file - e.g., GDIR, GREG, VDIR, VREG,
etc.
or “IPv4” for an IPv4 socket;
or “IPv6” for an open IPv6 network file - even if its address is IPv4, mapped in an
IPv6 address;
or “ax25” for a Linux AX.25 socket;
or “inet” for an Internet domain socket;
or “lla” for a HP-UX link level access file;
or “rte” for an AF_ROUTE socket;
or “sock” for a socket of unknown domain;
or “unix” for a UNIX domain socket;
or “x.25” for an HP-UX x.25 socket;
or “BLK” for a block special file;
or “CHR” for a character special file;
or “DEL” for a Linux map file that has been deleted;
or “DIR” for a directory;
or “DOOR” for a VDOOR file;
or “FIFO” for a FIFO special file;
or “KQUEUE” for a BSD style kernel event queue file;
or “LINK” for a symbolic link file;
or “MPB” for a multiplexed block file;
or “MPC” for a multiplexed character file;
or “NOFD” for a Linux /proc/<PID>/fd directory that can’t be opened — the
directory path appears in the NAME column, followed by an error message;
or “PAS” for a /proc/as file;
or “PAXV” for a /proc/auxv file;
or “PCRE” for a /proc/cred file;
or “PCTL” for a /proc control file;
or “PCUR” for the current /proc process;
or “PCWD” for a /proc current working directory;
or “PDIR” for a /proc directory;
or “PETY” for a /proc executable type (etype);
or “PFD” for a /proc file descriptor;
or “PFDR” for a /proc file descriptor directory;
or “PFIL” for an executable /proc file;
or “PFPR” for a /proc FP register set;
or “PGD” for a /proc/pagedata file;
or “PGID” for a /proc group notifier file;
or “PIPE” for pipes;
or “PLC” for a /proc/lwpctl file;
or “PLDR” for a /proc/lpw directory;
or “PLDT” for a /proc/ldt file;
or “PLPI” for a /proc/lpsinfo file;
or “PLST” for a /proc/lstatus file;
or “PLU” for a /proc/lusage file;
or “PLWG” for a /proc/gwindows file;
or “PLWI” for a /proc/lwpsinfo file;
or “PLWS” for a /proc/lwpstatus file;
or “PLWU” for a /proc/lwpusage file;
or “PLWX” for a /proc/xregs file’
or “PMAP” for a /proc map file (map);
or “PMEM” for a /proc memory image file;
or “PNTF” for a /proc process notifier file;
or “POBJ” for a /proc/object file;
or “PODR” for a /proc/object directory;
or “POLP” for an old format /proc light weight process file;
or “POPF” for an old format /proc PID file;
or “POPG” for an old format /proc page data file;
or “PORT” for a SYSV named pipe;
or “PREG” for a /proc register file;
or “PRMP” for a /proc/rmap file;
or “PRTD” for a /proc root directory;
or “PSGA” for a /proc/sigact file;
or “PSIN” for a /proc/psinfo file;
or “PSTA” for a /proc status file;
or “PSXSEM” for a POSIX semaphore file;
or “PSXSHM” for a POSIX shared memory file;
or “PUSG” for a /proc/usage file;
or “PW” for a /proc/watch file;
or “PXMP” for a /proc/xmap file;
or “REG” for a regular file;
or “SMT” for a shared memory transport file;
or “STSO” for a stream socket;
or “UNNM” for an unnamed type file;
or “XNAM” for an OpenServer Xenix special file of unknown type;
or “XSEM” for an OpenServer Xenix semaphore file;
or “XSD” for an OpenServer Xenix shared data file;
or the four type number octets if the corresponding name isn’t known.
FILE-ADDR
contains the kernel file structure address when f has been specified to +f;
FCT
contains the file reference count from the kernel file structure when c has been
specified to +f;
FILE-FLAG
when g or G has been specified to +f, this field contains the contents of the f_flag[s]
member of the kernel file structure and the kernel’s per-process open file flags (if
available); `G’ causes them to be displayed in hexadecimal; `g’, as short-hand names;
two lists may be displayed with entries separated by commas, the lists separated by a
semicolon (`;’); the first list may contain short-hand names for f_flag[s] values from
the following table:
AIO asynchronous I/O (e.g., FAIO) AP append ASYN asynchronous I/O
(e.g., FASYNC) BAS block, test, and set in use BKIU block if in use BL
use block offsets BSK block seek CA copy avoid CIO concurrent I/O CLON
clone CLRD CL read CR create DF defer DFI defer IND DFLU data flush DIR
direct DLY delay DOCL do clone DSYN data-only integrity DTY must be a
directory EVO event only EX open for exec EXCL exclusive open FSYN
synchronous writes GCDF defer during unp_gc() (AIX) GCMK mark during
unp_gc() (AIX) GTTY accessed via /dev/tty HUP HUP in progress KERN
kernel KIOC kernel-issued ioctl LCK has lock LG large file MBLK stream
message block MK mark MNT mount MSYN multiplex synchronization NATM
don't update atime NB non-blocking I/O NBDR no BDRM check NBIO SYSV
non-blocking I/O NBF n-buffering in effect NC no cache ND no delay NDSY
no data synchronization NET network NFLK don't follow links NMFS NM
file system NOTO disable background stop NSH no share NTTY no
controlling TTY OLRM OLR mirror PAIO POSIX asynchronous I/O PP POSIX
pipe R read RC file and record locking cache REV revoked RSH shared
read RSYN read synchronization RW read and write access SL shared lock
SNAP cooked snapshot SOCK socket SQSH Sequent shared set on open SQSV
Sequent SVM set on open SQR Sequent set repair on open SQS1 Sequent
full shared open SQS2 Sequent partial shared open STPI stop I/O SWR
synchronous read SYN file integrity while writing TCPM avoid TCP
collision TR truncate W write WKUP parallel I/O synchronization WTG
parallel I/O synchronization VH vhangup pending VTXT virtual text XL
exclusive lock
this list of names was derived from F* #define’s in dialect header files <fcntl.h>,
<linux</fs.h>, <sys/fcntl.c>, <sys/fcntlcom.h>, and <sys/file.h>; see the lsof.h header
file for a list showing the correspondence between the above short-hand names and
the header file definitions;
the second list (after the semicolon) may contain short-hand names for kernel per-
process open file flags from this table:
ALLC allocated BR the file has been read BHUP activity stopped by
SIGHUP BW the file has been written CLSG closing CX close-on-exec (see
fcntl(F_SETFD)) LCK lock was applied MP memory-mapped OPIP open pending
- in progress RSVW reserved wait SHMT UF_FSHMAT set (AIX) USE in use
(multi-threaded)
NODE-ID
(or INODE-ADDR for some dialects) contains a unique identifier for the file node
(usually the kernel vnode or inode address, but also occasionally a concatenation of
device and node number) when n has been specified to +f;
DEVICE
contains the device numbers, separated by commas, for a character special, block
special, regular, directory or NFS file;
or “memory” for a memory file system node under Tru64 UNIX;
or the address of the private data area of a Solaris socket stream;
or a kernel reference address that identifies the file (The kernel reference address may
be used for FIFO’s, for example.);
or the base address or device name of a Linux AX.25 socket device.
Usually only the lower thirty two bits of Tru64 UNIX kernel addresses are displayed.
SIZE, SIZE/OFF, or OFFSET
is the size of the file or the file offset in bytes. A value is displayed in this column
only if it is available. Lsof displays whatever value - size or offset - is appropriate for
the type of the file and the version of lsof.
On some UNIX dialects lsof can’t obtain accurate or consistent file offset information
from its kernel data sources, sometimes just for particular kinds of files (e.g., socket
files.) In other cases, files don’t have true sizes - e.g., sockets, FIFOs, pipes - so lsof
displays for their sizes the content amounts it finds in their kernel buffer descriptors
(e.g., socket buffer size counts or TCP/IP window sizes.) Consult the lsof FAQ (The
FAQ section gives its location.) for more information.
The file size is displayed in decimal; the offset is normally displayed in decimal with
a leading “0t” if it contains 8 digits or less; in hexadecimal with a leading “0x” if it is
longer than 8 digits. (Consult the -o o option description for information on when 8
might default to some other value.)
Thus the leading “0t” and “0x” identify an offset when the column may contain both
a size and an offset (i.e., its title is SIZE/OFF).
If the -o option is specified, lsof always displays the file offset (or nothing if no offset
is available) and labels the column OFFSET. The offset always begins with “0t” or
“0x” as described above.
The lsof user can control the switch from “0t” to “0x” with the -o o option. Consult
its description for more information.
If the -s option is specified, lsof always displays the file size (or nothing if no size is
available) and labels the column SIZE. The -o and -s options are mutually exclusive;
they can’t both be specified.
For files that don’t have a fixed size - e.g., don’t reside on a disk device - lsof will
display appropriate information about the current size or position of the file if it is
available in the kernel structures that define the file.
NLINK
contains the file link count when +L has been specified;
NODE
is the node number of a local file;
or the inode number of an NFS file in the server host;
or the Internet protocol type - e. g, “TCP”;
or “STR” for a stream;
or “CCITT” for an HP-UX x.25 socket;
or the IRQ or inode number of a Linux AX.25 socket device.
NAME
is the name of the mount point and file system on which the file resides;
or the name of a file specified in the names option (after any symbolic links have
been resolved);
or the name of a character special or block special device;
or the local and remote Internet addresses of a network file; the local host name or IP
number is followed by a colon (‘:’), the port, “->”, and the two-part remote address;
IP addresses may be reported as numbers or names, depending on the +|-M, -n, and -
P options; colon-separated IPv6 numbers are enclosed in square brackets; IPv4
INADDR_ANY and IPv6 IN6_IS_ADDR_UNSPECIFIED addresses, and zero port
numbers are represented by an asterisk (‘*’); a UDP destination address may be
followed by the amount of time elapsed since the last packet was sent to the
destination; TCP, UDP and UDPLITE remote addresses may be followed by
TCP/TPI information in parentheses - state (e.g., “(ESTABLISHED)”, “(Unbound)”),
queue sizes, and window sizes (not all dialects) - in a fashion similar to what
netstat(1) reports; see the -T option description or the description of the TCP/TPI
field in OUTPUT FOR OTHER PROGRAMS for more information on state,
queue size, and window size;
or the address or name of a UNIX domain socket, possibly including a stream clone
device name, a file system object’s path name, local and foreign kernel addresses,
socket pair information, and a bound vnode address;
or the local and remote mount point names of an NFS file;
or “STR”, followed by the stream name;
or a stream character device name, followed by “->” and the stream name or a list of
stream module names, separated by “->”;
or “STR:” followed by the SCO OpenServer stream device and module names,
separated by “->”;
or system directory name, “ — ”, and as many components of the path name as lsof
can find in the kernel’s name cache for selected dialects (See the KERNEL NAME
CACHE section for more information.);
or “PIPE->”, followed by a Solaris kernel pipe destination address;
or “COMMON:”, followed by the vnode device information structure’s device name,
for a Solaris common vnode;
or the address family, followed by a slash (`/’), followed by fourteen comma-
separated bytes of a non-Internet raw socket address;
or the HP-UX x.25 local address, followed by the virtual connection number (if any),
followed by the remote address (if any);
or “(dead)” for disassociated Tru64 UNIX files - typically terminal files that have
been flagged with the TIOCNOTTY ioctl and closed by daemons;
or “rd=<offset>” and “wr=<offset>” for the values of the read and write offsets of a
FIFO;
or “clone n:/dev/event” for SCO OpenServer file clones of the /dev/event device,
where n is the minor device number of the file;
or “(socketpair: n)” for a Solaris 2.6, 8, 9 or 10 UNIX domain socket, created by the
socketpair(3N) network function;
or “no PCB” for socket files that do not have a protocol block associated with them,
optionally followed by “, CANTSENDMORE” if sending on the socket has been
disabled, or “, CANTRCVMORE” if receiving on the socket has been disabled (e.g.,
by the shutdown(2) function);
or the local and remote addresses of a Linux IPX socket file in the form <net>:
[<node>:]<port>, followed in parentheses by the transmit and receive queue sizes,
and the connection state;
or “dgram” or “stream” for the type UnixWare 7.1.1 and above in-kernel UNIX
domain sockets, followed by a colon (‘:’) and the local path name when available,
followed by “->” and the remote path name or kernel socket address in hexadecimal
when available;
or the association value, association index, endpoint value, local address, local port,
remote address and remote port for Linux SCTP sockets;
or “protocol: ” followed by the Linux socket’s protocol attribute.
For dialects that support a “namefs” file system, allowing one file to be attached to
another with fattach(3C), lsof will add “(FA:<address1><direction><address2>)” to the
NAME column. <address1> and <address2> are hexadecimal vnode addresses.
<direction> will be “<-” if <address2> has been fattach’ed to this vnode whose address is
<address1>; and “->” if <address1>, the vnode address of this vnode, has been fattach’ed
to <address2>. <address1> may be omitted if it already appears in the DEVICE column.
Lsof may add two parenthetical notes to the NAME column for open Solaris 10 files: “(?)”
if lsof considers the path name of questionable accuracy; and “(deleted)” if the -X option
has been specified and lsof detects the open file’s path name has been deleted. Consult the
lsof FAQ (The FAQ section gives its location.) for more information on these NAME
column additions.
› LOCKS
Lsof can’t adequately report the wide variety of UNIX dialect file locks in a single
character. What it reports in a single character is a compromise between the
information it finds in the kernel and the limitations of the reporting format.
Moreover, when a process holds several byte level locks on a file, lsof only reports
the status of the first lock it encounters. If it is a byte level lock, then the lock
character will be reported in lower case - i.e., `r’, `w’, or `x’ - rather than the upper
case equivalent reported for a full file lock.
Generally lsof can only report on locks held by local processes on local files. When a
local process sets a lock on a remotely mounted (e.g., NFS) file, the remote server
host usually records the lock state. One exception is Solaris - at some patch levels of
2.3, and in all versions above 2.4, the Solaris kernel records information on remote
locks in local structures.
Lsof has trouble reporting locks for some UNIX dialects. Consult the BUGS section
of this manual page or the lsof FAQ (The FAQ section gives its location.) for more
information.
› OUTPUT FOR OTHER PROGRAMS
When the -F option is specified, lsof produces output that is suitable for processing
by another program - e.g, an awk or Perl script, or a C program.
Each unit of information is output in a field that is identified with a leading character
and terminated by a NL (012) (or a NUL (000) if the 0 (zero) field identifier character
is specified.) The data of the field follows immediately after the field identification
character and extends to the field terminator.
It is possible to think of field output as process and file sets. A process set begins
with a field whose identifier is `p’ (for process IDentifier (PID)). It extends to the
beginning of the next PID field or the beginning of the first file set of the process,
whichever comes first. Included in the process set are fields that identify the
command, the process group IDentification (PGID) number, the task (thread) ID
(TID), and the user ID (UID) number or login name.
A file set begins with a field whose identifier is `f’ (for file descriptor). It is followed
by lines that describe the file’s access mode, lock state, type, device, size, offset,
inode, protocol, name and stream module names. It extends to the beginning of the
next file or process set, whichever comes first.
When the NUL (000) field terminator has been selected with the 0 (zero) field
identifier character, lsof ends each process and file set with a NL (012) character.
Lsof always produces one field, the PID (`p’) field. All other fields may be declared
optionally in the field identifier character list that follows the -F option. When a field
selection character identifies an item lsof does not normally list - e.g., PPID, selected
with -R - specification of the field character - e.g., “-FR” - also selects the listing of
the item.
It is entirely possible to select a set of fields that cannot easily be parsed - e.g., if the
field descriptor field is not selected, it may be difficult to identify file sets. To help
you avoid this difficulty, lsof supports the -F option; it selects the output of all fields
with NL terminators (the -F0 option pair selects the output of all fields with NUL
terminators). For compatibility reasons neither -F nor -F0 select the raw device field.
These are the fields that lsof will produce. The single character listed first is the field
identifier.
a file access mode c process command name (all characters from proc or
user structure) C file structure share count d file's device character
code D file's major/minor device number (0x<hexadecimal>) f file
descriptor F file structure address (0x<hexadecimal>) G file flaGs
(0x<hexadecimal>; names if +fg follows) g process group ID i file's
inode number K tasK ID k link count l file's lock status L process
login name m marker between repeated output n file name, comment,
Internet address N node identifier (ox<hexadecimal> o file's offset
(decimal) p process ID (always selected) P protocol name r raw device
number (0x<hexadecimal>) R parent process ID s file's size (decimal) S
file's stream identification t file's type T TCP/TPI information,
identified by prefixes (the `=' is part of the prefix): QR=<read queue
size> QS=<send queue size> SO=<socket options and values> (not all
dialects) SS=<socket states> (not all dialects) ST=<connection state>
TF=<TCP flags and values> (not all dialects) WR=<window read size> (not
all dialects) WW=<window write size> (not all dialects) (TCP/TPI
information isn't reported for all supported UNIX dialects. The -h or -
? help output for the -T option will show what TCP/TPI reporting can be
requested.) u process user ID z Solaris 10 and higher zone name Z
SELinux security context (inhibited when SELinux is disabled) 0 use NUL
field terminator character in place of NL 1-9 dialect-specific field
identifiers (The output of -F? identifies the information to be found
in dialect-specific fields.)
You can get on-line help information on these characters and their descriptions by
specifying the -F? option pair. (Escape the `?’ character as your shell requires.)
Additional information on field content can be found in the OUTPUT section.
As an example, “-F pcfn” will select the process ID (`p’), command name (`c’), file
descriptor (`f’) and file name (`n’) fields with an NL field terminator character; “-F
pcfn0” selects the same output with a NUL (000) field terminator character.
Lsof doesn’t produce all fields for every process or file set, only those that are
available. Some fields are mutually exclusive: file device characters and file
major/minor device numbers; file inode number and protocol name; file name and
stream identification; file size and offset. One or the other member of these mutually
exclusive sets will appear in field output, but not both.
Normally lsof ends each field with a NL (012) character. The 0 (zero) field identifier
character may be specified to change the field terminator character to a NUL (000). A
NUL terminator may be easier to process with xargs (1), for example, or with
programs whose quoting mechanisms may not easily cope with the range of
characters in the field output. When the NUL field terminator is in use, lsof ends each
process and file set with a NL (012).
Three aids to producing programs that can process lsof field output are included in
the lsof distribution. The first is a C header file, lsof_fields.h, that contains symbols
for the field identification characters, indexes for storing them in a table, and
explanation strings that may be compiled into programs. Lsof uses this header file.
The second aid is a set of sample scripts that process field output, written in awk,
Perl 4, and Perl 5. They’re located in the scripts subdirectory of the lsof distribution.
The third aid is the C library used for the lsof test suite. The test suite is written in C
and uses field output to validate the correct operation of lsof. The library can be
found in the tests/LTlib.c file of the lsof distribution. The library uses the first aid, the
lsof_fields.h header file.
› BLOCKS AND TIMEOUTS
Lsof can be blocked by some kernel functions that it uses - lstat(2), readlink(2), and
stat(2). These functions are stalled in the kernel, for example, when the hosts where
mounted NFS file systems reside become inaccessible.
Lsof attempts to break these blocks with timers and child processes, but the
techniques are not wholly reliable. When lsof does manage to break a block, it will
report the break with an error message. The messages may be suppressed with the -t
and -w options.
The default timeout value may be displayed with the -h or -? option, and it may be
changed with the -S [t] option. The minimum for t is two seconds, but you should
avoid small values, since slow system responsiveness can cause short timeouts to
expire unexpectedly and perhaps stop lsof before it can produce any output.
When lsof has to break a block during its access of mounted file system information,
it normally continues, although with less information available to display about open
files.
Lsof can also be directed to avoid the protection of timers and child processes when
using the kernel functions that might block by specifying the -O option. While this
will allow lsof to start up with less overhead, it exposes lsof completely to the kernel
situations that might block it. Use this option cautiously.
› AVOIDING KERNEL BLOCKS
You can use the -b option to tell lsof to avoid using kernel functions that would
block. Some cautions apply.
First, using this option usually requires that your system supply alternate device
numbers in place of the device numbers that lsof would normally obtain with the
lstat(2) and stat(2) kernel functions. See the ALTERNATE DEVICE NUMBERS
section for more information on alternate device numbers.
Second, you can’t specify names for lsof to locate unless they’re file system names.
This is because lsof needs to know the device and inode numbers of files listed with
names in the lsof options, and the -b option prevents lsof from obtaining them.
Moreover, since lsof only has device numbers for the file systems that have
alternates, its ability to locate files on file systems depends completely on the
availability and accuracy of the alternates. If no alternates are available, or if they’re
incorrect, lsof won’t be able to locate files on the named file systems.
Third, if the names of your file system directories that lsof obtains from your
system’s mount table are symbolic links, lsof won’t be able to resolve the links. This
is because the -b option causes lsof to avoid the kernel readlink(2) function it uses to
resolve symbolic links.
Finally, using the -b option causes lsof to issue warning messages when it needs to
use the kernel functions that the -b option directs it to avoid. You can suppress these
messages by specifying the -w option, but if you do, you won’t see the alternate
device numbers reported in the warning messages.
› ALTERNATE DEVICE NUMBERS
On some dialects, when lsof has to break a block because it can’t get information
about a mounted file system via the lstat(2) and stat(2) kernel functions, or because
you specified the -b option, lsof can obtain some of the information it needs - the
device number and possibly the file system type - from the system mount table.
When that is possible, lsof will report the device number it obtained. (You can
suppress the report by specifying the -w option.)
You can assist this process if your mount table is supported with an /etc/mtab or
/etc/mnttab file that contains an options field by adding a “dev=xxxx” field for mount
points that do not have one in their options strings. Note: you must be able to edit the
file - i.e., some mount tables like recent Solaris /etc/mnttab or Linux /proc/mounts are
read-only and can’t be modified.
You may also be able to supply device numbers using the +m and +m m options,
provided they are supported by your dialect. Check the output of lsof’s -h or -?
options to see if the +m and +m m options are available.
The “xxxx” portion of the field is the hexadecimal value of the file system’s device
number. (Consult the st_dev field of the output of the lstat(2) and stat(2) functions for
the appropriate values for your file systems.) Here’s an example from a Sun Solaris
2.6 /etc/mnttab for a file system remotely mounted via NFS:
nfs ignore,noquota,dev=2a40001
Look for standard error file warning messages that begin “assuming “dev=xxxx” from
…”.
› KERNEL NAME CACHE
Lsof is able to examine the kernel’s name cache or use other kernel facilities (e.g., the
ADVFS 4.x tag_to_path() function under Tru64 UNIX) on some dialects for most
file system types, excluding AFS, and extract recently used path name components
from it. (AFS file system path lookups don’t use the kernel’s name cache; some
Solaris VxFS file system operations apparently don’t use it, either.)
Lsof reports the complete paths it finds in the NAME column. If lsof can’t report all
components in a path, it reports in the NAME column the file system name, followed
by a space, two `-‘ characters, another space, and the name components it has
located, separated by the `/’ character.
When lsof is run in repeat mode - i.e., with the -r option specified - the extent to
which it can report path name components for the same file may vary from cycle to
cycle. That’s because other running processes can cause the kernel to remove entries
from its name cache and replace them with others.
Lsof’s use of the kernel name cache to identify the paths of files can lead it to report
incorrect components under some circumstances. This can happen when the kernel
name cache uses device and node number as a key (e.g., SCO OpenServer) and a key
on a rapidly changing file system is reused. If the UNIX dialect’s kernel doesn’t
purge the name cache entry for a file when it is unlinked, lsof may find a reference to
the wrong entry in the cache. The lsof FAQ (The FAQ section gives its location.) has
more information on this situation.
Lsof can report path name components for these dialects:
FreeBSD HP-UX Linux NetBSD NEXTSTEP OpenBSD OPENSTEP SCO OpenServer
SCO|Caldera UnixWare Solaris Tru64 UNIX
If you want to know why lsof can’t report path name components for some dialects,
see the lsof FAQ (The FAQ section gives its location.)
› DEVICE CACHE FILE
Examining all members of the /dev (or /devices) node tree with stat(2) functions can
be time consuming. What’s more, the information that lsof needs - device number,
inode number, and path - rarely changes.
Consequently, lsof normally maintains an ASCII text file of cached /dev (or /devices)
information (exception: the /proc-based Linux lsof where it’s not needed.) The local
system administrator who builds lsof can control the way the device cache file path is
formed, selecting from these options:
Path from the -D option; Path from an environment variable; System-wide
path; Personal path (the default); Personal path, modified by an
environment variable.
Consult the output of the -h, -D? , or -? help options for the current state of device
cache support. The help output lists the default read-mode device cache file path that
is in effect for the current invocation of lsof. The -D? option output lists the read-only
and write device cache file paths, the names of any applicable environment variables,
and the personal device cache path format.
Lsof can detect that the current device cache file has been accidentally or maliciously
modified by integrity checks, including the computation and verification of a sixteen
bit Cyclic Redundancy Check (CRC) sum on the file’s contents. When lsof senses
something wrong with the file, it issues a warning and attempts to remove the current
cache file and create a new copy, but only to a path that the process can legitimately
write.
The path from which a lsof process may attempt to read a device cache file may not
be the same as the path to which it can legitimately write. Thus when lsof senses that
it needs to update the device cache file, it may choose a different path for writing it
from the path from which it read an incorrect or outdated version.
If available, the -Dr option will inhibit the writing of a new device cache file. (It’s
always available when specified without a path name argument.)
When a new device is added to the system, the device cache file may need to be
recreated. Since lsof compares the mtime of the device cache file with the mtime and
ctime of the /dev (or /devices) directory, it usually detects that a new device has been
added; in that case lsof issues a warning message and attempts to rebuild the device
cache file.
Whenever lsof writes a device cache file, it sets its ownership to the real UID of the
executing process, and its permission modes to 0600, this restricting its reading and
writing to the file’s owner.
› LSOF PERMISSIONS THAT AFFECT DEVICE CACHE FILE
ACCESS
Two permissions of the lsof executable affect its ability to access device cache files.
The permissions are set by the local system administrator when lsof is installed.
The first and rarer permission is setuid-root. It comes into effect when lsof is
executed; its effective UID is then root, while its real (i.e., that of the logged-on user)
UID is not. The lsof distribution recommends that versions for these dialects run
setuid-root.
HP-UX 11.11 and 11.23 Linux
The second and more common permission is setgid. It comes into effect when the
effective group IDentification number (GID) of the lsof process is set to one that can
access kernel memory devices - e.g., “kmem”, “sys”, or “system”.
An lsof process that has setgid permission usually surrenders the permission after it
has accessed the kernel memory devices. When it does that, lsof can allow more
liberal device cache path formations. The lsof distribution recommends that versions
for these dialects run setgid and be allowed to surrender setgid permission.
AIX 5.[12] and 5.3-ML1 Apple Darwin 7.x Power Macintosh systems FreeBSD
4.x, 4.1x, 5.x and [6789].x for x86-based systems FreeBSD 5.x and
[6789].x for Alpha, AMD64 and Sparc64-based systems HP-UX 11.00 NetBSD
1.[456], 2.x and 3.x for Alpha, x86, and SPARC-based systems NEXTSTEP
3.[13] for NEXTSTEP architectures OpenBSD 2.[89] and 3.[0-9] for x86-
based systems OPENSTEP 4.x SCO OpenServer Release 5.0.6 for x86-based
systems SCO|Caldera UnixWare 7.1.4 for x86-based systems Solaris 2.6,
8, 9 and 10 Tru64 UNIX 5.1
(Note: lsof for AIX 5L and above needs setuid-root permission if its -X option is
used.)
Lsof for these dialects does not support a device cache, so the permissions given to
the executable don’t apply to the device cache file.
Linux
› DEVICE CACHE FILE PATH FROM THE -D OPTION
The -D option provides limited means for specifying the device cache file path. Its ?
function will report the read-only and write device cache file paths that lsof will use.
When the -D b, r, and u functions are available, you can use them to request that the
cache file be built in a specific location (b[path]); read but not rebuilt (r[path]); or
read and rebuilt (u[path]). The b, r, and u functions are restricted under some
conditions. They are restricted when the lsof process is setuid-root. The path
specified with the r function is always read-only, even when it is available.
The b, r, and u functions are also restricted when the lsof process runs setgid and lsof
doesn’t surrender the setgid permission. (See the LSOF PERMISSIONS THAT
AFFECT DEVICE CACHE FILE ACCESS section for a list of implementations
that normally don’t surrender their setgid permission.)
A further -D function, i (for ignore), is always available.
When available, the b function tells lsof to read device information from the kernel
with the stat(2) function and build a device cache file at the indicated path.
When available, the r function tells lsof to read the device cache file, but not update
it. When a path argument accompanies -Dr, it names the device cache file path. The r
function is always available when it is specified without a path name argument. If
lsof is not running setuid-root and surrenders its setgid permission, a path name
argument may accompany the r function.
When available, the u function tells lsof to attempt to read and use the device cache
file. If it can’t read the file, or if it finds the contents of the file incorrect or outdated,
it will read information from the kernel, and attempt to write an updated version of
the device cache file, but only to a path it considers legitimate for the lsof process
effective and real UIDs.
› DEVICE CACHE PATH FROM AN ENVIRONMENT VARIABLE
Lsof’s second choice for the device cache file is the contents of the
LSOFDEVCACHE environment variable. It avoids this choice if the lsof process is
setuid-root, or the real UID of the process is root.
A further restriction applies to a device cache file path taken from the
LSOFDEVCACHE environment variable: lsof will not write a device cache file to
the path if the lsof process doesn’t surrender its setgid permission. (See the LSOF
PERMISSIONS THAT AFFECT DEVICE CACHE FILE ACCESS section for
information on implementations that don’t surrender their setgid permission.)
The local system administrator can disable the use of the LSOFDEVCACHE
environment variable or change its name when building lsof. Consult the output of -
D? for the environment variable’s name.
› SYSTEM-WIDE DEVICE CACHE PATH
The local system administrator may choose to have a system-wide device cache file
when building lsof. That file will generally be constructed by a special system
administration procedure when the system is booted or when the contents of /dev or
/devices) changes. If defined, it is lsof’s third device cache file path choice.
You can tell that a system-wide device cache file is in effect for your local installation
by examining the lsof help option output - i.e., the output from the -h or -? option.
Lsof will never write to the system-wide device cache file path by default. It must be
explicitly named with a -D function in a root-owned procedure. Once the file has
been written, the procedure must change its permission modes to 0644 (owner-read
and owner-write, group-read, and other-read).
› PERSONAL DEVICE CACHE PATH (DEFAULT)
The default device cache file path of the lsof distribution is one recorded in the home
directory of the real UID that executes lsof. Added to the home directory is a second
path component of the form .lsof_hostname.
This is lsof’s fourth device cache file path choice, and is usually the default. If a
system-wide device cache file path was defined when lsof was built, this fourth
choice will be applied when lsof can’t find the system-wide device cache file. This is
the only time lsof uses two paths when reading the device cache file.
The hostname part of the second component is the base name of the executing host,
as returned by gethostname(2). The base name is defined to be the characters
preceding the first `.’ in the gethostname(2) output, or all the gethostname(2) output
if it contains no `.’.
The device cache file belongs to the user ID and is readable and writable by the user
ID alone - i.e., its modes are 0600. Each distinct real user ID on a given host that
executes lsof has a distinct device cache file. The hostname part of the path
distinguishes device cache files in an NFS-mounted home directory into which
device cache files are written from several different hosts.
The personal device cache file path formed by this method represents a device cache
file that lsof will attempt to read, and will attempt to write should it not exist or
should its contents be incorrect or outdated.
The -Dr option without a path name argument will inhibit the writing of a new device
cache file.
The -D? option will list the format specification for constructing the personal device
cache file. The conversions used in the format specification are described in the
00DCACHE file of the lsof distribution.
› MODIFIED PERSONAL DEVICE CACHE PATH
If this option is defined by the local system administrator when lsof is built, the
LSOFPERSDCPATH environment variable contents may be used to add a component
of the personal device cache file path.
The LSOFPERSDCPATH variable contents are inserted in the path at the place
marked by the local system administrator with the “%p” conversion in the
HASPERSDC format specification of the dialect’s machine.h header file. (It’s placed
right after the home directory in the default lsof distribution.)
Thus, for example, if LSOFPERSDCPATH contains “LSOF”, the home directory is
“/Homes/abe”, the host name is “lsof.itap.purdue.edu”, and the HASPERSDC format
is the default (“%h/%p.lsof_%L”), the modified personal device cache file path is:
/Homes/abe/LSOF/.lsof_vic
The warning message may be suppressed with the -w option. It may also have been
suppressed by the system administrator when lsof was compiled by the setting of the
WARNDEVACCESS definition. In this case, the output from the help options will
include the message:
Inaccessible /dev warnings are disabled.
Inaccessible device warning messages usually disappear after lsof has created a
working device cache file.
› EXAMPLES
For a more extensive set of examples, documented more fully, see the
00QUICKSTART file of the lsof distribution.
To list all open files, use:
lsof
To list all open Internet, x.25 (HP-UX), and UNIX domain files, use:
lsof -i -U
To list all open IPv4 network files in use by the process whose PID is 1234, use:
lsof -i 4 -a -p 1234
Presuming the UNIX dialect supports IPv6, to list only open IPv6 network files, use:
lsof -i 6
To list all files using any protocol on ports 513, 514, or 515 of host
wonderland.cc.purdue.edu, use:
lsof -i @wonderland.cc.purdue.edu:513-515
To list all files using any protocol on any port of mace.cc.purdue.edu (cc.purdue.edu is the
default domain), use:
lsof -i @mace
To list all open files for login name “abe”, or user ID 1234, or process 456, or process 123,
or process 789, use:
lsof -p 456,123,789 -u 1234,abe
To list all open files on device /dev/hd4, use:
lsof /dev/hd4
To find the process that has /u/abe/foo open, use:
lsof /u/abe/foo
To send a SIGHUP to the processes that have /u/abe/bar open, use:
kill -HUP `lsof -t /u/abe/bar`
To find any open file, including an open UNIX domain socket file, with the name /dev/log,
use:
lsof /dev/log
To find processes with open files on the NFS file system named /nfs/mount/point whose
server is inaccessible, and presuming your mount table supplies the device number for
/nfs/mount/point, use:
lsof -b /nfs/mount/point
To do the preceding search with warning messages suppressed, use:
lsof -bw /nfs/mount/point
To ignore the device cache file, use:
lsof -Di
To obtain PID and command name field output for each process, file descriptor, file device
number, and file inode number for each file of each process, use:
lsof -FpcfDi
To list the files at descriptors 1 and 3 of every process running the lsof command for login
ID “abe” every 10 seconds, use:
lsof -c lsof -a -d 1 -d 3 -u abe -r10
To list the current working directory of processes running a command that is exactly four
characters long and has an ‘o’ or ‘O’ in character three, use this regular expression form of
the -c c option:
lsof -c /^..o.$/i -a -d cwd
To find an IP version 4 socket file by its associated numeric dot-form address, use:
lsof [email protected]
To find an IP version 6 socket file (when the UNIX dialect supports IPv6) by its associated
numeric colon-form address, use:
lsof -i@[0:1:2:3:4:5:6:7]
To find an IP version 6 socket file (when the UNIX dialect supports IPv6) by an associated
numeric colon-form address that has a run of zeroes in it - e.g., the loop-back address -
use:
lsof -i@[::1]
To obtain a repeat mode marker line that contains the current time, use:
lsof -rm====%T====
To add spaces to the previous marker line, use:
lsof -r “m==== %T ====”
› BUGS
Since lsof reads kernel memory in its search for open files, rapid changes in kernel
memory may produce unpredictable results.
When a file has multiple record locks, the lock status character (following the file
descriptor) is derived from a test of the first lock structure, not from any combination
of the individual record locks that might be described by multiple lock structures.
Lsof can’t search for files with restrictive access permissions by name unless it is
installed with root set-UID permission. Otherwise it is limited to searching for files to
which its user or its set-GID group (if any) has access permission.
The display of the destination address of a raw socket (e.g., for ping) depends on the
UNIX operating system. Some dialects store the destination address in the raw
socket’s protocol control block, some do not.
Lsof can’t always represent Solaris device numbers in the same way that ls(1) does.
For example, the major and minor device numbers that the lstat(2) and stat(2)
functions report for the directory on which CD-ROM files are mounted (typically
/cdrom) are not the same as the ones that it reports for the device on which CD-ROM
files are mounted (typically /dev/sr0). (Lsof reports the directory numbers.)
The support for /proc file systems is available only for BSD and Tru64 UNIX
dialects, Linux, and dialects derived from SYSV R4 - e.g., FreeBSD, NetBSD,
OpenBSD, Solaris, UnixWare.
Some /proc file items - device number, inode number, and file size - are unavailable
in some dialects. Searching for files in a /proc file system may require that the full
path name be specified.
No text (txt) file descriptors are displayed for Linux processes. All entries for files
other than the current working directory, the root directory, and numerical file
descriptors are labeled mem descriptors.
Lsof can’t search for Tru64 UNIX named pipes by name, because their kernel
implementation of lstat(2) returns an improper device number for a named pipe.
Lsof can’t report fully or correctly on HP-UX 9.01, 10.20, and 11.00 locks because of
insufficient access to kernel data or errors in the kernel data. See the lsof FAQ (The
FAQ section gives its location.) for details.
The AIX SMT file type is a fabrication. It’s made up for file structures whose type
(15) isn’t defined in the AIX /usr/include/sys/file.h header file. One way to create
such file structures is to run X clients with the DISPLAY variable set to “:0.0”.
The +|-f[cfgGn] option is not supported under /proc-based Linux lsof, because it
doesn’t read kernel structures from kernel memory.
› ENVIRONMENT
Lsof may access these environment variables.
LANG
defines a language locale. See setlocale(3) for the names of other variables that can
be used in place of LANG - e.g., LC_ALL, LC_TYPE, etc.
LSOFDEVCACHE
defines the path to a device cache file. See the DEVICE CACHE PATH FROM AN
ENVIRONMENT VARIABLE section for more information.
LSOFPERSDCPATH
defines the middle component of a modified personal device cache file path. See the
MODIFIED PERSONAL DEVICE CACHE PATH section for more information.
› FAQ
Frequently-asked questions and their answers (an FAQ) are available in the 00FAQ
file of the lsof distribution.
That file is also available via anonymous ftp from lsof.itap.purdue.edu at
pub/tools/unix/lsofFAQ. The URL is:
ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ
› FILES
/dev/kmem
kernel virtual memory device
/dev/mem
physical memory device
/dev/swap
system paging device
.lsof_hostname
lsof’s device cache file (The suffix, hostname, is the first component of the host’s
name returned by gethostname(2).)
› AUTHORS
Lsof was written by Victor A. Abell <[email protected]> of Purdue University. Many
others have contributed to lsof. They’re listed in the 00CREDITS file of the lsof
distribution.
› DISTRIBUTION
The latest distribution of lsof is available via anonymous ftp from the host
lsof.itap.purdue.edu. You’ll find the lsof distribution in the pub/tools/unix/lsof
directory.
You can also use this URL:
ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof
Lsof is also mirrored elsewhere. When you access lsof.itap.purdue.edu and change to its
pub/tools/unix/lsof directory, you’ll be given a list of some mirror sites. The
pub/tools/unix/lsof directory also contains a more complete list in its mirrors file. Use
mirrors with caution - not all mirrors always have the latest lsof revision.
Some pre-compiled Lsof executables are available on lsof.itap.purdue.edu, but their use is
discouraged - it’s better that you build your own from the sources. If you feel you must
use a pre-compiled executable, please read the cautions that appear in the README files
of the pub/tools/unix/lsof/binaries subdirectories and in the 00* files of the distribution.
More information on the lsof distribution can be found in its README.lsof_<version>
file. If you intend to get the lsof distribution and build it, please read
README.lsof_<version> and the other 00* files of the distribution before sending
questions to the author.
› SEE ALSO
Not all the following manual pages may exist in every UNIX dialect to which lsof has
been ported.
access(2), awk(1), crash(1), fattach(3C), ff(1), fstat(8), fuser(1), gethostname(2),
isprint(3), kill(1), localtime(3), lstat(2), modload(8), mount(8), netstat(1), ofiles(8L),
perl(1), ps(1), readlink(2), setlocale(3), stat(2), strftime(3), time(2), uname(1).
lspci
› NAME
lspci - list all PCI devices
› SYNOPSIS
lspci [options]
› DESCRIPTION
lspci is a utility for displaying information about PCI buses in the system and devices
connected to them.
By default, it shows a brief list of devices. Use the options described below to request
either a more verbose output or output intended for parsing by other programs.
If you are going to report bugs in PCI device drivers or in lspci itself, please include
output of “lspci -vvx” or even better “lspci -vvxxx” (however, see below for possible
caveats).
Some parts of the output, especially in the highly verbose modes, are probably
intelligible only to experienced PCI hackers. For exact definitions of the fields,
please consult either the PCI specifications or the header.h and
/usr/include/linux/pci.h include files.
Access to some parts of the PCI configuration space is restricted to root on many
operating systems, so the features of lspci available to normal users are limited.
However, lspci tries its best to display as much as available and mark all other
information with <access denied> text.
› OPTIONS
Basic display modes
-m
Dump PCI device data in a backward-compatible machine readable form. See below
for details.
-mm
Dump PCI device data in a machine readable form for easy parsing by scripts. See
below for details.
-t
Show a tree-like diagram containing all buses, bridges, devices and connections
between them.
Display options
-v
Be verbose and display detailed information about all devices.
-vv
Be very verbose and display more details. This level includes everything deemed
useful.
-vvv
Be even more verbose and display everything we are able to parse, even if it doesn’t
look interesting at all (e.g., undefined memory regions).
-k
Show kernel drivers handling each device and also kernel modules capable of
handling it. Turned on by default when -v is given in the normal mode of output.
(Currently works only on Linux with kernel 2.6 or newer.)
-x
Show hexadecimal dump of the standard part of the configuration space (the first 64
bytes or 128 bytes for CardBus bridges).
-xxx
Show hexadecimal dump of the whole PCI configuration space. It is available only to
root as several PCI devices crash when you try to read some parts of the config space
(this behavior probably doesn’t violate the PCI standard, but it’s at least very stupid).
However, such devices are rare, so you needn’t worry much.
-xxxx
Show hexadecimal dump of the extended (4096-byte) PCI configuration space
available on PCI-X 2.0 and PCI Express buses.
-b
Bus-centric view. Show all IRQ numbers and addresses as seen by the cards on the
PCI bus instead of as seen by the kernel.
-D
Always show PCI domain numbers. By default, lspci suppresses them on machines
which have only domain 0.
-n
Show PCI vendor and device codes as numbers instead of looking them up in the PCI
ID list.
-nn
Show PCI vendor and device codes as both numbers and names.
-q
Use DNS to query the central PCI ID database if a device is not found in the local
pci.ids file. If the DNS query succeeds, the result is cached in ~/.pciids-cache and it
is recognized in subsequent runs even if -q is not given any more. Please use this
switch inside automated scripts only with caution to avoid overloading the database
servers.
-qq
Same as -q, but the local cache is reset.
-Q
Query the central database even for entries which are recognized locally. Use this if
you suspect that the displayed entry is wrong.
-s [[[[<domain>]:]<bus>]:][<slot>][.[<func>]]
Show only devices in the specified domain (in case your machine has several host
bridges, they can either share a common bus number space or each of them can
address a PCI domain of its own; domains are numbered from 0 to ffff), bus (0 to ff),
slot (0 to 1f) and function (0 to 7). Each component of the device address can be
omitted or set to “*”, both meaning “any value”. All numbers are hexadecimal. E.g.,
“0:” means all devices on bus 0, “0” means all functions of device 0 on any bus,
“0.3” selects third function of device 0 on all buses and “.4” shows only the fourth
function of each device.
-d [<vendor>]:[<device>]
Show only devices with specified vendor and device ID. Both ID’s are given in
hexadecimal and may be omitted or given as “*”, both meaning “any value”.
Other options
-i <file>
Use <file> as the PCI ID list instead of /usr/share/hwdata/pci.ids.
-p <file>
Use <file> as the map of PCI ID’s handled by kernel modules. By default, lspci uses
/lib/modules/kernel_version/modules.pcimap. Applies only to Linux systems with
recent enough module tools.
-M
Invoke bus mapping mode which performs a thorough scan of all PCI devices,
including those behind misconfigured bridges, etc. This option gives meaningful
results only with a direct hardware access mode, which usually requires root
privileges. Please note that the bus mapper only scans PCI domain 0.
—version
Shows lspci version. This option should be used stand-alone.
The PCI utilities use the PCI library to talk to PCI devices (see pcilib(7) for details). You
can use the following options to influence its behavior:
-A <method>
The library supports a variety of methods to access the PCI hardware. By default, it
uses the first access method available, but you can use this option to override this
decision. See -A help for a list of available methods and their descriptions.
-O <param>=<value>
The behavior of the library is controlled by several named parameters. This option
allows to set the value of any of the parameters. Use -O help for a list of known
parameters and their default values.
-H1
Use direct hardware access via Intel configuration mechanism 1. (This is a shorthand
for -A intel-conf1.)
-H2
Use direct hardware access via Intel configuration mechanism 2. (This is a shorthand
for -A intel-conf2.)
-F <file>
Instead of accessing real hardware, read the list of devices and values of their
configuration registers from the given file produced by an earlier run of lspci -x. This
is very useful for analysis of user-supplied bug reports, because you can display the
hardware configuration in any way you want without disturbing the user with
requests for more dumps.
-G
Increase debug level of the library.
› MACHINE READABLE OUTPUT
If you intend to process the output of lspci automatically, please use one of the
machine-readable output formats (-m, -vm, -vmm) described in this section. All
other formats are likely to change between versions of lspci.
All numbers are always printed in hexadecimal. If you want to process numeric ID’s
instead of names, please add the -n switch.
In the simple format, each device is described on a single line, which is formatted as
parameters suitable for passing to a shell script, i.e., values separated by whitespaces,
quoted and escaped if necessary. Some of the arguments are positional: slot, class,
vendor name, device name, subsystem vendor name and subsystem name (the last
two are empty if the device has no subsystem); the remaining arguments are option-
like:
-rrev
Revision number.
-pprogif
Programming interface.
The relative order of positional arguments and options is undefined. New options can be
added in future versions, but they will always have a single argument not separated from
the option by any spaces, so they can be easily ignored if not recognized.
The verbose output is a sequence of records separated by blank lines. Each record
describes a single device by a sequence of lines, each line containing a single `tag: value‘
pair. The tag and the value are separated by a single tab character. Neither the records nor
the lines within a record are in any particular order. Tags are case-sensitive.
The following tags are defined:
Slot
The name of the slot where the device resides ([domain:]bus:device.function). This
tag is always the first in a record.
Class
Name of the class.
Vendor
Name of the vendor.
Device
Name of the device.
SVendor
Name of the subsystem vendor (optional).
SDevice
Name of the subsystem (optional).
PhySlot
The physical slot where the device resides (optional, Linux only).
Rev
Revision number (optional).
ProgIf
Programming interface (optional).
Driver
Kernel driver currently handling the device (optional, Linux only).
Module
Kernel module reporting that it is capable of handling the device (optional, Linux
only).
New tags can be added in future versions, so you should silently ignore any tags you don’t
recognize.
In this mode, lspci tries to be perfectly compatible with its old versions. It’s almost the
same as the regular verbose format, but the Device tag is used for both the slot and the
device name, so it occurs twice in a single record. Please avoid using this format in any
new code.
› FILES
/usr/share/hwdata/pci.ids
A list of all known PCI ID’s (vendors, devices, classes and subclasses). Maintained at
http://pciids.sourceforge.net/, use the update-pciids utility to download the most
recent version.
/usr/share/hwdata/pci.ids.gz
If lspci is compiled with support for compression, this file is tried before pci.ids.
~/.pciids-cache
All ID’s found in the DNS query mode are cached in this file.
› BUGS
Sometimes, lspci is not able to decode the configuration registers completely. This
usually happens when not enough documentation was available to the authors. In
such cases, it at least prints the <?> mark to signal that there is potentially something
more to say. If you know the details, patches will be of course welcome.
Access to the extended configuration space is currently supported only by the
linux_sysfs back-end.
› SEE ALSO
setpci(8), update-pciids(8), pcilib(7)
› AUTHOR
The PCI Utilities are maintained by Martin Mares <[email protected]>.
lsscsi
› NAME
lsscsi - list SCSI devices (or hosts) and their attributes
› SYNOPSIS
lsscsi [—classic] [—device] [—generic] [—help] [—hosts] [—kname] [—list] [—
lunhex] [—long] [—protection] [—protmode] [—scsi_id] [—size] [—sysfsroot=PATH]
[—transport] [—verbose] [—version] [—wwn] [H:C:T:L]
› DESCRIPTION
Uses information in sysfs (Linux kernel series 2.6 and later) to list SCSI devices (or
hosts) currently attached to the system. Options can be used to control the amount
and form of information provided for each device.
If a H:C:T:L argument is given then it acts as a filter and only devices that match it
are listed. The colons don’t have to be present, and ‘-‘, ‘*’, ‘?’ or missing arguments
at the end are interpreted as wildcards. The default is ‘*:*:*:*’ which means to match
everything. Any filter string using ‘*’ of ‘?’ should be surrounded by single or double
quotes to stop shell expansions. If ‘-‘ is used as a wildcard then the whole filter
argument should be prefixed by ‘— ‘ to tell this utility there are no more options on
the command line to be interpreted. A leading ‘[‘ and trailing ‘]’ are permitted (e.g.
‘[1:0:0]’ matches all LUNs on 1:0:0). May also be used to filter —hosts in which case
only the H is active and may be either a number or in the form “host<n>” where <n>
is a host number.
By default in this utility device node names (e.g. “/dev/sda” or “/dev/root_disk”) are
obtained by noting the major and minor numbers for the listed device obtained from
sysfs (e.g. the contents of “/sys/block/sda/dev”) and then looking for a match in the
“/dev” directory. This “match by major and minor” will allow devices that have been
given a different name by udev (for example) to be correctly reported by this utility.
In some situations it may be useful to see the device node name that Linux would
produce by default, so the —kname option is provided. An example of where this
may be useful is kernel error logs which tend to report disk error messages using the
disk’s default kernel name.
› OPTIONS
Arguments to long options are mandatory for short options as well. The options are
arranged in alphabetical order based on the long option name.
-c, —classic
The output is similar to that obtained from ‘cat /proc/scsi/scsi’
-d, —device
After outputting the (probable) SCSI device name the device node major and minor
numbers are shown in brackets (e.g. “/dev/sda[8:0]”).
-g, —generic
Output the SCSI generic device file name. Note that if the sg driver is a module it
may need to be loaded otherwise ‘-‘ may appear.
-h, —help
Output the usage message and exit.
-H, —hosts
List the SCSI hosts currently attached to the system. If this option is not given then
SCSI devices are listed.
-k, —kname
Use Linux default algorithm for naming devices (e.g. block major 8, minor 0 is
“/dev/sda”) rather than the “match by major and minor” in the “/dev” directory as
discussed above.
-L, —list
Output additional information in <attribute_name>=<value> pairs, one pair per line
preceded by two spaces. This option has the same effect as ‘-lll’.
-l, —long
Output additional information for each SCSI device (host). Can be used multiple
times for more output in which case the shorter option form is more convenient (e.g.
‘-lll’). When used three times (i.e. ‘-lll’) outputs SCSI device (host) attributes one per
line; preceded by two spaces; in the form “<attribute_name>=<value>”.
-x, —lunhex
when this option is used once the LUN in the tuple (at the start of each device line) is
shown in “T10” format which is up to 16 hexadecimal digits. It is prefixed by “0x” to
distinguish the LUN from the decimal value shown in the absence of this option.
Also hierarchal LUNs are shown with a “_” character separating the levels. For
example the two level LUN: 0x0355006600000000 will appear as 0x0355_0066. If
this option is given twice (e.g. using the short form: ‘-xx’) then the full 16
hexadecimal digits are shown for each LUN, prefixed by “0x”.
-p, —protection
Output target (DIF) and initiator (DIX) protection types.
-P, —protmode
Output effective protection information mode for each disk device.
-i, —scsi_id
outputs the udev derived matching id found in /dev/disk/by-id/scsi* . This is only for
disk (and disk like) devices. If no match is found then “dm-uuid-mpath*” and “usb*”
are searched in the same directory. If there is still no match then the
/sys/class/block/<disk>/holders directory is searched. The matching id is printed
following the device name (e.g. /dev/sdc) and if there is no match “-” is output.
-s, —size
Print disk capacity in human readable form.
-t, —transport
Output transport information. This will be a target related information or, if —hosts is
given, initiator related information. When used without —list, a name or identifier (or
both) are output on a single line, usually prefixed by the type of transport. For
devices this information replaces the normal vendor, product and revision strings.
When the —list option is also given then additionally multiple lines of
attribute_name=value pairs are output, each indented by two spaces. See the section
on transports below.
-v, —verbose
outputs directory names where information is found. Use multiple times for more
output.
-V, —version
outputs version information then exits.
-w, —wwn
outputs the WWN for disks instead of manufacturer, model and revision (or instead
of transport information). The World Wide Name (WWN) is typically 64 bits long
(16 hex digits) but could be up to 128 bits long. To indicate the WWN is
hexadecimal, it is prefixed by “0x”.
-y, —sysfsroot=PATH
assumes sysfs is mounted at PATH instead of the default ‘/sys’ . If this option is given
PATH should be an absolute path (i.e. start with ‘/’).
› TRANSPORTS
This utility lists SCSI devices which are known as logical units (LU) in the SCSI
Architecture Model (ref: SAM-4 at http://www.t10.org) or hosts when the —hosts
option is given. A host is called an initiator in SAM-4. A SCSI command travels out
via an initiator, across some transport to a target and then onwards to a logical unit. A
target device may contain several logical units. A target device has one or more ports
that can be viewed as transport end points. Each FC and SAS disk is a single target
that has two ports and contains one logical unit. If both target ports on a FC or SAS
disk are connected and visible to a machine, then lsscsi will show two entries.
Initiators (i.e. hosts) also have one or more ports and some HBAs in Linux have a
host entry per initiator port while others have a host entry per initiator device.
When the —transport option is given for devices (i.e. —hosts not given) then most of
the information produced by lsscsi is associated with the target, or more precisely: the
target port, through which SCSI commands pass that access a logical unit.
Typically this utility provides one line of output per “device” or host. Significantly
more information can be obtained by adding the —list option. When used together
with the —transport option, after the summary line, multiple lines of transport
specific information in the form “<attribute_name>=<value>” are output, each
indented by two spaces. Using a filter argument will reduce the volume of output if a
lot of devices or hosts are present.
The transports that are currently recognized are: IEEE 1394, ATA, FC, iSCSI, SAS,
SATA, SPI and USB.
For IEEE 1394 (a.k.a. Firewire and “SBP” when storage is involved), the EUI-64
based target port name is output when —transport is given, in the absence of the —
hosts option. When the —hosts option is given then the EUI-64 initiator port name is
output. Output on the summary line specific to the IEEE 1394 transport is prefixed by
“sbp:”.
to detect ATA and SATA a crude check is performed on the driver name (after the
checks for other transports are exhausted). Based on the driver name either ATA or
SATA transport type is chosen. Output on the summary line is either “ata:” or “sata:”.
No other attributes are given. Most device and hosts flagged as “ata:” will use the
parallel ATA transport (PATA).
For Fibre Channel (FC) the port name and port identifier are output when —transport
is given. In the absence of the —hosts option these ids will be for the target port
associated with the device (logical unit) being listed. When the —hosts option is
given then the ids are for the initiator port used by the host. Output on the summary
line specific to the FC transport is prefixed by “fc:”. If FCoE (over Ethernet) is
detected the prefix is changed to “fcoe:”.
For iSCSI the target port name is output when —transport is given, in the absence of
the —hosts option. This is made up of the iSCSI name and the target portal group tag.
Since the iSCSI name starts with “iqn” no further prefix is used. When the —hosts
option is given then only “iscsi:” is output on the summary line.
For Serial Attached SCSI the SAS address of the target port (or initiator port if —
hosts option is also given) is output. This will be a naa-5 address. For SAS HBAs and
SAS targets (such as SAS disks and tape drives) the SAS address will be world wide
unique. For SATA disks attached to a SAS expander, the expander provides the SAS
address by adding a non zero value to its (i.e. the expander’s) SAS address (e.g.
expander_sas_address + phy_id + 1). SATA disks directly attached to SAS HBAs
seem to have an indeterminate SAS address. Output on the summary line specific to
the SAS transport is prefixed by “sas:”.
For the SCSI Parallel Interface (SPI) the target port identifier (usually a number
between 0 and 15 inclusive) is output when —transport is given, in the absence of the
—hosts option. When the —hosts option is given then only “spi:” is output on the
summary line.
When a USB transport is detected, the summary line will contain “usb:” followed by
a USB device name. The USB device name has the form “<b>-<p1>[.<p2>[.<p3>]]:
<c>.<i>” where <b> is the USB bus number, <p1> is the port on the host. <p2> is a
port on a host connected hub, if present. If needed <p3> is a USB hub port closer to
the USB storage device. <c> refers to the configuration number while <i> is the
interface number. There is a separate SCSI host for each USB (SCSI) target. A USB
SCSI target may contain multiple logical units. Thus the same “usb: <device_name>”
string appears for a USB SCSI host and all logical units that belong to the USB SCSI
target associated with that USB SCSI host.
› LUNS
For historical reasons and as used by several other Unix based Operating Systems,
Linux uses a tuple of integers to describe (a path to) a SCSI device (also know as a
Logical Unit (LU)). The last element of that tuple is the so-called Logical Unit
Number (LUN). And originally in SCSI a LUN was an integer, at first 3 bits long,
then 8 then 16 bits. SCSI LUNs today (SAM-5 section 4.7) are 64 bits but SCSI
standards now consider a LUN to be an array of 8 bytes.
Up until 2013, Linux mapped SCSI LUNs to a 32 bit integer by taking the first 4
bytes of the SCSI LUN and ignoring the last 4 bytes. Linux treated the first two bytes
of the SCSI LUN as a unit (a word) and it became the least significant 16 bits in the
Linux LUN integer. The next two bytes of the SCSI LUN became the upper 16 bits in
the Linux LUN integer. The rationale for this was to keep commonly used LUNs
small Linux LUN integers. The most common LUN (by far) in SCSI LUN (hex)
notation is 00 00 00 00 00 00 00 00 and this becomes the Linux LUN integer 0. The
next most common LUN is 00 01 00 00 00 00 00 00 and this becomes the Linux
LUN integer 1.
In 2013 it is proposed to increase Linux LUNs to a 64 bit integer by extending the
mapping outlined above. In this case all information that is possible to represent in a
SCSI LUN is mapped a Linux LUN (64 bit) integer. And the mapping can be
reversed without losing information.
This version of the utility supports both 32 and 64 bit Linux LUN integers. By
default the LUN shown at the end of the tuple commencing each line is a Linux LUN
as a decimal integer. When the —lunhex option is given then the LUN is in SCSI
LUN format with the 8 bytes run together, with the output in hexadecimal and
prefixed by ‘0x’. The LUN is decoded according to SAM-5’s description and trailing
zeros (i.e. digits to the right) are not shown. So LUN 0 (i.e. 00 00 00 00 00 00 00 00)
is shown as 0x0000 and LUN 65 (i.e. 00 41 00 00 00 00 00 00) is shown as 0x0041.
If the —lunhex option is given twice then the full 64 bits (i.e. 16 hexadecimal digits)
are shown.
If the —lunhex option is not given on the command line then the environment
variable LSSCSI_LUNHEX_OPT is checked. If LSSCSI_LUNHEX_OPT is present
then its associated value becomes the number of times the —lunhex is set internally.
So, for example, ‘LSSCSI_LUNHEX_OPT=2 lsscsi’ and ‘lsscsi -xx’ are equivalent.
› EXAMPLES
Information about this utility including examples can also be found at:
http://sg.danny.cz/scsi/lsscsi.html .
› NOTES
Information for this command is derived from the sysfs file system, which is assumed
to be mounted at /sys unless specified otherwise by the user. SCSI (pseudo) devices
that have been detected by the SCSI mid level will be listed even if the required
upper level drivers (i.e. sd, sr, st, osst or ch) have not been loaded. If the appropriate
upper level driver has not been loaded then the device file name will appear as ‘-‘
rather than something like ‘/dev/st0’. Note that some devices (e.g. scanners and
medium changers) do not have a primary upper level driver and can only be accessed
via a SCSI generic (sg) device name.
Generic SCSI devices can also be accessed via the bsg driver in Linux. By default,
the bsg driver’s device node names are of the form ‘/dev/bsg/H:C:T:L‘. So, for
example, the SCSI device shown by this utility on a line starting with the tuple
‘6:0:1:2’ could be accessed via the bsg driver with the ‘/dev/bsg/6:0:1:2’ device node
name.
lsscsi version 0.21 or later is required to correctly display SCSI devices in Linux
kernel 2.6.26 (and possibly later) when the CONFIG_SYSFS_DEPRECATED_V2
kernel option is not defined.
› AUTHOR
Written by Doug Gilbert
› REPORTING BUGS
Report bugs to <dgilbert at interlog dot com>.
› COPYRIGHT
Copyright © 2003-2013 Douglas Gilbert This software is distributed under the GPL
version 2. There is NO warranty; not even for MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE.
› SEE ALSO
lspci lsusb
lsusb
› NAME
lsusb - list USB devices
› SYNOPSIS
lsusb [ options ]
› DESCRIPTION
lsusb is a utility for displaying information about USB buses in the system and the
devices connected to them.
› OPTIONS
-v, —verbose
Tells lsusb to be verbose and display detailed information about the devices shown.
This includes configuration descriptors for the device’s current speed. Class
descriptors will be shown, when available, for USB device classes including hub,
audio, HID, communications, and chipcard.
-s [[bus]:][devnum]
Show only devices in specified bus and/or devnum. Both ID’s are given in decimal
and may be omitted.
-d [vendor]:[product]
Show only devices with the specified vendor and product ID. Both ID’s are given in
hexadecimal.
-D device
Do not scan the /dev/bus/usb directory, instead display only information about the
device whose device file is given. The device file should be something like
/dev/bus/usb/001/001. This option displays detailed information like the v option;
you must be root to do this.
-t
Tells lsusb to dump the physical USB device hierarchy as a tree. This overrides the v
option.
-V, —version
Print version information on standard output, then exit successfully.
› RETURN VALUE
If the specified device is not found, a non-zero exit code is returned.
› FILES
/usr/share/hwdata/usb.ids
A list of all known USB ID’s (vendors, products, classes, subclasses and protocols).
› SEE ALSO
lspci(8), usbview(8).
› AUTHOR
Thomas Sailer, <[email protected]>.
LVCHANGE
› NAME
lvchange – change attributes of a logical volume
› SYNOPSIS
lvchange [—addtag Tag] [-A|—autobackup {y|n}] [-a|—activate [a|e|s|l]{y|n}] [—
activationmode {complete|degraded|partial}] [-k|—setactivationskip {y|n}] [-K|—
ignoreactivationskip] [—alloc AllocationPolicy] [—cachepolicy policy] [—
cachesettings key=value] [—commandprofile ProfileName] [-C|—contiguous
{y|n}] [-d|—debug] [—degraded] [—deltag Tag] [—detachprofile] [—discards
{ignore|nopassdown|passdown}] [—errorwhenfull {y|n}] [—resync] [-h|-?|—help]
[—ignorelockingfailure] [—ignoremonitoring] [—ignoreskippedcluster] [—
monitor {y|n}] [—poll {y|n}] [—[raid]maxrecoveryrate Rate] [—
[raid]minrecoveryrate Rate] [—[raid]syncaction {check|repair}] [—
[raid]writebehind IOCount] [—[raid]writemostly PhysicalVolume[:{y|n|t}]] [—
sysinit] [—noudevsync] [—metadataprofile ProfileName] [-M|—persistent {y|n}
[—minor minor] [—major major]] [-P|—partial] [-p|—permission {r|rw}] [-r|—
readahead {ReadAheadSectors|auto|none}] [—refresh] [-S|—select Selection] [-t|—
test] [-v|—verbose] [-Z|—zero {y|n}] [LogicalVolumePath…]
› DESCRIPTION
lvchange allows you to change the attributes of a logical volume including making
them known to the kernel ready for use.
› OPTIONS
See lvm(8) for common options.
-a, —activate [a|e|s|l]{y|n}
Controls the availability of the logical volumes for use. Communicates with the
kernel device-mapper driver via libdevmapper to activate (-ay) or deactivate (-an) the
logical volumes.
Activation of a logical volume creates a symbolic link
/dev/VolumeGroupName/LogicalVolumeName pointing to the device node. This link
is removed on deactivation. All software and scripts should access the device through
this symbolic link and present this as the name of the device. The location and name
of the underlying device node may depend on the distribution and configuration (e.g.
udev) and might change from release to release.
If autoactivation option is used (-aay), the logical volume is activated only if it
matches an item in the activation/auto_activation_volume_list set in lvm.conf. If this
list is not set, then all volumes are considered for activation. The -aay option should
be also used during system boot so it’s possible to select which volumes to activate
using the activation/auto_activation_volume_list setting.
In a clustered VG, clvmd is used for activation, and the following options are
possible:
With -aey, clvmd activates the LV in exclusive mode (with an exclusive lock),
allowing a single node to activate the LV.
With -asy, clvmd activates the LV in shared mode (with a shared lock), allowing
multiple nodes to activate the LV concurrently. If the LV type prohibits shared access,
such as an LV with a snapshot, the ‘s’ option is ignored and an exclusive lock is used.
With -ay (no mode specified), clvmd activates the LV in shared mode if the LV type
allows concurrent access, such as a linear LV. Otherwise, clvmd activates the LV in
exclusive mode.
With -aey, -asy, and -ay, clvmd attempts to activate the LV on all nodes. If exclusive
mode is used, then only one of the nodes will be successful.
With -an, clvmd attempts to deactivate the LV on all nodes.
With -aly, clvmd activates the LV only on the local node, and -aln deactivates only on
the local node. If the LV type allows concurrent access, then shared mode is used,
otherwise exclusive.
LVs with snapshots are always activated exclusively because they can only be used
on one node at once.
For local VGs, -ay, -aey, and -asy are all equivalent.
—activationmode {complete|degraded|partial}
The activation mode determines whether logical volumes are allowed to activate
when there are physical volumes missing (e.g. due to a device failure). complete is
the most restrictive; allowing only those logical volumes to be activated that are not
affected by the missing PVs. degraded allows RAID logical volumes to be activated
even if they have PVs missing. (Note that the “mirror segment type is not considered
a RAID logical volume. The raid1” segment type should be used instead.) Finally,
partial allows any logical volume to be activated even if portions are missing due to a
missing or failed PV. This last option should only be used when performing recovery
or repair operations. degraded is the default mode. To change it, modify
activation_mode in lvm.conf(5).
-k, —setactivationskip {y|n}
Controls whether Logical Volumes are persistently flagged to be skipped during
activation. By default, thin snapshot volumes are flagged for activation skip. To
activate such volumes, an extra -K/—ignoreactivationskip option must be used. The
flag is not applied during deactivation. To see whether the flag is attached, use lvs
command where the state of the flag is reported within lv_attr bits.
-K, —ignoreactivationskip
Ignore the flag to skip Logical Volumes during activation.
—cachepolicy policy, —cachesettings key=value
Only applicable to cached LVs; see also lvmcache(7). Sets the cache policy and its
associated tunable settings. In most use-cases, default values should be adequate.
-C, —contiguous {y|n}
Tries to set or reset the contiguous allocation policy for logical volumes. It’s only
possible to change a non-contiguous logical volume’s allocation policy to contiguous,
if all of the allocated physical extents are already contiguous.
—detachprofile
Detach any metadata configuration profiles attached to given Logical Volumes. See
lvm.conf(5) for more information about metadata profiles.
—discards {ignore|nopassdown|passdown}
Set this to ignore to ignore any discards received by a thin pool Logical Volume. Set
to nopassdown to process such discards within the thin pool itself and allow the no-
longer-needed extents to be overwritten by new data. Set to passdown (the default) to
process them both within the thin pool itself and to pass them down the underlying
device.
—errorwhenfull {y|n}
Sets thin pool behavior when data space is exhaused. See lvcreate(8) for information.
—resync
Forces the complete resynchronization of a mirror. In normal circumstances you
should not need this option because synchronization happens automatically. Data is
read from the primary mirror device and copied to the others, so this can take a
considerable amount of time - and during this time you are without a complete
redundant copy of your data.
—metadataprofile ProfileName
Uses and attaches ProfileName configuration profile to the logical volume metadata.
Whenever the logical volume is processed next time, the profile is automatically
applied. If the volume group has another profile attached, the logical volume profile
is preferred. See lvm.conf(5) for more information about metadata profiles.
—minor minor
Set the minor number.
—major major
Sets the major number. This option is supported only on older systems (kernel
version 2.4) and is ignored on modern Linux systems where major numbers are
dynamically assigned.
—monitor {y|n}
Start or stop monitoring a mirrored or snapshot logical volume with dmeventd, if it is
installed. If a device used by a monitored mirror reports an I/O error, the failure is
handled according to mirror_image_fault_policy and mirror_log_fault_policy set
in lvm.conf.
—poll {y|n}
Without polling a logical volume’s backgrounded transformation process will never
complete. If there is an incomplete pvmove or lvconvert (for example, on rebooting
after a crash), use —poll y to restart the process from its last checkpoint. However, it
may not be appropriate to immediately poll a logical volume when it is activated, use
—poll n to defer and then —poll y to restart the process.
—[raid]maxrecoveryrate Rate[bBsSkKmMgG]
Sets the maximum recovery rate for a RAID logical volume. Rate is specified as an
amount per second for each device in the array. If no suffix is given, then
KiB/sec/device is assumed. Setting the recovery rate to 0 means it will be unbounded.
—[raid]minrecoveryrate Rate[bBsSkKmMgG]
Sets the minimum recovery rate for a RAID logical volume. Rate is specified as an
amount per second for each device in the array. If no suffix is given, then
KiB/sec/device is assumed. Setting the recovery rate to 0 means it will be unbounded.
—[raid]syncaction {check|repair}
This argument is used to initiate various RAID synchronization operations. The check
and repair options provide a way to check the integrity of a RAID logical volume
(often referred to as “scrubbing”). These options cause the RAID logical volume to
read all of the data and parity blocks in the array and check for any discrepancies
(e.g. mismatches between mirrors or incorrect parity values). If check is used, the
discrepancies will be counted but not repaired. If repair is used, the discrepancies
will be corrected as they are encountered. The ‘lvs’ command can be used to show
the number of discrepancies found or repaired.
—[raid]writebehind IOCount
Specify the maximum number of outstanding writes that are allowed to devices in a
RAID1 logical volume that are marked as write-mostly. Once this value is exceeded,
writes become synchronous (i.e. all writes to the constituent devices must complete
before the array signals the write has completed). Setting the value to zero clears the
preference and allows the system to choose the value arbitrarily.
—[raid]writemostly PhysicalVolume[:{y|n|t}]
Mark a device in a RAID1 logical volume as write-mostly. All reads to these drives
will be avoided unless absolutely necessary. This keeps the number of I/Os to the
drive to a minimum. The default behavior is to set the write-mostly attribute for the
specified physical volume in the logical volume. It is possible to also remove the
write-mostly flag by appending a “:n to the physical volume or to toggle the value by
specifying :t“. The —writemostly argument can be specified more than one time in a
single command; making it possible to toggle the write-mostly attributes for all the
physical volumes in a logical volume at once.
—sysinit
Indicates that lvchange(8) is being invoked from early system initialisation scripts
(e.g. rc.sysinit or an initrd), before writeable filesystems are available. As such, some
functionality needs to be disabled and this option acts as a shortcut which selects an
appropriate set of options. Currently this is equivalent to using —
ignorelockingfailure, —ignoremonitoring, —poll n and setting
LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES environment variable.
If —sysinit is used in conjunction with lvmetad(8) enabled and running,
autoactivation is preferred over manual activation via direct lvchange call. Logical
volumes are autoactivated according to auto_activation_volume_list set in
lvm.conf(5).
—noudevsync
Disable udev synchronisation. The process will not wait for notification from udev. It
will continue irrespective of any possible udev processing in the background. You
should only use this if udev is not running or has rules that ignore the devices LVM2
creates.
—ignoremonitoring
Make no attempt to interact with dmeventd unless —monitor is specified. Do not use
this if dmeventd is already monitoring a device.
-M, —persistent {y|n}
Set to y to make the minor number specified persistent. Change of persistent numbers
is not supported for pool volumes.
-p, —permission {r|rw}
Change access permission to read-only or read/write.
-r, —readahead {ReadAheadSectors|auto|none}
Set read ahead sector count of this logical volume. For volume groups with metadata
in lvm1 format, this must be a value between 2 and 120 sectors. The default value is
“auto” which allows the kernel to choose a suitable value automatically. “None” is
equivalent to specifying zero.
—refresh
If the logical volume is active, reload its metadata. This is not necessary in normal
operation, but may be useful if something has gone wrong or if you’re doing
clustering manually without a clustered lock manager.
-Z, —zero {y|n}
Set zeroing mode for thin pool. Note: already provisioned blocks from pool in non-
zero mode are not cleared in unwritten parts when setting zero to y.
› ENVIRONMENT VARIABLES
LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES
Suppress locking failure messages.
› EXAMPLES
Changes the permission on volume lvol1 in volume group vg00 to be read-only:
lvchange -pr vg00/lvol1
› SEE ALSO
lvm(8), lvmcache(7), lvmthin(7), lvcreate(8), vgchange(8)
LVCONVERT
› NAME
lvconvert – convert a logical volume from linear to mirror or snapshot
› SYNOPSIS
lvconvert -m|—mirrors Mirrors [—type SegmentType] [—mirrorlog
{disk|core|mirrored}] [—corelog] [-R|—regionsize MirrorLogRegionSize] [—
stripes Stripes [-I|—stripesize StripeSize]] [-A|—alloc AllocationPolicy] [-b|—
background] [-f|—force] [-i|—interval Seconds] [—commandprofile ProfileName]
[-h|-?|—help] [—noudevsync] [-v|—verbose] [-y|—yes] [—version]
LogicalVolume[Path] [PhysicalVolume[Path][:PE[-PE]]…]
lvconvert —split [—commandprofile ProfileName] [-h|-?|—help] [—noudevsync]
[-v|—verbose] SplitableLogicalVolume{Name|Path}
lvconvert —splitcache|—uncache [—commandprofile ProfileName] [-h|-?|—help]
[—noudevsync] [-v|—verbose] [—version] CacheLogicalVolume{Name|Path}
lvconvert —splitmirrors Images [—name SplitLogicalVolumeName] [—
trackchanges] MirrorLogicalVolume[Path] [—commandprofile ProfileName]
[SplittablePhysicalVolume[Path][:PE[-PE]]…]
lvconvert —splitsnapshot [—commandprofile ProfileName] [-h|-?|—help] [—
noudevsync] [-v|—verbose] [—version] SnapshotLogicalVolume[Path]
lvconvert -s|—snapshot [-c|—chunksize ChunkSize[bBsSkK]] [-Z|—zero {y|n}] [—
commandprofile ProfileName] [-h|-?|—help] [—noudevsync] [-v|—verbose] [—
version] OriginalLogicalVolume[Path] SnapshotLogicalVolume[Path]
lvconvert —merge [-b|—background] [-i|—interval Seconds] [—commandprofile
ProfileName] [-h|-?|—help] [-v|—verbose] [—version] LogicalVolume[Path]…
lvconvert —repair [—stripes Stripes [-I|—stripesize StripeSize]] [—
commandprofile ProfileName] [-h|-?|—help] [-v|—verbose] [—version]
LogicalVolume[Path] [PhysicalVolume[Path]…]
lvconvert —replace PhysicalVolume [—commandprofile ProfileName] [-h|-?|—
help] [-v|—verbose] [—version] LogicalVolume[Path] [PhysicalVolume[Path]…]
lvconvert —type thin[-pool]|-T|—thin [—originname
NewExternalOriginVolumeName] [—thinpool ThinPoolLogicalVolume{Name|Path}
[-c|—chunksize ChunkSize[bBsSkKmMgG]] [—discards
{ignore|nopassdown|passdown}] [—poolmetadata
ThinPoolMetadataLogicalVolume{Name|Path} | —poolmetadatasize
ThinPoolMetadataSize[bBsSkKmMgG]] [-r|—readahead
{ReadAheadSectors|auto|none}] [—stripes Stripes [-I|—stripesize StripeSize]]] [—
poolmetadataspare {y|n}] [-Z|—zero {y|n}]] [—commandprofile ProfileName] [-
h|-?|—help] [-v|—verbose] [—version]
[[ExternalOrigin|ThinPool]LogicalVolume{Name|Path}] [PhysicalVolume[Path][:PE
[-PE]] …
lvconvert —type cache[-pool]|-H|—cache [—cachepool
CachePoolLogicalVolume{Name|Path}] [-c|—chunksize ChunkSize[bBsSkKmMgG]]
[—cachemode {writeback|writethrough}] [—cachepolicy policy] [—cachesettings
key=value] [—poolmetadata CachePoolMetadataLogicalVolume{Name|Path} | —
poolmetadatasize CachePoolMetadataSize[bBsSkKmMgG]] [—poolmetadataspare
{y|n}] [—commandprofile ProfileName] [-h|-?|—help] [-v|—verbose] [—version]
LogicalVolume{Name|Path} [PhysicalVolume[Path][:PE[-PE]]…]
› DESCRIPTION
lvconvert is used to change the segment type (i.e. linear, mirror, etc) or characteristics
of a logical volume. For example, it can add or remove the redundant images of a
logical volume, change the log type of a mirror, or designate a logical volume as a
snapshot repository. If the conversion requires allocation of physical extents (for
example, when converting from linear to mirror) and you specify one or more
PhysicalVolumes (optionally with ranges of physical extents), allocation of physical
extents will be restricted to these physical extents. If the conversion frees physical
extents (for example, when converting from a mirror to a linear, or reducing mirror
legs) and you specify one or more PhysicalVolumes, the freed extents come first from
the specified PhysicalVolumes.
› OPTIONS
See lvm(8) for common options. Exactly one of —cache, —corelog, —merge, —
mirrorlog, —mirrors, —repair, —replace, —snapshot, —split, —splitcache, —
splitsnapshot, —splitmirrors, —thin, —type or —uncache arguments is required.
-b, —background
Run the daemon in the background.
-H, —cache, —type cache
Converts logical volume to a cached LV with the use of cache pool specified with —
cachepool. For more information on cache pool LVs and cache LVs, see
lvmcache(7).
—cachepolicy policy
Only applicable to cached LVs; see also lvmcache(7). Sets the cache policy. mq is the
basic policy name. smq is more advanced version available in newer kernels.
—cachepool CachePoolLV
This argument is necessary when converting a logical volume to a cache LV. For
more information on cache pool LVs and cache LVs, see lvmcache(7).
—cachesettings key=value
Only applicable to cached LVs; see also lvmcache(7). Sets the cache tunable settings.
In most use-cases, default values should be adequate. Special string value default
switches setting back to its default kernel value and removes it from the list of
settings stored in lvm2 metadata.
-m, —mirrors Mirrors
Specifies the degree of the mirror you wish to create. For example, “-m 1 would
convert the original logical volume to a mirror volume with 2-sides; that is, a linear
volume plus one copy. There are two implementations of mirroring which correspond
to the raid1 and mirror segment types. The default mirroring segment type is raid1.
If the legacy mirror” segment type is desired, the —type argument must be used to
explicitly select the desired type. The —mirrorlog and —corelog options below are
only relevant to the legacy “mirror” segment type.
—mirrorlog {disk|core|mirrored}
Specifies the type of log to use. The default is disk, which is persistent and requires a
small amount of storage space, usually on a separate device from the data being
mirrored. Core may be useful for short-lived mirrors: It means the mirror is
regenerated by copying the data from the first device again every time the device is
activated - perhaps, for example, after every reboot. Using mirrored will create a
persistent log that is itself mirrored.
—corelog
The optional argument —corelog is the same as specifying —mirrorlog core.
-R, —regionsize MirrorLogRegionSize
A mirror is divided into regions of this size (in MB), and the mirror log uses this
granularity to track which regions are in sync.
—type SegmentType
Used to convert a logical volume to another segment type, like cache, cache-pool,
raid1, snapshot, thin, or thin-pool. When converting a logical volume to a cache LV,
the —cachepool argument is required. When converting a logical volume to a thin
LV, the —thinpool argument is required. See lvmcache(7) for more info about
caching support and lvmthin(7) for thin provisioning support.
-i, —interval Seconds
Report progress as a percentage at regular intervals.
—noudevsync
Disables udev synchronisation. The process will not wait for notification from udev.
It will continue irrespective of any possible udev processing in the background. You
should only use this if udev is not running or has rules that ignore the devices LVM2
creates.
—splitmirrors Images
The number of redundant Images of a mirror to be split off and used to form a new
logical volume. A name must be supplied for the newly-split-off logical volume using
the —name argument, unless the —trackchanges argument is given.
-n, —name Name
The name to apply to a logical volume which has been split off from a mirror logical
volume.
—trackchanges
Used with —splitmirrors on a raid1 device, this tracks changes so that the read-only
detached image can be merged efficiently back into the mirror later. Only the regions
of the detached device where the data changed get resynchronized.
Please note that this feature is only supported with the new md-based mirror
implementation and not with the original device-mapper mirror implementation.
—split
Separates SplitableLogicalVolume. Option is agregating various split commands and
tries to detect necessary split operation from its arguments.
—splitcache
Separates CacheLogicalVolume from cache pool. Before the logical volume becomes
uncached, cache is flushed. The cache pool volume is then left unused and could be
used e.g. for caching another volume. See also the option —uncache for uncaching
and removing cache pool with one command.
—splitsnapshot
Separates SnapshotLogicalVolume from its origin. The volume that is split off
contains the chunks that differ from the origin along with the metadata describing
them. This volume can be wiped and then destroyed with lvremove. The inverse of
—snapshot.
-s, —snapshot, —type snapshot
Recreates a snapshot from constituent logical volumes (or copies of them) after
having been separated using —splitsnapshot. For this to work correctly, no changes
may be made to the contents of either volume after the split.
-c, —chunksize ChunkSize[bBsSkKmMgG]
Gives the size of chunk for snapshot, cache pool and thin pool logical volumes.
Default unit is in kilobytes.
For snapshots the value must be power of 2 between 4KiB and 512KiB and the
default value is 4.
For cache pools the value must be between 32KiB and 1GiB and the default value is
64.
For thin pools the value must be between 64KiB and 1GiB and the default value
starts with 64 and scales up to fit the pool metadata size within 128MiB, if the pool
metadata size is not specified. The value must be a multiple of 64KiB. (Early kernel
support until thin target version 1.4 required the value to be a power of 2. Discards
weren’t supported for non-power of 2 values until thin target version 1.5.)
—discards {ignore|nopassdown|passdown}
Specifies whether or not discards will be processed by the thin layer in the kernel and
passed down to the Physical Volume. Options is currently supported only with thin
pools. Default is passdown.
-Z, —zero {y|n}
Controls zeroing of the first 4KiB of data in the snapshot. If the volume is read-only
the snapshot will not be zeroed. For thin pool volumes it controls zeroing of
provisioned blocks. Note: Provisioning of large zeroed chunks negatively impacts
performance.
—merge
Merges a snapshot into its origin volume or merges a raid1 image that has been split
from its mirror with —trackchanges back into its mirror.
To check if your kernel supports the snapshot merge feature, look for ‘snapshot-
merge’ in the output of dmsetup targets. If both the origin and snapshot volume are
not open the merge will start immediately. Otherwise, the merge will start the first
time either the origin or snapshot are activated and both are closed. Merging a
snapshot into an origin that cannot be closed, for example a root filesystem, is
deferred until the next time the origin volume is activated. When merging starts, the
resulting logical volume will have the origin’s name, minor number and UUID.
While the merge is in progress, reads or writes to the origin appear as they were
directed to the snapshot being merged. When the merge finishes, the merged snapshot
is removed. Multiple snapshots may be specified on the commandline or a @tag may
be used to specify multiple snapshots be merged to their respective origin.
—originname NewExternalOriginVolumeName
The new name for original logical volume, which becomes external origin volume for
a thin logical volume that will use given —thinpool. Without this option a default
name of “lvol<n>” will be generated where <n> is the LVM internal number of the
logical volume. This volume will be read-only and cannot be further modified as
long, as it is being used as the external origin.
—poolmetadata PoolMetadataLogicalVolume{Name|Path} Specifies cache or thin
pool metadata logical volume. The size should be in between 2MiB and 16GiB.
Cache pool is specified with the option —cachepool. Thin pool is specified with the
option —thinpool. When the specified pool already exists, the pool’s metadata
volume will be swapped with the given LV. Pool properties (like chunk size, discards
or zero) are preserved by default in this case. It can be useful for pool metadata repair
or its offline resize, since the metadata volume is available as regular volume for a
user with thin provisioning tools cache_dump(8), cache_repair(8),
cache_restore(8), thin_dump(8), thin_repair(8) and thin_restore(8).
—poolmetadatasize PoolMetadataSize[bBsSkKmMgG]
Sets the size of cache or thin pool’s metadata logical volume, if the pool metadata
volume is undefined. Pool is specified with the option —cachepool or —thinpool.
For thin pool supported value is in the range between 2MiB and 16GiB. The default
value is estimated with this formula (Pool_LV_size / Pool_LV_chunk_size * 64b).
Default unit is megabytes.
—poolmetadataspare {y|n}
Controls creation and maintanence of pool metadata spare logical volume that will be
used for automated pool recovery. Only one such volume is maintained within a
volume group with the size of the biggest pool metadata volume. Default is yes.
-r, —readahead {ReadAheadSectors|auto|none}
Sets read ahead sector count of thin pool metadata logical volume. The default value
is “auto which allows the kernel to choose a suitable value automatically. None” is
equivalent to specifying zero.
—repair
Repair a mirror after suffering a disk failure or try to fix thin pool metadata.
The mirror will be brought back into a consistent state. By default, the original
number of mirrors will be restored if possible. Specify -y on the command line to
skip the prompts. Use -f if you do not want any replacement. Additionally, you may
use —use-policies to use the device replacement policy specified in lvm.conf(5), viz.
activation/mirror_log_fault_policy or activation/mirror_device_fault_policy.
Thin pool repair automates the use of thin_repair(8) tool. Only inactive thin pool
volumes can be repaired. There is no validation of metadata between kernel and
lvm2. This requires further manual work. After successfull repair the old unmodified
metadata are still available in “<pool>_meta<n>” LV.
—replace PhysicalVolume
Remove the specified device (PhysicalVolume) and replace it with one that is
available in the volume group or from the specific list provided. This option is only
available to RAID segment types (e.g. raid1, raid5, etc).
—stripes Stripes
Gives the number of stripes. This is equal to the number of physical volumes to
scatter the logical volume. This does not apply to existing allocated space, only
newly allocated space can be striped.
-I, —stripesize StripeSize
Gives the number of kilobytes for the granularity of the stripes. StripeSize must be
2^n (n = 2 to 9) for metadata in LVM1 format. For metadata in LVM2 format, the
stripe size may be a larger power of 2 but must not exceed the physical extent size.
-T, —thin, —type thin
Converts the logical volume into a thin logical volume of the thin pool specified with
—thinpool. The original logical volume ExternalOriginLogicalVolume is renamed
into a new read-only logical volume. For the non-default name for this volume use —
originname. The volume cannot be further modified as long as it is used as an
external origin volume for unprovisioned areas of any thin logical volume.
—thinpool ThinPoolLogicalVolume{Name|Path}
Specifies or converts logical volume into a thin pool’s data volume. Content of
converted volume is lost. Thin pool’s metadata logical volume can be specified with
the option —poolmetadata or allocated with —poolmetadatasize. See lvmthin(7)
for more info about thin provisioning support.
—uncache
Uncaches CacheLogicalVolume. Before the volume becomes uncached, cache is
flushed. Unlike with —splitcache the cache pool volume is removed. This option
could be seen as an inverse of —cache.
› EXAMPLES
Converts the linear logical volume “vg00/lvol1” to a two-way mirror logical volume:
lvconvert -m1 vg00/lvol1
Converts the linear logical volume “vg00/lvol1” to a two-way RAID1 logical
volume:
lvconvert —type raid1 -m1 vg00/lvol1
Converts a mirror with a disk log to a mirror with an in-memory log:
lvconvert —mirrorlog core vg00/lvol1
Converts a mirror with an in-memory log to a mirror with a disk log:
lvconvert —mirrorlog disk vg00/lvol1
Converts a mirror logical volume to a linear logical volume:
lvconvert -m0 vg00/lvol1
Converts a mirror logical volume to a RAID1 logical volume with the same number
of images:
lvconvert —type raid1 vg00/mirror_lv
Converts logical volume “vg00/lvol2” to snapshot of original volume “vg00/lvol1”:
lvconvert -s vg00/lvol1 vg00/lvol2
Converts linear logical volume “vg00/lvol1” to a two-way mirror, using physical
extents /dev/sda:0-15 and /dev/sdb:0-15 for allocation of new extents:
lvconvert -m1 vg00/lvol1 /dev/sda:0-15 /dev/sdb:0-15
Converts mirror logical volume “vg00/lvmirror1” to linear, freeing physical extents
from /dev/sda:
lvconvert -m0 vg00/lvmirror1 /dev/sda
Merges “vg00/lvol1_snap” into its origin:
lvconvert —merge vg00/lvol1_snap
If “vg00/lvol1”, “vg00/lvol2” and “vg00/lvol3” are all tagged with “some_tag” each
snapshot logical volume will be merged serially, e.g.: “vg00/lvol1”, then
“vg00/lvol2”, then “vg00/lvol3”. If —background were used it would start all
snapshot logical volume merges in parallel.
lvconvert —merge @some_tag
Extracts one image from the mirror, making it a new logical volume named
“lv_split”. The mirror the image is extracted from is reduced accordingly. If it was a
2-way mirror (created with ‘-m 1’), then the resulting original volume will be linear.
lvconvert —splitmirrors 1 —name lv_split vg00/lvmirror1
A mirrored logical volume created with —type raid1 can use the —trackchanges
argument when splitting off an image. Detach one image from the mirrored logical
volume lv_raid1 as a separate read-only device and track the changes made to the
mirror while it is detached. The split-off device has a name of the form
lv_raid1_rimage_N, where N is a number, and it cannot be renamed.
lvconvert —splitmirrors 1 —trackchanges vg00/lv_raid1
Merge an image that was detached temporarily from its mirror with the —
trackchanges argument back into its original mirror and bring its contents back up-to-
date.
lvconvert —merge vg00/lv_raid1_rimage_1
Replaces the physical volume “/dev/sdb1” in the RAID1 logical volume “my_raid1”
with the specified physical volume “/dev/sdf1”. Had the argument “/dev/sdf1” been
left out, lvconvert would attempt to find a suitable device from those available in the
volume group.
lvconvert —replace /dev/sdb1 vg00/my_raid1 /dev/sdf1
Convert the logical volume “vg00/lvpool” into a thin pool with chunk size 128KiB
and convert “vg00/lv1” into a thin volume using this pool. Original “vg00/lv1” is
used as an external read-only origin, where all writes to such volume are stored in the
“vg00/lvpool”.
lvconvert —type thin —thinpool vg00/lvpool -c 128 lv1
Convert the logical volume “vg00/origin” into a thin volume from the thin pool
“vg00/lvpool”. This thin volume will use “vg00/origin” as an external origin volume
for unprovisioned areas in this volume. For the read-only external origin use the new
name “vg00/external”.
lvconvert -T —thinpool vg00/lvpool —originname external vg00/origin
Convert an existing logical volume to a cache pool LV using the given cache
metadata LV.
lvconvert —type cache-pool —poolmetadata vg00/lvx_meta vg00/lvx_data
lvrename vg00/lvx_data vg00/lvx_cachepool
Convert an existing logical volume to a cache LV using the given cache pool LV and
chunk size 128KiB.
lvconvert —cache —cachepool vg00/lvx_cachepool -c 128 vg00/lvx
Detach cache pool from an existing cached logical volume “vg00/lvol1” and leave
cache pool unused.
lvconvert —splitcache vg00/lvol1
Drop cache pool from an existing cached logical volume “vg00/lvol1”.
lvconvert —uncache vg00/lvol1
› SEE ALSO
lvm(8), lvm.conf(5), lvmcache(7), lvmthin(7), lvdisplay(8), lvextend(8),
lvreduce(8), lvremove(8), lvrename(8), lvscan(8), vgcreate(8), cache_dump(8),
cache_repair(8), cache_restore(8), thin_dump(8), thin_repair(8), thin_restore(8)
LVCREATE
› NAME
lvcreate - create a logical volume in an existing volume group
› SYNOPSIS
lvcreate [-a|—activate [a|e|l]{y|n}] [—addtag Tag] [—alloc AllocationPolicy] [-A|
—autobackup {y|n}] [-H|—cache] [—cachemode
{passthrough|writeback|writethrough}] [—cachepolicy policy] [—cachepool
CachePoolLogicalVolume{Name|Path} [—cachesettings key=value] [-c|—
chunksize ChunkSize[bBsSkKmMgG]] [—commandprofile ProfileName] [-C|—
contiguous {y|n}] [-d|—debug] [—discards {ignore|nopassdown|passdown}] [—
errorwhenfull {y|n}] [{-l|—extents LogicalExtentsNumber[%{FREE|PVS|VG}] | -L|
—size LogicalVolumeSize[bBsSkKmMgGtTpPeE]} [-i|—stripes Stripes [-I|—
stripesize StripeSize]]] [-h|-?|—help] [-K|—ignoreactivationskip] [—
ignoremonitoring] [—minor minor [-j|—major major]] [—metadataprofile
ProfileName] [-m|—mirrors Mirrors [{—corelog | —mirrorlog
{disk|core|mirrored}}] [—nosync] [-R|—regionsize
MirrorLogRegionSize[bBsSkKmMgG]]] [—monitor {y|n}] [-n|—name
LogicalVolume{Name|Path}] [—noudevsync] [-p|—permission {r|rw}] [-M|—
persistent {y|n}] [—poolmetadatasize MetadataVolumeSize[bBsSkKmMgG]] [—
poolmetadataspare {y|n}] [—[raid]maxrecoveryrate Rate]
[—[raid]minrecoveryrate Rate] [-r|—readahead {ReadAheadSectors|auto|none}]
[-k|—setactivationskip {y|n}] [-s|—snapshot [-V|—virtualsize
VirtualSize[bBsSkKmMgGtTpPeE]] [-t|—test] [-T|—thin] [—thinpool
ThinPoolLogicalVolume{Name|Path}] [—type SegmentType] [-v|—verbose] [-W|—
wipesignatures {y|n}] [-Z|—zero {y|n}] [VolumeGroup{Name|Path} [/
ExternalOrigin | Origin | Pool}LogicalVolumeName] [PhysicalVolumePath[:PE[-
PE]]…]
lvcreate [-l|—extents LogicalExtentsNumber[%{FREE|ORIGIN|PVS|VG}] | -L|—
size LogicalVolumeSize[bBsSkKmMgGtTpPeE]] [-c|—chunksize
ChunkSize[bBsSkKmMgG]] [—commandprofile Profilename] [—noudevsync] [—
ignoremonitoring] [—metadataProfile ProfileName] [—monitor {y|n}] [-n|—
name SnapshotLogicalVolume{Name|Path}] -s|—snapshot|-H|—cache
{[VolumeGroup{Name|Path}/]OriginalLogicalVolumeName -V|—virtualsize
VirtualSize[bBsSkKmMgGtTpPeE]}
› DESCRIPTION
lvcreate creates a new logical volume in a volume group (see vgcreate(8),
vgchange(8)) by allocating logical extents from the free physical extent pool of that
volume group. If there are not enough free physical extents then the volume group
can be extended (see vgextend(8)) with other physical volumes or by reducing
existing logical volumes of this volume group in size (see lvreduce(8)). If you
specify one or more PhysicalVolumes, allocation of physical extents will be restricted
to these volumes. The second form supports the creation of snapshot logical volumes
which keep the contents of the original logical volume for backup purposes.
› OPTIONS
See lvm(8) for common options.
-a, —activate {y|ay|n|ey|en|ly|ln}
Controls the availability of the Logical Volumes for immediate use after the
command finishes running. By default, new Logical Volumes are activated (-ay). If it
is possible technically, -an will leave the new Logical Volume inactive. But for
example, snapshots of active origin can only be created in the active state so -an
cannot be used with —type snapshot. This does not apply to thin volume snapshots,
which are by default created with flag to skip their activation (-ky). Normally the —
zero n argument has to be supplied too because zeroing (the default behaviour) also
requires activation. If autoactivation option is used (-aay), the logical volume is
activated only if it matches an item in the activation/auto_activation_volume_list set
in lvm.conf(5). For autoactivated logical volumes, —zero n and —wipesignatures n
is always assumed and it can’t be overridden. If the clustered locking is enabled, -aey
will activate exclusively on one node and -a{a|l}y will activate only on the local
node.
-H, —cache
Creates cache or cache pool logical volume or both. Specifying the optional argument
—size will cause the creation of the cache logical volume. Specifying both arguments
will cause the creation of cache with its cache pool volume. When the Volume group
name is specified together with existing logical volume name which is NOT a cache
pool name, such volume is treaded as cache origin volume and cache pool is created.
In this case the —size is used to specify size of cache pool volume. See lvmcache(7)
for more info about caching support. Note that the cache segment type requires a dm-
cache kernel module version 1.3.0 or greater.
—cachemode {passthrough|writeback|writethrough}
Specifying a cache mode determines when the writes to a cache LV are considered
complete. When writeback is specified, a write is considered complete as soon as it is
stored in the cache pool LV. If writethough is specified, a write is considered
complete only when it has been stored in the cache pool LV and on the origin LV.
While writethrough may be slower for writes, it is more resilient if something should
happen to a device associated with the cache pool LV.
—cachepolicy policy
Only applicable to cached LVs; see also lvmcache(7). Sets the cache policy. mq is the
basic policy name. smq is more advanced version available in newer kernels.
—cachepool CachePoolLogicalVolume{Name|Path}
Specifies the name of cache pool volume name. The other way to specify pool name
is to append name to Volume group name argument.
—cachesettings key=value
Only applicable to cached LVs; see also lvmcache(7). Sets the cache tunable settings.
In most use-cases, default values should be adequate. Special string value default
switches setting back to its default kernel value and removes it from the list of
settings stored in lvm2 metadata.
-c, —chunksize ChunkSize[bBsSkKmMgG]
Gives the size of chunk for snapshot, cache pool and thin pool logical volumes.
Default unit is in kilobytes. For snapshots the value must be power of 2 between
4KiB and 512KiB and the default value is 4KiB. For cache pools the value must a
multiple of 32KiB between 32KiB and 1GiB. The default is 64KiB. For thin pools
the value must be a multiple of 64KiB between 64KiB and 1GiB. Default value starts
with 64KiB and grows up to fit the pool metadata size within 128MiB, if the pool
metadata size is not specified. See lvm.conf(5) setting
allocation/thin_pool_chunk_size_policy to select different calculation policy. Thin
pool target version <1.4 requires this value to be a power of 2. For target version <1.5
discard is not supported for non power of 2 values.
-C, —contiguous {y|n}
Sets or resets the contiguous allocation policy for logical volumes. Default is no
contiguous allocation based on a next free principle.
—corelog
This is shortcut for option —mirrorlog core.
—discards {ignore|nopassdown|passdown}
Sets discards behavior for thin pool. Default is passdown.
—errorwhenfull {y|n}
Configures thin pool behaviour when data space is exhausted. Default is no. Device
will queue I/O operations until target timeout (see dm-thin-pool kernel module option
no_space_timeout) expires. Thus configured system has a time to i.e. extend the size
of thin pool data device. When set to yes, the I/O operation is immeditelly errored.
-K, —ignoreactivationskip
Ignore the flag to skip Logical Volumes during activation. Use —setactivationskip
option to set or reset activation skipping flag persistently for logical volume.
—ignoremonitoring
Make no attempt to interact with dmeventd unless —monitor is specified.
-l, —extents LogicalExtentsNumber[%{VG|PVS|FREE|ORIGIN}]
Gives the number of logical extents to allocate for the new logical volume. The total
number of physical extents allocated will be greater than this, for example, if the
volume is mirrored. The number can also be expressed as a percentage of the total
space in the Volume Group with the suffix %VG, as a percentage of the remaining
free space in the Volume Group with the suffix %FREE, as a percentage of the
remaining free space for the specified PhysicalVolume(s) with the suffix %PVS, or
(for a snapshot) as a percentage of the total space in the Origin Logical Volume with
the suffix %ORIGIN (i.e. 100%ORIGIN provides space for the whole origin). When
expressed as a percentage, the number is treated as an approximate upper limit for the
total number of physical extents to be allocated (including extents used by any
mirrors, for example).
-j, —major major
Sets the major number. Major numbers are not supported with pool volumes. This
option is supported only on older systems (kernel version 2.4) and is ignored on
modern Linux systems where major numbers are dynamically assigned.
—metadataprofile ProfileName
Uses and attaches the ProfileName configuration profile to the logical volume
metadata. Whenever the logical volume is processed next time, the profile is
automatically applied. If the volume group has another profile attached, the logical
volume profile is preferred. See lvm.conf(5) for more information about metadata
profiles.
—minor minor
Sets the minor number. Minor numbers are not supported with pool volumes.
-m, —mirrors Mirrors
Creates a mirrored logical volume with Mirrors copies. For example, specifying -m 1
would result in a mirror with two-sides; that is, a linear volume plus one copy.
Specifying the optional argument —nosync will cause the creation of the mirror to
skip the initial resynchronization. Any data written afterwards will be mirrored, but
the original contents will not be copied. This is useful for skipping a potentially long
and resource intensive initial sync of an empty device.
There are two implementations of mirroring which can be used and correspond to the
“raid1 and mirror segment types. The default is raid1“. See the —type option for
more information if you would like to use the legacy “mirror” segment type. See
lvm.conf(5) settings global/mirror_segtype_default and
global/raid10_segtype_default to configure default mirror segment type. The options
—mirrorlog and —corelog apply to the legacy “mirror” segment type only.
—mirrorlog {disk|core|mirrored}
Specifies the type of log to be used for logical volumes utilizing the legacy “mirror”
segment type. The default is disk, which is persistent and requires a small amount of
storage space, usually on a separate device from the data being mirrored. Using core
means the mirror is regenerated by copying the data from the first device each time
the logical volume is activated, like after every reboot. Using mirrored will create a
persistent log that is itself mirrored.
—monitor {y|n}
Starts or avoids monitoring a mirrored, snapshot or thin pool logical volume with
dmeventd, if it is installed. If a device used by a monitored mirror reports an I/O
error, the failure is handled according to activation/mirror_image_fault_policy and
activation/mirror_log_fault_policy set in lvm.conf(5).
-n, —name LogicalVolume{Name|Path}
Sets the name for the new logical volume. Without this option a default name of
“lvol#” will be generated where # is the LVM internal number of the logical volume.
—nosync
Causes the creation of the mirror to skip the initial resynchronization.
—noudevsync
Disables udev synchronisation. The process will not wait for notification from udev.
It will continue irrespective of any possible udev processing in the background. You
should only use this if udev is not running or has rules that ignore the devices LVM2
creates.
-p, —permission {r|rw}
Sets access permissions to read only (r) or read and write (rw). Default is read and
write.
-M, —persistent {y|n}
Set to y to make the minor number specified persistent. Pool volumes cannot have
persistent major and minor numbers. Defaults to yes only when major or minor
number is specified. Otherwise it is no.
—poolmetadatasize MetadataVolumeSize[bBsSkKmMgG] Sets the size of pool’s
metadata logical volume. Supported values are in range between 2MiB and 16GiB
for thin pool, and upto 16GiB for cache pool. The minimum value is computed from
pool’s data size. Default value for thin pool is (Pool_LV_size / Pool_LV_chunk_size
* 64b). Default unit is megabytes.
—poolmetadataspare {y|n}
Controls creation and maintanence of pool metadata spare logical volume that will be
used for automated pool recovery. Only one such volume is maintained within a
volume group with the size of the biggest pool metadata volume. Default is yes.
—[raid]maxrecoveryrate Rate[bBsSkKmMgG]
Sets the maximum recovery rate for a RAID logical volume. Rate is specified as an
amount per second for each device in the array. If no suffix is given, then
KiB/sec/device is assumed. Setting the recovery rate to 0 means it will be unbounded.
—[raid]minrecoveryrate Rate[bBsSkKmMgG]
Sets the minimum recovery rate for a RAID logical volume. Rate is specified as an
amount per second for each device in the array. If no suffix is given, then
KiB/sec/device is assumed. Setting the recovery rate to 0 means it will be unbounded.
-r, —readahead {ReadAheadSectors|auto|none}
Sets read ahead sector count of this logical volume. For volume groups with metadata
in lvm1 format, this must be a value between 2 and 120. The default value is auto
which allows the kernel to choose a suitable value automatically. None is equivalent
to specifying zero.
-R, —regionsize MirrorLogRegionSize[bBsSkKmMgG]
A mirror is divided into regions of this size (in MiB), and the mirror log uses this
granularity to track which regions are in sync.
-k, —setactivationskip {y|n}
Controls whether Logical Volumes are persistently flagged to be skipped during
activation. By default, thin snapshot volumes are flagged for activation skip. See
lvm.conf(5) activation/auto_set_activation_skip how to change its default behaviour.
To activate such volumes, an extra -K|—ignoreactivationskip option must be used.
The flag is not applied during deactivation. Use lvchange —setactivationskip {y|n}
command to change the skip flag for existing volumes. To see whether the flag is
attached, use lvs command where the state of the flag is reported within lv_attr bits.
-L, —size LogicalVolumeSize[bBsSkKmMgGtTpPeE]
Gives the size to allocate for the new logical volume. A size suffix of B for bytes, S
for sectors as 512 bytes, K for kilobytes, M for megabytes, G for gigabytes, T for
terabytes, P for petabytes or E for exabytes is optional. Default unit is megabytes.
-s, —snapshot OriginalLogicalVolume{Name|Path}
Creates a snapshot logical volume (or snapshot) for an existing, so called original
logical volume (or origin). Snapshots provide a ‘frozen image’ of the contents of the
origin while the origin can still be updated. They enable consistent backups and
online recovery of removed/overwritten data/files. Thin snapshot is created when the
origin is a thin volume and the size IS NOT specified. Thin snapshot shares same
blocks within the thin pool volume. The non thin volume snapshot with the specified
size does not need the same amount of storage the origin has. In a typical scenario,
15-20% might be enough. In case the snapshot runs out of storage, use lvextend(8) to
grow it. Shrinking a snapshot is supported by lvreduce(8) as well. Run lvs(8) on the
snapshot in order to check how much data is allocated to it. Note: a small amount of
the space you allocate to the snapshot is used to track the locations of the chunks of
data, so you should allocate slightly more space than you actually need and monitor
(—monitor) the rate at which the snapshot data is growing so you can avoid running
out of space. If —thinpool is specified, thin volume is created that will use given
original logical volume as an external origin that serves unprovisioned blocks. Only
read-only volumes can be used as external origins. To make the volume external
origin, lvm expects the volume to be inactive. External origin volume can be
used/shared for many thin volumes even from different thin pools. See lvconvert(8)
for online conversion to thin volumes with external origin.
-i, —stripes Stripes
Gives the number of stripes. This is equal to the number of physical volumes to
scatter the logical volume. When creating a RAID 4/5/6 logical volume, the extra
devices which are necessary for parity are internally accounted for. Specifying -i3
would use 3 devices for striped logical volumes, 4 devices for RAID 4/5, and 5
devices for RAID 6. Alternatively, RAID 4/5/6 will stripe across all PVs in the
volume group or all of the PVs specified if the -i argument is omitted.
-I, —stripesize StripeSize
Gives the number of kilobytes for the granularity of the stripes. StripeSize must be
2^n (n = 2 to 9) for metadata in LVM1 format. For metadata in LVM2 format, the
stripe size may be a larger power of 2 but must not exceed the physical extent size.
-T, —thin
Creates thin pool or thin logical volume or both. Specifying the optional argument —
size or —extents will cause the creation of the thin pool logical volume. Specifying
the optional argument —virtualsize will cause the creation of the thin logical volume
from given thin pool volume. Specifying both arguments will cause the creation of
both thin pool and thin volume using this pool. See lvmthin(7) for more info about
thin provisioning support. Thin provisioning requires device mapper kernel driver
from kernel 3.2 or greater.
—thinpool ThinPoolLogicalVolume{Name|Path}
Specifies the name of thin pool volume name. The other way to specify pool name is
to append name to Volume group name argument.
—type SegmentType
Creates a logical volume with the specified segment type. Supported types are:
cache, cache-pool, error, linear, mirror, raid1, raid4, raid5_la, raid5_ls (= raid5),
raid5_ra, raid5_rs, raid6_nc, raid6_nr, raid6_zr (= raid6), raid10, snapshot, striped,
thin, thin-pool or zero. Segment type may have a commandline switch alias that will
enable its use. When the type is not explicitly specified an implicit type is selected
from combination of options: -H|—cache|—cachepool (cache or cachepool), -T|—
thin|—thinpool (thin or thinpool), -m|—mirrors (raid1 or mirror), -s|—snapshot|-
V|—virtualsize (snapshot or thin), -i|—stripes (striped). Default type is linear.
-V, —virtualsize VirtualSize[bBsSkKmMgGtTpPeE]
Creates a thinly provisioned device or a sparse device of the given size (in MiB by
default). See lvm.conf(5) settings global/sparse_segtype_default to configure default
sparse segment type. See lvmthin(7) for more info about thin provisioning support.
Anything written to a sparse snapshot will be returned when reading from it. Reading
from other areas of the device will return blocks of zeros. Virtual snapshot is
implemented by creating a hidden virtual device of the requested size using the zero
target. A suffix of _vorigin is used for this device. Note: using sparse snapshots is not
efficient for larger device sizes (GiB), thin provisioning should be used for this case.
-W, —wipesignatures {y|n}
Controls wiping of detected signatures on newly created Logical Volume. If this
option is not specified, then by default signature wiping is done each time the zeroing
(-Z/—zero) is done. This default behaviour can be controlled by
allocation/wipe_signatures_when_zeroing_new_lvs setting found in lvm.conf(5). If
blkid wiping is used allocation/use_blkid_wiping setting in lvm.conf(5)) and LVM2
is compiled with blkid wiping support, then blkid(8) library is used to detect the
signatures (use blkid -k command to list the signatures that are recognized).
Otherwise, native LVM2 code is used to detect signatures (MD RAID, swap and
LUKS signatures are detected only in this case). Logical volume is not wiped if the
read only flag is set.
-Z, —zero {y|n}
Controls zeroing of the first 4KiB of data in the new logical volume. Default is yes.
Snapshot COW volumes are always zeroed. Logical volume is not zeroed if the read
only flag is set.
Warning: trying to mount an unzeroed logical volume can cause the system to hang.
› EXAMPLES
Creates a striped logical volume with 3 stripes, a stripe size of 8KiB and a size of
100MiB in the volume group named vg00. The logical volume name will be chosen
by lvcreate:
lvcreate -i 3 -I 8 -L 100M vg00
Creates a mirror logical volume with 2 sides with a useable size of 500 MiB. This
operation would require 3 devices (or option —alloc anywhere ) - two for the mirror
devices and one for the disk log:
lvcreate -m1 -L 500M vg00
Creates a mirror logical volume with 2 sides with a useable size of 500 MiB. This
operation would require 2 devices - the log is “in-memory”:
lvcreate -m1 —mirrorlog core -L 500M vg00
Creates a snapshot logical volume named “vg00/snap” which has access to the
contents of the original logical volume named “vg00/lvol1” at snapshot logical
volume creation time. If the original logical volume contains a file system, you can
mount the snapshot logical volume on an arbitrary directory in order to access the
contents of the filesystem to run a backup while the original filesystem continues to
get updated:
lvcreate —size 100m —snapshot —name snap /dev/vg00/lvol1
Creates a snapshot logical volume named “vg00/snap” with size for overwriting 20%
of the original logical volume named “vg00/lvol1”.:
lvcreate -s -l 20%ORIGIN —name snap vg00/lvol1
Creates a sparse device named /dev/vg1/sparse of size 1TiB with space for just under
100MiB of actual data on it:
lvcreate —virtualsize 1T —size 100M —snapshot —name sparse vg1
Creates a linear logical volume “vg00/lvol1” using physical extents /dev/sda:0-7 and
/dev/sdb:0-7 for allocation of extents:
lvcreate -L 64M -n lvol1 vg00 /dev/sda:0-7 /dev/sdb:0-7
Creates a 5GiB RAID5 logical volume “vg00/my_lv”, with 3 stripes (plus a parity
drive for a total of 4 devices) and a stripesize of 64KiB:
lvcreate —type raid5 -L 5G -i 3 -I 64 -n my_lv vg00
Creates a RAID5 logical volume “vg00/my_lv”, using all of the free space in the VG
and spanning all the PVs in the VG:
lvcreate —type raid5 -l 100%FREE -n my_lv vg00
Creates a 5GiB RAID10 logical volume “vg00/my_lv”, with 2 stripes on 2 2-way
mirrors. Note that the -i and -m arguments behave differently. The -i specifies the
number of stripes. The -m specifies the number of additional copies:
lvcreate —type raid10 -L 5G -i 2 -m 1 -n my_lv vg00
Creates 100MiB pool logical volume for thin provisioning build with 2 stripes 64KiB
and chunk size 256KiB together with 1TiB thin provisioned logical volume
“vg00/thin_lv”:
lvcreate -i 2 -I 64 -c 256 -L100M -T vg00/pool -V 1T —name thin_lv
Creates a thin snapshot volume “thinsnap” of thin volume “thinvol” that will share
the same blocks within the thin pool. Note: the size MUST NOT be specified,
otherwise the non-thin snapshot is created instead:
lvcreate -s vg00/thinvol —name thinsnap
Creates a thin snapshot volume of read-only inactive volume “origin” which then
becomes the thin external origin for the thin snapshot volume in vg00 that will use an
existing thin pool “vg00/pool”:
lvcreate -s —thinpool vg00/pool origin
Create a cache pool LV that can later be used to cache one logical volume.
lvcreate —type cache-pool -L 1G -n my_lv_cachepool vg /dev/fast1
If there is an existing cache pool LV, create the large slow device (i.e. the origin LV)
and link it to the supplied cache pool LV, creating a cache LV.
lvcreate —cache -L 100G -n my_lv vg/my_lv_cachepool /dev/slow1
If there is an existing logical volume, create the small and fast cache pool LV and link
it to the supplied existing logical volume (i.e. the origin LV), creating a cache LV.
lvcreate —type cache -L 1G -n my_lv_cachepool vg/my_lv /dev/fast1
› SEE ALSO
lvm(8), lvm.conf(5), lvmcache(7), lvmthin(7), lvconvert(8), lvchange(8),
lvextend(8), lvreduce(8), lvremove(8), lvrename(8) lvs(8), lvscan(8), vgcreate(8),
blkid(8)
ExternalOrigin | Origin | Pool}LogicalVolumeName] [PhysicalVolumePath[:PE[-
PE]]…]
lvcreate [-l|—extents LogicalExtentsNumber[%{FREE|ORIGIN|PVS|VG}] | -L|—
size LogicalVolumeSize[bBsSkKmMgGtTpPeE]] [-c|—chunksize
ChunkSize[bBsSkKmMgG]] [—commandprofile Profilename] [—noudevsync] [—
ignoremonitoring] [—metadataProfile ProfileName] [—monitor {y|n}] [-n|—
name SnapshotLogicalVolume{Name|Path}] -s|—snapshot|-H|—cache
{[VolumeGroup{Name|Path}/]OriginalLogicalVolumeName -V|—virtualsize
VirtualSize[bBsSkKmMgGtTpPeE]}
› DESCRIPTION
lvcreate creates a new logical volume in a volume group (see
vgcreate(8),vgchange(8)) by allocating logical extents from the free physical extent
pool of that volume group. If there are not enough free physical extents then the
volume group can be extended (see vgextend(8)) with other physical volumes or by
reducing existing logical volumes of this volume group in size (see lvreduce(8)). If
you specify one or more PhysicalVolumes, allocation of physical extents will be
restricted to these volumes. The second form supports the creation of snapshot logical
volumes which keep the contents of the original logical volume for backup purposes.
› OPTIONS
See lvm(8) for common options.
-a,—activate{y|ay|n|ey|en|ly|ln}
Controls the availability of the Logical Volumes for immediate use after the
command finishes running. By default, new Logical Volumes are activated (-ay). If it
is possible technically, -an will leave the new Logical Volume inactive. But for
example, snapshots of active origin can only be created in the active state so -an
cannot be used with —type snapshot. This does not apply to thin volume snapshots,
which are by default created with flag to skip their activation (-ky). Normally the —
zero n argument has to be supplied too because zeroing (the default behaviour) also
requires activation. If autoactivation option is used (-aay), the logical volume is
activated only if it matches an item in the activation/auto_activation_volume_list set
in lvm.conf(5). For autoactivated logical volumes, —zero n and —wipesignatures n
is always assumed and it can’t be overridden. If the clustered locking is enabled, -aey
will activate exclusively on one node and -a{a|l}y will activate only on the local
node.
-H,—cache
Creates cache or cache pool logical volume or both. Specifying the optional argument
—size will cause the creation of the cache logical volume. Specifying both arguments
will cause the creation of cache with its cache pool volume. When the Volume group
name is specified together with existing logical volume name which is NOT a cache
pool name, such volume is treaded as cache origin volume and cache pool is created.
In this case the —size is used to specify size of cache pool volume. See lvmcache(7)
for more info about caching support. Note that the cache segment type requires a dm-
cache kernel module version 1.3.0 or greater.
—cachemode{passthrough|writeback|writethrough}
Specifying a cache mode determines when the writes to a cache LV are considered
complete. When writeback is specified, a write is considered complete as soon as it is
stored in the cache pool LV. If writethough is specified, a write is considered
complete only when it has been stored in the cache pool LV and on the origin LV.
While writethrough may be slower for writes, it is more resilient if something should
happen to a device associated with the cache pool LV.
—cachepolicy policy
Only applicable to cached LVs; see also lvmcache(7). Sets the cache policy. mq is the
basic policy name. smq is more advanced version available in newer kernels.
—cachepoolCachePoolLogicalVolume{Name|Path}
Specifies the name of cache pool volume name. The other way to specify pool name
is to append name to Volume group name argument.
—cachesettingskey=value
Only applicable to cached LVs; see also lvmcache(7). Sets the cache tunable settings.
In most use-cases, default values should be adequate. Special string value default
switches setting back to its default kernel value and removes it from the list of
settings stored in lvm2 metadata.
-c,—chunksizeChunkSize[bBsSkKmMgG]
Gives the size of chunk for snapshot, cache pool and thin pool logical volumes.
Default unit is in kilobytes. For snapshots the value must be power of 2 between
4KiB and 512KiB and the default value is 4KiB. For cache pools the value must a
multiple of 32KiB between 32KiB and 1GiB. The default is 64KiB. For thin pools
the value must be a multiple of 64KiB between 64KiB and 1GiB. Default value starts
with 64KiB and grows up to fit the pool metadata size within 128MiB, if the pool
metadata size is not specified. See lvm.conf(5) setting
allocation/thin_pool_chunk_size_policy to select different calculation policy. Thin
pool target version <1.4 requires this value to be a power of 2. For target version <1.5
discard is not supported for non power of 2 values.
-C,—contiguous{y|n}
Sets or resets the contiguous allocation policy for logical volumes. Default is no
contiguous allocation based on a next free principle.
—corelog
This is shortcut for option —mirrorlog core.
—discards{ignore|nopassdown|passdown}
Sets discards behavior for thin pool. Default is passdown.
—errorwhenfull{y|n}
Configures thin pool behaviour when data space is exhausted. Default is no. Device
will queue I/O operations until target timeout (see dm-thin-pool kernel module option
no_space_timeout) expires. Thus configured system has a time to i.e. extend the size
of thin pool data device. When set to yes, the I/O operation is immeditelly errored.
-K,—ignoreactivationskip
Ignore the flag to skip Logical Volumes during activation. Use —setactivationskip
option to set or reset activation skipping flag persistently for logical volume.
—ignoremonitoring
Make no attempt to interact with dmeventd unless —monitor is specified.
-l,—extentsLogicalExtentsNumber[%{VG|PVS|FREE|ORIGIN}]
Gives the number of logical extents to allocate for the new logical volume. The total
number of physical extents allocated will be greater than this, for example, if the
volume is mirrored. The number can also be expressed as a percentage of the total
space in the Volume Group with the suffix %VG, as a percentage of the remaining
free space in the Volume Group with the suffix %FREE, as a percentage of the
remaining free space for the specified PhysicalVolume(s) with the suffix %PVS, or
(for a snapshot) as a percentage of the total space in the Origin Logical Volume with
the suffix %ORIGIN (i.e. 100%ORIGIN provides space for the whole origin). When
expressed as a percentage, the number is treated as an approximate upper limit for the
total number of physical extents to be allocated (including extents used by any
mirrors, for example).
-j,—majormajor
Sets the major number. Major numbers are not supported with pool volumes. This
option is supported only on older systems (kernel version 2.4) and is ignored on
modern Linux systems where major numbers are dynamically assigned.
—metadataprofileProfileName
Uses and attaches the ProfileName configuration profile to the logical volume
metadata. Whenever the logical volume is processed next time, the profile is
automatically applied. If the volume group has another profile attached, the logical
volume profile is preferred. See lvm.conf(5) for more information about metadata
profiles.
—minor minor
Sets the minor number. Minor numbers are not supported with pool volumes.
-m,—mirrorsMirrors
Creates a mirrored logical volume with Mirrors copies. For example, specifying -m 1
would result in a mirror with two-sides; that is, a linear volume plus one copy.
Specifying the optional argument —nosync will cause the creation of the mirror to
skip the initial resynchronization. Any data written afterwards will be mirrored, but
the original contents will not be copied. This is useful for skipping a potentially long
and resource intensive initial sync of an empty device.
There are two implementations of mirroring which can be used and correspond to the
“raid1 and mirror segment types. The default is raid1“. See the —type option for
more information if you would like to use the legacy “mirror” segment type. See
lvm.conf(5) settings global/mirror_segtype_default and
global/raid10_segtype_default to configure default mirror segment type. The options
—mirrorlog and —corelog apply to the legacy “mirror” segment type only.
—mirrorlog{disk|core|mirrored}
Specifies the type of log to be used for logical volumes utilizing the legacy “mirror”
segment type. The default is disk, which is persistent and requires a small amount of
storage space, usually on a separate device from the data being mirrored. Using core
means the mirror is regenerated by copying the data from the first device each time
the logical volume is activated, like after every reboot. Using mirrored will create a
persistent log that is itself mirrored.
—monitor{y|n}
Starts or avoids monitoring a mirrored, snapshot or thin pool logical volume with
dmeventd, if it is installed. If a device used by a monitored mirror reports an I/O
error, the failure is handled according to activation/mirror_image_fault_policy and
activation/mirror_log_fault_policy set in lvm.conf(5).
-n,—nameLogicalVolume{Name|Path}
Sets the name for the new logical volume. Without this option a default name of
“lvol#” will be generated where # is the LVM internal number of the logical volume.
—nosync
Causes the creation of the mirror to skip the initial resynchronization.
—noudevsync
Disables udev synchronisation. The process will not wait for notification from udev.
It will continue irrespective of any possible udev processing in the background. You
should only use this if udev is not running or has rules that ignore the devices LVM2
creates.
-p,—permission{r|rw}
Sets access permissions to read only (r) or read and write (rw). Default is read and
write.
-M,—persistent{y|n}
Set to y to make the minor number specified persistent. Pool volumes cannot have
persistent major and minor numbers. Defaults to yes only when major or minor
number is specified. Otherwise it is no.
—poolmetadatasizeMetadataVolumeSize[bBsSkKmMgG] Sets the size of pool’s
metadata logical volume. Supported values are in range between 2MiB and 16GiB
for thin pool, and upto 16GiB for cache pool. The minimum value is computed from
pool’s data size. Default value for thin pool is (Pool_LV_size / Pool_LV_chunk_size
* 64b). Default unit is megabytes.
—poolmetadataspare{
y | n } Controls creation and maintanence of pool metadata spare logical volume that
will be used for automated pool recovery. Only one such volume is maintained within
a volume group with the size of the biggest pool metadata volume. Default is yes.
—[raid]maxrecoveryrateRate[bBsSkKmMgG]
Sets the maximum recovery rate for a RAID logical volume. Rate is specified as an
amount per second for each device in the array. If no suffix is given, then
KiB/sec/device is assumed. Setting the recovery rate to 0 means it will be unbounded.
—[raid]minrecoveryrateRate[bBsSkKmMgG]
Sets the minimum recovery rate for a RAID logical volume. Rate is specified as an
amount per second for each device in the array. If no suffix is given, then
KiB/sec/device is assumed. Setting the recovery rate to 0 means it will be unbounded.
-r,—readahead{ReadAheadSectors|auto|none}
Sets read ahead sector count of this logical volume. For volume groups with metadata
in lvm1 format, this must be a value between 2 and 120. The default value is auto
which allows the kernel to choose a suitable value automatically. None is equivalent
to specifying zero.
-R,—regionsizeMirrorLogRegionSize[bBsSkKmMgG]
A mirror is divided into regions of this size (in MiB), and the mirror log uses this
granularity to track which regions are in sync.
-k,—setactivationskip{y|n}
Controls whether Logical Volumes are persistently flagged to be skipped during
activation. By default, thin snapshot volumes are flagged for activation skip. See
lvm.conf(5) activation/auto_set_activation_skip how to change its default behaviour.
To activate such volumes, an extra -K|—ignoreactivationskip option must be used.
The flag is not applied during deactivation. Use lvchange —setactivationskip {y|n}
command to change the skip flag for existing volumes. To see whether the flag is
attached, use lvs command where the state of the flag is reported within lv_attr bits.
-L,—sizeLogicalVolumeSize[bBsSkKmMgGtTpPeE]
Gives the size to allocate for the new logical volume. A size suffix of B for bytes, S
for sectors as 512 bytes, K for kilobytes, M for megabytes, G for gigabytes, T for
terabytes, P for petabytes or E for exabytes is optional. Default unit is megabytes.
-s,—snapshotOriginalLogicalVolume{Name|Path}
Creates a snapshot logical volume (or snapshot) for an existing, so called original
logical volume (or origin). Snapshots provide a ‘frozen image’ of the contents of the
origin while the origin can still be updated. They enable consistent backups and
online recovery of removed/overwritten data/files. Thin snapshot is created when the
origin is a thin volume and the size IS NOT specified. Thin snapshot shares same
blocks within the thin pool volume. The non thin volume snapshot with the specified
size does not need the same amount of storage the origin has. In a typical scenario,
15-20% might be enough. In case the snapshot runs out of storage, use lvextend(8) to
grow it. Shrinking a snapshot is supported by lvreduce(8) as well. Run lvs(8) on the
snapshot in order to check how much data is allocated to it. Note: a small amount of
the space you allocate to the snapshot is used to track the locations of the chunks of
data, so you should allocate slightly more space than you actually need and monitor
(—monitor) the rate at which the snapshot data is growing so you can avoid running
out of space. If —thinpool is specified, thin volume is created that will use given
original logical volume as an external origin that serves unprovisioned blocks. Only
read-only volumes can be used as external origins. To make the volume external
origin, lvm expects the volume to be inactive. External origin volume can be
used/shared for many thin volumes even from different thin pools. See lvconvert(8)
for online conversion to thin volumes with external origin.
-i,—stripesStripes
Gives the number of stripes. This is equal to the number of physical volumes to
scatter the logical volume. When creating a RAID 4/5/6 logical volume, the extra
devices which are necessary for parity are internally accounted for. Specifying -i3
would use 3 devices for striped logical volumes, 4 devices for RAID 4/5, and 5
devices for RAID 6. Alternatively, RAID 4/5/6 will stripe across all PVs in the
volume group or all of the PVs specified if the -i argument is omitted.
-I,—stripesizeStripeSize
Gives the number of kilobytes for the granularity of the stripes. StripeSize must be
2^n (n = 2 to 9) for metadata in LVM1 format. For metadata in LVM2 format, the
stripe size may be a larger power of 2 but must not exceed the physical extent size.
-T,—thin
Creates thin pool or thin logical volume or both. Specifying the optional argument —
size or —extents will cause the creation of the thin pool logical volume. Specifying
the optional argument —virtualsize will cause the creation of the thin logical volume
from given thin pool volume. Specifying both arguments will cause the creation of
both thin pool and thin volume using this pool. See lvmthin(7) for more info about
thin provisioning support. Thin provisioning requires device mapper kernel driver
from kernel 3.2 or greater.
—thinpoolThinPoolLogicalVolume{Name|Path}
Specifies the name of thin pool volume name. The other way to specify pool name is
to append name to Volume group name argument.
—type SegmentType
Creates a logical volume with the specified segment type. Supported types are:
cache, cache-pool, error, linear, mirror, raid1, raid4, raid5_la, raid5_ls(=raid5),
raid5_ra, raid5_rs, raid6_nc, raid6_nr, raid6_zr(=raid6), raid10, snapshot, striped,
thin, thin-pool or zero. Segment type may have a commandline switch alias that will
enable its use. When the type is not explicitly specified an implicit type is selected
from combination of options: -H|—cache|—cachepool(cache or cachepool), -T|—
thin|—thinpool(thin or thinpool), -m|—mirrors(raid1 or mirror), -s|—snapshot|-V|
—virtualsize(snapshot or thin), -i|—stripes(striped). Default type is linear.
-V,—virtualsizeVirtualSize[bBsSkKmMgGtTpPeE]
Creates a thinly provisioned device or a sparse device of the given size (in MiB by
default). See lvm.conf(5) settings global/sparse_segtype_default to configure default
sparse segment type. See lvmthin(7) for more info about thin provisioning support.
Anything written to a sparse snapshot will be returned when reading from it. Reading
from other areas of the device will return blocks of zeros. Virtual snapshot is
implemented by creating a hidden virtual device of the requested size using the zero
target. A suffix of _vorigin is used for this device. Note: using sparse snapshots is not
efficient for larger device sizes (GiB), thin provisioning should be used for this case.
-W,—wipesignatures{y|n}
Controls wiping of detected signatures on newly created Logical Volume. If this
option is not specified, then by default signature wiping is done each time the zeroing
(-Z/—zero) is done. This default behaviour can be controlled by
allocation/wipe_signatures_when_zeroing_new_lvs setting found in lvm.conf(5). If
blkid wiping is used allocation/use_blkid_wiping setting in lvm.conf(5)) and LVM2
is compiled with blkid wiping support, then blkid(8) library is used to detect the
signatures (use blkid -k command to list the signatures that are recognized).
Otherwise, native LVM2 code is used to detect signatures (MD RAID, swap and
LUKS signatures are detected only in this case). Logical volume is not wiped if the
read only flag is set.
-Z,—zero{y|n}
Controls zeroing of the first 4KiB of data in the new logical volume. Default is yes.
Snapshot COW volumes are always zeroed. Logical volume is not zeroed if the read
only flag is set.
Warning: trying to mount an unzeroed logical volume can cause the system to hang.
› EXAMPLES
Creates a striped logical volume with 3 stripes, a stripe size of 8KiB and a size of
100MiB in the volume group named vg00. The logical volume name will be chosen
by lvcreate:
lvcreate -i 3 -I 8 -L 100M vg00
Creates a mirror logical volume with 2 sides with a useable size of 500 MiB. This
operation would require 3 devices (or option —alloc anywhere ) - two for the mirror
devices and one for the disk log:
lvcreate -m1 -L 500M vg00
Creates a mirror logical volume with 2 sides with a useable size of 500 MiB. This
operation would require 2 devices - the log is “in-memory”:
lvcreate -m1 —mirrorlog core -L 500M vg00
Creates a snapshot logical volume named “vg00/snap” which has access to the
contents of the original logical volume named “vg00/lvol1” at snapshot logical
volume creation time. If the original logical volume contains a file system, you can
mount the snapshot logical volume on an arbitrary directory in order to access the
contents of the filesystem to run a backup while the original filesystem continues to
get updated:
lvcreate —size 100m —snapshot —name snap /dev/vg00/lvol1
Creates a snapshot logical volume named “vg00/snap” with size for overwriting 20%
of the original logical volume named “vg00/lvol1”.:
lvcreate -s -l 20%ORIGIN —name snap vg00/lvol1
Creates a sparse device named /dev/vg1/sparse of size 1TiB with space for just under
100MiB of actual data on it:
lvcreate —virtualsize 1T —size 100M —snapshot —name sparse vg1
Creates a linear logical volume “vg00/lvol1” using physical extents /dev/sda:0-7 and
/dev/sdb:0-7 for allocation of extents:
lvcreate -L 64M -n lvol1 vg00 /dev/sda:0-7 /dev/sdb:0-7
Creates a 5GiB RAID5 logical volume “vg00/my_lv”, with 3 stripes (plus a parity
drive for a total of 4 devices) and a stripesize of 64KiB:
lvcreate —type raid5 -L 5G -i 3 -I 64 -n my_lv vg00
Creates a RAID5 logical volume “vg00/my_lv”, using all of the free space in the VG
and spanning all the PVs in the VG:
lvcreate —type raid5 -l 100%FREE -n my_lv vg00
Creates a 5GiB RAID10 logical volume “vg00/my_lv”, with 2 stripes on 2 2-way
mirrors. Note that the -i and -m arguments behave differently. The -i specifies the
number of stripes. The -m specifies the number of additional copies:
lvcreate —type raid10 -L 5G -i 2 -m 1 -n my_lv vg00
Creates 100MiB pool logical volume for thin provisioning build with 2 stripes 64KiB
and chunk size 256KiB together with 1TiB thin provisioned logical volume
“vg00/thin_lv”:
lvcreate -i 2 -I 64 -c 256 -L100M -T vg00/pool -V 1T —name thin_lv
Creates a thin snapshot volume “thinsnap” of thin volume “thinvol” that will share
the same blocks within the thin pool. Note: the size MUST NOT be specified,
otherwise the non-thin snapshot is created instead:
lvcreate -s vg00/thinvol —name thinsnap
Creates a thin snapshot volume of read-only inactive volume “origin” which then
becomes the thin external origin for the thin snapshot volume in vg00 that will use an
existing thin pool “vg00/pool”:
lvcreate -s —thinpool vg00/pool origin
Create a cache pool LV that can later be used to cache one logical volume.
lvcreate —type cache-pool -L 1G -n my_lv_cachepool vg /dev/fast1
If there is an existing cache pool LV, create the large slow device (i.e. the origin LV)
and link it to the supplied cache pool LV, creating a cache LV.
lvcreate —cache -L 100G -n my_lv vg/my_lv_cachepool /dev/slow1
If there is an existing logical volume, create the small and fast cache pool LV and link
it to the supplied existing logical volume (i.e. the origin LV), creating a cache LV.
lvcreate —type cache -L 1G -n my_lv_cachepool vg/my_lv /dev/fast1
› SEE ALSO
lvm(8), lvm.conf(5), lvmcache(7), lvmthin(7), lvconvert(8), lvchange(8),
lvextend(8), lvreduce(8), lvremove(8), lvrename(8) lvs(8), lvscan(8), vgcreate(8),
blkid(8)
LVDISPLAY
› NAME
lvdisplay – display attributes of a logical volume
› SYNOPSIS
lvdisplay [-a|—all] [-c|—colon] [—commandprofile ProfileName] [-d|—debug] [-
h|-?|—help] [—ignorelockingfailure] [—ignoreskippedcluster] [—maps] [—
nosuffix] [-P|—partial] [-S|—select Selection] [—units hHbBsSkKmMgGtTpPeE] [-
v|—verbose] [—version] [VolumeGroupName|LogicalVolume{Name|Path} …]
lvdisplay -C|—columns [—aligned] [—binary] [-a|—all] [—commandprofile
ProfileName] [-d|—debug] [-h|-?|—help] [—ignorelockingfailure] [—
ignoreskippedcluster] [—noheadings] [—nosuffix] [-o|—options
[+]Field[,Field…]] [-O|—sort [+|-]Key1[,[+|-]Key2…]] [-P|—partial] [—
segments] [-S|—select Selection] [—separator Separator] [—unbuffered] [—units
hHbBsSkKmMgGtTpPeE] [-v|—verbose] [—version]
[VolumeGroupName|LogicalVolume{Name|Path} …]
› DESCRIPTION
lvdisplay allows you to see the attributes of a logical volume like size, read/write
status, snapshot information etc.
lvs(8) is an alternative that provides the same information in the style of ps(1). lvs(8)
is recommended over lvdisplay.
› OPTIONS
See lvm(8) for common options and lvs for options given with —columns.
—all
Include information in the output about internal Logical Volumes that are
components of normally-accessible Logical Volumes, such as mirrors, but which are
not independently accessible (e.g. not mountable). For example, after creating a
mirror using lvcreate -m1 —mirrorlog disk, this option will reveal three internal
Logical Volumes, with suffixes mimage_0, mimage_1, and mlog.
-C, —columns
Display output in columns, the equivalent of lvs(8). Options listed are the same as
options given in lvs(8).
-c, —colon
Generate colon separated output for easier parsing in scripts or programs. N.B. lvs(8)
provides considerably more control over the output. The values are: logical
volume name volume group name logical volume access logical volume
status internal logical volume number open count of logical volume
logical volume size in sectors current logical extents associated to
logical volume allocated logical extents of logical volume allocation
policy of logical volume read ahead sectors of logical volume major
device number of logical volume minor device number of logical volume
-m, —maps
Display the mapping of logical extents to physical volumes and physical extents. To
map physical extents to logical extents use: pvs —segments -
o+lv_name,seg_start_pe,segtype
› EXAMPLES
Shows attributes of that logical volume. If snapshot logical volumes have been
created for this original logical volume, this command shows a list of all snapshot
logical volumes and their status (active or inactive) as well:
lvdisplay -v vg00/lvol2
Shows the attributes of this snapshot logical volume and also which original logical
volume it is associated with:
lvdisplay vg00/snapshot
› SEE ALSO
lvm(8), lvcreate(8), lvs(8), lvscan(8), pvs(8)
LVEXTEND
› NAME
lvextend – extend the size of a logical volume
› SYNOPSIS
lvextend [—alloc AllocationPolicy] [-A|—autobackup {y|n}] [—commandprofile
ProfileName] [-d|—debug] [-h|-?|—help] [-f|—force] [-i|—stripes Stripes [-I|—
stripesize StripeSize]] {-l|—extents [+]LogicalExtentsNumber[%
{VG|LV|PVS|FREE|ORIGIN}] | -L|—size
[+]LogicalVolumeSize[bBsSkKmMgGtTpPeE]} [-n|—nofsck] [—noudevsync] [-r|—
resizefs] [—use-policies] [-t|—test] [-v|—verbose] LogicalVolumePath
[PhysicalVolumePath[:PE[-PE]]…]
› DESCRIPTION
lvextend allows you to extend the size of a logical volume. Extension of snapshot
logical volumes (see lvcreate(8) for information to create snapshots) is supported as
well. But to change the number of copies in a mirrored logical volume use
lvconvert(8).
› OPTIONS
See lvm(8) for common options.
-f, —force
Proceed with size extension without prompting.
-l, —extents [+]LogicalExtentsNumber[%{VG|LV|PVS|FREE|ORIGIN}]
Extend or set the logical volume size in units of logical extents. With the ‘+‘ sign the
value is added to the actual size of the logical volume and without it, the value is
taken as an absolute one. The total number of physical extents allocated will be
greater than this, for example, if the volume is mirrored. The number can also be
expressed as a percentage of the total space in the Volume Group with the suffix
%VG, relative to the existing size of the Logical Volume with the suffix %LV, of the
remaining free space for the specified PhysicalVolume(s) with the suffix %PVS, as a
percentage of the remaining free space in the Volume Group with the suffix %FREE,
or (for a snapshot) as a percentage of the total space in the Origin Logical Volume
with the suffix %ORIGIN. The resulting value is rounded upward. N.B. In a future
release, when expressed as a percentage with PVS, VG or FREE, the number will be
treated as an approximate upper limit for the total number of physical extents to be
allocated (including extents used by any mirrors, for example). The code may
currently allocate more space than you might otherwise expect.
-L, —size [+]LogicalVolumeSize[bBsSkKmMgGtTpPeE]
Extend or set the logical volume size in units of megabytes. A size suffix of M for
megabytes, G for gigabytes, T for terabytes, P for petabytes or E for exabytes is
optional. With the + sign the value is added to the actual size of the logical volume
and without it, the value is taken as an absolute one.
-i, —stripes Stripes
Gives the number of stripes for the extension. Not applicable to LVs using the
original metadata LVM format, which must use a single value throughout.
-I, —stripesize StripeSize
Gives the number of kilobytes for the granularity of the stripes. Not applicable to LVs
using the original metadata LVM format, which must use a single value throughout.
StripeSize must be 2^n (n = 2 to 9)
-n, —nofsck
Do not perform fsck before extending filesystem when filesystem requires it. You
may need to use —force to proceed with this option.
—noudevsync
Disable udev synchronisation. The process will not wait for notification from udev. It
will continue irrespective of any possible udev processing in the background. You
should only use this if udev is not running or has rules that ignore the devices LVM2
creates.
-r, —resizefs
Resize underlying filesystem together with the logical volume using fsadm(8).
—use-policies
Resizes the logical volume according to configured policy. See lvm.conf(5) for some
details.
› EXAMPLES
Extends the size of the logical volume “vg01/lvol10” by 54MiB on physical volume
/dev/sdk3. This is only possible if /dev/sdk3 is a member of volume group vg01 and
there are enough free physical extents in it:
lvextend -L +54 /dev/vg01/lvol10 /dev/sdk3
Extends the size of logical volume “vg01/lvol01” by the amount of free space on
physical volume /dev/sdk3. This is equivalent to specifying “-l +100%PVS” on the
command line:
lvextend /dev/vg01/lvol01 /dev/sdk3
Extends a logical volume “vg01/lvol01” by 16MiB using physical extents /dev/sda:8-
9 and /dev/sdb:8-9 for allocation of extents:
lvextend -L+16M vg01/lvol01 /dev/sda:8-9 /dev/sdb:8-9
› SEE ALSO
fsadm(8), lvm(8), lvm.conf(5), lvcreate(8), lvconvert(8), lvreduce(8), lvresize(8),
lvchange(8)
LVMCONFIG
› NAME
lvmconfig, lvm dumpconfig, lvm config – Display LVM configuration
› SYNOPSIS
lvmconfig [-f|—file filename] [—type
{current|default|diff|full|list|missing|new|profilable|profilable-command|profilable-
metadata} [—atversion version] [—ignoreadvanced] [—ignoreunsupported] [—
ignorelocal] [-l|—list] [—config ConfigurationString] [—commandprofile
ProfileName] [—profile ProfileName] [—metadataprofile ProfileName] [—
mergedconfig] [—showdeprecated] [—showunsupported] [—validate] [—
withsummary] [—withcomments] [—withspaces] [—withversions]
[ConfigurationNode…]
› DESCRIPTION
lvmconfig produces formatted output from the LVM configuration tree. The
command was added in release 2.02.119 and has an identical longer form lvm
dumpconfig.
› OPTIONS
-f, —file filename
Send output to a file named ‘filename’.
-l, —list
List configuration settings with summarizing comment. This is the same as using
lvmconfig —type list —withsummary.
—type {current|default|diff|full|missing|new|profilable|profilable-command|profilable-
metadata}
Select the type of configuration to display. The configuration settings displayed have
either default values or currently-used values assigned based on the type selected. If
no type is selected, —type current is used by default. Whenever a configuration
setting with a default value is commented out, it means the setting does not have any
concrete default value defined. Output can be saved and used as a proper lvm.conf(5)
file.
current
Display the current lvm.conf configuration merged with any tag config if used. See
also lvm.conf(5) for more info about LVM configuration methods.
default
Display all possible configuration settings with default values assigned.
diff
Display all configuration settings for which the values used differ from defaults. The
value assigned for each configuration setting is the value currently used. This is
actually minimal LVM configuration which can be used without a change to current
configured behaviour.
full
Display full configuration tree - a combination of current configuration tree (—type
current) and tree of settings for which default values are used (—type missing). This
is exactly the configuration tree that LVM2 uses during command execution. Using
this type also implies the use of —mergedconfig option. If comments are displayed
(see —withcomments and —withsummary options), then for each setting found in
existing configuration and for which defaults are not used, there’s an extra comment
line printed to denote this.
list
Display plain list of configuration settings.
missing
Display all configuration settings with default values assigned which are missing in
the configuration currently used and for which LVM automatically fallbacks to using
these default values.
new
Display all new configuration settings introduced in current LVM version or specific
version as defined by —atversion option.
profilable
Display all profilable configuration settings with default values assigned. See
lvm.conf(5) for more info about profile config method.
profilable-command
Display all profilable configuration settings with default values assigned that can be
used in command profile. This is a subset of settings displayed by —type —
profilable.
profilable-metadata
Display all profilable configuration settings with default values assigned that can be
used in metadata profile. This is a subset of settings displayed by —type —
profilable.
—atversion version Specify an LVM version in x.y.z format where x is the major version,
the y is the minor version and z is the patchlevel (e.g. 2.2.106). When configuration is
displayed, the configuration settings recognized at this LVM version will be considered
only. This can be used to display a configuration that a certain LVM version understands
and which does not contain any newer settings for which LVM would issue a warning
message when checking the configuration.
—ignoreadvanced Exclude advanced configuration settings from the output.
—ignoreunsupported Exclude unsupported configuration settings from the output. These
settings are either used for debugging and development purposes only or their support is
not yet complete and they are not meant to be used in production. The current and diff
types include unsupported settings in their output by default, all the other types ignore
unsupported settings.
—ignorelocal Ignore local section.
—config ConfigurationString Use ConfigurationString to override existing
configuration. This configuration is then applied for the lvmconfig command itself. See
also lvm.conf(5) for more info about config cascade.
—commandprofile ProfileName Use ProfileName to override existing configuration.
This configuration is then applied for the lvmconfig command itself. See also —
mergedconfig option and lvm.conf(5) for more info about config cascade.
—profile ProfileName The same as using —commandprofile but the configuration is not
applied for the lvmconfig command itself.
—metadataprofile ProfileName Use ProfileName to override existing configuration. The
configuration defined in metadata profile has no effect for the lvmconfig command itself.
lvmconfig displays the configuration only. See also —mergedconfig option and
lvm.conf(5) for more info about config cascade.
—mergedconfig When the lvmconfig command is run with the —config option and/or —
commandprofile (or using LVM_COMMAND_PROFILE environment variable), —
profile, —metadataprofile option, merge all the contents of the config cascade before
displaying it. Without the —mergeconfig option used, only the configuration at the front
of the cascade is displayed. See also lvm.conf(5) for more info about config cascade.
—showdeprecated Include deprecated configuration settings in the output. These settings
are always deprecated since certain version. If concrete version is specified with —
atversion option, deprecated settings are automatically included if specified version is
lower that the version in which the settings were deprecated. The current and diff types
include deprecated settings int their output by default, all the other types ignore deprecated
settings.
—showunsupported Include unsupported configuration settings in the output. These
settings are either used for debugging or development purposes only or their support is not
yet complete and they are not meant to be used in production. The current and diff types
include unsupported settings in their output by default, all the other types ignore
unsupported settings.
—validate Validate current configuration used and exit with appropriate return code. The
validation is done only for the configuration at the front of the config cascade. To validate
the whole merged configuration tree, use also the —mergedconfig option. The validation
is done even if config/checks lvm.conf(5) option is disabled.
—withsummary Display a one line comment for each configuration node.
—withcomments Display a full comment for each configuration node. For deprecated
settings, also display comments about deprecation in addition.
—withspaces Where appropriate, add more spaces in output for better readability.
—withversions Also display a comment containing the version of introduction for each
configuration node. If the setting is deprecated, also display the version since which it is
deprecated.
› SEE ALSO
lvm(8) lvmconf(8) lvm.conf(5)
LVPOLL
› NAME
lvpoll – Internal command used by lvmpolld to complete some Logical Volume
operations.
› SYNOPSIS
lvm lvpoll —polloperation {pvmove|convert|merge|merge_thin} [—abort] [-A|—
autobackup {y|n}] [—commandprofile ProfileName] [-d|—debug] [-h|-?|—help]
[—handlemissingpvs] [-i|—interval Seconds] [-t|—test] [-v|—verbose] [—version]
LogicalVolume[Path]
› DESCRIPTION
lvpoll is an internal command used by lvmpolld(8) to monitor and complete
lvconvert(8) and pvmove(8) operations. lvpoll itself does not initiate these
operations and you should never normally need to invoke it directly.
LogicalVolume The Logical Volume undergoing conversion or, in the case of
pvmove, the name of the internal pvmove Logical Volume (see EXAMPLES).
› OPTIONS
See lvm(8) for common options.
—polloperation {convert|merge|merge_thin|pvmove}
Mandatory option. pvmove refers to a pvmove operation that is moving data. convert
refers to an operation that is increasing the number of redundant copies of data
maintained by a mirror. merge indicates a merge operation that doesn’t involve thin
volumes. merge_thin indicates a merge operation involving thin snapshots.
pvmove(8) and lvconvert(8) describe how to initiate these operations.
—abort
Abort pvmove in progress. See pvmove(8).
—handlemissingpvs
Used when the polling operation needs to handle missing PVs to be able to continue.
This can happen when lvconvert(8) is repairing a mirror with one or more faulty
devices.
-i, —interval Seconds
Report progress at regular intervals
› EXAMPLES
Resume polling of a pvmove operation identified by the Logical Volume
vg00/pvmove0:
lvm lvpoll —polloperation pvmove vg00/pvmove0
Abort the same pvmove operation:
lvm lvpoll —polloperation pvmove —abort vg00/pvmove0
To find out the name of the pvmove Logical Volume resulting from an original
pvmove /dev/sda1 command you may use the following lvs command. (Remove the
parentheses from the LV name.)
lvs -a -S move_pv=/dev/sda1
Resume polling of mirror conversion vg00/lvmirror:
lvm lvpoll —polloperation convert vg00/lvmirror
Complete mirror repair:
lvm lvpoll —polloperation convert vg/damaged_mirror —handlemissingpvs
Process snapshot merge:
lvm lvpoll —polloperation merge vg/snapshot_old
Finish thin snapshot merge:
lvm lvpoll —polloperation merge_thin vg/thin_snapshot
› SEE ALSO
lvconvert(8), lvm(8), lvmpolld(8), lvs(8), pvmove(8)
LVM
› NAME
lvm – LVM2 tools
› SYNOPSIS
lvm [command | file]
› DESCRIPTION
lvm provides the command-line tools for LVM2. A separate manual page describes
each command in detail.
If lvm is invoked with no arguments it presents a readline prompt (assuming it was
compiled with readline support). LVM commands may be entered interactively at this
prompt with readline facilities including history and command name and option
completion. Refer to readline(3) for details.
If lvm is invoked with argv[0] set to the name of a specific LVM command (for
example by using a hard or soft link) it acts as that command.
On invocation, lvm requires that only the standard file descriptors stdin, stdout and
stderr are available. If others are found, they get closed and messages are issued
warning about the leak. This warning can be suppressed by setting the environment
variable LVM_SUPPRESS_FD_WARNINGS.
Where commands take VG or LV names as arguments, the full path name is optional.
An LV called “lvol0” in a VG called “vg0” can be specified as “vg0/lvol0”. Where a
list of VGs is required but is left empty, a list of all VGs will be substituted. Where a
list of LVs is required but a VG is given, a list of all the LVs in that VG will be
substituted. So lvdisplay vg0 will display all the LVs in “vg0”. Tags can also be used
- see —addtag below.
One advantage of using the built-in shell is that configuration information gets
cached internally between commands.
A file containing a simple script with one command per line can also be given on the
command line. The script can also be executed directly if the first line is #! followed
by the absolute path of lvm.
› BUILT-IN COMMANDS
The following commands are built into lvm without links normally being created in
the filesystem for them.
config – The same as lvmconfig(8) below.
devtypes – Display the recognised built-in block device types.
dumpconfig – The same as lvmconfig(8) below.
formats – Display recognised metadata formats.
help – Display the help text.
lvpoll – Internal command used by lvmpolld to complete some Logical Volume
operations.
pvdata – Not implemented in LVM2.
segtypes – Display recognised Logical Volume segment types.
systemid – Display the system ID, if any, currently set on this host.
tags – Display any tags defined on this host.
version – Display version information.
› COMMANDS
The following commands implement the core LVM functionality.
pvchange – Change attributes of a Physical Volume.
pvck – Check Physical Volume metadata.
pvcreate – Initialize a disk or partition for use by LVM.
pvdisplay – Display attributes of a Physical Volume.
pvmove – Move Physical Extents.
pvremove – Remove a Physical Volume.
pvresize – Resize a disk or partition in use by LVM2.
pvs – Report information about Physical Volumes.
pvscan – Scan all disks for Physical Volumes.
vgcfgbackup – Backup Volume Group descriptor area.
vgcfgrestore – Restore Volume Group descriptor area.
vgchange – Change attributes of a Volume Group.
vgck – Check Volume Group metadata.
vgconvert – Convert Volume Group metadata format.
vgcreate – Create a Volume Group.
vgdisplay – Display attributes of Volume Groups.
vgexport – Make volume Groups unknown to the system.
vgextend – Add Physical Volumes to a Volume Group.
vgimport – Make exported Volume Groups known to the system.
vgimportclone – Import and rename duplicated Volume Group (e.g. a hardware snapshot).
vgmerge – Merge two Volume Groups.
vgmknodes – Recreate Volume Group directory and Logical Volume special files
vgreduce – Reduce a Volume Group by removing one or more
Physical Volumes.
vgremove – Remove a Volume Group.
vgrename – Rename a Volume Group.
vgs – Report information about Volume Groups.
vgscan – Scan all disks for Volume Groups and rebuild caches.
vgsplit – Split a Volume Group into two, moving any logical
volumes from one Volume Group to another by moving entire Physical Volumes.
lvchange – Change attributes of a Logical Volume.
lvconvert – Convert a Logical Volume from linear to mirror or snapshot.
lvcreate – Create a Logical Volume in an existing Volume Group.
lvdisplay – Display attributes of a Logical Volume.
lvextend – Extend the size of a Logical Volume.
lvmchange – Change attributes of the Logical Volume Manager.
lvmconfig – Display the configuration information after
loading lvm.conf(5) and any other configuration files.
lvmdiskscan – Scan for all devices visible to LVM2.
lvmdump – Create lvm2 information dumps for diagnostic purposes.
lvreduce – Reduce the size of a Logical Volume.
lvremove – Remove a Logical Volume.
lvrename – Rename a Logical Volume.
lvresize – Resize a Logical Volume.
lvs – Report information about Logical Volumes.
lvscan – Scan (all disks) for Logical Volumes.
The following commands are not implemented in LVM2 but might be in the future:
lvmsadc, lvmsar, pvdata.
› OPTIONS
The following options are available for many of the commands. They are
implemented generically and documented here rather than repeated on individual
manual pages.
Additional hyphens within option names are ignored. For example, —readonly and
—read-only are both accepted.
-h, -?, —help
Display the help text.
—version
Display version information.
-v, —verbose
Set verbose level. Repeat from 1 to 3 times to increase the detail of messages sent to
stdout and stderr. Overrides config file setting.
-d, —debug
Set debug level. Repeat from 1 to 6 times to increase the detail of messages sent to
the log file and/or syslog (if configured). Overrides config file setting.
-q, —quiet
Suppress output and log messages. Overrides -d and -v. Repeat once to also suppress
any prompts with answer ‘no’.
—yes
Don’t prompt for confirmation interactively but instead always assume the answer is
‘yes’. Take great care if you use this!
-t, —test
Run in test mode. Commands will not update metadata. This is implemented by
disabling all metadata writing but nevertheless returning success to the calling
function. This may lead to unusual error messages in multi-stage operations if a tool
relies on reading back metadata it believes has changed but hasn’t.
—driverloaded {y|n}
Whether or not the device-mapper kernel driver is loaded. If you set this to n, no
attempt will be made to contact the driver.
-A, —autobackup {y|n}
Whether or not to metadata should be backed up automatically after a change. You
are strongly advised not to disable this! See vgcfgbackup(8).
-P, —partial
When set, the tools will do their best to provide access to Volume Groups that are
only partially available (one or more Physical Volumes belonging to the Volume
Group are missing from the system). Where part of a logical volume is missing,
/dev/ioerror will be substituted, and you could use dmsetup(8) to set this up to
return I/O errors when accessed, or create it as a large block device of nulls. Metadata
may not be changed with this option. To insert a replacement Physical Volume of the
same or large size use pvcreate -u to set the uuid to match the original followed by
vgcfgrestore(8).
-S, —select Selection
For reporting commands, display only rows that match selection criteria. All rows are
displayed with the additional “selected” column (-o selected) showing 1 if the row
matches the Selection and 0 otherwise. For non-reporting commands which process
LVM entities, the selection can be used to match items to process. See SELECTION
CRITERIA section of this man page for more information about the way the
selection criteria are constructed.
-M, —metadatatype Type
Specifies which type of on-disk metadata to use, such as lvm1 or lvm2, which can be
abbreviated to 1 or 2 respectively. The default (lvm2) can be changed by setting
format in the global section of the config file.
—ignorelockingfailure
This lets you proceed with read-only metadata operations such as lvchange -ay and
vgchange -ay even if the locking module fails. One use for this is in a system init
script if the lock directory is mounted read-only when the script runs.
—ignoreskippedcluster
Use to avoid exiting with an non-zero status code if the command is run without
clustered locking and some clustered Volume Groups have to be skipped over.
—readonly
Run the command in a special read-only mode which will read on-disk metadata
without needing to take any locks. This can be used to peek inside metadata used by a
virtual machine image while the virtual machine is running. It can also be used to
peek inside the metadata of clustered Volume Groups when clustered locking is not
configured or running. No attempt will be made to communicate with the device-
mapper kernel driver, so this option is unable to report whether or not Logical
Volumes are actually in use.
—foreign
Cause the command to access foreign VGs, that would otherwise be skipped. It can
be used to report or display a VG that is owned by another host. This option can
cause a command to perform poorly because lvmetad caching is not used and
metadata is read from disks.
—shared
Cause the command to access shared VGs, that would otherwise be skipped when
lvmlockd is not being used. It can be used to report or display a lockd VG without
locking.
—addtag Tag
Add the tag Tag to a PV, VG or LV. Supply this argument multiple times to add more
than one tag at once. A tag is a word that can be used to group LVM2 objects of the
same type together. Tags can be given on the command line in place of PV, VG or LV
arguments. Tags should be prefixed with @ to avoid ambiguity. Each tag is expanded
by replacing it with all objects possessing that tag which are of the type expected by
its position on the command line. PVs can only possess tags while they are part of a
Volume Group: PV tags are discarded if the PV is removed from the VG. As an
example, you could tag some LVs as database and others as userdata and then
activate the database ones with lvchange -ay @database. Objects can possess
multiple tags simultaneously. Only the new LVM2 metadata format supports tagging:
objects using the LVM1 metadata format cannot be tagged because the on-disk
format does not support it. Characters allowed in tags are: A-Z a-z 0-9 _ + . - and as
of version 2.02.78 the following characters are also accepted: / = ! : # &
—deltag Tag
Delete the tag Tag from a PV, VG or LV, if it’s present. Supply this argument multiple
times to remove more than one tag at once.
—alloc {anywhere|contiguous|cling|inherit|normal}
Selects the allocation policy when a command needs to allocate Physical Extents
from the Volume Group. Each Volume Group and Logical Volume has an allocation
policy defined. The default for a Volume Group is normal which applies common-
sense rules such as not placing parallel stripes on the same Physical Volume. The
default for a Logical Volume is inherit which applies the same policy as for the
Volume Group. These policies can be changed using lvchange(8) and vgchange(8)
or overridden on the command line of any command that performs allocation. The
contiguous policy requires that new Physical Extents be placed adjacent to existing
Physical Extents. The cling policy places new Physical Extents on the same Physical
Volume as existing Physical Extents in the same stripe of the Logical Volume. If
there are sufficient free Physical Extents to satisfy an allocation request but normal
doesn’t use them, anywhere will - even if that reduces performance by placing two
stripes on the same Physical Volume.
—commandprofile ProfileName
Selects the command configuration profile to use when processing an LVM
command. See also lvm.conf(5) for more information about command profile
config and the way it fits with other LVM configuration methods. Using —
commandprofile option overrides any command profile specified via
LVM_COMMAND_PROFILE environment variable.
—metadataprofile ProfileName
Selects the metadata configuration profile to use when processing an LVM command.
When using metadata profile during Volume Group or Logical Volume creation, the
metadata profile name is saved in metadata. When such Volume Group or Logical
Volume is processed next time, the metadata profile is automatically applied and the
use of —metadataprofile option is not necessary. See also lvm.conf(5) for more
information about metadata profile config and the way it fits with other LVM
configuration methods.
—profile ProfileName
A short form of —metadataprofile for vgcreate, lvcreate, vgchange and lvchange
command and a short form of —commandprofile for any other command (with the
exception of lvmconfig command where the —profile has special meaning, see
lvmconfig(8) for more information).
—config ConfigurationString
Uses the ConfigurationString as direct string representation of the configuration to
override the existing configuration. The ConfigurationString is of exactly the same
format as used in any LVM configuration file. See lvm.conf(5) for more information
about direct config override on command line and the way it fits with other LVM
configuration methods.
› VALID NAMES
The valid characters for VG and LV names are: a-z A-Z 0-9 + _ . -
VG and LV names cannot begin with a hyphen. There are also various reserved
names that are used internally by lvm that can not be used as LV or VG names. A VG
cannot be called anything that exists in /dev/ at the time of creation, nor can it be
called ‘.’ or ‘..’. An LV cannot be called ‘.’, ‘..’, ‘snapshot’ or ‘pvmove’. The LV
name may also not contain any of the following strings: ‘_cdata’, ‘_cmeta’, ‘_corig’,
‘_mlog’, ‘_mimage’, ‘_pmspare’, ‘_rimage’, ‘_rlog’, ‘_tdata’ or ‘_tmeta’. A directory
bearing the name of each Volume Group is created under /dev when any of its
Logical Volumes are activated. Each active Logical Volume is accessible from this
directory as a symbolic link leading to a device node. Links or nodes in /dev/mapper
are intended only for internal use and the precise format and escaping might change
between releases and distributions. Other software and scripts should use the
/dev/VolumeGroupName/LogicalVolumeName format to reduce the chance of
needing amendment when the software is updated. Should you need to process the
node names in /dev/mapper, you may use dmsetup splitname to separate out the
original VG, LV and internal layer names.
› ALLOCATION
When an operation needs to allocate Physical Extents for one or more Logical
Volumes, the tools proceed as follows:
First of all, they generate the complete set of unallocated Physical Extents in the
Volume Group. If any ranges of Physical Extents are supplied at the end of the
command line, only unallocated Physical Extents within those ranges on the specified
Physical Volumes are considered.
Then they try each allocation policy in turn, starting with the strictest policy
(contiguous) and ending with the allocation policy specified using —alloc or set as
the default for the particular Logical Volume or Volume Group concerned. For each
policy, working from the lowest-numbered Logical Extent of the empty Logical
Volume space that needs to be filled, they allocate as much space as possible
according to the restrictions imposed by the policy. If more space is needed, they
move on to the next policy.
The restrictions are as follows:
Contiguous requires that the physical location of any Logical Extent that is not the
first Logical Extent of a Logical Volume is adjacent to the physical location of the
Logical Extent immediately preceding it.
Cling requires that the Physical Volume used for any Logical Extent to be added to an
existing Logical Volume is already in use by at least one Logical Extent earlier in that
Logical Volume. If the configuration parameter allocation/cling_tag_list is defined,
then two Physical Volumes are considered to match if any of the listed tags is present
on both Physical Volumes. This allows groups of Physical Volumes with similar
properties (such as their physical location) to be tagged and treated as equivalent for
allocation purposes.
When a Logical Volume is striped or mirrored, the above restrictions are applied
independently to each stripe or mirror image (leg) that needs space.
Normal will not choose a Physical Extent that shares the same Physical Volume as a
Logical Extent already allocated to a parallel Logical Volume (i.e. a different stripe or
mirror image/leg) at the same offset within that parallel Logical Volume.
When allocating a mirror log at the same time as Logical Volumes to hold the mirror
data, Normal will first try to select different Physical Volumes for the log and the
data. If that’s not possible and the allocation/mirror_logs_require_separate_pvs
configuration parameter is set to 0, it will then allow the log to share Physical
Volume(s) with part of the data.
When allocating thin pool metadata, similar considerations to those of a mirror log in
the last paragraph apply based on the value of the
allocation/thin_pool_metadata_require_separate_pvs configuration parameter.
If you rely upon any layout behaviour beyond that documented here, be aware that it
might change in future versions of the code.
For example, if you supply on the command line two empty Physical Volumes that
have an identical number of free Physical Extents available for allocation, the current
code considers using each of them in the order they are listed, but there is no
guarantee that future releases will maintain that property. If it is important to obtain a
specific layout for a particular Logical Volume, then you should build it up through a
sequence of lvcreate(8) and lvconvert(8) steps such that the restrictions described
above applied to each step leave the tools no discretion over the layout.
To view the way the allocation process currently works in any specific case, read the
debug logging output, for example by adding -vvvv to a command.
› LOGICAL VOLUME TYPES
Some logical volume types are simple to create and can be done with a single
lvcreate(8) command. The linear and striped logical volume types are an example of
this. Other logical volume types may require more than one command to create. The
cache (lvmcache(7)) and thin provisioning (lvmthin(7)) types are examples of this.
› SELECTION CRITERIA
The selection criteria are a set of statements combined by logical and grouping
operators. The statement consists of column name for which a set of valid values is
defined using comparison operators. For complete list of column names (fields) that
can be used in selection, see the output of <lvm reporting command> -S help.
Comparison operators (cmp_op):
=~ – Matching regular expression.
!~ – Not matching regular expression.
= – Equal to.
!= – Not equal to.
>= – Greater than or equal to.
> – Greater than
<= – Less than or equal to.
< – Less than.
—atversion version Specify an LVM version in x.y.z format where x is the major version,
the y is the minor version and z is the patchlevel (e.g. 2.2.106). When configuration is
displayed, the configuration settings recognized at this LVM version will be considered
only. This can be used to display a configuration that a certain LVM version understands
and which does not contain any newer settings for which LVM would issue a warning
message when checking the configuration.
—ignoreadvanced Exclude advanced configuration settings from the output.
—ignoreunsupported Exclude unsupported configuration settings from the output. These
settings are either used for debugging and development purposes only or their support is
not yet complete and they are not meant to be used in production. The current and diff
types include unsupported settings in their output by default, all the other types ignore
unsupported settings.
—ignorelocal Ignore local section.
—config ConfigurationString Use ConfigurationString to override existing
configuration. This configuration is then applied for the lvmconfig command itself. See
also lvm.conf(5) for more info about config cascade.
—commandprofile ProfileName Use ProfileName to override existing configuration.
This configuration is then applied for the lvmconfig command itself. See also —
mergedconfig option and lvm.conf(5) for more info about config cascade.
—profile ProfileName The same as using —commandprofile but the configuration is not
applied for the lvmconfig command itself.
—metadataprofile ProfileName Use ProfileName to override existing configuration. The
configuration defined in metadata profile has no effect for the lvmconfig command itself.
lvmconfig displays the configuration only. See also —mergedconfig option and
lvm.conf(5) for more info about config cascade.
—mergedconfig When the lvmconfig command is run with the —config option and/or —
commandprofile (or using LVM_COMMAND_PROFILE environment variable), —
profile, —metadataprofile option, merge all the contents of the config cascade before
displaying it. Without the —mergeconfig option used, only the configuration at the front
of the cascade is displayed. See also lvm.conf(5) for more info about config cascade.
—showdeprecated Include deprecated configuration settings in the output. These settings
are always deprecated since certain version. If concrete version is specified with —
atversion option, deprecated settings are automatically included if specified version is
lower that the version in which the settings were deprecated. The current and diff types
include deprecated settings int their output by default, all the other types ignore deprecated
settings.
—showunsupported Include unsupported configuration settings in the output. These
settings are either used for debugging or development purposes only or their support is not
yet complete and they are not meant to be used in production. The current and diff types
include unsupported settings in their output by default, all the other types ignore
unsupported settings.
—validate Validate current configuration used and exit with appropriate return code. The
validation is done only for the configuration at the front of the config cascade. To validate
the whole merged configuration tree, use also the —mergedconfig option. The validation
is done even if config/checks lvm.conf(5) option is disabled.
—withsummary Display a one line comment for each configuration node.
—withcomments Display a full comment for each configuration node. For deprecated
settings, also display comments about deprecation in addition.
—withspaces Where appropriate, add more spaces in output for better readability.
—withversions Also display a comment containing the version of introduction for each
configuration node. If the setting is deprecated, also display the version since which it is
deprecated.
› SEE ALSO
lvm(8) lvmconf(8) lvm.conf(5)
LVMDISKSCAN
› NAME
lvmdiskscan – scan for all devices visible to LVM2
› SYNOPSIS
lvmdiskscan [—commandprofile ProfileName] [-d|—debug] [-h|-?|—help] [-l|—
lvmpartition] [-v|—verbose]
› DESCRIPTION
lvmdiskscan scans all SCSI, (E)IDE disks, multiple devices and a bunch of other
block devices in the system looking for LVM physical volumes. The size reported is
the real device size. Define a filter in lvm.conf(5) to restrict the scan to avoid a CD
ROM, for example.
› OPTIONS
See lvm(8) for common options.
-l, —lvmpartition
Only reports Physical Volumes.
› SEE ALSO
lvm(8), lvm.conf(5), pvscan(8), vgscan(8)
LVMDUMP
› NAME
lvmdump – create lvm2 information dumps for diagnostic purposes
› SYNOPSIS
lvmdump [-a] [-c] [-d directory] [-h] [-l] [-m] [-p] [-s] [-u]
› DESCRIPTION
lvmdump is a tool to dump various information concerning LVM2. By default, it
creates a tarball suitable for submission along with a problem report.
The content of the tarball is as follows: - dmsetup info - table of currently running
processes - recent entries from /var/log/messages (containing system messages) -
complete lvm configuration and cache (content of /etc/lvm) - list of device nodes
present under /dev - list of files present /sys/block - list of files present
/sys/devices/virtual/block - if enabled with -m, metadata dump will be also included -
if enabled with -a, debug output of vgscan, pvscan and list of all available volume
groups, physical volumes and logical volumes will be included - if enabled with -c,
cluster status info - if enabled with -l, lvmetad state if running - if enabled with -p,
lvmpolld state if running - if enabled with -s, system info and context - if enabled
with -u, udev info and context
› OPTIONS
-a
Advanced collection. WARNING: if lvm is already hung, then this script may hang
as well if -a is used.
-c
If clvmd is running, gather cluster data as well.
-d directory
Dump into a directory instead of tarball By default, lvmdump will produce a single
compressed tarball containing all the information. Using this option, it can be
instructed to only produce the raw dump tree, rooted in directory.
-h
Print help message
-l
Include lvmetad(8) daemon dump if it is running. The dump contains cached
information that is currently stored in lvmetad: VG metadata, PV metadata and
various mappings in between these metadata for quick access.
-m
Gather LVM metadata from the PVs This option generates a 1:1 dump of the
metadata area from all PVs visible to the system, which can cause the dump to
increase in size considerably. However, the metadata dump may represent a valuable
diagnostic resource.
-p
Include lvmpolld(8) daemon dump if it is running. The dump contains all in-progress
operation currently monitored by the daemon and partial history for all yet
uncollected results of polling operations already finished including reason.
-s
Gather system info and context. Currently, this encompasses systemd info and
context only: overall state of systemd units present in the system, more detailed status
of units controlling LVM functionality and the content of systemd journal for current
boot.
-u
Gather udev info and context: /etc/udev/udev.conf file, udev daemon version (output
of ‘udevadm info —version’ command), udev rules currently used in the system
(content of /lib/udev/rules.d and /etc/udev/rules.d directory), list of files in /lib/udev
directory and dump of current udev database content (the output of ‘udevadm info —
export-db’ command).
› ENVIRONMENT VARIABLES
LVM_BINARY
The LVM2 binary to use. Defaults to “lvm”. Sometimes you might need to set this to
“/sbin/lvm.static”, for example.
DMSETUP_BINARY
The dmsetup binary to use. Defaults to “dmsetup”.
› SEE ALSO
lvm(8)
LVMETAD
› NAME
lvmetad – LVM metadata cache daemon
› SYNOPSIS
lvmetad [-l {all|wire|debug}] [-p pidfile_path] [-s socket_path] [-f] [-h] [-V] [-?]
› DESCRIPTION
The lvmetad daemon caches LVM metadata, so that LVM commands can read
metadata without scanning disks.
Metadata caching can be an advantage because scanning disks is time consuming and
may interfere with the normal work of the system and disks.
lvmetad does not read metadata from disks itself. The ‘pvscan —cache’ command
scans disks, reads the LVM metadata and sends it to lvmetad.
New LVM disks that appear on the system must be scanned by pvscan before
lvmetad knows about them. If lvmetad does not know about a disk, then LVM
commands using lvmetad will also not know about it. When disks are added or
removed from the system, lvmetad must be updated.
lvmetad is usually combined with event-based system services that automatically run
pvscan —cache on new disks. This way, the lvmetad cache is automatically updated
with metadata from new disks when they appear. LVM udev rules and systemd
services implement this automation. Automatic scanning is usually combined with
automatic activation. For more information, see pvscan(8).
If lvmetad is started or restarted after disks have been added to the system, or if the
global_filter has changed, the cache must be updated by running ‘pvscan —cache’.
When lvmetad is not used, LVM commands revert to scanning disks for LVM
metadata.
Use of lvmetad is enabled/disabled by: lvm.conf(5) global/use_lvmetad
For more information on this setting, see: lvmconfig —withcomments
global/use_lvmetad
To ignore disks from LVM at the system level, e.g. lvmetad, pvscan use: lvm.conf(5)
devices/global_filter
For more information on this setting, see lvmconfig —withcomments
devices/global_filter
› OPTIONS
To run the daemon in a test environment both the pidfile_path and the socket_path
should be changed from the defaults.
-f
Don’t fork, but run in the foreground.
-h, -?
Show help information.
-l {all|wire|debug}
Select the type of log messages to generate. Messages are logged by syslog.
Additionally, when -f is given they are also sent to standard error. Since release
2.02.98, there are two classes of messages: wire and debug. Selecting ‘all’ supplies
both and is equivalent to a comma-separated list -l wire,debug. Prior to release
2.02.98, repeating -d from 1 to 3 times, viz. -d, -dd, -ddd, increased the detail of
messages.
-p pidfile_path
Path to the pidfile. This overrides both the built-in default (/run/lvmetad.pid) and the
environment variable LVM_LVMETAD_PIDFILE. This file is used to prevent
more than one instance of the daemon running simultaneously.
-s socket_path
Path to the socket file. This overrides both the built-in default
(/run/lvm/lvmetad.socket) and the environment variable
LVM_LVMETAD_SOCKET. To communicate successfully with lvmetad, all
LVM2 processes should use the same socket path.
-V
Display the version of lvmetad daemon.
› ENVIRONMENT VARIABLES
LVM_LVMETAD_PIDFILE
Path for the pid file.
LVM_LVMETAD_SOCKET
Path for the socket file.
› SEE ALSO
lvm(8), lvmconfig(8), lvm.conf(5), pvscan(8)
LVMPOLLD
› NAME
lvmpolld – LVM poll daemon
› SYNOPSIS
lvmpolld [-l|—log {all|wire|debug}] [-p|—pidfile pidfile_path] [-s|—socket
socket_path] [-B|—binary lvm_binary_path] [-t|—timeout timeout_value] [-f|—
foreground] [-h|—help] [-V|—version]
lvmpolld [—dump]
› DESCRIPTION
lvmpolld is polling daemon for LVM. The daemon receives requests for polling of
already initialised operations originating in LVM2 command line tool. The requests
for polling originate in the lvconvert, pvmove, lvchange or vgchange LVM2
commands.
The purpose of lvmpolld is to reduce the number of spawned background processes
per otherwise unique polling operation. There should be only one. It also eliminates
the possibility of unsolicited termination of background process by external factors.
lvmpolld is used by LVM only if it is enabled in lvm.conf(5) by specifying the
global/use_lvmpolld setting. If this is not defined in the LVM configuration
explicitly then default setting is used instead (see the output of lvmconfig —type
default global/use_lvmpolld command).
› OPTIONS
To run the daemon in a test environment both the pidfile_path and the socket_path
should be changed from the defaults.
-f, —foreground
Don’t fork, but run in the foreground.
-h, —help
Show help information.
-l, —log {all|wire|debug}
Select the type of log messages to generate. Messages are logged by syslog.
Additionally, when -f is given they are also sent to standard error. There are two
classes of messages: wire and debug. Selecting ‘all’ supplies both and is equivalent to
a comma-separated list -l wire,debug.
-p, —pidfile pidfile_path
Path to the pidfile. This overrides both the built-in default (/run/lvmpolld.pid) and the
environment variable LVM_LVMPOLLD_PIDFILE. This file is used to prevent
more than one instance of the daemon running simultaneously.
-s, —socket socket_path
Path to the socket file. This overrides both the built-in default
(/run/lvm/lvmpolld.socket) and the environment variable
LVM_LVMPOLLD_SOCKET.
-t, —timeout timeout_value
The daemon may shutdown after being idle for the given time (in seconds). When the
option is omitted or the value given is zero the daemon never shutdowns on idle.
-B, —binary lvm_binary_path
Optional path to alternative LVM binary (default: /usr/sbin/lvm). Use for testing
purposes only.
-V, —version
Display the version of lvmpolld daemon.
—dump
Contact the running lvmpolld daemon to obtain the complete state and print it out in a
raw format.
› ENVIRONMENT VARIABLES
LVM_LVMPOLLD_PIDFILE
Path for the pid file.
LVM_LVMPOLLD_SOCKET
Path for the socket file.
› SEE ALSO
lvm(8), lvm.conf(5)
LVMSADC
› NAME
lvmsadc – LVM system activity data collector
› SYNOPSIS
lvmsadc
› DESCRIPTION
lvmsadc is not currently supported under LVM2.
› SEE ALSO
lvm(8)
LVMSAR
› NAME
lvmsar – LVM system activity reporter
› SYNOPSIS
lvmsar
› DESCRIPTION
lvmsar is not currently supported under LVM2.
› SEE ALSO
lvm(8)
LVREDUCE
› NAME
lvreduce – reduce the size of a logical volume
› SYNOPSIS
lvreduce [-A|—autobackup {y|n}] [—commandprofile ProfileName] [-d|—debug]
[-h|—help] [-t|—test] [-v|—verbose] [—version] [-f|—force] [—noudevsync] {-l|
—extents [-]LogicalExtentsNumber[%{VG|LV|FREE|ORIGIN}] | [-L|—size [-
]LogicalVolumeSize[bBsSkKmMgGtTpPeE]} [-n|—nofsck] [-r|—resizefs]
LogicalVolume{Name|Path}
› DESCRIPTION
lvreduce allows you to reduce the size of a logical volume. Be careful when reducing
a logical volume’s size, because data in the reduced part is lost!!! You should
therefore ensure that any filesystem on the volume is resized before running lvreduce
so that the extents that are to be removed are not in use. Shrinking snapshot logical
volumes (see lvcreate(8) for information to create snapshots) is supported as well.
But to change the number of copies in a mirrored logical volume use lvconvert(8).
Sizes will be rounded if necessary - for example, the volume size must be an exact
number of extents and the size of a striped segment must be a multiple of the number
of stripes.
› OPTIONS
See lvm(8) for common options.
-f, —force
Force size reduction without prompting even when it may cause data loss.
-l, —extents [-]LogicalExtentsNumber[%{VG|LV|FREE|ORIGIN}]
Reduce or set the logical volume size in units of logical extents. With the - sign the
value will be subtracted from the logical volume’s actual size and without it the value
will be taken as an absolute size. The total number of physical extents freed will be
greater than this logical value if, for example, the volume is mirrored. The number
can also be expressed as a percentage of the total space in the Volume Group with the
suffix %VG, relative to the existing size of the Logical Volume with the suffix %LV,
as a percentage of the remaining free space in the Volume Group with the suffix
%FREE, or (for a snapshot) as a percentage of the total space in the Origin Logical
Volume with the suffix %ORIGIN. The resulting value for the subtraction is rounded
downward, for the absolute size it is rounded upward. N.B. In a future release, when
expressed as a percentage with VG or FREE, the number will be treated as an
approximate total number of physical extents to be freed (including extents used by
any mirrors, for example). The code may currently release more space than you
might otherwise expect.
-L, —size [-]LogicalVolumeSize[bBsSkKmMgGtTpPeE]
Reduce or set the logical volume size in units of megabytes. A size suffix of k for
kilobyte, m for megabyte, g for gigabytes, t for terabytes, p for petabytes or e for
exabytes is optional. With the - sign the value will be subtracted from the logical
volume’s actual size and without it it will be taken as an absolute size.
-n, —nofsck
Do not perform fsck before resizing filesystem when filesystem requires it. You may
need to use —force to proceed with this option.
—noudevsync
Disable udev synchronisation. The process will not wait for notification from udev. It
will continue irrespective of any possible udev processing in the background. You
should only use this if udev is not running or has rules that ignore the devices LVM2
creates.
-r, —resizefs
Resize underlying filesystem together with the logical volume using fsadm(8).
› EXAMPLES
Reduce the size of logical volume lvol1 in volume group vg00 by 3 logical extents:
lvreduce -l -3 vg00/lvol1
› SEE ALSO
fsadm(8), lvchange(8), lvconvert(8), lvcreate(8), lvextend(8), lvm(8), lvresize(8),
vgreduce(8)
LVREMOVE
› NAME
lvremove – remove a logical volume
› SYNOPSIS
lvremove [-A|—autobackup {y|n}] [—commandprofile ProfileName] [-d|—
debug] [-h|—help] [-S|—select Selection] [-t|—test] [-v|—verbose] [—version] [-f|
—force] [—noudevsync] [LogicalVolume{Name|Path}…]
› DESCRIPTION
lvremove removes one or more logical volumes. Confirmation will be requested
before deactivating any active logical volume prior to removal. Logical volumes
cannot be deactivated or removed while they are open (e.g. if they contain a mounted
filesystem). Removing an origin logical volume will also remove all dependent
snapshots.
If the logical volume is clustered then it must be deactivated on all nodes in the
cluster before it can be removed. A single lvchange command issued from one node
can do this.
› OPTIONS
See lvm(8) for common options.
-f, —force
Remove active logical volumes without confirmation. Tool will try to deactivate
unused volume. To proceed with damaged pools use -ff
—noudevsync
Disable udev synchronisation. The process will not wait for notification from udev. It
will continue irrespective of any possible udev processing in the background. You
should only use this if udev is not running or has rules that ignore the devices LVM2
creates.
› EXAMPLES
Remove the active logical volume lvol1 in volume group vg00 without asking for
confirmation:
lvremove -f vg00/lvol1
Remove all logical volumes in volume group vg00:
lvremove vg00
› SEE ALSO
lvcreate(8), lvdisplay(8), lvchange(8), lvm(8), lvs(8), lvscan(8), vgremove(8)
LVRENAME
› NAME
lvrename – rename a logical volume
› SYNOPSIS
lvrename [-A|—autobackup {y|n}] [—commandprofile ProfileName] [-d|—
debug] [-h|—help] [-t|—test] [-v|—verbose] [—version] [-f|—force] [—
noudevsync] {OldLogicalVolume{Name|Path} NewLogicalVolume{Name|Path} |
VolumeGroupName OldLogicalVolumeName NewLogicalVolumeName}
› DESCRIPTION
lvrename renames an existing logical volume from OldLogicalVolume{Name|Path}
to NewLogicalVolume{Name|Path}.
› OPTIONS
See lvm(8) for common options.
—noudevsync
Disable udev synchronisation. The process will not wait for notification from udev. It
will continue irrespective of any possible udev processing in the background. You
should only use this if udev is not running or has rules that ignore the devices LVM2
creates.
› EXAMPLE
To rename lvold in volume group vg02 to lvnew:
lvrename /dev/vg02/lvold vg02/lvnew
An alternate syntax to rename this logical volume is:
lvrename vg02 lvold lvnew
› SEE ALSO
lvm(8), lvchange(8), vgcreate(8), vgrename(8)
LVRESIZE
› NAME
lvresize – resize a logical volume
› SYNOPSIS
lvresize [—alloc AllocationPolicy] [—noudevsync] [—commandprofile
ProfileName] [-i|—stripes Stripes [-I|—stripesize StripeSize]] {[-l|—extents [+|-
]LogicalExtentsNumber[%{VG|LV|PVS|FREE|ORIGIN}] | [-L|—size [+|-
]LogicalVolumeSize[bBsSkKmMgGtTpPeE]} [—poolmetadatasize
[+]MetadataVolumeSize[bBsSkKmMgG] [-f|—force] [-n|—nofsck] [-r|—resizefs]
LogicalVolume{Name|Path} [PhysicalVolumePath[:PE[-PE]]…]
› DESCRIPTION
lvresize allows you to resize a logical volume. Be careful when reducing a logical
volume’s size, because data in the reduced part is lost!!! You should therefore ensure
that any filesystem on the volume is shrunk first so that the extents that are to be
removed are not in use. Resizing snapshot logical volumes (see lvcreate(8) for
information about creating snapshots) is supported as well. But to change the number
of copies in a mirrored logical volume use lvconvert(8).
› OPTIONS
See lvm(8) for common options.
-f, —force
Force resize without prompting even when it may cause data loss.
-n, —nofsck
Do not perform fsck before resizing filesystem when filesystem requires it. You may
need to use —force to proceed with this option.
-r, —resizefs
Resize underlying filesystem together with the logical volume using fsadm(8).
-l, —extents [+|-]LogicalExtentsNumber[%{VG|LV|PVS|FREE|ORIGIN}]
Change or set the logical volume size in units of logical extents. With the + or - sign
the value is added to or subtracted from the actual size of the logical volume and
without it, the value is taken as an absolute one. The total number of physical extents
affected will be greater than this if, for example, the volume is mirrored. The number
can also be expressed as a percentage of the total space in the Volume Group with the
suffix %VG, relative to the existing size of the Logical Volume with the suffix %LV,
as a percentage of the remaining free space of the PhysicalVolumes on the command
line with the suffix %PVS, as a percentage of the remaining free space in the Volume
Group with the suffix %FREE, or (for a snapshot) as a percentage of the total space
in the Origin Logical Volume with the suffix %ORIGIN. The resulting value is
rounded downward for the subtraction otherwise it is rounded upward. N.B. In a
future release, when expressed as a percentage with PVS, VG or FREE, the number
will be treated as an approximate total number of physical extents to be allocated or
freed (including extents used by any mirrors, for example). The code may currently
allocate or remove more space than you might otherwise expect.
-L, —size [+|-]LogicalVolumeSize[bBsSkKmMgGtTpPeE]
Change or set the logical volume size in units of megabytes. A size suffix of M for
megabytes, G for gigabytes, T for terabytes, P for petabytes or E for exabytes is
optional. With the + or - sign the value is added or subtracted from the actual size of
the logical volume and rounded to the full extent size and without it, the value is
taken as an absolute one.
-i, —stripes Stripes
Gives the number of stripes to use when extending a Logical Volume. Defaults to
whatever the last segment of the Logical Volume uses. Not applicable to LVs using
the original metadata LVM format, which must use a single value throughout.
—poolmetadatasize [+]MetadataVolumeSize[bBsSkKmMgG]
Change or set the thin pool metadata logical volume size. With the + sign the value is
added to the actual size of the metadata volume and rounded to the full extent size
and without it, the value is taken as an absolute one. Maximal size is 16GiB. Default
unit is megabytes.
-I, —stripesize StripeSize
Gives the number of kilobytes for the granularity of the stripes. Defaults to whatever
the last segment of the Logical Volume uses. Not applicable to LVs using the original
metadata LVM format, which must use a single value throughout. StripeSize must be
2^n (n = 2 to 9) for metadata in LVM1 format. For metadata in LVM2 format, the
stripe size may be a larger power of 2 but must not exceed the physical extent size.
—noudevsync
Disable udev synchronisation. The process will not wait for notification from udev. It
will continue irrespective of any possible udev processing in the background. You
should only use this if udev is not running or has rules that ignore the devices LVM2
creates.
› EXAMPLES
Extend a logical volume vg1/lv1 by 16MB using physical extents /dev/sda:0-1 and
/dev/sdb:0-1 for allocation of extents:
lvresize -L+16M vg1/lv1 /dev/sda:0-1 /dev/sdb:0-1
› SEE ALSO
fsadm(8), lvm(8), lvconvert(8), lvcreate(8), lvreduce(8), lvchange(8)
LVS
› NAME
lvs – report information about logical volumes
› SYNOPSIS
lvs [—aligned] [—binary] [-a|—all] [—commandprofile ProfileName] [-d|—
debug] [-h|-?|—help] [—ignorelockingfailure] [—ignoreskippedcluster] [—
nameprefixes] [—noheadings] [—nosuffix] [-o|—options [+]Field[,Field]] [-O|—
sort [+|-]Key1[,[+|-]Key2[,…]]] [-P|—partial] [—rows] [-S|—select Selection] [—
separator Separator] [—segments] [—unbuffered] [—units
hHbBsSkKmMgGtTpPeE] [—unquoted] [-v|—verbose] [—version]
[VolumeGroupName|LogicalVolume{Name|Path}
[VolumeGroupName|LogicalVolume{Name|Path} …]]
› DESCRIPTION
lvs produces formatted output about logical volumes.
› OPTIONS
See lvm(8) for common options.
—aligned
Use with —separator to align the output columns.
—binary
Use binary values “0” or “1” instead of descriptive literal values for columns that
have exactly two valid values to report (not counting the “unknown” value which
denotes that the value could not be determined).
—all
Include information in the output about internal Logical Volumes that are
components of normally-accessible Logical Volumes, such as mirrors, but which are
not independently accessible (e.g. not mountable). The names of such Logical
Volumes are enclosed within square brackets in the output. For example, after
creating a mirror using lvcreate -m1 —mirrorlog disk , this option will reveal three
internal Logical Volumes, with suffixes mimage_0, mimage_1, and mlog.
—nameprefixes
Add an “LVM2_” prefix plus the field name to the output. Useful with —noheadings
to produce a list of field=value pairs that can be used to set environment variables
(for example, in udev(7) rules).
—noheadings
Suppress the headings line that is normally the first line of output. Useful if grepping
the output.
—nosuffix
Suppress the suffix on output sizes. Use with —units (except h and H) if processing
the output.
-o, —options
Comma-separated ordered list of columns. Precede the list with ‘+‘ to append to the
default selection of columns instead of replacing it.
Use -o lv_all to select all logical volume columns, and -o seg_all to select all logical
volume segment columns.
Use -o help to view the full list of columns available.
Column names include: chunk_size, convert_lv, copy_percent, data_lv, devices,
discards, lv_attr, lv_host, lv_kernel_major, lv_kernel_minor, lv_kernel_read_ahead,
lv_major, lv_minor, lv_name, lv_path, lv_profile, lv_read_ahead, lv_size, lv_tags,
lv_time, lv_uuid, metadata_lv, mirror_log, modules, move_pv, origin, origin_size,
pool_lv, raid_max_recovery_rate, raid_min_recovery_rate, raid_mismatch_count,
raid_sync_action, raid_write_behind, region_size, segtype, seg_count,
seg_pe_ranges, seg_size, seg_size_pe, seg_start, seg_start_pe, seg_tags,
snap_percent, stripes, stripe_size, sync_percent, thin_count, transaction_id, zero.
With —segments, any “seg_” prefixes are optional; otherwise any “lv_” prefixes are
optional. Columns mentioned in vgs(8) can also be chosen.
The lv_attr bits are:
1
Volume type: (C)ache, (m)irrored, (M)irrored without initial sync, (o)rigin, (O)rigin
with merging snapshot, (r)aid, (R)aid without initial sync, (s)napshot, merging
(S)napshot, (p)vmove, (v)irtual, mirror or raid (i)mage, mirror or raid (I)mage out-of-
sync, mirror (l)og device, under (c)onversion, thin (V)olume, (t)hin pool, (T)hin pool
data, raid or pool m(e)tadata or pool metadata spare.
2
Permissions: (w)riteable, (r)ead-only, (R)ead-only activation of non-read-only
volume
3
Allocation policy: (a)nywhere, (c)ontiguous, (i)nherited, c(l)ing, (n)ormal This is
capitalised if the volume is currently locked against allocation changes, for example
during pvmove(8).
4
fixed (m)inor
5
State: (a)ctive, (s)uspended, (I)nvalid snapshot, invalid (S)uspended snapshot,
snapshot (m)erge failed, suspended snapshot (M)erge failed, mapped (d)evice present
without tables, mapped device present with (i)nactive table, (X) unknown
6
device (o)pen, (X) unknown
7
Target type: (C)ache, (m)irror, (r)aid, (s)napshot, (t)hin, (u)nknown, (v)irtual. This
groups logical volumes related to the same kernel target together. So, for example,
mirror images, mirror logs as well as mirrors themselves appear as (m) if they use the
original device-mapper mirror kernel driver; whereas the raid equivalents using the
md raid kernel driver all appear as (r). Snapshots using the original device-mapper
driver appear as (s); whereas snapshots of thin volumes using the new thin
provisioning driver appear as (t).
8
Newly-allocated data blocks are overwritten with blocks of (z)eroes before use.
9
Volume Health: (p)artial, (r)efresh needed, (m)ismatches exist, (w)ritemostly, (X)
unknown. (p)artial signifies that one or more of the Physical Volumes this Logical
Volume uses is missing from the system. (r)efresh signifies that one or more of the
Physical Volumes this RAID Logical Volume uses had suffered a write error. The
write error could be due to a temporary failure of that Physical Volume or an
indication that it is failing. The device should be refreshed or replaced. (m)ismatches
signifies that the RAID logical volume has portions of the array that are not coherent.
Inconsistencies are detected by initiating a “check” on a RAID logical volume. (The
scrubbing operations, “check” and “repair”, can be performed on a RAID logical
volume via the ‘lvchange’ command.) (w)ritemostly signifies the devices in a RAID
1 logical volume that have been marked write-mostly.
10
s(k)ip activation: this volume is flagged to be skipped during activation.
-O, —sort Comma-separated ordered list of columns to sort by. Replaces the default
selection. Precede any column with ‘-‘ for a reverse sort on that column. —rows Output
columns as rows. -S, —select Selection Display only rows that match Selection criteria.
All rows are displayed with the additional “selected” column (-o selected) showing 1 if
the row matches the Selection and 0 otherwise. The Selection criteria are defined by
specifying column names and their valid values (that can include reserved values) while
making use of supported comparison operators. See lvm(8) and -S, —select description
for more detailed information about constructing the Selection criteria. As a quick help
and to see full list of column names that can be used in Selection including the list of
reserved values and the set of supported selection operators, check the output of lvs -S
help command. —segments Use default columns that emphasize segment information. —
separator Separator String to use to separate each column. Useful if grepping the output.
—unbuffered Produce output immediately without sorting or aligning the columns
properly. —units hHbBsSkKmMgGtTpPeE All sizes are output in these units: (h)uman-
readable, (b)ytes, (s)ectors, (k)ilobytes, (m)egabytes, (g)igabytes, (t)erabytes, (p)etabytes,
(e)xabytes. Capitalise to use multiples of 1000 (S.I.) instead of 1024. Can also specify
custom units e.g. —units 3M —unquoted When used with —nameprefixes, output
values in the field=value pairs are not quoted.
› SEE ALSO
lvm(8), lvdisplay(8), pvs(8), vgs(8)
LVSCAN
› NAME
lvscan – scan (all disks) for Logical Volumes
› SYNOPSIS
lvscan [-a|—all] [-b|—blockdevice] [—commandprofile ProfileName] [-d|—
debug] [-h|—help] [—ignorelockingfailure] [-P|—partial] [-v|—verbose]
› DESCRIPTION
lvscan scans all known volume groups or all supported LVM block devices in the
system for defined Logical Volumes. The output consists of one line for each Logical
Volume indicating whether or not it is active, a snapshot or origin, the size of the
device and its allocation policy. Use lvs(8) or lvdisplay(8) to obtain more-
comprehensive information about the Logical Volumes.
› OPTIONS
See lvm(8) for common options.
—all
Include information in the output about internal Logical Volumes that are
components of normally-accessible Logical Volumes, such as mirrors, but which are
not independently accessible (e.g. not mountable). For example, after creating a
mirror using lvcreate -m1 —mirrorlog disk, this option will reveal three internal
Logical Volumes, with suffixes mimage_0, mimage_1, and mlog.
-b, —blockdevice
This option is now ignored. Instead, use lvs(8) or lvdisplay(8) to obtain the device
number.
—cache LogicalVolume
Applicable only when lvmetad(8) is in use (see also lvm.conf(5),
global/use_lvmetad). This command issues a rescan of physical volume labels and
metadata areas of all PVs that the logical volume uses. In particular, this can be used
when a RAID logical volume becomes degraded, to update information about
physical volume availability. This is only necessary if the logical volume is not being
monitored by dmeventd (see lvchange(8), option —monitor).
› SEE ALSO
lvm(8), lvcreate(8), lvdisplay(8) lvs(8)
MAKEDELTARPM
› NAME
makedeltarpm - create a deltarpm from two rpms
› SYNOPSIS
makedeltarpm [-v] [-V version] [-z compression] [-m mbytes] [-s seqfile] [-r] [-u]
oldrpm newrpm deltarpm makedeltarpm [-v] [-V version] [-z compression] [-s
seqfile] [-u] -p oldrpmprint oldpatchrpm oldrpm newrpm deltarpm
› DESCRIPTION
makedeltarpm creates a deltarpm from two rpms. The deltarpm can later be used to
recreate the new rpm from either filesystem data or the old rpm. Use the -v option to
make makedeltarpm more verbose about its work (use it twice to make it even more
verbose).
If you want to create a smaller and faster to combine “rpm-only” deltarpm which
does not work with filesystem data, specify the -r option.
makedeltarpm normally produces a V3 format deltarpm, use the -V option to specify
a different version if desired. The -z option can be used to specify a different
compression method, the default is to use the same compression method as used in
the new rpm.
The -s option makes makedeltarpm write out the sequence id to the specified file
seqfile.
If you also use patch rpms you should use the -p option to specify the rpm-print of
oldrpm and the created patch rpm. This option tells makedeltarpm to exclude the files
that were not included in the patch rpm but are not byteswise identical to the ones in
oldrpm.
makedeltarpm can also create an “identity” deltarpm by adding the -u switch. In this
case only one rpm has to be specified. An identity deltarpm can be useful to just
replace the signature header of a rpm or to reconstruct a rpm from the filesystem.
› MEMORY CONSIDERATIONS
makedeltarpm normally needs about three to four times the size of the rpm’s
uncompressed payload. You can use the -m option to enable a sliding block algorithm
that needs mbytes megabytes of memory. This trades memory usage with the size of
the created deltarpm. Furthermore, the uncompressed deltarpm payload is currently
also stored in memory when this option is used, but it tends to be small in most cases.
› SEE ALSO
applydeltarpm(8) combinedeltarpm(8)
› AUTHOR
Michael Schroeder <[email protected]>
MAKEDUMPFILE
› NAME
makedumpfile - make a small dumpfile of kdump
› SYNOPSIS
makedumpfile [OPTION] [-x VMLINUX|-i VMCOREINFO] VMCORE DUMPFILE
makedumpfile -F [OPTION] [-x VMLINUX|-i VMCOREINFO] VMCORE
makedumpfile [OPTION] -x VMLINUX [—config FILTERCONFIGFILE] [—eppic
EPPICMACRO] VMCORE DUMPFILE makedumpfile -R DUMPFILE
makedumpfile —split [OPTION] [-x VMLINUX|-i VMCOREINFO] VMCORE
DUMPFILE1 DUMPFILE2 [DUMPFILE3 ..] makedumpfile —reassemble
DUMPFILE1 DUMPFILE2 [DUMPFILE3 ..] DUMPFILE makedumpfile -g
VMCOREINFO -x VMLINUX makedumpfile -E [—xen-syms XEN-SYMS|—xen-
vmcoreinfo VMCOREINFO] VMCORE DUMPFILE makedumpfile —dump-dmesg
[-x VMLINUX|-i VMCOREINFO] VMCORE LOGFILE makedumpfile [OPTION] -x
VMLINUX —diskset=VMCORE1 —diskset=VMCORE2 [—diskset=VMCORE3 ..]
DUMPFILE makedumpfile -h makedumpfile -v
› DESCRIPTION
With kdump, the memory image of the first kernel (called “panicked kernel”) can be
taken as /proc/vmcore while the second kernel (called “kdump kernel” or “capture
kernel”) is running. This document represents /proc/vmcore as VMCORE.
makedumpfile makes a small DUMPFILE by compressing dump data or by
excluding unnecessary pages for analysis, or both. makedumpfile needs the first
kernel’s debug information, so that it can distinguish unnecessary pages by analyzing
how the first kernel uses the memory. The information can be taken from VMLINUX
or VMCOREINFO.
makedumpfile can exclude the following types of pages while copying VMCORE to
DUMPFILE, and a user can choose which type of pages will be excluded. - Pages
filled with zero - Cache pages without private pages - All cache pages with
private pages - User process data pages - Free pages
makedumpfile provides two DUMPFILE formats (the ELF format and the kdump-
compressed format). By default, makedumpfile makes a DUMPFILE in the kdump-
compressed format. The kdump-compressed format is readable only with the crash
utility, and it can be smaller than the ELF format because of the compression support.
The ELF format is readable with GDB and the crash utility. If a user wants to use
GDB, DUMPFILE format has to be explicitly specified to be the ELF format.
Apart from the exclusion of unnecessary pages mentioned above, makedumpfile
allows user to filter out targeted kernel data. The filter config file can be used to
specify kernel/module symbols and its members that need to be filtered out through
the erase command syntax. makedumpfile reads the filter config and builds the list of
memory addresses and its sizes after processing filter commands. The memory
locations that require to be filtered out are then poisoned with character ‘X’ (58 in
Hex). Refer to makedumpfile.conf(5) for file format.
Eppic macros can also be used to specify kernel symbols and its members that need
to be filtered. Eppic provides C semantics including language constructs such as
conditional statements, logical and arithmetic operators, functions, nested loops to
traverse and erase kernel data. —eppic requires eppic_makedumpfile.so and eppic
library. eppic_makedumpfile.so can be built from makedumpfile source. Refer to
http://code.google.com/p/eppic/ to build eppic library libeppic.a and for more
information on writing eppic macros.
To analyze the first kernel’s memory usage, makedumpfile can refer to
VMCOREINFO instead of VMLINUX. VMCOREINFO contains the first kernel’s
information (structure size, field offset, etc.), and VMCOREINFO is small enough to
be included into the second kernel’s initrd. If the second kernel is running on its initrd
without mounting a root file system, makedumpfile cannot refer to VMLINUX
because the second kernel’s initrd cannot include a large file like VMLINUX. To solve
the problem, makedumpfile makes VMCOREINFO beforehand, and it refers to
VMCOREINFO instead of VMLINUX while the second kernel is running. VMCORE
has contained VMCOREINFO since linux-2.6.24, and a user does not need to specify
neither -x nor -i option.
If the second kernel is running on its initrd without mounting any file system, a user
needs to transport the dump data to a remote host. To transport the dump data by
SSH, makedumpfile outputs the dump data in the intermediate format (the flattened
format) to the standard output. By piping the output data to SSH, a user can transport
the dump data to a remote host. Note that analysis tools (crash utility before version
5.1.2 or GDB) cannot read the flattened format directly, so on a remote host the
received data in the flattened format needs to be rearranged to a readable DUMPFILE
format by makedumpfile (or makedumpfile-R.pl).
makedumpfile can read a DUMPFILE in the kdump-compressed format instead of
VMCORE and re-filter it. This feature is useful in situation that users need to reduce
the file size of DUMPFILE for sending it somewhere by ftp/scp/etc. (If all of the
page types, which are specified by a new dump_level, are excluded from an original
DUMPFILE already, a new DUMPFILE is the same as an original DUMPFILE.) For
example, makedumpfile can create a DUMPFILE of dump_level 31 from the one of
dump_level 3 like the following: Example: # makedumpfile -c -d 3 /proc/vmcore
dumpfile.1 # makedumpfile -c -d 31 dumpfile.1 dumpfile.2
makedumpfile can read VMCORE(s) in three kinds of sadump formats: single
partition format, diskset format and media backup format, and can convert each of
them into kdump-compressed format with filtering and compression processing. Note
that for VMCORE(s) created by sadump, you always need to pass VMLINUX with -x
option. Also, to pass multiple VMCOREs created on diskset configuration, you need
to use —diskset option.
› OPTIONS
-c,-l,-p
Compress dump data by each page using zlib for -c option, lzo for -l option or snappy
for -p option. (-l option needs USELZO=on and -p option needs USESNAPPY=on
when building) A user cannot specify this option with -E option, because the ELF
format does not support compressed data. Example: # makedumpfile -c -d 31 -x
vmlinux /proc/vmcore dumpfile
-d dump_level
Specify the type of unnecessary page for analysis. Pages of the specified type are not
copied to DUMPFILE. The page type marked in the following table is excluded. A
user can specify multiple page types by setting the sum of each page type for
dump_level. The maximum of dump_level is 31. Note that a dump_level for Xen
dump filtering is 0 or 1 on a machine other than x86_64. On a x86_64 machine, even
2 or bigger dump level will be effective if you specify domain-0’s vmlinux with -x
option. Then the pages are excluded only from domain-0. If specifying multiple
dump_levels with the delimiter ‘,’, makedumpfile retries to create a DUMPFILE by
other dump_level when “No space on device” error happens. For example, if
dump_level is “11,31” and makedumpfile fails by dump_level 11, makedumpfile
retries it by dump_level 31. Example: # makedumpfile -d 11 -x vmlinux
/proc/vmcore dumpfile # makedumpfile -d 11,31 -x vmlinux /proc/vmcore dumpfile
| |cache |cache | | dump | zero |without|with | user | free
level | page |private|private| data | page ––-+––+––-+––-+––+––
0 | | | | | 1 | X | | | | 2 | | X | | |
3 | X | X | | | 4 | | X | X | | 5 | X | X | X | |
6 | | X | X | | 7 | X | X | X | | 8 | | | | X |
9 | X | | | X | 10 | | X | | X | 11 | X | X | | X |
12 | | X | X | X | 13 | X | X | X | X |
14 | | X | X | X | 15 | X | X | X | X | 16 | | | | | X
17 | X | | | | X 18 | | X | | | X
19 | X | X | | | X 20 | | X | X | | X
21 | X | X | X | | X 22 | | X | X | | X
23 | X | X | X | | X 24 | | | | X | X
25 | X | | | X | X 26 | | X | | X | X
27 | X | X | | X | X 28 | | X | X | X | X
29 | X | X | X | X | X 30 | | X | X | X | X
31 | X | X | X | X | X
-E
Create DUMPFILE in the ELF format. This option cannot be specified with either of
-c option or -l option, because the ELF format does not support compressed data.
Example: # makedumpfile -E -d 31 -x vmlinux /proc/vmcore dumpfile
-f
Force existing DUMPFILE to be overwritten. Example: # makedumpfile -f -d 31 -x
vmlinux /proc/vmcore dumpfile This command overwrites DUMPFILE even if it
already exists.
-x VMLINUX
Specify the first kernel’s VMLINUX with debug information to analyze the first
kernel’s memory usage. This option is necessary if VMCORE does not contain
VMCOREINFO, [-i VMCOREINFO] is not specified, and dump_level is 2 or more.
The page size of the first kernel and the second kernel should match. Example: #
makedumpfile -d 31 -x vmlinux /proc/vmcore dumpfile
-i VMCOREINFO
Specify VMCOREINFO instead of VMLINUX for analyzing the first kernel’s memory
usage. VMCOREINFO should be made beforehand by makedumpfile with -g option,
and it contains the first kernel’s information. This option is necessary if VMCORE
does not contain VMCOREINFO, [-x VMLINUX] is not specified, and dump_level is
2 or more. Example: # makedumpfile -d 31 -i vmcoreinfo /proc/vmcore dumpfile
-g VMCOREINFO
Generate VMCOREINFO from the first kernel’s VMLINUX with debug information.
VMCOREINFO must be generated on the system that is running the first kernel. With
-i option, a user can specify VMCOREINFO generated on the other system that is
running the same first kernel. [-x VMLINUX] must be specified. Example: #
makedumpfile -g vmcoreinfo -x vmlinux
—config FILTERCONFIGFILE
Used in conjunction with -x VMLINUX option, to specify the filter config file
FILTERCONFIGFILE that contains erase commands to filter out desired kernel data
from vmcore while creating DUMPFILE. For filter command syntax please refer to
makedumpfile.conf(5).
—eppic EPPICMACRO
Used in conjunction with -x VMLINUX option, to specify the eppic macro file that
contains filter rules or directory that contains eppic macro files to filter out desired
kernel data from vmcore while creating DUMPFILE. When directory is specified, all
the eppic macros in the directory are processed.
-F
Output the dump data in the flattened format to the standard output for transporting
the dump data by SSH. Analysis tools (crash utility before version 5.1.2 or GDB)
cannot read the flattened format directly. For analysis, the dump data in the flattened
format should be rearranged to a normal DUMPFILE (readable with analysis tools)
by -R option. By which option is specified with -F option, the format of the
rearranged DUMPFILE is fixed. In other words, it is impossible to specify the
DUMPFILE format when the dump data is rearranged with -R option. If specifying -
E option with -F option, the format of the rearranged DUMPFILE is the ELF format.
Otherwise, it is the kdump-compressed format. All the messages are output to
standard error output by -F option because standard output is used for the dump data.
Example: # makedumpfile -F -c -d 31 -x vmlinux /proc/vmcore \ | ssh user@host
“cat > dumpfile.tmp” # makedumpfile -F -c -d 31 -x vmlinux /proc/vmcore \ | ssh
user@host “makedumpfile -R dumpfile” # makedumpfile -F -E -d 31 -i vmcoreinfo
/proc/vmcore \ | ssh user@host “makedumpfile -R dumpfile” # makedumpfile -F -E
—xen-vmcoreinfo VMCOREINFO /proc/vmcore \ | ssh user@host “makedumpfile -R
dumpfile”
-R
Rearrange the dump data in the flattened format from the standard input to a normal
DUMPFILE (readable with analysis tools). Example: # makedumpfile -R dumpfile <
dumpfile.tmp # makedumpfile -F -d 31 -x vmlinux /proc/vmcore \ | ssh user@host
“makedumpfile -R dumpfile”
Instead of using -R option, a perl script “makedumpfile-R.pl” rearranges the dump
data in the flattened format to a normal DUMPFILE, too. The perl script does not
depend on architecture, and most systems have perl command. Even if a remote host
does not have makedumpfile, it is possible to rearrange the dump data in the flattened
format to a readable DUMPFILE on a remote host by running this script. Example: #
makedumpfile -F -d 31 -x vmlinux /proc/vmcore \ | ssh user@host “makedumpfile-
R.pl dumpfile”
—split
Split the dump data to multiple DUMPFILEs in parallel. If specifying DUMPFILEs
on different storage devices, a device can share I/O load with other devices and it
reduces time for saving the dump data. The file size of each DUMPFILE is smaller
than the system memory size which is divided by the number of DUMPFILEs. This
feature supports only the kdump-compressed format. Example: # makedumpfile —
split -d 31 -x vmlinux /proc/vmcore dumpfile1 dumpfile2
—reassemble
Reassemble multiple DUMPFILEs, which are created by —split option, into one
DUMPFILE. dumpfile1 and dumpfile2 are reassembled into dumpfile on the
following example. Example: # makedumpfile —reassemble dumpfile1 dumpfile2
dumpfile
-b <order>
Cache 2^order pages in ram when generating DUMPFILE before writing to output.
The default value is 4.
—cyclic-buffer buffer_size
Specify the buffer size in kilo bytes for analysis in the cyclic mode. In the cyclic
mode, the number of cycles is represented as:
num_of_cycles = system_memory /
(buffer_size * 1024 * bit_per_bytes * page_size )
The lesser number of cycles, the faster working speed is expected. By default,
buffer_size will be calculated automatically depending on system memory size, so
ordinary users don’t need to specify this option.
Example: # makedumpfile —cyclic-buffer 1024 -d 31 -x vmlinux /proc/vmcore
dumpfile
—splitblock-size splitblock_size
Specify the splitblock size in kilo bytes for analysis in the cyclic mode with —split.
If —splitblock N is specified, difference of each splitted dumpfile size is at most N
kilo bytes. Example: # makedumpfile —splitblock-size 1024 -d 31 -x vmlinux —
split /proc/vmcore dumpfile1 dumpfile2
—non-cyclic
Running in the non-cyclic mode, this mode uses the old filtering logic same as v1.4.4
or before. If you feel the cyclic mode is too slow, please try this mode. Example: #
makedumpfile —non-cyclic -d 31 -x vmlinux /proc/vmcore dumpfile
—non-mmap
Never use mmap(2) to read VMCORE even if it supports mmap(2). Generally,
reading VMCORE with mmap(2) is faster than without it, so ordinary users don’t
need to specify this option. This option is mainly for debugging. Example: #
makedumpfile —non-mmap -d 31 -x vmlinux /proc/vmcore dumpfile
—xen-syms XEN-SYMS
Specify the XEN-SYMS with debug information to analyze the xen’s memory usage.
This option extracts the part of xen and domain-0. -E option must be specified with
this option. Example: # makedumpfile -E —xen-syms xen-syms /proc/vmcore
dumpfile
—xen-vmcoreinfo VMCOREINFO
Specify VMCOREINFO instead of XEN-SYMS for analyzing the xen’s memory
usage. VMCOREINFO should be made beforehand by makedumpfile with -g option,
and it contains the xen’s information. -E option must be specified with this option.
Example: # makedumpfile -E —xen-vmcoreinfo VMCOREINFO /proc/vmcore
dumpfile
-X
Exclude all the user domain pages from Xen kdump’s VMCORE, and extracts the
part of xen and domain-0. If VMCORE contains VMCOREINFO for Xen, it is not
necessary to specify —xen-syms and —xen-vmcoreinfo. -E option must be specified
with this option. Example: # makedumpfile -E -X /proc/vmcore dumpfile
—xen_phys_start xen_phys_start_address
This option is only for x86_64. Specify the xen_phys_start_address, if the xen
code/data is relocatable and VMCORE does not contain xen_phys_start_address in
the CRASHINFO. xen_phys_start_address can be taken from the line of “Hypervisor
code and data” in /proc/iomem. For example, specify 0xcee00000 as
xen_phys_start_address if /proc/iomem is the following: ––––––––––––––––––-
# cat /proc/iomem … cee00000-cfd99999 : Hypervisor code and data …
––––––––––––––––––-
Example: # makedumpfile -E -X —xen_phys_start 0xcee00000 /proc/vmcore
dumpfile
—message-level message_level
Specify the message types. Users can restrict outputs printed by specifying
message_level with this option. The message type marked with an X in the following
table is printed. For example, according to the table, specifying 7 as message_level
means progress indicator, common message, and error message are printed, and this
is a default value. Note that the maximum value of message_level is 31.
message | progress | common | error | debug | report
level | indicator| message | message | message | message –––+–––-+–––+–––+–––
+––– 0 | | | | | 1 | X | | | |
2 | | X | | | 3 | X | X | | |
4 | | | X | | 5 | X | | X | |
6 | | X | X | | * 7 | X | X | X | |
8 | | | | X | 9 | X | | | X |
10 | | X | | X | 11 | X | X | | X |
12 | | | X | X | 13 | X | | X | X |
14 | | X | X | X | 15 | X | X | X | X |
16 | | | | | X 17 | X | | | | X
18 | | X | | | X 19 | X | X | | | X
20 | | | X | | X 21 | X | | X | | X
22 | | X | X | | X 23 | X | X | X | | X
24 | | | | X | X 25 | X | | | X | X
26 | | X | | X | X 27 | X | X | | X | X
28 | | | X | X | X 29 | X | | X | X | X
30 | | X | X | X | X 31 | X | X | X | X | X
—vtop virtual_address
This option is useful, when user debugs the translation problem of virtual address. If
specifing virtual_address, its physical address is printed. It makes debugging easy by
comparing the output of this option with the one of “vtop” subcommand of the crash
utility. “—vtop” option only prints the translation output, and it does not affect the
dumpfile creation.
—dump-dmesg
This option overrides the normal behavior of makedumpfile. Instead of compressing
and filtering a VMCORE to make it smaller, it simply extracts the dmesg log from a
VMCORE and writes it to the specified LOGFILE. If a VMCORE does not contain
VMCOREINFO for dmesg, it is necessary to specfiy [-x VMLINUX] or [-i
VMCOREINFO].
Example: # makedumpfile —dump-dmesg /proc/vmcore dmesgfile # makedumpfile
—dump-dmesg -x vmlinux /proc/vmcore dmesgfile
—mem-usage
This option is only for x86_64. This option is used to show the page numbers of
current system in different use. It should be executed in 1st kernel. By the help of
this, user can know how many pages is dumpable when different dump_level is
specified. It analyzes the ‘System Ram’ and ‘kernel text’ program segment of
/proc/kcore excluding the crashkernel range, then calculates the page number of
different kind per vmcoreinfo. So currently /proc/kcore need be specified explicitly.
Example: # makedumpfile —mem-usage /proc/kcore
—diskset=VMCORE
Specify multiple VMCOREs created on sadump diskset configuration the same
number of times as the number of VMCOREs in increasing order from left to right.
VMCOREs are assembled into a single DUMPFILE.
Example: # makedumpfile -x vmlinux —diskset=vmcore1 —diskset=vmcore2
dumpfile
-D
Print debugging message.
-h (—help)
Show help message and LZO/snappy support status (enabled/disabled).
-v
Show the version of makedumpfile.
› ENVIRONMENT VARIABLES
TMPDIR
This environment variable is for a temporary memory bitmap file only in the non-
cyclic mode. If your machine has a lots of memory and you use tmpfs on /tmp,
makedumpfile can fail for a little memory in the 2nd kernel because makedumpfile
makes a very large temporary memory bitmap file in this case. To avoid this failure,
you can set a TMPDIR environment variable. If you do not set a TMPDIR
environment variable, makedumpfile uses /tmp directory for a temporary bitmap file
as a default.
› DIAGNOSTICS
makedumpfile exits with the following value.
0 : makedumpfile succeeded.
1 : makedumpfile failed without the following reasons.
2 : makedumpfile failed due to the different version between VMLINUX and VMCORE.
3 : makedumpfile failed due to the analysis error of the memory.
› AUTHORS
Written by Masaki Tachibana, and Ken’ichi Ohmichi.
› SEE ALSO
crash(8), gdb(1), kexec(8), makedumpfile.conf(5)
MANDB
› NAME
mandb - create or update the manual page index caches
› SYNOPSIS
mandb [-dqsucpt?V] [-C file] [manpath] mandb [-dqsut] [-C file] -f filename …
› DESCRIPTION
mandb is used to initialise or manually update index database caches that are usually
maintained by man. The caches contain information relevant to the current state of
the manual page system and the information stored within them is used by the man-
db utilities to enhance their speed and functionality.
When creating or updating an index, mandb will warn of bad ROFF .so requests,
bogus manual page filenames and manual pages from which the whatis cannot be
parsed.
Supplying mandb with an optional colon-delimited path will override the internal
system manual page hierarchy search path, determined from information found
within the man-db configuration file.
› DATABASE CACHES
mandb can be compiled with support for any one of the following database types.
Name Type Async Filename
sets up a one-to-one direct-to-font table where user bytes directly address the font. This is
useful for fonts that are in the same order as the character set one uses. A command like
mapscrn 8859-2
sets up a user-to-unicode table that assumes that the user uses ISO 8859-2.
› INPUT FORMAT
The mapscrn command can read the map in either of two formats: 1. 256 or 512
bytes binary data 2. two-column text file Format (1) is a direct image of the
translation table. The 256-bytes tables are direct-to-font, the 512-bytes tables are
user-to-unicode tables. Format (2) is used to fill the table as follows: cell with offset
mentioned in the first column is filled with the value mentioned in the second
column. When values larger than 255 occur, or values are written using the U+xxxx
notation, the table is assumed to be a user-to-unicode table, otherwise it is a direct-to-
font table.
Values in the file may be specified in one of several formats: 1. Decimal: String of
decimal digits not starting with ‘0’ 2. Octal: String of octal digits beginning with ‘0’.
3. Hexadecimal: String of hexadecimal digits preceded by “0x”. 4. Unicode: String
of four hexadecimal digits preceded by “U+”. 5. Character: Single character
enclosed in single quotes. (And the binary value is used.) Note that blank, comma,
tab character and ‘#’ cannot be specified with this format. 6. UTF-8 Character:
Single (possibly multi-byte) UTF-8 character, enclosed in single quotes.
Note that control characters (with codes < 32) cannot be re-mapped with mapscrn
because they have special meaning for the driver.
› FILES
/lib/kbd/consoletrans is the default directory for screen mappings.
› SEE ALSO
setfont(8)
› AUTHOR
Copyright (C) 1993 Eugene G. Crosser <[email protected]> This software and
documentation may be distributed freely.
MASTER
› NAME
master - Postfix master process
› SYNOPSIS
master [-Ddtvw] [-c config_dir] [-e exit_time]
› DESCRIPTION
The master(8) daemon is the resident process that runs Postfix daemons on demand:
daemons to send or receive messages via the network, daemons to deliver mail
locally, etc. These daemons are created on demand up to a configurable maximum
number per service.
Postfix daemons terminate voluntarily, either after being idle for a configurable
amount of time, or after having serviced a configurable number of requests.
Exceptions to this rule are the resident queue manager, address verification server,
and the TLS session cache and pseudo-random number server.
The behavior of the master(8) daemon is controlled by the master.cf configuration
file, as described in master(5).
Options:
-c config_dir
Read the main.cf and master.cf configuration files in the named directory instead of
the default configuration directory. This also overrides the configuration files for
other Postfix daemon processes.
-D
After initialization, run a debugger on the master process. The debugging command
is specified with the debugger_command in the main.cf global configuration file.
-d
Do not redirect stdin, stdout or stderr to /dev/null, and do not discard the controlling
terminal. This must be used for debugging only.
-e exit_time
Terminate the master process after exit_time seconds. Child processes terminate at
their convenience.
-t
Test mode. Return a zero exit status when the master.pid lock file does not exist or
when that file is not locked. This is evidence that the master(8) daemon is not
running.
-v
Enable verbose logging for debugging purposes. This option is passed on to child
processes. Multiple -v options make the software increasingly verbose.
-w
Wait in a dummy foreground process, while the real master daemon initializes in a
background process. The dummy foreground process returns a zero exit status only if
the master daemon initialization is successful, and if it completes in a reasonable
amount of time.
This feature is available in Postfix 2.10 and later.
Signals:
SIGHUP
Upon receipt of a HUP signal (e.g., after “postfix reload“), the master process re-
reads its configuration files. If a service has been removed from the master.cf file, its
running processes are terminated immediately. Otherwise, running processes are
allowed to terminate as soon as is convenient, so that changes in configuration
settings affect only new service requests.
SIGTERM
Upon receipt of a TERM signal (e.g., after “postfix abort), the master process
passes the signal on to its child processes and terminates. This is useful for an
emergency shutdown. Normally one would terminate only the master (postfix stop“)
and allow running processes to finish what they are doing.
› DIAGNOSTICS
Problems are reported to syslogd(8). The exit status is non-zero in case of problems,
including problems while initializing as a master daemon process in the background.
› ENVIRONMENT
MAIL_DEBUG
After initialization, start a debugger as specified with the debugger_command
configuration parameter in the main.cf configuration file.
MAIL_CONFIG
Directory with Postfix configuration files.
› CONFIGURATION PARAMETERS
Unlike most Postfix daemon processes, the master(8) server does not automatically
pick up changes to main.cf. Changes to master.cf are never picked up automatically.
Use the “postfix reload” command after a configuration change.
› RESOURCE AND RATE CONTROLS
default_process_limit (100)
The default maximal number of Postfix child processes that provide a given service.
max_idle (100s)
The maximum amount of time that an idle Postfix daemon process waits for an
incoming connection before terminating voluntarily.
max_use (100)
The maximal number of incoming connections that a Postfix daemon process will
service before terminating voluntarily.
service_throttle_time (60s)
How long the Postfix master(8) waits before forking a server that appears to be
malfunctioning.
If a device is given before any options, or if the first option is and of —add, —re-add, —
add-spare, —fail, —remove, or —replace, then the MANAGE mode is assumed.
Anything other than these will cause the Misc mode to be assumed.
› OPTIONS THAT ARE NOT MODE-SPECIFIC ARE:
-h, —help
Display general help message or, after one of the above options, a mode-specific help
message.
—help-options
Display more detailed help about command line parsing and some commonly used
options.
-V, —version
Print version information for mdadm.
-v, —verbose
Be more verbose about what is happening. This can be used twice to be extra-
verbose. The extra verbosity currently only affects —detail —scan and —examine
—scan.
-q, —quiet
Avoid printing purely informative messages. With this, mdadm will be silent unless
there is something really important to report.
-f, —force
Be more forceful about certain operations. See the various modes for the exact
meaning of this option in different contexts.
-c, —config=
Specify the config file or directory. Default is to use /etc/mdadm.conf and
/etc/mdadm.conf.d, or if those are missing then /etc/mdadm/mdadm.conf and
/etc/mdadm/mdadm.conf.d. If the config file given is partitions then nothing will
be read, but mdadm will act as though the config file contained exactly DEVICE
partitions containers and will read /proc/partitions to find a list of devices to scan,
and /proc/mdstat to find a list of containers to examine. If the word none is given for
the config file, then mdadm will act as though the config file were empty.
If the name given is of a directory, then mdadm will collect all the files contained in
the directory with a name ending in .conf, sort them lexically, and process all of those
files as config files.
-s, —scan
Scan config file or /proc/mdstat for missing information. In general, this option
gives mdadm permission to get any missing information (like component devices,
array devices, array identities, and alert destination) from the configuration file (see
previous option); one exception is MISC mode when using —detail or —stop, in
which case —scan says to get a list of array devices from /proc/mdstat.
-e, —metadata=
Declare the style of RAID metadata (superblock) to be used. The default is 1.2 for —
create, and to guess for other operations. The default can be overridden by setting the
metadata value for the CREATE keyword in mdadm.conf.
Options are:
0, 0.90, default
0, 0.90
Use the original 0.90 format superblock. This format limits arrays to 28 component
devices and limits component devices of levels 1 and greater to 2 terabytes. It is also
possible for there to be confusion about whether the superblock applies to a whole
device or just the last partition, if that partition starts on a 64K boundary.
1, 1.0, 1.1, 1.2
1, 1.0, 1.1, 1.2 default
Use the new version-1 format superblock. This has fewer restrictions. It can easily be
moved between hosts with different endian-ness, and a recovery operation can be
checkpointed and restarted. The different sub-versions store the superblock at
different locations on the device, either at the end (for 1.0), at the start (for 1.1) or 4K
from the start (for 1.2). “1” is equivalent to “1.2” (the commonly preferred 1.x
format). “default” is equivalent to “1.2”.
ddf
Use the “Industry Standard” DDF (Disk Data Format) format defined by SNIA.
When creating a DDF array a CONTAINER will be created, and normal arrays can
be created in that container.
imsm
Use the Intel(R) Matrix Storage Manager metadata format. This creates a
CONTAINER which is managed in a similar manner to DDF, and is supported by an
option-rom on some platforms:
http://www.intel.com/design/chipsets/matrixstorage_sb.htm
—homehost= This will override any HOMEHOST setting in the config file and provides
the identity of the host which should be considered the home for any arrays.
When creating an array, the homehost will be recorded in the metadata. For version-1
superblocks, it will be prefixed to the array name. For version-0.90 superblocks, part of
the SHA1 hash of the hostname will be stored in the later half of the UUID.
When reporting information about an array, any array which is tagged for the given
homehost will be reported as such.
When using Auto-Assemble, only arrays tagged for the given homehost will be allowed to
use ‘local’ names (i.e. not ending in ‘_’ followed by a digit string). See below under Auto
Assembly.
—prefer= When mdadm needs to print the name for a device it normally finds the name
in /dev which refers to the device and is shortest. When a path component is given with —
prefer mdadm will prefer a longer name if it contains that component. For example —
prefer=by-uuid will prefer a name in a subdirectory of /dev called by-uuid.
This functionality is currently only provided by —detail and —monitor.
› FOR CREATE, BUILD, OR GROW:
-n, —raid-devices=
Specify the number of active devices in the array. This, plus the number of spare
devices (see below) must equal the number of component-devices (including
“missing” devices) that are listed on the command line for —create. Setting a value
of 1 is probably a mistake and so requires that —force be specified first. A value of 1
will then be allowed for linear, multipath, RAID0 and RAID1. It is never allowed for
RAID4, RAID5 or RAID6. This number can only be changed using —grow for
RAID1, RAID4, RAID5 and RAID6 arrays, and only on kernels which provide the
necessary support.
-x, —spare-devices=
Specify the number of spare (eXtra) devices in the initial array. Spares can also be
added and removed later. The number of component devices listed on the command
line must equal the number of RAID devices plus the number of spare devices.
-z, —size=
Amount (in Kibibytes) of space to use from each drive in RAID levels 1/4/5/6. This
must be a multiple of the chunk size, and must leave about 128Kb of space at the end
of the drive for the RAID superblock. If this is not specified (as it normally is not) the
smallest drive (or partition) sets the size, though if there is a variance among the
drives of greater than 1%, a warning is issued.
A suffix of ‘M’ or ‘G’ can be given to indicate Megabytes or Gigabytes respectively.
Sometimes a replacement drive can be a little smaller than the original drives though
this should be minimised by IDEMA standards. Such a replacement drive will be
rejected by md. To guard against this it can be useful to set the initial size slightly
smaller than the smaller device with the aim that it will still be larger than any
replacement.
This value can be set with —grow for RAID level 1/4/5/6 though CONTAINER
based arrays such as those with IMSM metadata may not be able to support this. If
the array was created with a size smaller than the currently active drives, the extra
space can be accessed using —grow. The size can be given as max which means to
choose the largest size that fits on all current drives.
Before reducing the size of the array (with —grow —size=) you should make sure
that space isn’t needed. If the device holds a filesystem, you would need to resize the
filesystem to use less space.
After reducing the array size you should check that the data stored in the device is
still available. If the device holds a filesystem, then an ‘fsck’ of the filesystem is a
minimum requirement. If there are problems the array can be made bigger again with
no loss with another —grow —size= command.
This value cannot be used when creating a CONTAINER such as with DDF and
IMSM metadata, though it perfectly valid when creating an array inside a container.
-Z, —array-size=
This is only meaningful with —grow and its effect is not persistent: when the array is
stopped and restarted the default array size will be restored.
Setting the array-size causes the array to appear smaller to programs that access the
data. This is particularly needed before reshaping an array so that it will be smaller.
As the reshape is not reversible, but setting the size with —array-size is, it is
required that the array size is reduced as appropriate before the number of devices in
the array is reduced.
Before reducing the size of the array you should make sure that space isn’t needed. If
the device holds a filesystem, you would need to resize the filesystem to use less
space.
After reducing the array size you should check that the data stored in the device is
still available. If the device holds a filesystem, then an ‘fsck’ of the filesystem is a
minimum requirement. If there are problems the array can be made bigger again with
no loss with another —grow —array-size= command.
A suffix of ‘M’ or ‘G’ can be given to indicate Megabytes or Gigabytes respectively.
A value of max restores the apparent size of the array to be whatever the real amount
of available space is.
-c, —chunk=
Specify chunk size of kibibytes. The default when creating an array is 512KB. To
ensure compatibility with earlier versions, the default when building an array with no
persistent metadata is 64KB. This is only meaningful for RAID0, RAID4, RAID5,
RAID6, and RAID10.
RAID4, RAID5, RAID6, and RAID10 require the chunk size to be a power of 2. In
any case it must be a multiple of 4KB.
A suffix of ‘M’ or ‘G’ can be given to indicate Megabytes or Gigabytes respectively.
—rounding=
Specify rounding factor for a Linear array. The size of each component will be
rounded down to a multiple of this size. This is a synonym for —chunk but
highlights the different meaning for Linear as compared to other RAID levels. The
default is 64K if a kernel earlier than 2.6.16 is in use, and is 0K (i.e. no rounding) in
later kernels.
-l, —level=
Set RAID level. When used with —create, options are: linear, raid0, 0, stripe, raid1,
1, mirror, raid4, 4, raid5, 5, raid6, 6, raid10, 10, multipath, mp, faulty, container.
Obviously some of these are synonymous.
When a CONTAINER metadata type is requested, only the container level is
permitted, and it does not need to be explicitly given.
When used with —build, only linear, stripe, raid0, 0, raid1, multipath, mp, and faulty
are valid.
Can be used with —grow to change the RAID level in some cases. See LEVEL
CHANGES below.
-p, —layout=
This option configures the fine details of data layout for RAID5, RAID6, and
RAID10 arrays, and controls the failure modes for faulty.
The layout of the RAID5 parity block can be one of left-asymmetric, left-
symmetric, right-asymmetric, right-symmetric, la, ra, ls, rs. The default is left-
symmetric.
It is also possible to cause RAID5 to use a RAID4-like layout by choosing parity-
first, or parity-last.
Finally for RAID5 there are DDF-compatible layouts, ddf-zero-restart, ddf-N-
restart, and ddf-N-continue.
These same layouts are available for RAID6. There are also 4 layouts that will
provide an intermediate stage for converting between RAID5 and RAID6. These
provide a layout which is identical to the corresponding RAID5 layout on the first N-
1 devices, and has the ‘Q’ syndrome (the second ‘parity’ block used by RAID6) on
the last device. These layouts are: left-symmetric-6, right-symmetric-6, left-
asymmetric-6, right-asymmetric-6, and parity-first-6.
When setting the failure mode for level faulty, the options are: write-transient, wt,
read-transient, rt, write-persistent, wp, read-persistent, rp, write-all, read-
fixable, rf, clear, flush, none.
Each failure mode can be followed by a number, which is used as a period between
fault generation. Without a number, the fault is generated once on the first relevant
request. With a number, the fault will be generated after that many requests, and will
continue to be generated every time the period elapses.
Multiple failure modes can be current simultaneously by using the —grow option to
set subsequent failure modes.
“clear” or “none” will remove any pending or periodic failure modes, and “flush”
will clear any persistent faults.
Finally, the layout options for RAID10 are one of ‘n’, ‘o’ or ‘f’ followed by a small
number. The default is ‘n2’. The supported options are:
n signals ‘near’ copies. Multiple copies of one data block are at similar offsets in
different devices.
o signals ‘offset’ copies. Rather than the chunks being duplicated within a stripe,
whole stripes are duplicated but are rotated by one device so duplicate blocks are on
different devices. Thus subsequent copies of a block are in the next drive, and are one
chunk further down.
f signals ‘far’ copies (multiple copies have very different offsets). See md(4) for more
detail about ‘near’, ‘offset’, and ‘far’.
The number is the number of copies of each datablock. 2 is normal, 3 can be useful.
This number can be at most equal to the number of devices in the array. It does not
need to divide evenly into that number (e.g. it is perfectly legal to have an ‘n2’ layout
for an array with an odd number of devices).
When an array is converted between RAID5 and RAID6 an intermediate RAID6
layout is used in which the second parity block (Q) is always on the last device. To
convert a RAID5 to RAID6 and leave it in this new layout (which does not require
re-striping) use —layout=preserve. This will try to avoid any restriping.
The converse of this is —layout=normalise which will change a non-standard
RAID6 layout into a more standard arrangement.
—parity=
same as —layout (thus explaining the p of -p).
-b, —bitmap=
Specify a file to store a write-intent bitmap in. The file should not exist unless —
force is also given. The same file should be provided when assembling the array. If
the word internal is given, then the bitmap is stored with the metadata on the array,
and so is replicated on all devices. If the word none is given with —grow mode, then
any bitmap that is present is removed.
To help catch typing errors, the filename must contain at least one slash (‘/’) if it is a
real file (not ‘internal’ or ‘none’).
Note: external bitmaps are only known to work on ext2 and ext3. Storing bitmap files
on other filesystems may result in serious problems.
When creating an array on devices which are 100G or larger, mdadm automatically
adds an internal bitmap as it will usually be beneficial. This can be suppressed with
—bitmap=none.
—bitmap-chunk=
Set the chunksize of the bitmap. Each bit corresponds to that many Kilobytes of
storage. When using a file based bitmap, the default is to use the smallest size that is
at-least 4 and requires no more than 2^21 chunks. When using an internal bitmap,
the chunksize defaults to 64Meg, or larger if necessary to fit the bitmap into the
available space.
A suffix of ‘M’ or ‘G’ can be given to indicate Megabytes or Gigabytes respectively.
-W, —write-mostly
subsequent devices listed in a —build, —create, or —add command will be flagged
as ‘write-mostly’. This is valid for RAID1 only and means that the ‘md’ driver will
avoid reading from these devices if at all possible. This can be useful if mirroring
over a slow link.
—write-behind=
Specify that write-behind mode should be enabled (valid for RAID1 only). If an
argument is specified, it will set the maximum number of outstanding writes allowed.
The default value is 256. A write-intent bitmap is required in order to use write-
behind mode, and write-behind is only attempted on drives marked as write-mostly.
—assume-clean
Tell mdadm that the array pre-existed and is known to be clean. It can be useful when
trying to recover from a major failure as you can be sure that no data will be affected
unless you actually write to the array. It can also be used when creating a RAID1 or
RAID10 if you want to avoid the initial resync, however this practice – while
normally safe – is not recommended. Use this only if you really know what you are
doing.
When the devices that will be part of a new array were filled with zeros before
creation the operator knows the array is actually clean. If that is the case, such as
after running badblocks, this argument can be used to tell mdadm the facts the
operator knows.
When an array is resized to a larger size with —grow —size= the new space is
normally resynced in that same way that the whole array is resynced at creation.
From Linux version 3.0, —assume-clean can be used with that command to avoid
the automatic resync.
—backup-file=
This is needed when —grow is used to increase the number of raid-devices in a
RAID5 or RAID6 if there are no spare devices available, or to shrink, change RAID
level or layout. See the GROW MODE section below on RAID-DEVICES
CHANGES. The file must be stored on a separate device, not on the RAID array
being reshaped.
—data-offset=
Arrays with 1.x metadata can leave a gap between the start of the device and the start
of array data. This gap can be used for various metadata. The start of data is known
as the data-offset. Normally an appropriate data offset is computed automatically.
However it can be useful to set it explicitly such as when re-creating an array which
was originally created using a different version of mdadm which computed a different
offset.
Setting the offset explicitly over-rides the default. The value given is in Kilobytes
unless an ‘M’ or ‘G’ suffix is given.
Since Linux 3.4, —data-offset can also be used with —grow for some RAID levels
(initially on RAID10). This allows the data-offset to be changed as part of the
reshape process. When the data offset is changed, no backup file is required as the
difference in offsets is used to provide the same functionality.
When the new offset is earlier than the old offset, the number of devices in the array
cannot shrink. When it is after the old offset, the number of devices in the array
cannot increase.
When creating an array, —data-offset can be specified as variable. In the case each
member device is expected to have a offset appended to the name, separated by a
colon. This makes it possible to recreate exactly an array which has varying data
offsets (as can happen when different versions of mdadm are used to add different
devices).
—continue
This option is complementary to the —freeze-reshape option for assembly. It is
needed when —grow operation is interrupted and it is not restarted automatically due
to —freeze-reshape usage during array assembly. This option is used together with -
G , ( —grow ) command and device for a pending reshape to be continued. All
parameters required for reshape continuation will be read from array metadata. If
initial —grow command had required —backup-file= option to be set, continuation
option will require to have exactly the same backup file given as well.
Any other parameter passed together with —continue option will be ignored.
-N, —name=
Set a name for the array. This is currently only effective when creating an array with
a version-1 superblock, or an array in a DDF container. The name is a simple textual
string that can be used to identify array components when assembling. If name is
needed but not specified, it is taken from the basename of the device that is being
created. e.g. when creating /dev/md/home the name will default to home.
-R, —run
Insist that mdadm run the array, even if some of the components appear to be active
in another array or filesystem. Normally mdadm will ask for confirmation before
including such components in an array. This option causes that question to be
suppressed.
-f, —force
Insist that mdadm accept the geometry and layout specified without question.
Normally mdadm will not allow creation of an array with only one device, and will
try to create a RAID5 array with one missing drive (as this makes the initial resync
work faster). With —force, mdadm will not try to be so clever.
-o, —readonly
Start the array read only rather than read-write as normal. No writes will be allowed
to the array, and no resync, recovery, or reshape will be started.
-a, —auto{=yes,md,mdp,part,p}{NN}
Instruct mdadm how to create the device file if needed, possibly allocating an unused
minor number. “md” causes a non-partitionable array to be used (though since Linux
2.6.28, these array devices are in fact partitionable). “mdp”, “part” or “p” causes a
partitionable array (2.6 and later) to be used. “yes” requires the named md device to
have a ‘standard’ format, and the type and minor number will be determined from
this. With mdadm 3.0, device creation is normally left up to udev so this option is
unlikely to be needed. See DEVICE NAMES below.
The argument can also come immediately after “-a”. e.g. “-ap”.
If —auto is not given on the command line or in the config file, then the default will
be —auto=yes.
If —scan is also given, then any auto= entries in the config file will override the —
auto instruction given on the command line.
For partitionable arrays, mdadm will create the device file for the whole array and for
the first 4 partitions. A different number of partitions can be specified at the end of
this option (e.g. —auto=p7). If the device name ends with a digit, the partition names
add a ‘p’, and a number, e.g. /dev/md/home1p3. If there is no trailing digit, then the
partition names just have a number added, e.g. /dev/md/scratch3.
If the md device name is in a ‘standard’ format as described in DEVICE NAMES,
then it will be created, if necessary, with the appropriate device number based on that
name. If the device name is not in one of these formats, then a unused device number
will be allocated. The device number will be considered unused if there is no active
array for that number, and there is no entry in /dev for that number and with a non-
standard name. Names that are not in ‘standard’ format are only allowed in
“/dev/md/”.
This is meaningful with —create or —build.
-a, —add
This option can be used in Grow mode in two cases.
If the target array is a Linear array, then —add can be used to add one or more
devices to the array. They are simply catenated on to the end of the array. Once
added, the devices cannot be removed.
If the —raid-disks option is being used to increase the number of devices in an array,
then —add can be used to add some extra devices to be included in the array. In most
cases this is not needed as the extra devices can be added as spares first, and then the
number of raid-disks can be changed. However for RAID0, it is not possible to add
spares. So to increase the number of devices in a RAID0, it is necessary to set the
new number of devices, and to add the new devices, in the same command.
› FOR ASSEMBLE:
-u, —uuid=
uuid of array to assemble. Devices which don’t have this uuid are excluded
-m, —super-minor=
Minor number of device that array was created for. Devices which don’t have this
minor number are excluded. If you create an array as /dev/md1, then all superblocks
will contain the minor number 1, even if the array is later assembled as /dev/md2.
Giving the literal word “dev” for —super-minor will cause mdadm to use the minor
number of the md device that is being assembled. e.g. when assembling /dev/md0, —
super-minor=dev will look for super blocks with a minor number of 0.
—super-minor is only relevant for v0.90 metadata, and should not normally be used.
Using —uuid is much safer.
-N, —name=
Specify the name of the array to assemble. This must be the name that was specified
when creating the array. It must either match the name stored in the superblock
exactly, or it must match with the current homehost prefixed to the start of the given
name.
-f, —force
Assemble the array even if the metadata on some devices appears to be out-of-date. If
mdadm cannot find enough working devices to start the array, but can find some
devices that are recorded as having failed, then it will mark those devices as working
so that the array can be started. An array which requires —force to be started may
contain data corruption. Use it carefully.
-R, —run
Attempt to start the array even if fewer drives were given than were present last time
the array was active. Normally if not all the expected drives are found and —scan is
not used, then the array will be assembled but not started. With —run an attempt will
be made to start it anyway.
—no-degraded
This is the reverse of —run in that it inhibits the startup of array unless all expected
drives are present. This is only needed with —scan, and can be used if the physical
connections to devices are not as reliable as you would like.
-a, —auto{=no,yes,md,mdp,part}
See this option under Create and Build options.
-b, —bitmap=
Specify the bitmap file that was given when the array was created. If an array has an
internal bitmap, there is no need to specify this when assembling the array.
—backup-file=
If —backup-file was used while reshaping an array (e.g. changing number of devices
or chunk size) and the system crashed during the critical section, then the same —
backup-file must be presented to —assemble to allow possibly corrupted data to be
restored, and the reshape to be completed.
—invalid-backup
If the file needed for the above option is not available for any reason an empty file
can be given together with this option to indicate that the backup file is invalid. In
this case the data that was being rearranged at the time of the crash could be
irrecoverably lost, but the rest of the array may still be recoverable. This option
should only be used as a last resort if there is no way to recover the backup file.
-U, —update=
Update the superblock on each device while assembling the array. The argument
given to this flag can be one of sparc2.2, summaries, uuid, name, homehost,
resync, byteorder, devicesize, no-bitmap, bbl, no-bbl, metadata, or super-minor.
The sparc2.2 option will adjust the superblock of an array what was created on a
Sparc machine running a patched 2.2 Linux kernel. This kernel got the alignment of
part of the superblock wrong. You can use the —examine —sparc2.2 option to
mdadm to see what effect this would have.
The super-minor option will update the preferred minor field on each superblock to
match the minor number of the array being assembled. This can be useful if —
examine reports a different “Preferred Minor” to —detail. In some cases this update
will be performed automatically by the kernel driver. In particular the update happens
automatically at the first write to an array with redundancy (RAID level 1 or greater)
on a 2.6 (or later) kernel.
The uuid option will change the uuid of the array. If a UUID is given with the —
uuid option that UUID will be used as a new UUID and will NOT be used to help
identify the devices in the array. If no —uuid is given, a random UUID is chosen.
The name option will change the name of the array as stored in the superblock. This
is only supported for version-1 superblocks.
The homehost option will change the homehost as recorded in the superblock. For
version-0 superblocks, this is the same as updating the UUID. For version-1
superblocks, this involves updating the name.
The resync option will cause the array to be marked dirty meaning that any
redundancy in the array (e.g. parity for RAID5, copies for RAID1) may be incorrect.
This will cause the RAID system to perform a “resync” pass to make sure that all
redundant information is correct.
The byteorder option allows arrays to be moved between machines with different
byte-order. When assembling such an array for the first time after a move, giving —
update=byteorder will cause mdadm to expect superblocks to have their byteorder
reversed, and will correct that order before assembling the array. This is only valid
with original (Version 0.90) superblocks.
The summaries option will correct the summaries in the superblock. That is the
counts of total, working, active, failed, and spare devices.
The devicesize option will rarely be of use. It applies to version 1.1 and 1.2 metadata
only (where the metadata is at the start of the device) and is only useful when the
component device has changed size (typically become larger). The version 1
metadata records the amount of the device that can be used to store data, so if a
device in a version 1.1 or 1.2 array becomes larger, the metadata will still be visible,
but the extra space will not. In this case it might be useful to assemble the array with
—update=devicesize. This will cause mdadm to determine the maximum usable
amount of space on each device and update the relevant field in the metadata.
The metadata option only works on v0.90 metadata arrays and will convert them to
v1.0 metadata. The array must not be dirty (i.e. it must not need a sync) and it must
not have a write-intent bitmap.
The old metadata will remain on the devices, but will appear older than the new
metadata and so will usually be ignored. The old metadata (or indeed the new
metadata) can be removed by giving the appropriate —metadata= option to —zero-
superblock.
The no-bitmap option can be used when an array has an internal bitmap which is
corrupt in some way so that assembling the array normally fails. It will cause any
internal bitmap to be ignored.
The bbl option will reserve space in each device for a bad block list. This will be 4K
in size and positioned near the end of any free space between the superblock and the
data.
The no-bbl option will cause any reservation of space for a bad block list to be
removed. If the bad block list contains entries, this will fail, as removing the list
could cause data corruption.
—freeze-reshape
Option is intended to be used in start-up scripts during initrd boot phase. When array
under reshape is assembled during initrd phase, this option stops reshape after
reshape critical section is being restored. This happens before file system pivot
operation and avoids loss of file system context. Losing file system context would
cause reshape to be broken.
Reshape can be continued later using the —continue option for the grow command.
› FOR MANAGE MODE:
-t, —test
Unless a more serious error occurred, mdadm will exit with a status of 2 if no
changes were made to the array and 0 if at least one change was made. This can be
useful when an indirect specifier such as missing, detached or faulty is used in
requesting an operation on the array. —test will report failure if these specifiers
didn’t find any match.
-a, —add
hot-add listed devices. If a device appears to have recently been part of the array
(possibly it failed or was removed) the device is re-added as described in the next
point. If that fails or the device was never part of the array, the device is added as a
hot-spare. If the array is degraded, it will immediately start to rebuild data onto that
spare.
Note that this and the following options are only meaningful on array with
redundancy. They don’t apply to RAID0 or Linear.
—re-add
re-add a device that was previously removed from an array. If the metadata on the
device reports that it is a member of the array, and the slot that it used is still vacant,
then the device will be added back to the array in the same position. This will
normally cause the data for that device to be recovered. However based on the event
count on the device, the recovery may only require sections that are flagged a write-
intent bitmap to be recovered or may not require any recovery at all.
When used on an array that has no metadata (i.e. it was built with —build) it will be
assumed that bitmap-based recovery is enough to make the device fully consistent
with the array.
When used with v1.x metadata, —re-add can be accompanied by —
update=devicesize, —update=bbl, or —update=no-bbl. See the description of
these option when used in Assemble mode for an explanation of their use.
If the device name given is missing then mdadm will try to find any device that looks
like it should be part of the array but isn’t and will try to re-add all such devices.
If the device name given is faulty then mdadm will find all devices in the array that
are marked faulty, remove them and attempt to immediately re-add them. This can be
useful if you are certain that the reason for failure has been resolved.
—add-spare
Add a device as a spare. This is similar to —add except that it does not attempt —re-
add first. The device will be added as a spare even if it looks like it could be an
recent member of the array.
-r, —remove
remove listed devices. They must not be active. i.e. they should be failed or spare
devices.
As well as the name of a device file (e.g. /dev/sda1) the words failed, detached and
names like set-A can be given to —remove. The first causes all failed device to be
removed. The second causes any device which is no longer connected to the system
(i.e an ‘open’ returns ENXIO) to be removed. The third will remove a set as describe
below under —fail.
-f, —fail
Mark listed devices as faulty. As well as the name of a device file, the word detached
or a set name like set-A can be given. The former will cause any device that has been
detached from the system to be marked as failed. It can then be removed.
For RAID10 arrays where the number of copies evenly divides the number of
devices, the devices can be conceptually divided into sets where each set contains a
single complete copy of the data on the array. Sometimes a RAID10 array will be
configured so that these sets are on separate controllers. In this case all the devices in
one set can be failed by giving a name like set-A or set-B to —fail. The appropriate
set names are reported by —detail.
—set-faulty
same as —fail.
—replace
Mark listed devices as requiring replacement. As soon as a spare is available, it will
be rebuilt and will replace the marked device. This is similar to marking a device as
faulty, but the device remains in service during the recovery process to increase
resilience against multiple failures. When the replacement process finishes, the
replaced device will be marked as faulty.
—with
This can follow a list of —replace devices. The devices listed after —with will be
preferentially used to replace the devices listed after —replace. These device must
already be spare devices in the array.
—write-mostly
Subsequent devices that are added or re-added will have the ‘write-mostly’ flag set.
This is only valid for RAID1 and means that the ‘md’ driver will avoid reading from
these devices if possible.
—readwrite
Subsequent devices that are added or re-added will have the ‘write-mostly’ flag
cleared.
Each of these options requires that the first device listed is the array to be acted upon, and
the remainder are component devices to be added, removed, marked as faulty, etc. Several
different operations can be specified for different devices, e.g. mdadm /dev/md0 —add
/dev/sda1 —fail /dev/sdb1 —remove /dev/sdb1 Each operation applies to all devices listed
until the next operation.
If an array is using a write-intent bitmap, then devices which have been removed can be
re-added in a way that avoids a full reconstruction but instead just updates the blocks that
have changed since the device was removed. For arrays with persistent metadata
(superblocks) this is done automatically. For arrays created with —build mdadm needs to
be told that this device we removed recently with —re-add.
Devices can only be removed from an array if they are not in active use, i.e. that must be
spares or failed devices. To remove an active device, it must first be marked as faulty.
› FOR MISC MODE:
-Q, —query
Examine a device to see (1) if it is an md device and (2) if it is a component of an md
array. Information about what is discovered is presented.
-D, —detail
Print details of one or more md devices.
—detail-platform
Print details of the platform’s RAID capabilities (firmware / hardware topology) for a
given metadata format. If used without argument, mdadm will scan all controllers
looking for their capabilities. Otherwise, mdadm will only look at the controller
specified by the argument in form of an absolute filepath or a link, e.g.
/sys/devices/pci0000:00/0000:00:1f.2.
-Y, —export
When used with —detail, —detail-platform, —examine, or —incremental output
will be formatted as key=value pairs for easy import into the environment.
With —incremental The value MD_STARTED indicates whether an array was
started (yes) or not, which may include a reason (unsafe, nothing, no). Also the
value MD_FOREIGN indicates if the array is expected on this host (no), or seems to
be from elsewhere (yes).
-E, —examine
Print contents of the metadata stored on the named device(s). Note the contrast
between —examine and —detail. —examine applies to devices which are
components of an array, while —detail applies to a whole array which is currently
active.
—sparc2.2
If an array was created on a SPARC machine with a 2.2 Linux kernel patched with
RAID support, the superblock will have been created incorrectly, or at least
incompatibly with 2.4 and later kernels. Using the —sparc2.2 flag with —examine
will fix the superblock before displaying it. If this appears to do the right thing, then
the array can be successfully assembled using —assemble —update=sparc2.2.
-X, —examine-bitmap
Report information about a bitmap file. The argument is either an external bitmap file
or an array component in case of an internal bitmap. Note that running this on an
array device (e.g. /dev/md0) does not report the bitmap for that array.
—examine-badblocks
List the bad-blocks recorded for the device, if a bad-blocks list has been configured.
Currently only 1.x metadata supports bad-blocks lists.
—dump=directory
—restore=directory
Save metadata from lists devices, or restore metadata to listed devices.
-R, —run
start a partially assembled array. If —assemble did not find enough devices to fully
start the array, it might leaving it partially assembled. If you wish, you can then use
—run to start the array in degraded mode.
-S, —stop
deactivate array, releasing all resources.
-o, —readonly
mark array as readonly.
-w, —readwrite
mark array as readwrite.
—zero-superblock
If the device contains a valid md superblock, the block is overwritten with zeros.
With —force the block where the superblock would be is overwritten even if it
doesn’t appear to be valid.
—kill-subarray=
If the device is a container and the argument to —kill-subarray specifies an inactive
subarray in the container, then the subarray is deleted. Deleting all subarrays will
leave an ‘empty-container’ or spare superblock on the drives. See —zero-superblock
for completely removing a superblock. Note that some formats depend on the
subarray index for generating a UUID, this command will fail if it would change the
UUID of an active subarray.
—update-subarray=
If the device is a container and the argument to —update-subarray specifies a
subarray in the container, then attempt to update the given superblock field in the
subarray. See below in MISC MODE for details.
-t, —test
When used with —detail, the exit status of mdadm is set to reflect the status of the
device. See below in MISC MODE for details.
-W, —wait
For each md device given, wait for any resync, recovery, or reshape activity to finish
before returning. mdadm will return with success if it actually waited for every
device listed, otherwise it will return failure.
—wait-clean
For each md device given, or each device in /proc/mdstat if —scan is given, arrange
for the array to be marked clean as soon as possible. mdadm will return with success
if the array uses external metadata and we successfully waited. For native arrays this
returns immediately as the kernel handles dirty-clean transitions at shutdown. No
action is taken if safe-mode handling is disabled.
—action=
Set the “sync_action” for all md devices given to one of idle, frozen, check, repair.
Setting to idle will abort any currently running action though some actions will
automatically restart. Setting to frozen will abort any current action and ensure no
other action starts automatically.
Details of check and repair can be found it md(4) under SCRUBBING AND
MISMATCHES.
› FOR INCREMENTAL ASSEMBLY MODE:
—rebuild-map, -r
Rebuild the map file (/run/mdadm/map) that mdadm uses to help track which arrays
are currently being assembled.
—run, -R
Run any array assembled as soon as a minimal number of devices are available,
rather than waiting until all expected devices are present.
—scan, -s
Only meaningful with -R this will scan the map file for arrays that are being
incrementally assembled and will try to start any that are not already started. If any
such array is listed in mdadm.conf as requiring an external bitmap, that bitmap will
be attached first.
—fail, -f
This allows the hot-plug system to remove devices that have fully disappeared from
the kernel. It will first fail and then remove the device from any array it belongs to.
The device name given should be a kernel device name such as “sda”, not a name in
/dev.
—path=
Only used with —fail. The ‘path’ given will be recorded so that if a new device
appears at the same location it can be automatically added to the same array. This
allows the failed device to be automatically replaced by a new device without
metadata if it appears at specified path. This option is normally only set by a udev
script.
› FOR MONITOR MODE:
-m, —mail
Give a mail address to send alerts to.
-p, —program, —alert
Give a program to be run whenever an event is detected.
-y, —syslog
Cause all events to be reported through ‘syslog’. The messages have facility of
‘daemon’ and varying priorities.
-d, —delay
Give a delay in seconds. mdadm polls the md arrays and then waits this many
seconds before polling again. The default is 60 seconds. Since 2.6.16, there is no
need to reduce this as the kernel alerts mdadm immediately when there is any change.
-r, —increment
Give a percentage increment. mdadm will generate RebuildNN events with the given
percentage increment.
-f, —daemonise
Tell mdadm to run as a background daemon if it decides to monitor anything. This
causes it to fork and run in the child, and to disconnect from the terminal. The
process id of the child is written to stdout. This is useful with —scan which will only
continue monitoring if a mail address or alert program is found in the config file.
-i, —pid-file
When mdadm is running in daemon mode, write the pid of the daemon process to the
specified file, instead of printing it on standard output.
-1, —oneshot
Check arrays only once. This will generate NewArray events and more significantly
DegradedArray and SparesMissing events. Running mdadm —monitor —scan -1
from a cron script will ensure regular notification of any degraded arrays.
-t, —test
Generate a TestMessage alert for every array found at startup. This alert gets mailed
and passed to the alert program. This can be used for testing that alert message do get
through successfully.
—no-sharing
This inhibits the functionality for moving spares between arrays. Only one
monitoring process started with —scan but without this flag is allowed, otherwise the
two could interfere with each other.
› ASSEMBLE MODE
Usage: mdadm —assemble md-device options-and-component-devices…
Usage: mdadm —assemble —scan md-devices-and-options…
Usage: mdadm —assemble —scan options…
This usage assembles one or more RAID arrays from pre-existing components. For each
array, mdadm needs to know the md device, the identity of the array, and a number of
component-devices. These can be found in a number of ways.
In the first usage example (without the —scan) the first device given is the md device. In
the second usage example, all devices listed are treated as md devices and assembly is
attempted. In the third (where no devices are listed) all md devices that are listed in the
configuration file are assembled. If no arrays are described by the configuration file, then
any arrays that can be found on unused devices will be assembled.
If precisely one device is listed, but —scan is not given, then mdadm acts as though —
scan was given and identity information is extracted from the configuration file.
The identity can be given with the —uuid option, the —name option, or the —super-
minor option, will be taken from the md-device record in the config file, or will be taken
from the super block of the first component-device listed on the command line.
Devices can be given on the —assemble command line or in the config file. Only devices
which have an md superblock which contains the right identity will be considered for any
array.
The config file is only used if explicitly named with —config or requested with (a
possibly implicit) —scan. In the later case, /etc/mdadm.conf or
/etc/mdadm/mdadm.conf is used.
If —scan is not given, then the config file will only be used to find the identity of md
arrays.
Normally the array will be started after it is assembled. However if —scan is not given
and not all expected drives were listed, then the array is not started (to guard against usage
errors). To insist that the array be started in this case (as may work for RAID1, 4, 5, 6, or
10), give the —run flag.
If udev is active, mdadm does not create any entries in /dev but leaves that to udev. It does
record information in /run/mdadm/map which will allow udev to choose the correct
name.
If mdadm detects that udev is not configured, it will create the devices in /dev itself.
In Linux kernels prior to version 2.6.28 there were two distinctly different types of md
devices that could be created: one that could be partitioned using standard partitioning
tools and one that could not. Since 2.6.28 that distinction is no longer relevant as both type
of devices can be partitioned. mdadm will normally create the type that originally could
not be partitioned as it has a well defined major number (9).
Prior to 2.6.28, it is important that mdadm chooses the correct type of array device to use.
This can be controlled with the —auto option. In particular, a value of “mdp” or “part” or
“p” tells mdadm to use a partitionable device rather than the default.
In the no-udev case, the value given to —auto can be suffixed by a number. This tells
mdadm to create that number of partition devices rather than the default of 4.
The value given to —auto can also be given in the configuration file as a word starting
auto= on the ARRAY line for the relevant array.
Auto Assembly
When —assemble is used with —scan and no devices are listed, mdadm will first attempt
to assemble all the arrays listed in the config file.
If no arrays are listed in the config (other than those marked <ignore>) it will look
through the available devices for possible arrays and will try to assemble anything that it
finds. Arrays which are tagged as belonging to the given homehost will be assembled and
started normally. Arrays which do not obviously belong to this host are given names that
are expected not to conflict with anything local, and are started “read-auto” so that nothing
is written to any device until the array is written to. i.e. automatic resync etc is delayed.
If mdadm finds a consistent set of devices that look like they should comprise an array,
and if the superblock is tagged as belonging to the given home host, it will automatically
choose a device name and try to assemble the array. If the array uses version-0.90
metadata, then the minor number as recorded in the superblock is used to create a name in
/dev/md/ so for example /dev/md/3. If the array uses version-1 metadata, then the name
from the superblock is used to similarly create a name in /dev/md/ (the name will have
any ‘host’ prefix stripped first).
This behaviour can be modified by the AUTO line in the mdadm.conf configuration file.
This line can indicate that specific metadata type should, or should not, be automatically
assembled. If an array is found which is not listed in mdadm.conf and has a metadata
format that is denied by the AUTO line, then it will not be assembled. The AUTO line can
also request that all arrays identified as being for this homehost should be assembled
regardless of their metadata type. See mdadm.conf(5) for further details.
Note: Auto assembly cannot be used for assembling and activating some arrays which are
undergoing reshape. In particular as the backup-file cannot be given, any reshape which
requires a backup-file to continue cannot be started by auto assembly. An array which is
growing to more devices and has passed the critical section can be assembled using auto-
assembly.
› BUILD MODE
Usage: mdadm —build md-device —chunk=X —level=Y —raid-devices=Z devices
This usage is similar to —create. The difference is that it creates an array without a
superblock. With these arrays there is no difference between initially creating the array
and subsequently assembling the array, except that hopefully there is useful data there in
the second case.
The level may raid0, linear, raid1, raid10, multipath, or faulty, or one of their synonyms.
All devices must be listed and the array will be started once complete. It will often be
appropriate to use —assume-clean with levels raid1 or raid10.
› CREATE MODE
Usage: mdadm —create md-device —chunk=X —level=Y —raid-devices=Z
devices
This usage will initialise a new md array, associate some devices with it, and activate the
array.
The named device will normally not exist when mdadm —create is run, but will be created
by udev once the array becomes active.
As devices are added, they are checked to see if they contain RAID superblocks or
filesystems. They are also checked to see if the variance in device size exceeds 1%.
If any discrepancy is found, the array will not automatically be run, though the presence of
a —run can override this caution.
To create a “degraded” array in which some devices are missing, simply give the word
“missing” in place of a device name. This will cause mdadm to leave the corresponding
slot in the array empty. For a RAID4 or RAID5 array at most one slot can be “missing; for
a RAID6 array at most two slots. For a RAID1 array, only one real device needs to be
given. All of the others can be missing“.
When creating a RAID5 array, mdadm will automatically create a degraded array with an
extra spare drive. This is because building the spare into a degraded array is in general
faster than resyncing the parity on a non-degraded, but not clean, array. This feature can
be overridden with the —force option.
When creating an array with version-1 metadata a name for the array is required. If this is
not given with the —name option, mdadm will choose a name based on the last
component of the name of the device being created. So if /dev/md3 is being created, then
the name 3 will be chosen. If /dev/md/home is being created, then the name home will be
used.
When creating a partition based array, using mdadm with version-1.x metadata, the
partition type should be set to 0xDA (non fs-data). This type selection allows for greater
precision since using any other [RAID auto-detect (0xFD) or a GNU/Linux partition
(0x83)], might create problems in the event of array recovery through a live cdrom.
A new array will normally get a randomly assigned 128bit UUID which is very likely to
be unique. If you have a specific need, you can choose a UUID for the array by giving the
—uuid= option. Be warned that creating two arrays with the same UUID is a recipe for
disaster. Also, using —uuid= when creating a v0.90 array will silently override any —
homehost= setting.
If the array type supports a write-intent bitmap, and if the devices in the array exceed
100G is size, an internal write-intent bitmap will automatically be added unless some
other option is explicitly requested with the —bitmap option. In any case space for a
bitmap will be reserved so that one can be added layer with —grow —bitmap=internal.
If the metadata type supports it (currently only 1.x metadata), space will be allocated to
store a bad block list. This allows a modest number of bad blocks to be recorded, allowing
the drive to remain in service while only partially functional.
When creating an array within a CONTAINER mdadm can be given either the list of
devices to use, or simply the name of the container. The former case gives control over
which devices in the container will be used for the array. The latter case allows mdadm to
automatically choose which devices to use based on how much spare space is available.
The General Management options that are valid with —create are:
—run
insist on running the array even if some devices look like they might be in use.
—readonly
start the array readonly – not supported yet.
› MANAGE MODE
Usage: mdadm device options… devices…
This usage will allow individual devices in an array to be failed, removed or added. It is
possible to perform multiple operations with on command. For example: mdadm
/dev/md0 -f /dev/hda1 -r /dev/hda1 -a /dev/hda1 will firstly mark /dev/hda1 as faulty in
/dev/md0 and will then remove it from the array and finally add it back in as a spare.
However only one md array can be affected by a single command.
When a device is added to an active array, mdadm checks to see if it has metadata on it
which suggests that it was recently a member of the array. If it does, it tries to “re-add” the
device. If there have been no changes since the device was removed, or if the array has a
write-intent bitmap which has recorded whatever changes there were, then the device will
immediately become a full member of the array and those differences recorded in the
bitmap will be resolved.
› MISC MODE
Usage: mdadm options … devices …
MISC mode includes a number of distinct operations that operate on distinct devices. The
operations are:
—query
The device is examined to see if it is (1) an active md array, or (2) a component of an
md array. The information discovered is reported.
—detail
The device should be an active md device. mdadm will display a detailed description
of the array. —brief or —scan will cause the output to be less detailed and the format
to be suitable for inclusion in mdadm.conf. The exit status of mdadm will normally
be 0 unless mdadm failed to get useful information about the device(s); however, if
the —test option is given, then the exit status will be:
0
The array is functioning normally.
1
The array has at least one failed device.
2
The array has multiple failed devices such that it is unusable.
4
There was an error while trying to get information about the device.
—detail-platform Print detail of the platform’s RAID capabilities (firmware / hardware
topology). If the metadata is specified with -e or —metadata= then the return status will
be:
0
metadata successfully enumerated its platform components on this system
1
metadata is platform independent
2
metadata failed to find its platform components on this system
—update-subarray= If the device is a container and the argument to —update-subarray
specifies a subarray in the container, then attempt to update the given superblock field in
the subarray. Similar to updating an array in “assemble” mode, the field to update is
selected by -U or —update= option. Currently only name is supported.
The name option updates the subarray name in the metadata, it may not affect the device
node name or the device node symlink until the subarray is re-assembled. If updating
name would change the UUID of an active subarray this operation is blocked, and the
command will end in an error.
—examine The device should be a component of an md array. mdadm will read the md
superblock of the device and display the contents. If —brief or —scan is given, then
multiple devices that are components of the one array are grouped together and reported in
a single entry suitable for inclusion in mdadm.conf.
Having —scan without listing any devices will cause all devices listed in the config file to
be examined.
—dump=directory If the device contains RAID metadata, a file will be created in the
directory and the metadata will be written to it. The file will be the same size as the device
and have the metadata written in the file at the same locate that it exists in the device.
However the file will be “sparse” so that only those blocks containing metadata will be
allocated. The total space used will be small.
The file name used in the directory will be the base name of the device. Further if any
links appear in /dev/disk/by-id which point to the device, then hard links to the file will be
created in directory based on these by-id names.
Multiple devices can be listed and their metadata will all be stored in the one directory.
—restore=directory This is the reverse of —dump. mdadm will locate a file in the
directory that has a name appropriate for the given device and will restore metadata from
it. Names that match /dev/disk/by-id names are preferred, however if two of those refer to
different files, mdadm will not choose between them but will abort the operation.
If a file name is given instead of a directory then mdadm will restore from that file to a
single device, always provided the size of the file matches that of the device, and the file
contains valid metadata.
—stop The devices should be active md arrays which will be deactivated, as long as they
are not currently in use.
—run This will fully activate a partially assembled md array.
—readonly This will mark an active array as read-only, providing that it is not currently
being used.
—readwrite This will change a readonly array back to being read/write.
—scan For all operations except —examine, —scan will cause the operation to be
applied to all arrays listed in /proc/mdstat. For —examine, —scan causes all devices
listed in the config file to be examined.
-b, —brief Be less verbose. This is used with —detail and —examine. Using —brief
with —verbose gives an intermediate level of verbosity.
› MONITOR MODE
Usage: mdadm —monitor options… devices…
This usage causes mdadm to periodically poll a number of md arrays and to report on any
events noticed. mdadm will never exit once it decides that there are arrays to be checked,
so it should normally be run in the background.
As well as reporting events, mdadm may move a spare drive from one array to another if
they are in the same spare-group or domain and if the destination array has a failed drive
but no spares.
If any devices are listed on the command line, mdadm will only monitor those devices.
Otherwise all arrays listed in the configuration file will be monitored. Further, if —scan is
given, then any other md devices that appear in /proc/mdstat will also be monitored.
The result of monitoring the arrays is the generation of events. These events are passed to
a separate program (if specified) and may be mailed to a given E-mail address.
When passing events to a program, the program is run once for each event, and is given 2
or 3 command-line arguments: the first is the name of the event (see below), the second is
the name of the md device which is affected, and the third is the name of a related device
if relevant (such as a component device that has failed).
If —scan is given, then a program or an E-mail address must be specified on the
command line or in the config file. If neither are available, then mdadm will not monitor
anything. Without —scan, mdadm will continue monitoring as long as something was
found to monitor. If no program or email is given, then each event is reported to stdout.
The different events are:
DeviceDisappeared
An md array which previously was configured appears to no longer be configured.
(syslog priority: Critical)
If mdadm was told to monitor an array which is RAID0 or Linear, then it will report
DeviceDisappeared with the extra information Wrong-Level. This is because
RAID0 and Linear do not support the device-failed, hot-spare and resync operations
which are monitored.
RebuildStarted
An md array started reconstruction (e.g. recovery, resync, reshape, check, repair).
(syslog priority: Warning)
RebuildNN
Where NN is a two-digit number (ie. 05, 48). This indicates that rebuild has passed
that many percent of the total. The events are generated with fixed increment since 0.
Increment size may be specified with a commandline option (default is 20). (syslog
priority: Warning)
RebuildFinished
An md array that was rebuilding, isn’t any more, either because it finished normally
or was aborted. (syslog priority: Warning)
Fail
An active component device of an array has been marked as faulty. (syslog priority:
Critical)
FailSpare
A spare component device which was being rebuilt to replace a faulty device has
failed. (syslog priority: Critical)
SpareActive
A spare component device which was being rebuilt to replace a faulty device has
been successfully rebuilt and has been made active. (syslog priority: Info)
NewArray
A new md array has been detected in the /proc/mdstat file. (syslog priority: Info)
DegradedArray
A newly noticed array appears to be degraded. This message is not generated when
mdadm notices a drive failure which causes degradation, but only when mdadm
notices that an array is degraded when it first sees the array. (syslog priority: Critical)
MoveSpare
A spare drive has been moved from one array in a spare-group or domain to another
to allow a failed drive to be replaced. (syslog priority: Info)
SparesMissing
If mdadm has been told, via the config file, that an array should have a certain
number of spare devices, and mdadm detects that it has fewer than this number when
it first sees the array, it will report a SparesMissing message. (syslog priority:
Warning)
TestMessage
An array was found at startup, and the —test flag was given. (syslog priority: Info)
Only Fail, FailSpare, DegradedArray, SparesMissing and TestMessage cause Email to
be sent. All events cause the program to be run. The program is run with two or three
arguments: the event name, the array device and possibly a second device.
Each event has an associated array device (e.g. /dev/md1) and possibly a second device.
For Fail, FailSpare, and SpareActive the second device is the relevant component
device. For MoveSpare the second device is the array that the spare was moved from.
For mdadm to move spares from one array to another, the different arrays need to be
labeled with the same spare-group or the spares must be allowed to migrate through
matching POLICY domains in the configuration file. The spare-group name can be any
string; it is only necessary that different spare groups use different names.
When mdadm detects that an array in a spare group has fewer active devices than
necessary for the complete array, and has no spare devices, it will look for another array in
the same spare group that has a full complement of working drive and a spare. It will then
attempt to remove the spare from the second drive and add it to the first. If the removal
succeeds but the adding fails, then it is added back to the original array.
If the spare group for a degraded array is not defined, mdadm will look at the rules of
spare migration specified by POLICY lines in mdadm.conf and then follow similar steps
as above if a matching spare is found.
› GROW MODE
The GROW mode is used for changing the size or shape of an active array. For this to
work, the kernel must support the necessary change. Various types of growth are
being added during 2.6 development.
Currently the supported changes include
change the “size” attribute for RAID1, RAID4, RAID5 and RAID6. increase or
decrease the “raid-devices” attribute of RAID0, RAID1, RAID4, RAID5, and
RAID6. change the chunk-size and layout of RAID0, RAID4, RAID5, RAID6 and
RAID10. convert between RAID1 and RAID5, between RAID5 and RAID6, between
RAID0, RAID4, and RAID5, and between RAID0 and RAID10 (in the near-2 mode).
add a write-intent bitmap to any array which supports these bitmaps, or remove a
write-intent bitmap from such an array.
Using GROW on containers is currently supported only for Intel’s IMSM container
format. The number of devices in a container can be increased - which affects all
arrays in the container - or an array in a container can be converted between levels
where those levels are supported by the container, and the conversion is on of those
listed above. Resizing arrays in an IMSM container with —grow —size is not yet
supported.
Grow functionality (e.g. expand a number of raid devices) for Intel’s IMSM container
format has an experimental status. It is guarded by the
MDADM_EXPERIMENTAL environment variable which must be set to ‘1’ for a
GROW command to succeed. This is for the following reasons:
1.
Intel’s native IMSM check-pointing is not fully tested yet. This can causes IMSM
incompatibility during the grow process: an array which is growing cannot roam
between Microsoft Windows(R) and Linux systems.
2.
Interrupting a grow operation is not recommended, because it has not been fully
tested for Intel’s IMSM container format yet.
Note: Intel’s native checkpointing doesn’t use —backup-file option and it is transparent
for assembly feature.
SIZE CHANGES
Normally when an array is built the “size” is taken from the smallest of the drives. If all
the small drives in an arrays are, one at a time, removed and replaced with larger drives,
then you could have an array of large drives with only a small amount used. In this
situation, changing the “size” with “GROW” mode will allow the extra space to start
being used. If the size is increased in this way, a “resync” process will start to make sure
the new parts of the array are synchronised.
Note that when an array changes size, any filesystem that may be stored in the array will
not automatically grow or shrink to use or vacate the space. The filesystem will need to be
explicitly told to use the extra space after growing, or to reduce its size prior to shrinking
the array.
Also the size of an array cannot be changed while it has an active bitmap. If an array has a
bitmap, it must be removed before the size can be changed. Once the change is complete a
new bitmap can be created.
RAID-DEVICES CHANGES
A RAID1 array can work with any number of devices from 1 upwards (though 1 is not
very useful). There may be times which you want to increase or decrease the number of
active devices. Note that this is different to hot-add or hot-remove which changes the
number of inactive devices.
When reducing the number of devices in a RAID1 array, the slots which are to be removed
from the array must already be vacant. That is, the devices which were in those slots must
be failed and removed.
When the number of devices is increased, any hot spares that are present will be activated
immediately.
Changing the number of active devices in a RAID5 or RAID6 is much more effort. Every
block in the array will need to be read and written back to a new location. From 2.6.17, the
Linux Kernel is able to increase the number of devices in a RAID5 safely, including
restarting an interrupted “reshape”. From 2.6.31, the Linux Kernel is able to increase or
decrease the number of devices in a RAID5 or RAID6.
From 2.6.35, the Linux Kernel is able to convert a RAID0 in to a RAID4 or RAID5.
mdadm uses this functionality and the ability to add devices to a RAID4 to allow devices
to be added to a RAID0. When requested to do this, mdadm will convert the RAID0 to a
RAID4, add the necessary disks and make the reshape happen, and then convert the
RAID4 back to RAID0.
When decreasing the number of devices, the size of the array will also decrease. If there
was data in the array, it could get destroyed and this is not reversible, so you should firstly
shrink the filesystem on the array to fit within the new size. To help prevent accidents,
mdadm requires that the size of the array be decreased first with mdadm —grow —
array-size. This is a reversible change which simply makes the end of the array
inaccessible. The integrity of any data can then be checked before the non-reversible
reduction in the number of devices is request.
When relocating the first few stripes on a RAID5 or RAID6, it is not possible to keep the
data on disk completely consistent and crash-proof. To provide the required safety, mdadm
disables writes to the array while this “critical section” is reshaped, and takes a backup of
the data that is in that section. For grows, this backup may be stored in any spare devices
that the array has, however it can also be stored in a separate file specified with the —
backup-file option, and is required to be specified for shrinks, RAID level changes and
layout changes. If this option is used, and the system does crash during the critical period,
the same file must be passed to —assemble to restore the backup and reassemble the
array. When shrinking rather than growing the array, the reshape is done from the end
towards the beginning, so the “critical section” is at the end of the reshape.
LEVEL CHANGES
Changing the RAID level of any array happens instantaneously. However in the RAID5 to
RAID6 case this requires a non-standard layout of the RAID6 data, and in the RAID6 to
RAID5 case that non-standard layout is required before the change can be accomplished.
So while the level change is instant, the accompanying layout change can take quite a long
time. A —backup-file is required. If the array is not simultaneously being grown or
shrunk, so that the array size will remain the same - for example, reshaping a 3-drive
RAID5 into a 4-drive RAID6 - the backup file will be used not just for a “cricital section”
but throughout the reshape operation, as described below under LAYOUT CHANGES.
Changing the chunk-size of layout without also changing the number of devices as the
same time will involve re-writing all blocks in-place. To ensure against data loss in the
case of a crash, a —backup-file must be provided for these changes. Small sections of the
array will be copied to the backup file while they are being rearranged. This means that all
the data is copied twice, once to the backup and once to the new layout on the array, so
this type of reshape will go very slowly.
If the reshape is interrupted for any reason, this backup file must be made available to
mdadm —assemble so the array can be reassembled. Consequently the file cannot be
stored on the device being reshaped.
BITMAP CHANGES
A write-intent bitmap can be added to, or removed from, an active array. Either internal
bitmaps, or bitmaps stored in a separate file, can be added. Note that if you add a bitmap
stored in a file which is in a filesystem that is on the RAID array being affected, the
system will deadlock. The bitmap must be on a separate filesystem.
› INCREMENTAL MODE
Usage: mdadm —incremental [—run] [—quiet] component-device [optional-
aliases-for-device]
Usage: mdadm —incremental —fail component-device
Usage: mdadm —incremental —rebuild-map
Usage: mdadm —incremental —run —scan
If you’re using the /proc filesystem, /proc/mdstat lists all active md devices with
information about them. mdadm uses this to find arrays when —scan is given in Misc
mode, and to monitor array reconstruction on Monitor mode.
/etc/mdadm.conf
The config file lists which devices may be scanned to see if they contain MD super
block, and gives identifying information (e.g. UUID) about known MD arrays. See
mdadm.conf(5) for more details.
/etc/mdadm.conf.d
/run/mdadm/map
When —incremental mode is used, this file gets a list of arrays currently being
created.
› DEVICE NAMES
mdadm understand two sorts of names for array devices.
The first is the so-called ‘standard’ format name, which matches the names used by
the kernel and which appear in /proc/mdstat.
The second sort can be freely chosen, but must reside in /dev/md/. When giving a
device name to mdadm to create or assemble an array, either full path name such as
/dev/md0 or /dev/md/home can be given, or just the suffix of the second sort of name,
such as home can be given.
When mdadm chooses device names during auto-assembly or incremental assembly,
it will sometimes add a small sequence number to the end of the name to avoid
conflicted between multiple arrays that have the same name. If mdadm can
reasonably determine that the array really is meant for this host, either by a hostname
in the metadata, or by the presence of the array in mdadm.conf, then it will leave off
the suffix if possible. Also if the homehost is specified as <ignore> mdadm will only
use a suffix if a different array of the same name already exists or is listed in the
config file.
The standard names for non-partitioned arrays (the only sort of md array available in
2.4 and earlier) are of the form
/dev/mdNN
where NN is a number. The standard names for partitionable arrays (as available from 2.6
onwards) are of the form:
/dev/md_dNN
Partition numbers should be indicated by adding “pMM” to these, thus “/dev/md/d1p2”.
From kernel version 2.6.28 the “non-partitioned array” can actually be partitioned. So the
“md_dNN” names are no longer needed, and partitions such as “/dev/mdNNpXX” are
possible.
From kernel version 2.6.29 standard names can be non-numeric following the form:
/dev/md_XXX
where XXX is any string. These names are supported by mdadm since version 3.3
provided they are enabled in mdadm.conf.
› NOTE
mdadm was previously known as mdctl.
› SEE ALSO
For further information on mdadm usage, MD and the various levels of RAID, see:
http://raid.wiki.kernel.org/
sync_action - resync-to-idle Notify the metadata handler that a resync may have
completed. If a resync process is idled before it completes this event allows the metadata
handler to checkpoint resync. sync_action - recover-to-idle A spare may have completed
rebuilding so tell the metadata handler about the state of each disk. This is the metadata
handler’s opportunity to clear any “out-of-sync” bits and clear the volume’s degraded
status. If a recovery process is idled before it completes this event allows the metadata
handler to checkpoint recovery. <disk>/state - faulty A disk failure kicks off a series of
events. First, notify the metadata handler that a disk has failed, and then notify the kernel
that it can unblock writes that were dependent on this disk. After unblocking the kernel
this disk is set to be removed+ from the member array. Finally the disk is marked failed in
all other member arrays in the container. + Note This behavior differs slightly from native
MD arrays where removal is reserved for a mdadm —remove event. In the external
metadata case the container holds the final reference on a block device and a mdadm —
remove <container> <victim> call is still required.
Containers:
External metadata formats, like DDF, differ from the native MD metadata formats in that
they define a set of disks and a series of sub-arrays within those disks. MD metadata in
comparison defines a 1:1 relationship between a set of block devices and a RAID array.
For example to create 2 arrays at different RAID levels on a single set of disks, MD
metadata requires the disks be partitioned and then each array can be created with a subset
of those partitions. The supported external formats perform this disk carving internally.
Container devices simply hold references to all member disks and allow tools like mdmon
to determine which active arrays belong to which container. Some array management
commands like disk removal and disk add are now only valid at the container level.
Attempts to perform these actions on member arrays are blocked with error messages like:
“mdadm: Cannot remove disks from a ‘member’ array, perform this operation on the
parent container”
Containers are identified in /proc/mdstat with a metadata version string “external:
<metadata name>”. Member devices are identified by “external:/<container
device>/<member index>”, or “external:-<container device>/<member index>” if the
array is to remain readonly.
› OPTIONS
CONTAINER
The container device to monitor. It can be a full path like /dev/md/container, or a
simple md device name like md127.
—foreground
Normally, mdmon will fork and continue in the background. Adding this option will
skip that step and run mdmon in the foreground.
—takeover
This instructs mdmon to replace any active mdmon which is currently monitoring the
array. This is primarily used late in the boot process to replace any mdmon which was
started from an initramfs before the root filesystem was mounted. This avoids
holding a reference on that initramfs indefinitely and ensures that the pid and sock
files used to communicate with mdmon are in a standard place.
—all
This tells mdmon to find any active containers and start monitoring each of them if
appropriate. This is normally used with —takeover late in the boot sequence. A
separate mdmon process is started for each container as the —all argument is over-
written with the name of the container. To allow for containers with names longer
than 5 characters, this argument can be arbitrarily extended, e.g. to —all-active-
arrays.
Note that mdmon is automatically started by mdadm when needed and so does not need to
be considered when working with RAID arrays. The only times it is run other than by
mdadm is when the boot scripts need to restart it after mounting the new root filesystem.
› START UP AND SHUTDOWN
As mdmon needs to be running whenever any filesystem on the monitored device is
mounted there are special considerations when the root filesystem is mounted from
an mdmon monitored device. Note that in general mdmon is needed even if the
filesystem is mounted read-only as some filesystems can still write to the device in
those circumstances, for example to replay a journal after an unclean shutdown.
When the array is assembled by the initramfs code, mdadm will automatically start
mdmon as required. This means that mdmon must be installed on the initramfs and
there must be a writable filesystem (typically tmpfs) in which mdmon can create a
.pid and .sock file. The particular filesystem to use is given to mdmon at compile
time and defaults to /run/mdadm.
This filesystem must persist through to shutdown time.
After the final root filesystem has be instantiated (usually with pivot_root) mdmon
should be run with —all —takeover so that the mdmon running from the initramfs
can be replaced with one running in the main root, and so the memory used by the
initramfs can be released.
At shutdown time, mdmon should not be killed along with other processes. Also as it
holds a file (socket actually) open in /dev (by default) it will not be possible to
unmount /dev if it is a separate filesystem.
› EXAMPLES
mdmon —all-active-arrays —takeover Any mdmon which is currently running is
killed and a new instance is started. This should be run during in the boot sequence if
an initramfs was used, so that any mdmon running from the initramfs will not hold
the initramfs active.
› SEE ALSO
mdadm(8), md(4).
MII-DIAG
› NAME
mii-diag - Network adapter control and monitoring
› SYNOPSIS
mii-diag [options]<interface>
› DESCRIPTION
This manual page documents briefly the mii-diag network adapter control and
monitoring command. Addition documentation is available from
http://scyld.com/diag/index.html.
This mii-diag command configures, controls and monitors the transceiver
management registers for network interfaces, and configures driver operational
parameters. For transceiver control mii-diag uses the Media Independent Interface
(MII) standard (thus the command name). It also has additional Linux-specific
controls to communicate parameters such as message enable settings and buffer sizes
to the underlying device driver.
The MII standard defines registers that control and report network transceiver
capabilities, link settings and errors. Examples are link speed, duplex, capabilities
advertised to the link partner, status LED indications and link error counters.
› OPTIONS
The mii-diag command supports both single character and long option names. Short
options use a single dash (´-´) in front of the option character. For options without
parameters, multiple options may be concatenated after a single dash. Long options
are prefixed by two dashes (´—´), and may be abbreviated with a unique prefix. A
long option may take a parameter of the form —arg=param or —arg param.
A summary of options is as follows.
-A, —advertise <speed|setting>
-F, —fixed-speed <speed|setting>
Speed is one of: 100baseT4, 100baseTx, 100baseTx-FD, 100baseTx-HD, 10baseT,
10baseT-FD, 10baseT-HD. For more precise control an explicit numeric register
setting is also allowed.
-a, —all-interfaces
Show the status of all interfaces. This option is not recommended with any other
option, especially ones that change settings.
-s,—status
Return exit status 2 if there is no link beat.
-D
Increase the debugging level. This may be used to understand the actions the
command is taking.
-g, —read-parameters
Show driver-specific parameters.
-G, —set-parameters value[,value…]
Set driver-specific parameters. Set a adapter-specific parameters. Parameters are
comma separated, with missing elements retaining the existing value.
-v
Increase the verbosity level. Additional “-v” options increase the level further.
-V
Show the program version information.
-w, —watch
Continuously monitor the transceiver and report changes.
-?
Emit usage information.
› DESCRIPTION
Calling the command with just the interface name produces extensive output
describing the transceiver capabilities, configuration and current status.
The ‘—monitor’ option allows scripting link beat changes.
This option is similar to —watch, but with lower overhead and simplified output. It
polls the interface only once a second and the output format is a single line per link
change with three fixed words
<unknown|down||negotiating|up> <STATUS> <PARTNER-CAP>
Example output: mii-diag —monitor eth0 down 0x7809 0x0000
negotiating 0x7829 0x45e1 up 0x782d 0x45e1
down 0x7809 0x0000
This may be used as mii-diag —monitor eth0 |
while read linkstatus bmsr linkpar; do case $linkstatus in up) ifup eth0 ;;
down) ifdown eth0 ;; esac done
It may be useful to shorten the DHCP client daemon timeout if it does not receive an
address by adding the following setting to /etc/sysconfig/network:
DHCPCDARGS=”-t 3”
› SEE ALSO
ether-wake(8),net-diag(8),mii-tool(8). Addition documentation is available from
http://scyld.com/diag/index.html.
› KNOWN BUGS
The —all-interfaces option is quirky. There are very few settings that are usefully
applied to all interfaces.
› AUTHOR
The manual pages, diagnostic commands, and many of the underlying Linux network
drivers were written by Donald Becker for the Scyld Beowulf() cluster system.
MII-TOOL
› NAME
mii-tool - view, manipulate media-independent interface status
› SYNOPSIS
mii-tool [-v, —verbose] [-V, —version] [-R, —reset] [-r, —restart] [-w, —watch]
[-l, —log] [-A, —advertise=media,…] [-F, —force=media] [-p, —phy=addr]
interface …
› NOTE
This program is obsolete. For replacement check ethtool.
› DESCRIPTION
This utility checks or sets the status of a network interface’s Media Independent
Interface (MII) unit. Most fast ethernet adapters use an MII to autonegotiate link
speed and duplex setting.
Most intelligent network devices use an autonegotiation protocol to communicate
what media technologies they support, and then select the fastest mutually supported
media technology. The -A or —advertise options can be used to tell the MII to only
advertise a subset of its capabilities. Some passive devices, such as single-speed
hubs, are unable to autonegotiate. To handle such devices, the MII protocol also
allows for establishing a link by simply detecting either a 10baseT or 100baseT link
beat. The -F or —force options can be used to force the MII to operate in one mode,
instead of autonegotiating. The -A and -F options are mutually exclusive.
The default short output reports the negotiated link speed and link status for each
interface.
› OPTIONS
-v, —verbose
Display more detailed MII status information. If used twice, also display raw MII
register contents. Alert: If used three times, will force reading all MII registers,
including non standard ones. It’s not guaranteed any valid answer from PHY while
PHY communication can even hang. With driver e1000e will fail while reading
register 0x07.
-V, —version
Display program version information.
-R, —reset
Reset the MII to its default configuration.
-r, —restart
Restart autonegotiation.
-w, —watch
Watch interface(s) and report changes in link status. The MII interfaces are polled at
one second intervals.
-l, —log
Used with -w, records link status changes in the system log instead of printing on
standard output.
-F media, —force=media
Disable autonegotiation, and force the MII to either 100baseTx-FD, 100baseTx-HD,
10baseT-FD, or 10baseT-HD operation.
-A media,…, —advertise=media,…
Enable and restart autonegotiation, and advertise only the specified media
technologies. Multiple technologies should be separated by commas. Valid media are
100baseT4, 100baseTx-FD, 100baseTx-HD, 10baseT-FD, and 10baseT-HD.
-p addr, —phy=addr
Override the MII address provided by kernel with value addr.
› DIAGNOSTICS
SIOCGMIIPHY on ‘eth?’ failed: Invalid argument
If the interface is not running (up), kernel will refuse to report its link state.
SIOCGMIIPHY on ‘eth?’ failed: Operation not permitted
Most kernels restrict access to root.
SIOCGMIIPHY on ‘eth?’ failed: No such device
This error is shown, if the kernel does not know about the named device.
SIOCGMIIPHY on ‘eth?’ failed: Operation not supported
The interface in question does not support MII queries. Most likely, it does not have
MII transceivers, at all.
› SEE ALSO
ethtool(8)
› AUTHORS
David Hinds - [email protected] Donald Becker - [email protected]
Bernd Eckenfels - [email protected]
› SEE ALSO
http://net-tools.sourceforge.net - Homepage of the net-tools project
MISSION-CONTROL-5
› NAME
mission-control-5 - Telepathy account manager/chanel dispatcher
› SYNOPSIS
/usr/libexec/mission-control-5
› DESCRIPTION
Mission Control 5 implements the AccountManager and ChannelDispatcher services
described in the Telepathy D-Bus specification, allowing clients like empathy(1) to
store account details, connect to accounts, request communication channels, and have
channels dispatched to them.
It is a D-Bus service which runs on the session bus, and should usually be started
automatically by D-Bus activation. However, it might be useful to start it manually
for debugging.
› OPTIONS
There are no command-line options.
› ENVIRONMENT
MC_DEBUG=all or MC_DEBUG=category[,category…]
May be set to “all” for full debug output from Mission Control and telepathy-glib, or
various undocumented category names (which may change from release to release) to
filter the output. See Mission Control and telepathy-glib source code for the available
categories.
MC_DEBUG=level
Set a numeric debug level for Mission Control itself (but not telepathy-glib). Level 0
logs nothing, level 1 logs most messages, and level 2 logs all messages.
MC_TP_DEBUG=type
May be set to “all” for full debug output from telepathy-glib, or various
undocumented options (which may change from telepathy-glib release to release) to
filter the output. See telepathy-glib source code for the available options.
› SEE ALSO
http://telepathy.freedesktop.org/
MKFS.FAT
› NAME
mkfs.fat - create an MS-DOS filesystem under Linux
› SYNOPSIS
mkfs.fat [ -a ] [ -A ] [ -b sector-of-backup ] [ -c ] [ -l filename ] [ -C ] [ -f number-of-
FATs ] [ -F FAT-size ] [ -h number-of-hidden-sectors ] [ -i volume-id ] [ -I ] [ -m
message-file ] [ -n volume-name ] [ -r root-dir-entries ] [ -R number-of-reserved-
sectors ] [ -s sectors-per-cluster ] [ -S logical-sector-size ] [ -v ] device [ block-count
]
› DESCRIPTION
mkfs.fat is used to create an MS-DOS filesystem under Linux on a device (usually a
disk partition). device is the special file corresponding to the device (e.g /dev/hdXX).
block-count is the number of blocks on the device. If omitted, mkfs.fat automatically
determines the filesystem size.
› OPTIONS
-a
Normally, for any filesystem except very small ones, mkfs.fat will align all the data
structures to cluster size, to make sure that as long as the partition is properly aligned,
so will all the data structures in the filesystem. This option disables alignment; this
may provide a handful of additional clusters of storage at the expense of a significant
performance degradation on RAIDs, flash media or large-sector hard disks.
-A
Use Atari variation of the MS-DOS filesystem. This is default if mkfs.fat is run on
an Atari, then this option turns off Atari format. There are some differences when
using Atari format: If not directed otherwise by the user, mkfs.fat will always use 2
sectors per cluster, since GEMDOS doesn’t like other values very much. It will also
obey the maximum number of sectors GEMDOS can handle. Larger filesystems are
managed by raising the logical sector size. Under Atari format, an Atari-compatible
serial number for the filesystem is generated, and a 12 bit FAT is used only for
filesystems that have one of the usual floppy sizes (720k, 1.2M, 1.44M, 2.88M), a 16
bit FAT otherwise. This can be overridden with the -F option. Some PC-specific boot
sector fields aren’t written, and a boot message (option -m) is ignored.
-b sector-of-backup
Selects the location of the backup boot sector for FAT32. Default depends on number
of reserved sectors, but usually is sector 6. The backup must be within the range of
reserved sectors.
-c
Check the device for bad blocks before creating the filesystem.
-C
Create the file given as device on the command line, and write the to-be-created
filesystem to it. This can be used to create the new filesystem in a file instead of on a
real device, and to avoid using dd in advance to create a file of appropriate size. With
this option, the block-count must be given, because otherwise the intended size of the
filesystem wouldn’t be known. The file created is a sparse file, which actually only
contains the meta-data areas (boot sector, FATs, and root directory). The data portions
won’t be stored on the disk, but the file nevertheless will have the correct size. The
resulting file can be copied later to a floppy disk or other device, or mounted through
a loop device.
-f number-of-FATs
Specify the number of file allocation tables in the filesystem. The default is 2.
Currently the Linux MS-DOS filesystem does not support more than 2 FATs.
-F FAT-size
Specifies the type of file allocation tables used (12, 16 or 32 bit). If nothing is
specified, mkfs.fat will automatically select between 12, 16 and 32 bit, whatever fits
better for the filesystem size.
-h number-of-hidden-sectors
Select the number of hidden sectors in the volume. Apparently some digital cameras
get indigestion if you feed them a CF card without such hidden sectors, this option
allows you to satisfy them. Assumes ‘0’ if no value is given on the command line.
-i volume-id
Sets the volume ID of the newly created filesystem; volume-id is a 32-bit
hexadecimal number (for example, 2e24ec82). The default is a number which
depends on the filesystem creation time.
-I
It is typical for fixed disk devices to be partitioned so, by default, you are not
permitted to create a filesystem across the entire device. mkfs.fat will complain and
tell you that it refuses to work. This is different when using MO disks. One doesn’t
always need partitions on MO disks. The filesystem can go directly to the whole disk.
Under other OSes this is known as the ‘superfloppy’ format.
This switch will force mkfs.fat to work properly.
-l filename
Read the bad blocks list from filename.
-m message-file
Sets the message the user receives on attempts to boot this filesystem without having
properly installed an operating system. The message file must not exceed 418 bytes
once line feeds have been converted to carriage return-line feed combinations, and
tabs have been expanded. If the filename is a hyphen (-), the text is taken from
standard input.
-n volume-name
Sets the volume name (label) of the filesystem. The volume name can be up to 11
characters long. The default is no label.
-r root-dir-entries
Select the number of entries available in the root directory. The default is 112 or 224
for floppies and 512 for hard disks.
-R number-of-reserved-sectors
Select the number of reserved sectors. With FAT32 format at least 2 reserved sectors
are needed, the default is 32. Otherwise the default is 1 (only the boot sector).
-s sectors-per-cluster
Specify the number of disk sectors per cluster. Must be a power of 2, i.e. 1, 2, 4, 8, …
128.
-S logical-sector-size
Specify the number of bytes per logical sector. Must be a power of 2 and greater than
or equal to 512, i.e. 512, 1024, 2048, 4096, 8192, 16384, or 32768.
-v
Verbose execution.
› BUGS
mkfs.fat can not create boot-able filesystems. This isn’t as easy as you might think at
first glance for various reasons and has been discussed a lot already. mkfs.fat simply
will not support it ;)
› AUTHOR
Dave Hudson - <[email protected]>; modified by Peter Anvin
<[email protected]>. Fixes and additions by Roman Hodek <[email protected]>
for Debian GNU/Linux.
› ACKNOWLEDGMENTS
mkfs.fat is based on code from mke2fs (written by Remy Card -
<[email protected]>) which is itself based on mkfs (written by Linus Torvalds -
<[email protected]>).
› SEE ALSO
fsck.fat(8), fatlabel(8), mkfs(8)
MKDUMRD
› NAME
mkdumprd - creates initial ramdisk images for kdump crash recovery
› SYNOPSIS
mkdumprd [OPTION]
› DESCRIPTION
mkdumprd creates an initial ram file system for use in conjunction with the booting
of a kernel within the kdump framework for crash recovery. mkdumprds purpose is
to create an initial ram filesystem capable of copying the crashed systems vmcore
image to a location specified in /etc/kdump.conf
mkdumprd interrogates the running system to understand what modules need to be
loaded in the initramfs (based on configuration retrieved from /etc/kdump.conf)
mkdumprd add a new dracut module 99kdumpbase and use dracut utility to
generate the initramfs.
mkdumprd was not intended for casual use outside of the service initialization script
for the kdump utility, and should not be run manually. If you require a custom kdump
initramfs image, it is suggested that you use the kdump service infrastructure to
create one, and then manually unpack, modify and repack the image.
› OPTIONS
All options here are passed to dracut directly, please refer dracut docs
for the info.
› SEE ALSO
dracut(8)
MKE2FS
› NAME
mke2fs - create an ext2/ext3/ext4 filesystem
› SYNOPSIS
mke2fs [ -c | -l filename ] [ -b block-size ] [ -D ] [ -f fragment-size ] [ -g blocks-per-
group ] [ -G number-of-groups ] [ -i bytes-per-inode ] [ -I inode-size ] [ -j ] [ -J
journal-options ] [ -N number-of-inodes ] [ -n ] [ -m reserved-blocks-percentage ] [ -
o creator-os ] [ -O feature[,…] ] [ -q ] [ -r fs-revision-level ] [ -E extended-options ] [
-v ] [ -F ] [ -L volume-label ] [ -M last-mounted-directory ] [ -S ] [ -t fs-type ] [ -T
usage-type ] [ -U UUID ] [ -V ] device [ blocks-count ]
mke2fs -O journal_dev [ -b block-size ] [ -L volume-label ] [ -n ] [ -q ] [ -v ]
external-journal [ blocks-count ]
› DESCRIPTION
mke2fs is used to create an ext2, ext3, or ext4 filesystem, usually in a disk partition.
device is the special file corresponding to the device (e.g /dev/hdXX). blocks-count is
the number of blocks on the device. If omitted, mke2fs automagically figures the file
system size. If called as mkfs.ext3 a journal is created as if the -j option was
specified.
The defaults of the parameters for the newly created filesystem, if not overridden by
the options listed below, are controlled by the /etc/mke2fs.conf configuration file.
See the mke2fs.conf(5) manual page for more details.
› OPTIONS
-b block-size
Specify the size of blocks in bytes. Valid block-size values are 1024, 2048 and 4096
bytes per block. If omitted, block-size is heuristically determined by the filesystem
size and the expected usage of the filesystem (see the -T option). If block-size is
preceded by a negative sign (‘-‘), then mke2fs will use heuristics to determine the
appropriate block size, with the constraint that the block size will be at least block-
size bytes. This is useful for certain hardware devices which require that the
blocksize be a multiple of 2k.
-c
Check the device for bad blocks before creating the file system. If this option is
specified twice, then a slower read-write test is used instead of a fast read-only test.
-C cluster-size
Specify the size of cluster in bytes for filesystems using the bigalloc feature. Valid
cluster-size values are from 2048 to 256M bytes per cluster. This can only be
specified if the bigalloc feature is enabled. (See the ext4 (5) man page for more
details about bigalloc.) The default cluster size if bigalloc is enabled is 16 times the
block size.
-D
Use direct I/O when writing to the disk. This avoids mke2fs dirtying a lot of buffer
cache memory, which may impact other applications running on a busy server. This
option will cause mke2fs to run much more slowly, however, so there is a tradeoff to
using direct I/O.
-E extended-options
Set extended options for the filesystem. Extended options are comma separated, and
may take an argument using the equals (‘=’) sign. The -E option used to be -R in
earlier versions of mke2fs. The -R option is still accepted for backwards
compatibility, but is deprecated. The following extended options are supported:
mmp_update_interval=interval
Adjust the initial MMP update interval to interval seconds. Specifying an interval of
0 means to use the default interval. The specified interval must be less than 300
seconds. Requires that the mmp feature be enabled.
stride=stride-size
Configure the filesystem for a RAID array with stride-size filesystem blocks. This is
the number of blocks read or written to disk before moving to the next disk, which is
sometimes referred to as the chunk size. This mostly affects placement of filesystem
metadata like bitmaps at mke2fs time to avoid placing them on a single disk, which
can hurt performance. It may also be used by the block allocator.
stripe_width=stripe-width
Configure the filesystem for a RAID array with stripe-width filesystem blocks per
stripe. This is typically stride-size * N, where N is the number of data-bearing disks
in the RAID (e.g. for RAID 5 there is one parity disk, so N will be the number of
disks in the array minus 1). This allows the block allocator to prevent read-modify-
write of the parity in a RAID stripe if possible when the data is written.
resize=max-online-resize
Reserve enough space so that the block group descriptor table can grow to support a
filesystem that has max-online-resize blocks.
lazy_itable_init[= <0 to disable, 1 to enable>]
If enabled and the uninit_bg feature is enabled, the inode table will not be fully
initialized by mke2fs. This speeds up filesystem initialization noticeably, but it
requires the kernel to finish initializing the filesystem in the background when the
filesystem is first mounted. If the option value is omitted, it defaults to 1 to enable
lazy inode table zeroing.
lazy_journal_init[= <0 to disable, 1 to enable>]
If enabled, the journal inode will not be fully zeroed out by mke2fs. This speeds up
filesystem initialization noticeably, but carries some small risk if the system crashes
before the journal has been overwritten entirely one time. If the option value is
omitted, it defaults to 1 to enable lazy journal inode zeroing.
root_owner[=uid:gid]
Specify the numeric user and group ID of the root directory. If no UID:GID is
specified, use the user and group ID of the user running mke2fs. In mke2fs 1.42 and
earlier the UID and GID of the root directory were set by default to the UID and GID
of the user running the mke2fs command. The root_owner= option allows explicitly
specifying these values, and avoid side-effects for users that do not expect the
contents of the filesystem to change based on the user running mke2fs.
test_fs
Set a flag in the filesystem superblock indicating that it may be mounted using
experimental kernel code, such as the ext4dev filesystem.
discard
Attempt to discard blocks at mkfs time (discarding blocks initially is useful on solid
state devices and sparse / thin-provisioned storage). When the device advertises that
discard also zeroes data (any subsequent read after the discard and before write
returns zero), then mark all not-yet-zeroed inode tables as zeroed. This significantly
speeds up filesystem initialization. This is set as default.
nodiscard
Do not attempt to discard blocks at mkfs time.
-b|—byte-count <size>
Specify the size of the resultant filesystem. If this option is not used, mkfs.btrfs uses
all the available storage for the filesystem.
-d|—data <type>
Specify how the data must be spanned across the devices specified. Valid values are
raid0, raid1, raid5, raid6, raid10 or single.
-f|—force
Force overwrite when an existing filesystem is detected on the device. By default,
mkfs.btrfs will not write to the device if it suspects that there is a filesystem or
partition table on the device already.
-l|—leafsize <size>
Alias for —nodesize. Deprecated.
-n|—nodesize <size>
Specify the nodesize, the tree block size in which btrfs stores data. The default value
is 16KB (16384) or the page size, whichever is bigger. Must be a multiple of the
sectorsize, but not larger than 65536. Leafsize always equals nodesize and the options
are aliases.
-L|—label <name>
Specify a label for the filesystem.
Note -r option is done completely in userland, and don’t need root privilege to
mount the filesystem.
-K|—nodiscard
Do not perform whole device TRIM operation by default.
-O|—features <feature1>[,<feature2>…]
A list of filesystem features turned on at mkfs time. Not all features are supported by
old kernels.
To see all features run:
mkfs.btrfs -O list-all
-U|—uuid <UUID>
Create the filesystem with the specified UUID, which must not already exist on the
system.
-V|—version
Print the mkfs.btrfs version and exit.
-h
Print help.
› UNIT
As default the unit is the byte, however it is possible to append a suffix to the
arguments like k for KBytes, m for MBytes…
› AVAILABILITY
btrfs is part of btrfs-progs. Please refer to the btrfs wiki
http://btrfs.wiki.kernel.orgm[] for further details.
› SEE ALSO
btrfs(8)
mkfs.cramfs
› NAME
mkfs.cramfs - make compressed ROM file system
› SYNOPSIS
mkfs.cramfs [options] directory file
› DESCRIPTION
Files on cramfs file systems are zlib-compressed one page at a time to allow random
read access. The metadata is not compressed, but is expressed in a terse
representation that is more space-efficient than conventional file systems.
The file system is intentionally read-only to simplify its design; random write access
for compressed files is difficult to implement. cramfs ships with a utility (mkcramfs)
to pack files into new cramfs images.
File sizes are limited to less than 16MB.
Maximum file system size is a little under 272MB. (The last file on the file system
must begin before the 256MB block, but can extend past it.)
› ARGUMENTS
The directory is simply the root of the directory tree that we want to generate a
compressed filesystem out of.
The file will contain the cram file system, which later can be mounted.
› OPTIONS
-v
Enable verbose messaging.
-E
Treat all warnings as errors, which are reflected as command return value.
-b blocksize
Use defined block size, which has to be divisible by page size.
-e edition
Use defined file system edition number in superblock.
-N big, little, host
Use defined endianness. Value defaults to host.
-i file
Insert a file to cramfs file system.
-n name
Set name of the cramfs file system.
-p
Pad by 512 bytes for boot code.
-s
This option is ignored. Originally the -s turned on directory entry sorting.
-z
Make explicit holes. Use of this option will require 2.3.39 kernel, or newer.
-V
Display version information and exit.
-h
Display help and exit.
› EXIT STATUS
0
success
8
operation error, such as unable to allocate memory
› SEE ALSO
mount(8), fsck.cramfs(8)
› AVAILABILITY
The example command is part of the util-linux package and is available from Linux
Kernel Archive
MKE2FS
› NAME
mke2fs - create an ext2/ext3/ext4 filesystem
› SYNOPSIS
mke2fs [ -c | -l filename ] [ -b block-size ] [ -D ] [ -f fragment-size ] [ -g blocks-per-
group ] [ -G number-of-groups ] [ -i bytes-per-inode ] [ -I inode-size ] [ -j ] [ -J
journal-options ] [ -N number-of-inodes ] [ -n ] [ -m reserved-blocks-percentage ] [ -
o creator-os ] [ -O feature[,…] ] [ -q ] [ -r fs-revision-level ] [ -E extended-options ] [
-v ] [ -F ] [ -L volume-label ] [ -M last-mounted-directory ] [ -S ] [ -t fs-type ] [ -T
usage-type ] [ -U UUID ] [ -V ] device [ blocks-count ]
mke2fs -O journal_dev [ -b block-size ] [ -L volume-label ] [ -n ] [ -q ] [ -v ]
external-journal [ blocks-count ]
› DESCRIPTION
mke2fs is used to create an ext2, ext3, or ext4 filesystem, usually in a disk partition.
device is the special file corresponding to the device (e.g /dev/hdXX). blocks-count is
the number of blocks on the device. If omitted, mke2fs automagically figures the file
system size. If called as mkfs.ext3 a journal is created as if the -j option was
specified.
The defaults of the parameters for the newly created filesystem, if not overridden by
the options listed below, are controlled by the /etc/mke2fs.conf configuration file.
See the mke2fs.conf(5) manual page for more details.
› OPTIONS
-b block-size
Specify the size of blocks in bytes. Valid block-size values are 1024, 2048 and 4096
bytes per block. If omitted, block-size is heuristically determined by the filesystem
size and the expected usage of the filesystem (see the -T option). If block-size is
preceded by a negative sign (‘-‘), then mke2fs will use heuristics to determine the
appropriate block size, with the constraint that the block size will be at least block-
size bytes. This is useful for certain hardware devices which require that the
blocksize be a multiple of 2k.
-c
Check the device for bad blocks before creating the file system. If this option is
specified twice, then a slower read-write test is used instead of a fast read-only test.
-C cluster-size
Specify the size of cluster in bytes for filesystems using the bigalloc feature. Valid
cluster-size values are from 2048 to 256M bytes per cluster. This can only be
specified if the bigalloc feature is enabled. (See the ext4 (5) man page for more
details about bigalloc.) The default cluster size if bigalloc is enabled is 16 times the
block size.
-D
Use direct I/O when writing to the disk. This avoids mke2fs dirtying a lot of buffer
cache memory, which may impact other applications running on a busy server. This
option will cause mke2fs to run much more slowly, however, so there is a tradeoff to
using direct I/O.
-E extended-options
Set extended options for the filesystem. Extended options are comma separated, and
may take an argument using the equals (‘=’) sign. The -E option used to be -R in
earlier versions of mke2fs. The -R option is still accepted for backwards
compatibility, but is deprecated. The following extended options are supported:
mmp_update_interval=interval
Adjust the initial MMP update interval to interval seconds. Specifying an interval of
0 means to use the default interval. The specified interval must be less than 300
seconds. Requires that the mmp feature be enabled.
stride=stride-size
Configure the filesystem for a RAID array with stride-size filesystem blocks. This is
the number of blocks read or written to disk before moving to the next disk, which is
sometimes referred to as the chunk size. This mostly affects placement of filesystem
metadata like bitmaps at mke2fs time to avoid placing them on a single disk, which
can hurt performance. It may also be used by the block allocator.
stripe_width=stripe-width
Configure the filesystem for a RAID array with stripe-width filesystem blocks per
stripe. This is typically stride-size * N, where N is the number of data-bearing disks
in the RAID (e.g. for RAID 5 there is one parity disk, so N will be the number of
disks in the array minus 1). This allows the block allocator to prevent read-modify-
write of the parity in a RAID stripe if possible when the data is written.
resize=max-online-resize
Reserve enough space so that the block group descriptor table can grow to support a
filesystem that has max-online-resize blocks.
lazy_itable_init[= <0 to disable, 1 to enable>]
If enabled and the uninit_bg feature is enabled, the inode table will not be fully
initialized by mke2fs. This speeds up filesystem initialization noticeably, but it
requires the kernel to finish initializing the filesystem in the background when the
filesystem is first mounted. If the option value is omitted, it defaults to 1 to enable
lazy inode table zeroing.
lazy_journal_init[= <0 to disable, 1 to enable>]
If enabled, the journal inode will not be fully zeroed out by mke2fs. This speeds up
filesystem initialization noticeably, but carries some small risk if the system crashes
before the journal has been overwritten entirely one time. If the option value is
omitted, it defaults to 1 to enable lazy journal inode zeroing.
root_owner[=uid:gid]
Specify the numeric user and group ID of the root directory. If no UID:GID is
specified, use the user and group ID of the user running mke2fs. In mke2fs 1.42 and
earlier the UID and GID of the root directory were set by default to the UID and GID
of the user running the mke2fs command. The root_owner= option allows explicitly
specifying these values, and avoid side-effects for users that do not expect the
contents of the filesystem to change based on the user running mke2fs.
test_fs
Set a flag in the filesystem superblock indicating that it may be mounted using
experimental kernel code, such as the ext4dev filesystem.
discard
Attempt to discard blocks at mkfs time (discarding blocks initially is useful on solid
state devices and sparse / thin-provisioned storage). When the device advertises that
discard also zeroes data (any subsequent read after the discard and before write
returns zero), then mark all not-yet-zeroed inode tables as zeroed. This significantly
speeds up filesystem initialization. This is set as default.
nodiscard
Do not attempt to discard blocks at mkfs time.
The size-in-blocks parameter is the desired size of the file system, in blocks. It is present
only for backwards compatibility. If omitted the size will be determined automatically.
Only block counts strictly greater than 10 and strictly less than 65536 are allowed.
› OPTIONS
-c
Check the device for bad blocks before creating the filesystem. If any are found, the
count is printed.
-n namelength
Specify the maximum length of filenames. Currently, the only allowable values are
14 and 30. The default is 30. Note that kernels older than 0.99p7 only accept
namelength 14.
-i inodecount
Specify the number of inodes for the filesystem.
-l filename
Read the list of bad blocks from filename. The file has one bad-block number per
line. The count of bad blocks read is printed.
-1
Make a Minix version 1 filesystem.
-2, -v
Make a Minix version 2 filesystem.
-3
Make a Minix version 3 filesystem.
› EXIT CODES
The exit code returned by mkfs.minix is one of the following:
0
No errors
8
Operational error
16
Usage or syntax error
› SEE ALSO
mkfs(8), fsck(8), reboot(8)
› AVAILABILITY
The mkfs.minix command is part of the util-linux package and is available from
ftp://ftp.kernel.org/pub/linux/utils/util-linux/.
MKFS.FAT
› NAME
mkfs.fat - create an MS-DOS filesystem under Linux
› SYNOPSIS
mkfs.fat [ -a ] [ -A ] [ -b sector-of-backup ] [ -c ] [ -l filename ] [ -C ] [ -f number-of-
FATs ] [ -F FAT-size ] [ -h number-of-hidden-sectors ] [ -i volume-id ] [ -I ] [ -m
message-file ] [ -n volume-name ] [ -r root-dir-entries ] [ -R number-of-reserved-
sectors ] [ -s sectors-per-cluster ] [ -S logical-sector-size ] [ -v ] device [ block-count
]
› DESCRIPTION
mkfs.fat is used to create an MS-DOS filesystem under Linux on a device (usually a
disk partition). device is the special file corresponding to the device (e.g /dev/hdXX).
block-count is the number of blocks on the device. If omitted, mkfs.fat automatically
determines the filesystem size.
› OPTIONS
-a
Normally, for any filesystem except very small ones, mkfs.fat will align all the data
structures to cluster size, to make sure that as long as the partition is properly aligned,
so will all the data structures in the filesystem. This option disables alignment; this
may provide a handful of additional clusters of storage at the expense of a significant
performance degradation on RAIDs, flash media or large-sector hard disks.
-A
Use Atari variation of the MS-DOS filesystem. This is default if mkfs.fat is run on
an Atari, then this option turns off Atari format. There are some differences when
using Atari format: If not directed otherwise by the user, mkfs.fat will always use 2
sectors per cluster, since GEMDOS doesn’t like other values very much. It will also
obey the maximum number of sectors GEMDOS can handle. Larger filesystems are
managed by raising the logical sector size. Under Atari format, an Atari-compatible
serial number for the filesystem is generated, and a 12 bit FAT is used only for
filesystems that have one of the usual floppy sizes (720k, 1.2M, 1.44M, 2.88M), a 16
bit FAT otherwise. This can be overridden with the -F option. Some PC-specific boot
sector fields aren’t written, and a boot message (option -m) is ignored.
-b sector-of-backup
Selects the location of the backup boot sector for FAT32. Default depends on number
of reserved sectors, but usually is sector 6. The backup must be within the range of
reserved sectors.
-c
Check the device for bad blocks before creating the filesystem.
-C
Create the file given as device on the command line, and write the to-be-created
filesystem to it. This can be used to create the new filesystem in a file instead of on a
real device, and to avoid using dd in advance to create a file of appropriate size. With
this option, the block-count must be given, because otherwise the intended size of the
filesystem wouldn’t be known. The file created is a sparse file, which actually only
contains the meta-data areas (boot sector, FATs, and root directory). The data portions
won’t be stored on the disk, but the file nevertheless will have the correct size. The
resulting file can be copied later to a floppy disk or other device, or mounted through
a loop device.
-f number-of-FATs
Specify the number of file allocation tables in the filesystem. The default is 2.
Currently the Linux MS-DOS filesystem does not support more than 2 FATs.
-F FAT-size
Specifies the type of file allocation tables used (12, 16 or 32 bit). If nothing is
specified, mkfs.fat will automatically select between 12, 16 and 32 bit, whatever fits
better for the filesystem size.
-h number-of-hidden-sectors
Select the number of hidden sectors in the volume. Apparently some digital cameras
get indigestion if you feed them a CF card without such hidden sectors, this option
allows you to satisfy them. Assumes ‘0’ if no value is given on the command line.
-i volume-id
Sets the volume ID of the newly created filesystem; volume-id is a 32-bit
hexadecimal number (for example, 2e24ec82). The default is a number which
depends on the filesystem creation time.
-I
It is typical for fixed disk devices to be partitioned so, by default, you are not
permitted to create a filesystem across the entire device. mkfs.fat will complain and
tell you that it refuses to work. This is different when using MO disks. One doesn’t
always need partitions on MO disks. The filesystem can go directly to the whole disk.
Under other OSes this is known as the ‘superfloppy’ format.
This switch will force mkfs.fat to work properly.
-l filename
Read the bad blocks list from filename.
-m message-file
Sets the message the user receives on attempts to boot this filesystem without having
properly installed an operating system. The message file must not exceed 418 bytes
once line feeds have been converted to carriage return-line feed combinations, and
tabs have been expanded. If the filename is a hyphen (-), the text is taken from
standard input.
-n volume-name
Sets the volume name (label) of the filesystem. The volume name can be up to 11
characters long. The default is no label.
-r root-dir-entries
Select the number of entries available in the root directory. The default is 112 or 224
for floppies and 512 for hard disks.
-R number-of-reserved-sectors
Select the number of reserved sectors. With FAT32 format at least 2 reserved sectors
are needed, the default is 32. Otherwise the default is 1 (only the boot sector).
-s sectors-per-cluster
Specify the number of disk sectors per cluster. Must be a power of 2, i.e. 1, 2, 4, 8, …
128.
-S logical-sector-size
Specify the number of bytes per logical sector. Must be a power of 2 and greater than
or equal to 512, i.e. 512, 1024, 2048, 4096, 8192, 16384, or 32768.
-v
Verbose execution.
› BUGS
mkfs.fat can not create boot-able filesystems. This isn’t as easy as you might think at
first glance for various reasons and has been discussed a lot already. mkfs.fat simply
will not support it ;)
› AUTHOR
Dave Hudson - <[email protected]>; modified by Peter Anvin
<[email protected]>. Fixes and additions by Roman Hodek <[email protected]>
for Debian GNU/Linux.
› ACKNOWLEDGMENTS
mkfs.fat is based on code from mke2fs (written by Remy Card -
<[email protected]>) which is itself based on mkfs (written by Linus Torvalds -
<[email protected]>).
› SEE ALSO
fsck.fat(8), fatlabel(8), mkfs(8)
MKFS.FAT
› NAME
mkfs.fat - create an MS-DOS filesystem under Linux
› SYNOPSIS
mkfs.fat [ -a ] [ -A ] [ -b sector-of-backup ] [ -c ] [ -l filename ] [ -C ] [ -f number-of-
FATs ] [ -F FAT-size ] [ -h number-of-hidden-sectors ] [ -i volume-id ] [ -I ] [ -m
message-file ] [ -n volume-name ] [ -r root-dir-entries ] [ -R number-of-reserved-
sectors ] [ -s sectors-per-cluster ] [ -S logical-sector-size ] [ -v ] device [ block-count
]
› DESCRIPTION
mkfs.fat is used to create an MS-DOS filesystem under Linux on a device (usually a
disk partition). device is the special file corresponding to the device (e.g /dev/hdXX).
block-count is the number of blocks on the device. If omitted, mkfs.fat automatically
determines the filesystem size.
› OPTIONS
-a
Normally, for any filesystem except very small ones, mkfs.fat will align all the data
structures to cluster size, to make sure that as long as the partition is properly aligned,
so will all the data structures in the filesystem. This option disables alignment; this
may provide a handful of additional clusters of storage at the expense of a significant
performance degradation on RAIDs, flash media or large-sector hard disks.
-A
Use Atari variation of the MS-DOS filesystem. This is default if mkfs.fat is run on
an Atari, then this option turns off Atari format. There are some differences when
using Atari format: If not directed otherwise by the user, mkfs.fat will always use 2
sectors per cluster, since GEMDOS doesn’t like other values very much. It will also
obey the maximum number of sectors GEMDOS can handle. Larger filesystems are
managed by raising the logical sector size. Under Atari format, an Atari-compatible
serial number for the filesystem is generated, and a 12 bit FAT is used only for
filesystems that have one of the usual floppy sizes (720k, 1.2M, 1.44M, 2.88M), a 16
bit FAT otherwise. This can be overridden with the -F option. Some PC-specific boot
sector fields aren’t written, and a boot message (option -m) is ignored.
-b sector-of-backup
Selects the location of the backup boot sector for FAT32. Default depends on number
of reserved sectors, but usually is sector 6. The backup must be within the range of
reserved sectors.
-c
Check the device for bad blocks before creating the filesystem.
-C
Create the file given as device on the command line, and write the to-be-created
filesystem to it. This can be used to create the new filesystem in a file instead of on a
real device, and to avoid using dd in advance to create a file of appropriate size. With
this option, the block-count must be given, because otherwise the intended size of the
filesystem wouldn’t be known. The file created is a sparse file, which actually only
contains the meta-data areas (boot sector, FATs, and root directory). The data portions
won’t be stored on the disk, but the file nevertheless will have the correct size. The
resulting file can be copied later to a floppy disk or other device, or mounted through
a loop device.
-f number-of-FATs
Specify the number of file allocation tables in the filesystem. The default is 2.
Currently the Linux MS-DOS filesystem does not support more than 2 FATs.
-F FAT-size
Specifies the type of file allocation tables used (12, 16 or 32 bit). If nothing is
specified, mkfs.fat will automatically select between 12, 16 and 32 bit, whatever fits
better for the filesystem size.
-h number-of-hidden-sectors
Select the number of hidden sectors in the volume. Apparently some digital cameras
get indigestion if you feed them a CF card without such hidden sectors, this option
allows you to satisfy them. Assumes ‘0’ if no value is given on the command line.
-i volume-id
Sets the volume ID of the newly created filesystem; volume-id is a 32-bit
hexadecimal number (for example, 2e24ec82). The default is a number which
depends on the filesystem creation time.
-I
It is typical for fixed disk devices to be partitioned so, by default, you are not
permitted to create a filesystem across the entire device. mkfs.fat will complain and
tell you that it refuses to work. This is different when using MO disks. One doesn’t
always need partitions on MO disks. The filesystem can go directly to the whole disk.
Under other OSes this is known as the ‘superfloppy’ format.
This switch will force mkfs.fat to work properly.
-l filename
Read the bad blocks list from filename.
-m message-file
Sets the message the user receives on attempts to boot this filesystem without having
properly installed an operating system. The message file must not exceed 418 bytes
once line feeds have been converted to carriage return-line feed combinations, and
tabs have been expanded. If the filename is a hyphen (-), the text is taken from
standard input.
-n volume-name
Sets the volume name (label) of the filesystem. The volume name can be up to 11
characters long. The default is no label.
-r root-dir-entries
Select the number of entries available in the root directory. The default is 112 or 224
for floppies and 512 for hard disks.
-R number-of-reserved-sectors
Select the number of reserved sectors. With FAT32 format at least 2 reserved sectors
are needed, the default is 32. Otherwise the default is 1 (only the boot sector).
-s sectors-per-cluster
Specify the number of disk sectors per cluster. Must be a power of 2, i.e. 1, 2, 4, 8, …
128.
-S logical-sector-size
Specify the number of bytes per logical sector. Must be a power of 2 and greater than
or equal to 512, i.e. 512, 1024, 2048, 4096, 8192, 16384, or 32768.
-v
Verbose execution.
› BUGS
mkfs.fat can not create boot-able filesystems. This isn’t as easy as you might think at
first glance for various reasons and has been discussed a lot already. mkfs.fat simply
will not support it ;)
› AUTHOR
Dave Hudson - <[email protected]>; modified by Peter Anvin
<[email protected]>. Fixes and additions by Roman Hodek <[email protected]>
for Debian GNU/Linux.
› ACKNOWLEDGMENTS
mkfs.fat is based on code from mke2fs (written by Remy Card -
<[email protected]>) which is itself based on mkfs (written by Linus Torvalds -
<[email protected]>).
› SEE ALSO
fsck.fat(8), fatlabel(8), mkfs(8)
mkfs.xfs
› NAME
mkfs.xfs - construct an XFS filesystem
› SYNOPSIS
mkfs.xfs [ -b block_size ] [ -m global_metadata_options ] [ -d data_section_options
] [ -f ] [ -i inode_options ] [ -l log_section_options ] [ -n naming_options ] [ -p
protofile ] [ -q ] [ -r realtime_section_options ] [ -s sector_size ] [ -L label ] [ -N ] [ -
K ] device mkfs.xfs -V
› DESCRIPTION
mkfs.xfs constructs an XFS filesystem by writing on a special file using the values
found in the arguments of the command line. It is invoked automatically by mkfs(8)
when it is given the -t xfs option.
In its simplest (and most commonly used form), the size of the filesystem is
determined from the disk driver. As an example, to make a filesystem with an
internal log on the first partition on the first SCSI disk, use:
mkfs.xfs /dev/sda1
The metadata log can be placed on another device to reduce the number of disk seeks. To
create a filesystem on the first partition on the first SCSI disk with a 10000 block log
located on the first partition on the second SCSI disk, use:
mkfs.xfs -l logdev=/dev/sdb1,size=10000b /dev/sda1
Each of the option elements in the argument list above can be given as multiple comma-
separated suboptions if multiple suboptions apply to the same option. Equivalently, each
main option can be given multiple times with different suboptions. For example, -l
internal,size=10000b and -l internal -l size=10000b are equivalent.
In the descriptions below, sizes are given in sectors, bytes, blocks, kilobytes, megabytes,
gigabytes, etc. Sizes are treated as hexadecimal if prefixed by 0x or 0X, octal if prefixed
by 0, or decimal otherwise. The following lists possible multiplication suffixes:
s - multiply by sector size (default = 512, see -s
option below).
b - multiply by filesystem block size (default = 4K, see -b
option below).
k - multiply by one kilobyte (1,024 bytes).
m - multiply by one megabyte (1,048,576 bytes).
g - multiply by one gigabyte (1,073,741,824 bytes).
t - multiply by one terabyte (1,099,511,627,776 bytes).
p - multiply by one petabyte (1,024 terabytes).
e - multiply by one exabyte (1,048,576 terabytes).
› OPTIONS
-b block_size_options
This option specifies the fundamental block size of the filesystem. The valid
block_size_options are: log=value or size=value and only one can be supplied. The
block size is specified either as a base two logarithm value with log=, or in bytes with
size=. The default value is 4096 bytes (4 KiB), the minimum is 512, and the
maximum is 65536 (64 KiB). XFS on Linux currently only supports pagesize or
smaller blocks.
-m global_metadata_options
These options specify metadata format options that either apply to the entire
filesystem or aren’t easily characterised by a specific functionality group. The valid
global_metadata_options are:
crc=value
This is used to create a filesystem which maintains and checks CRC information in
all metadata objects on disk. The value is either 0 to disable the feature, or 1 to enable
the use of CRCs.
CRCs enable enhanced error detection due to hardware issues, whilst the format
changes also improves crash recovery algorithms and the ability of various tools to
validate and repair metadata corruptions when they are found. The CRC algorithm
used is CRC32c, so the overhead is dependent on CPU architecture as some CPUs
have hardware acceleration of this algorithm. Typically the overhead of calculating
and checking the CRCs is not noticable in normal operation.
By default, mkfs.xfs will not enable metadata CRCs.
finobt=value
This option enables the use of a separate free inode btree index in each allocation
group. The value is either 0 to disable the feature, or 1 to create a free inode btree in
each allocation group.
The free inode btree mirrors the existing allocated inode btree index which indexes
both used and free inodes. The free inode btree does not index used inodes, allowing
faster, more consistent inode allocation performance as filesystems age.
By default, mkfs.xfs will not create free inode btrees. This feature is also currently
only available for filesystems created with the -m crc=1 option set.
-d data_section_options These options specify the location, size, and other parameters of
the data section of the filesystem. The valid data_section_options are:
agcount=value
This is used to specify the number of allocation groups. The data section of the
filesystem is divided into allocation groups to improve the performance of XFS.
More allocation groups imply that more parallelism can be achieved when allocating
blocks and inodes. The minimum allocation group size is 16 MiB; the maximum size
is just under 1 TiB. The data section of the filesystem is divided into value allocation
groups (default value is scaled automatically based on the underlying device size).
agsize=value
This is an alternative to using the agcount suboption. The value is the desired size of
the allocation group expressed in bytes (usually using the m or g suffixes). This value
must be a multiple of the filesystem block size, and must be at least 16MiB, and no
more than 1TiB, and may be automatically adjusted to properly align with the stripe
geometry. The agcount and agsize suboptions are mutually exclusive.
name=value
This can be used to specify the name of the special file containing the filesystem. In
this case, the log section must be specified as internal (with a size, see the -l option
below) and there can be no real-time section.
file[=value]
This is used to specify that the file given by the name suboption is a regular file. The
value is either 0 or 1, with 1 signifying that the file is regular. This suboption is used
only to make a filesystem image. If the value is omitted then 1 is assumed.
size=value
This is used to specify the size of the data section. This suboption is required if -d
file[=1] is given. Otherwise, it is only needed if the filesystem should occupy less
space than the size of the special file.
sunit=value
This is used to specify the stripe unit for a RAID device or a logical volume. The
value has to be specified in 512-byte block units. Use the su suboption to specify the
stripe unit size in bytes. This suboption ensures that data allocations will be stripe
unit aligned when the current end of file is being extended and the file size is larger
than 512KiB. Also inode allocations and the internal log will be stripe unit aligned.
su=value
This is an alternative to using sunit. The su suboption is used to specify the stripe
unit for a RAID device or a striped logical volume. The value has to be specified in
bytes, (usually using the m or g suffixes). This value must be a multiple of the
filesystem block size.
swidth=value
This is used to specify the stripe width for a RAID device or a striped logical volume.
The value has to be specified in 512-byte block units. Use the sw suboption to
specify the stripe width size in bytes. This suboption is required if -d sunit has been
specified and it has to be a multiple of the -d sunit suboption.
sw=value
suboption is an alternative to using swidth. The sw suboption is used to specify the
stripe width for a RAID device or striped logical volume. The value is expressed as a
multiplier of the stripe unit, usually the same as the number of stripe members in the
logical volume configuration, or data disks in a RAID device.
When a filesystem is created on a logical volume device, mkfs.xfs will automatically
query the logical volume for appropriate sunit and swidth values.
noalign
This option disables automatic geometry detection and creates the filesystem without
stripe geometry alignment even if the underlying storage device provides this
information.
-f Force overwrite when an existing filesystem is detected on the device. By default,
mkfs.xfs will not write to the device if it suspects that there is a filesystem or partition
table on the device already. -i inode_options This option specifies the inode size of the
filesystem, and other inode allocation parameters. The XFS inode contains a fixed-size
part and a variable-size part. The variable-size part, whose size is affected by this option,
can contain: directory data, for small directories; attribute data, for small attribute sets;
symbolic link data, for small symbolic links; the extent list for the file, for files with a
small number of extents; and the root of a tree describing the location of extents for the
file, for files with a large number of extents. The valid inode_options are:
size=value | log=value | perblock=value
The inode size is specified either as a value in bytes with size=, a base two logarithm
value with log=, or as the number fitting in a filesystem block with perblock=. The
minimum (and default) value is 256 bytes. The maximum value is 2048 (2 KiB)
subject to the restriction that the inode size cannot exceed one half of the filesystem
block size.
XFS uses 64-bit inode numbers internally; however, the number of significant bits in
an inode number is affected by filesystem geometry. In practice, filesystem size and
inode size are the predominant factors. The Linux kernel (on 32 bit hardware
platforms) and most applications cannot currently handle inode numbers greater than
32 significant bits, so if no inode size is given on the command line, mkfs.xfs will
attempt to choose a size such that inode numbers will be < 32 bits. If an inode size is
specified, or if a filesystem is sufficiently large, mkfs.xfs will warn if this will create
inode numbers > 32 significant bits.
maxpct=value
This specifies the maximum percentage of space in the filesystem that can be
allocated to inodes. The default value is 25% for filesystems under 1TB, 5% for
filesystems under 50TB and 1% for filesystems over 50TB.
In the default inode allocation mode, inode blocks are chosen such that inode
numbers will not exceed 32 bits, which restricts the inode blocks to the lower portion
of the filesystem. The data block allocator will avoid these low blocks to
accommodate the specified maxpct, so a high value may result in a filesystem with
nothing but inodes in a significant portion of the lower blocks of the filesystem. (This
restriction is not present when the filesystem is mounted with the inode64 option on
64-bit platforms).
Setting the value to 0 means that essentially all of the filesystem can become inode
blocks, subject to inode32 restrictions.
This value can be modified with xfs_growfs(8).
align[=value]
This is used to specify that inode allocation is or is not aligned. The value is either 0
or 1, with 1 signifying that inodes are allocated aligned. If the value is omitted, 1 is
assumed. The default is that inodes are aligned. Aligned inode access is normally
more efficient than unaligned access; alignment must be established at the time the
filesystem is created, since inodes are allocated at that time. This option can be used
to turn off inode alignment when the filesystem needs to be mountable by a version
of IRIX that does not have the inode alignment feature (any release of IRIX before
6.2, and IRIX 6.2 without XFS patches).
attr=value
This is used to specify the version of extended attribute inline allocation policy to be
used. By default, this is 2, which uses an efficient algorithm for managing the
available inline inode space between attribute and extent data.
The previous version 1, which has fixed regions for attribute and extent data, is kept
for backwards compatibility with kernels older than version 2.6.16.
projid32bit[=value]
This is used to enable 32bit quota project identifiers. The value is either 0 or 1, with 1
signifying that 32bit projid are to be enabled. If the value is omitted, 1 is assumed.
(This default changed in release version 3.2.0.)
-l log_section_options These options specify the location, size, and other parameters of the
log section of the filesystem. The valid log_section_options are:
internal[=value]
This is used to specify that the log section is a piece of the data section instead of
being another device or logical volume. The value is either 0 or 1, with 1 signifying
that the log is internal. If the value is omitted, 1 is assumed.
logdev=device
This is used to specify that the log section should reside on the device separate from
the data section. The internal=1 and logdev options are mutually exclusive.
size=value
This is used to specify the size of the log section.
If the log is contained within the data section and size isn’t specified, mkfs.xfs will
try to select a suitable log size depending on the size of the filesystem. The actual
logsize depends on the filesystem block size and the directory block size.
Otherwise, the size suboption is only needed if the log section of the filesystem
should occupy less space than the size of the special file. The value is specified in
bytes or blocks, with a b suffix meaning multiplication by the filesystem block size,
as described above. The overriding minimum value for size is 512 blocks. With some
combinations of filesystem block size, inode size, and directory block size, the
minimum log size is larger than 512 blocks.
version=value
This specifies the version of the log. The current default is 2, which allows for larger
log buffer sizes, as well as supporting stripe-aligned log writes (see the sunit and su
options, below).
The previous version 1, which is limited to 32k log buffers and does not support
stripe-aligned writes, is kept for backwards compatibility with very old 2.4 kernels.
sunit=value
This specifies the alignment to be used for log writes. The value has to be specified in
512-byte block units. Use the su suboption to specify the log stripe unit size in bytes.
Log writes will be aligned on this boundary, and rounded up to this boundary. This
gives major improvements in performance on some configurations such as software
RAID5 when the sunit is specified as the filesystem block size. The equivalent byte
value must be a multiple of the filesystem block size. Version 2 logs are
automatically selected if the log sunit suboption is specified.
The su suboption is an alternative to using sunit.
su=value
This is used to specify the log stripe. The value has to be specified in bytes, (usually
using the s or b suffixes). This value must be a multiple of the filesystem block size.
Version 2 logs are automatically selected if the log su suboption is specified.
lazy-count=value
This changes the method of logging various persistent counters in the superblock.
Under metadata intensive workloads, these counters are updated and logged
frequently enough that the superblock updates become a serialization point in the
filesystem. The value can be either 0 or 1.
With lazy-count=1, the superblock is not modified or logged on every change of the
persistent counters. Instead, enough information is kept in other parts of the
filesystem to be able to maintain the persistent counter values without needed to keep
them in the superblock. This gives significant improvements in performance on some
configurations. The default value is 1 (on) so you must specify lazy-count=0 if you
want to disable this feature for older kernels which don’t support it.
-n naming_options These options specify the version and size parameters for the naming
(directory) area of the filesystem. The valid naming_options are:
size=value | log=value
The block size is specified either as a value in bytes with size=, or as a base two
logarithm value with log=. The block size must be a power of 2 and cannot be less
than the filesystem block size. The default size value for version 2 directories is 4096
bytes (4 KiB), unless the filesystem block size is larger than 4096, in which case the
default value is the filesystem block size. For version 1 directories the block size is
the same as the filesystem block size.
version=value
The naming (directory) version value can be either 2 or ‘ci’, defaulting to 2 if
unspecified. With version 2 directories, the directory block size can be any power of
2 size from the filesystem block size up to 65536.
The version=ci option enables ASCII only case-insensitive filename lookup and
version 2 directories. Filenames are case-preserving, that is, the names are stored in
directories using the case they were created with.
Note: Version 1 directories are not supported.
ftype=value
This feature allows the inode type to be stored in the directory structure so that the
readdir(3) and getdents(2) do not need to look up the inode to determine the inode
type.
The value is either 0 or 1, with 1 signifiying that filetype information will be stored in
the directory structure. The default value is 0.
When CRCs are enabled via -m crc=1, the ftype functionality is always enabled. This
feature can not be turned off for such filesystem configurations.
-p protofile If the optional -p protofile argument is given, mkfs.xfs uses protofile as a
prototype file and takes its directions from that file. The blocks and inodes specifiers in
the protofile are provided for backwards compatibility, but are otherwise unused. The
syntax of the protofile is defined by a number of tokens separated by spaces or newlines.
Note that the line numbers are not part of the syntax but are meant to help you in the
following discussion of the file contents. 1 /stand/diskboot 2 4872 110 3 d--777 3 1
4 usr d--777 3 1 5 sh ---755 3 1 /bin/sh 6 ken d--755 6 1 7 $ 8 b0 b--644 3
1 0 0 9 c0 c--644 3 1 0 0 10 fifo p--644 3 1 11 slink l--644 3 1
/a/symbolic/link 12 : This is a comment line 13 $ 14 $ Line 1 is a dummy
string. (It was formerly the bootfilename.) It is present for backward compatibility; boot
blocks are not used on SGI systems. Note that some string of characters must be present as
the first line of the proto file to cause it to be parsed correctly; the value of this string is
immaterial since it is ignored. Line 2 contains two numeric values (formerly the numbers
of blocks and inodes). These are also merely for backward compatibility: two numeric
values must appear at this point for the proto file to be correctly parsed, but their values
are immaterial since they are ignored. The lines 3 through 11 specify the files and
directories you want to include in this filesystem. Line 3 defines the root directory. Other
directories and files that you want in the filesystem are indicated by lines 4 through 6 and
lines 8 through 10. Line 11 contains symbolic link syntax. Notice the dollar sign ($)
syntax on line 7. This syntax directs the mkfs.xfs command to terminate the branch of the
filesystem it is currently on and then continue from the directory specified by the next
line, in this case line 8. It must be the last character on a line. The colon on line 12
introduces a comment; all characters up until the following newline are ignored. Note that
this means you cannot have a file in a prototype file whose name contains a colon. The $
on lines 13 and 14 end the process, since no additional specifications follow. File
specifications provide the following: * file mode * user ID * group ID
* the file’s beginning contents
A 6-character string defines the mode for a file. The first character of this string
defines the file type. The character range for this first character is -bcdpl. A file may
be a regular file, a block special file, a character special file, directory files, named
pipes (first-in, first out files), and symbolic links. The second character of the mode
string is used to specify setuserID mode, in which case it is u. If setuserID mode is
not specified, the second character is -. The third character of the mode string is used
to specify the setgroupID mode, in which case it is g. If setgroupID mode is not
specified, the third character is -. The remaining characters of the mode string are a
three digit octal number. This octal number defines the owner, group, and other read,
write, and execute permissions for the file, respectively. For more information on file
permissions, see the chmod(1) command.
Following the mode character string are two decimal number tokens that specify the
user and group IDs of the file’s owner.
In a regular file, the next token specifies the pathname from which the contents and
size of the file are copied. In a block or character special file, the next token are two
decimal numbers that specify the major and minor device numbers. When a file is a
symbolic link, the next token specifies the contents of the link.
When the file is a directory, the mkfs.xfs command creates the entries dot (.) and
dot-dot (..) and then reads the list of names and file specifications in a recursive
manner for all of the entries in the directory. A scan of the protofile is always
terminated with the dollar ( $ ) token.
-q
Quiet option. Normally mkfs.xfs prints the parameters of the filesystem to be
constructed; the -q flag suppresses this.
-r realtime_section_options
These options specify the location, size, and other parameters of the real-time section
of the filesystem. The valid realtime_section_options are:
rtdev=device
This is used to specify the device which should contain the real-time section of the
filesystem. The suboption value is the name of a block device.
extsize=value
This is used to specify the size of the blocks in the real-time section of the filesystem.
This value must be a multiple of the filesystem block size. The minimum allowed
size is the filesystem block size or 4 KiB (whichever is larger); the default size is the
stripe width for striped volumes or 64 KiB for non-striped volumes; the maximum
allowed size is 1 GiB. The real-time extent size should be carefully chosen to match
the parameters of the physical media used.
size=value
This is used to specify the size of the real-time section. This suboption is only needed
if the real-time section of the filesystem should occupy less space than the size of the
partition or logical volume containing the section.
noalign
This option disables stripe size detection, enforcing a realtime device with no stripe
geometry.
-s sector_size This option specifies the fundamental sector size of the filesystem. The
sector_size is specified either as a value in bytes with size=value or as a base two
logarithm value with log=value. The default sector_size is 512 bytes. The minimum value
for sector size is 512; the maximum is 32768 (32 KiB). The sector_size must be a power
of 2 size and cannot be made larger than the filesystem block size. -L label Set the
filesystem label. XFS filesystem labels can be at most 12 characters long; if label is longer
than 12 characters, mkfs.xfs will not proceed with creating the filesystem. Refer to the
mount(8) and xfs_admin(8) manual entries for additional information. -N Causes the file
system parameters to be printed out without really creating the file system. -K Do not
attempt to discard blocks at mkfs time. -V Prints the version number and exits.
› SEE ALSO
xfs(5), mkfs(8), mount(8), xfs_info(8), xfs_admin(8).
› BUGS
With a prototype file, it is not possible to specify hard links.
MKHOMEDIR_HELPER
› NAME
mkhomedir_helper - Helper binary that creates home directories
› SYNOPSIS
mkhomedir_helper {user} [umask [ path-to-skel ]]
› DESCRIPTION
mkhomedir_helper
is a helper program for the pam_mkhomedir module that creates home directories and
populates them with contents of the specified skel directory.
The default value of umask is 0022 and the default value of path-to-skel is /etc/skel.
The helper is separated from the module to not require direct access from login
SELinux domains to the contents of user home directories. The SELinux domain
transition happens when the module is executing the mkhomedir_helper.
The helper never touches home directories if they already exist.
› SEE ALSO
pam_mkhomedir(8)
› AUTHOR
Written by Tomas Mraz based on the code originally in pam_mkhomedir module.
MKINITRD
› NAME
mkinitrd - is a compat wrapper, which calls dracut to generate an initramfs
› SYNOPSIS
mkinitrd [OPTION…] [<initrd-image>] <kernel-version>
› DESCRIPTION
mkinitrd creates an initramfs image <initrd-image> for the kernel with version
<kernel-version> by calling “dracut”.
Important
If a more fine grained control over the resulting image is needed, “dracut” should be
called directly.
› OPTIONS
—version
print info about the version
-v, —verbose
increase verbosity level
-f, —force
overwrite existing initramfs file.
*—image-version
append the kernel version to the target image <initrd-image>-<kernel-version>.
—with=<module>
add the kernel module <module> to the initramfs.
—preload=<module>
preload the kernel module <module> in the initramfs before any other kernel
modules are loaded. This can be used to ensure a certain device naming, which
should in theory be avoided and the use of symbolic links in /dev is encouraged.
—nocompress
do not compress the resulting image.
—help
print a help message and exit.
› AVAILABILITY
The mkinitrd command is part of the dracut package and is available from
https://dracut.wiki.kernel.orgm[]
› AUTHORS
Harald Hoyer
› SEE ALSO
dracut(8)
MKSWAP
› NAME
mkswap - set up a Linux swap area
› SYNOPSIS
mkswap [options] device [size]
› DESCRIPTION
mkswap sets up a Linux swap area on a device or in a file.
The device argument will usually be a disk partition (something like /dev/sdb7) but
can also be a file. The Linux kernel does not look at partition IDs, but many
installation scripts will assume that partitions of hex type 82 (LINUX_SWAP) are
meant to be swap partitions. (Warning: Solaris also uses this type. Be careful not
to kill your Solaris partitions.)
The size parameter is superfluous but retained for backwards compatibility. (It
specifies the desired size of the swap area in 1024-byte blocks. mkswap will use the
entire partition or file if it is omitted. Specifying it is unwise — a typo may destroy
your disk.)
After creating the swap area, you need the swapon command to start using it.
Usually swap areas are listed in /etc/fstab so that they can be taken into use at boot
time by a swapon -a command in some boot script.
› WARNING
The swap header does not touch the first block. A boot loader or disk label can be
there, but it is not a recommended setup. The recommended setup is to use a separate
partition for a Linux swap area.
mkswap, like many others mkfs-like utils, erases the first partition block to make
any previous filesystem invisible.
However, mkswap refuses to erase the first block on a device with a disk label
(SUN, BSD, …).
› OPTIONS
-c, —check
Check the device (if it is a block device) for bad blocks before creating the swap area.
If any bad blocks are found, the count is printed.
-f, —force
Go ahead even if the command is stupid. This allows the creation of a swap area
larger than the file or partition it resides on.
Also, without this option, mkswap will refuse to erase the first block on a device
with a partition table.
-L, —label label
Specify a label for the device, to allow swapon by label.
-p, —pagesize size
Specify the page size (in bytes) to use. This option is usually unnecessary; mkswap
reads the size from the kernel.
-U, —uuid UUID
Specify the UUID to use. The default is to generate a UUID.
-v, —swapversion 1
Specify the swap-space version. (This option is currently pointless, as the old -v 0
option has become obsolete and now only -v 1 is supported. The kernel has not
supported v0 swap-space format since 2.5.22 (June 2002). The new version v1 is
supported since 2.1.117 (August 1998).)
-h, —help
Display help text and exit.
-V, —version
Display version information and exit.
› NOTES
The maximum useful size of a swap area depends on the architecture and the kernel
version. It is roughly 2GiB on i386, PPC, m68k and ARM, 1GiB on sparc, 512MiB
on mips, 128GiB on alpha, and 3TiB on sparc64. For kernels after 2.3.3 (May 1999)
there is no such limitation.
Note that before version 2.1.117 the kernel allocated one byte for each page, while it
now allocates two bytes, so that taking into use a swap area of 2 GiB might require 2
MiB of kernel memory.
Presently, Linux allows 32 swap areas (this was 8 before Linux 2.4.10 (Sep 2001)).
The areas in use can be seen in the file /proc/swaps (since 2.1.25 (Sep 1997)).
mkswap refuses areas smaller than 10 pages.
If you don’t know the page size that your machine uses, you may be able to look it up
with “cat /proc/cpuinfo” (or you may not — the contents of this file depend on
architecture and kernel version).
To set up a swap file, it is necessary to create that file before initializing it with
mkswap, e.g. using a command like
# fallocate --length 8GiB swapfile
Note that a swap file must not contain any holes (so, using cp(1) to create the file is not
acceptable).
› ENVIRONMENT
LIBBLKID_DEBUG=0xffff
enables debug output.
› SEE ALSO
fdisk(8), swapon(8)
› AVAILABILITY
The mkswap command is part of the util-linux package and is available from
ftp://ftp.kernel.org/pub/linux/utils/util-linux/.
mmcli
› NAME
mmcli - Control and monitor the ModemManager
› SYNOPSIS
mmcli [OPTION…]
› DESCRIPTION
ModemManager is a DBus-powered Linux daemon which provides a unified high
level API for communicating with (mobile broadband) modems. It acts as a standard
RIL (Radio Interface Layer) and may be used by different connection managers, like
NetworkManager. Thanks to the built-in plugin architecture, ModemManager talks to
very different kinds of modems with very different kinds of ports. In addition to the
standard AT serial ports, Qualcomm-based QCDM and QMI ports are also supported.
› HELP OPTIONS
-h, —help
Show summary of options by group.
—help-all
Show all groups and options.
—help-manager
Show manager specific options.
—help-common
Show common options. These are used for defining the device an option operates on.
For example, modems, bearers, SIMs, SMS’, etc.
—help-modem
Show modem specific options.
—help-3gpp
Show 3GPP specific options.
—help-cdma
Show CDMA specific options.
—help-simple
Show simple options. These are useful for getting connected or disconnected and
understanding the state of things as fast as possible without worrying so much about
the details.
—help-location
Show location or positioning specific options.
—help-messaging
Show messaging specific options. See also —help-sms which is related.
—help-time
Show time specific options.
—help-firmware
Show firmware specific options.
—help-oma
Show OMA specific options.
—help-sim
Show SIM card specific options.
—help-bearer
Show bearer specific options.
—help-sms
Show SMS specific options. See also —help-messaging which is related.
› MANAGER OPTIONS
-G, —set-logging=[ERR|WARN|INFO|DEBUG]
Set the logging level in ModemManager daemon. For debugging information you can
supply DEBUG. Each value above DEBUG provides less detail. In most cases ERR
(for displaying errors) are the important messages.
The default mode is ERR.
-L, —list-modems
List available modems.
-M, —monitor-modems
List available modems and monitor modems added or removed.
-S, —scan-modems
Scan for any potential new modems. This is only useful when expecting pure RS232
modems, as they are not notified automatically by the kernel.
› COMMON OPTIONS
All options below take a PATH or INDEX argument. If no action is provided, the
default information about the modem, bearer, etc. is shown instead.
The PATH and INDEX are created automatically when the modem is plugged in.
They can be found using mmcli -L. This produces something like (for modems
only):
Found 1 modems: /org/freedesktop/ModemManager1/Modem/4
In this case, the INDEX is 4 and the PATH is the entire string above.
However, for the bearers, SIMs and SMS cases, the PATH is slightly different. The
Modem is replaced with the object name in use, like Bearer. For example:
/org/freedesktop/ModemManager1/Bearer/4
-m, —modem=[PATH|INDEX]
Specify a modem.
-b, —bearer=[PATH|INDEX]
Specify a bearer.
-i, —sim=[PATH|INDEX]
Specify a SIM card.
-s, —sms=[PATH|INDEX]
Specify an SMS.
› MODEM OPTIONS
All of the modem options below make use of the —modem or -m switch to specify
the modem to act on.
Some operations require a MODE. MODE can be any combination of the modes
actually supported by the modem. In the perfect case, the following are possible:
2G - 2G technologies, e.g. EDGE, CDMA1x 3G - 3G technologies, e.g.
HSPA, EV-DO 4G - 4G technologies, e.g. LTE ANY - for all supported
modes.
-w, —monitor-state
Monitor the state of a given modem.
-e, —enable
Enable a given modem.
This powers the antenna, starts the automatic registration process and in general
prepares the modem to be connected.
-d, —disable
Disable a given modem.
This disconnects the existing connection(s) for the modem and puts it into a low
power mode.
-r, —reset
Resets the modem to the settings it had when it was power cycled.
—factory-reset=CODE
Resets the modem to its original factory default settings.
The CODE provided is vendor specific. Without the correct vendor code, it’s
unlikely this operation will succeed. This is not a common user action.
—command=COMMAND
Send an AT COMMAND to the given modem. For example, COMMAND could be
‘AT+GMM’ to probe for phone model information. This operation is only available
when ModemManager is run in debug mode.
—list-bearers
List packet data bearers that are available for the given modem.
—create-bearer=[‘KEY1=VALUE1,KEY2=VALUE2,…’]
Create a new packet data bearer for a given modem. The KEYs and some VALUEs
are listed below:
apn
Access Point Name. Required in 3GPP.
ip-type
Addressing type. Given as a MMBearerIpFamily value (e.g. ‘ipv4’, ‘ipv6’, ‘ipv4v6’).
Optional in 3GPP and CDMA.
allowed-auth
Authentication method to use. Given as a MMBearerAllowedAuth value (e.g.
‘none|pap|chap|mschap|mschapv2|eap’). Optional in 3GPP.
user
User name (if any) required by the network. Optional in 3GPP.
password
Password (if any) required by the network. Optional in 3GPP.
allow-roaming
Flag to tell whether connection is allowed during roaming, given as a boolean value
(i.e ‘yes’ or ‘no’). Optional in 3GPP.
rm-protocol
Protocol of the Rm interface, given as a MMModemCdmaRmProtocol value (e.g.
‘async’, ‘packet-relay’, ‘packet-network-ppp’, ‘packet-network-slip’, ‘stu-iii’).
Optional in CDMA.
number
Telephone number to dial. Required in POTS.
—delete-bearer=PATH Delete bearer from a given modem. This option explicitly uses a
PATH to define the bearer, you can not use an INDEX to be deleted. —set-allowed-
modes=[MODE1|MODE2|…] Set allowed modes for a given modem. For possible
modes, see the beginning of this section. —set-bands=[BAND1|BAND2|…] Set bands to
be used for a given modem. These are frequency ranges the modem should use. There are
quite a number of supported bands and listing them all here would be quite extensive. For
details, see the MMModemBand documentation.
An example would be: ‘egsm|dcs|pcs|g850’ to select all the GSM frequency bands.
—set-preferred-mode=MODE Set the preferred MODE for the given modem. The
MODE MUST be one of the allowed modes as set with the —set-allowed-modes option.
Possible MODE arguments are detailed at the beginning of this section.
› 3GPP OPTIONS
The 3rd Generation Partnership Project (3GPP) is a collaboration between groups of
telecommunications associations. These options pertain to devices which support
3GPP.
Included are options to control USSD (Unstructured Supplementary Service Data)
sessions.
All of the 3GPP options below make use of the —modem or -m switch to specify the
modem to act on.
—3gpp-scan
Scan for available 3GPP networks.
—3gpp-register-home
Request a given modem to register in its home network.
This registers with the default network(s) specified by the modem,
—3gpp-register-in-operator=MCCMNC
Request a given modem to register on the network of the given MCCMNC (Mobile
Country Code, Mobile Network Code) based operator. This code is used for
GSM/LTE, CDMA, iDEN, TETRA and UMTS public land mobile networks and
some satellite mobile networks. The ITU-T Recommendation E.212 defines mobile
country codes.
—3gpp-ussd-status
Request the status of ANY ongoing USSD session.
—3gpp-ussd-initiate=COMMAND
Request the given modem to initiate a USSD session with COMMAND.
For example, COMMAND could be ‘*101#’ to give your current pre-pay balance.
—3gpp-ussd-respond=RESPONSE
When initiating an USSD session, a RESPONSE may be needed by a network-
originated request. This option allows for that.
—3gpp-ussd-cancel
Cancel an ongoing USSD session for a given modem.
› CDMA OPTIONS
All CDMA (Code Division Multiple Access) options require the —modem or -m
option.
—cdma-activate=CARRIER
Activate the given modem using OTA (Over the Air) settings. The CARRIER is a
code provided by the network for the default settings they provide.
› SIMPLE OPTIONS
All simple options must be used with —modem or -m.
—simple-connect=[‘KEY1=VALUE1,KEY2=VALUE2,…’]
Run a full connection sequence using KEY / VALUE pairs. You can use the —
create-bearer options, plus any of the following ones:
pin
SIM-PIN unlock code.
operator-id
ETSI MCC-MNC of a network to force registration.
—create-file-with-data=PATH This option takes an SMS that has DATA (not TEXT) and
will create a local file described by PATH and store the content of the SMS there.
› APPLICATION OPTIONS
-v, —verbose
Perform actions with more details reported and/or logged.
-V, —version
Returns the version of this program.
-a, —async
Use asynchronous methods. This is purely a development tool and has no practical
benefit to most user operations.
—timeout=SECONDS
Use SECONDS for the timeout when performing operations with this command.
This option is useful when executing long running operations, like —3gpp-scan.
› EXAMPLES
Send the PIN to the SIM card
You’ll need first to know which the proper path/index is for the SIM in your modem:
$ mmcli -m 0 | grep SIM SIM | path:
'/org/freedesktop/ModemManager1/SIM/0'
And after that, you can just use the SIM index: $ mmcli -i 0 --pin=1234
successfully sent PIN code to the SIM
You can launch the simple connection process like: $ mmcli -m 0 --simple-
connect="pin=1234,apn=internet" successfully connected the modem
Scanning for 3GPP networks may really take a long time, so a specific timeout must
be given: $ mmcli -m 0 --3gpp-scan --timeout=300 Found 4 networks: 21404
- Yoigo (umts, available) 21407 - Movistar (umts, current) 21401 -
vodafone ES (umts, forbidden) 21403 - Orange (umts, forbidden)
When the receiver gets all the parts of the message, they can now recover the sent file
with another mmcli command in their ModemManager setup:
$> sudo mmcli -m 0 --messaging-list-sms Found 1 SMS messages:
/org/freedesktop/ModemManager1/SMS/0 (received) $> sudo mmcli -s 0 --
create-file-with-data=/path/to/the/output/file
You first need to check whether the modem has GPS-specific location capabilities.
Note that we’ll assume the modem is exposed as index 0; if you have more than one
modem, just use —list-modems to check the proper modem index:
$ mmcli -m 0 --location-status /org/freedesktop/ModemManager1/Modem/0 -
--------------------------- Location | capabilities: '3gpp-lac-ci, gps-
raw, gps-nmea' | enabled: 'none' | signals: 'no'
The output says that the modem supports 3GPP Location area code/Cell ID, GPS raw
and GPS-NMEA location sources. None is enabled yet, as we didn’t enable the
modem, which we can do issuing:
$ sudo mmcli -m 0 --enable successfully enabled the modem $ mmcli -m 0
--location-status /org/freedesktop/ModemManager1/Modem/0 --------------
-------------- Location | capabilities: '3gpp-lac-ci, gps-raw, gps-
nmea' | enabled: '3gpp-lac-ci' | signals: 'no'
We can enable the RAW and NMEA GPS location sources using:
$ sudo mmcli -m 0 \ --location-enable-gps-raw \ --location-enable-gps-
nmea successfully setup location gathering
If we do check again the status, we’ll see the GPS-specific locations are enabled:
$ mmcli -m 0 --location-status /org/freedesktop/ModemManager1/Modem/0 -
--------------------------- Location | capabilities: '3gpp-lac-ci, gps-
raw, gps-nmea' | enabled: '3gpp-lac-ci, gps-raw, gps-nmea' | signals:
'no'
-F, —field
Only print this field value, one per line. This is most useful for scripts. Field names
are case-insensitive. Common fields (which may not be in every module) include
author, description, license, parm, depends, and alias. There are often multiple parm,
alias and depends fields. The special field filename lists the filename of the module.
-b basedir, —basedir basedir
Root directory for modules, / by default.
-k kernel
Provide information about a kernel other than the running one. This is particularly
useful for distributions needing to extract information from a newly installed (but not
yet running) set of kernel modules. For example, you wish to find which firmware
files are needed by various modules in a new kernel for which you must make an
initrd/initramfs image prior to booting.
-0, —null
Use the ASCII zero character to separate field values, instead of a new line. This is
useful for scripts, since a new line can theoretically appear inside a field.
-a —author, -d —description, -l —license, -p —parameters, -n —filename
These are shortcuts for the —field flag’s author, description, license, parm and
filename arguments, to ease the transition from the old modutils modinfo.
› COPYRIGHT
This manual page originally Copyright 2003, Rusty Russell, IBM Corporation.
Maintained by Jon Masters and others.
› SEE ALSO
modprobe(8)
› AUTHORS
Jon Masters <[email protected]>
Developer
-b, —use-blacklist
This option causes modprobe to apply the blacklist commands in the configuration
files (if any) to module names as well. It is usually used by udev(7).
-C, —config
This option overrides the default configuration directory (/etc/modprobe.d).
This option is passed through install or remove commands to other modprobe
commands in the MODPROBE_OPTIONS environment variable.
-c, —showconfig
Dump out the effective configuration from the config directory and exit.
—dump-modversions
Print out a list of module versioning information required by a module. This option is
commonly used by distributions in order to package up a Linux kernel module using
module versioning deps.
-d, —dirname
Root directory for modules, / by default.
—first-time
Normally, modprobe will succeed (and do nothing) if told to insert a module which
is already present or to remove a module which isn’t present. This is ideal for simple
scripts; however, more complicated scripts often want to know whether modprobe
really did something: this option makes modprobe fail in the case that it actually
didn’t do anything.
—force-vermagic
Every module contains a small string containing important information, such as the
kernel and compiler versions. If a module fails to load and the kernel complains that
the “version magic” doesn’t match, you can use this option to remove it. Naturally,
this check is there for your protection, so this using option is dangerous unless you
know what you’re doing.
This applies to any modules inserted: both the module (or alias) on the command line
and any modules on which it depends.
—force-modversion
When modules are compiled with CONFIG_MODVERSIONS set, a section detailing
the versions of every interfaced used by (or supplied by) the module is created. If a
module fails to load and the kernel complains that the module disagrees about a
version of some interface, you can use “—force-modversion” to remove the version
information altogether. Naturally, this check is there for your protection, so using this
option is dangerous unless you know what you’re doing.
This applies any modules inserted: both the module (or alias) on the command line
and any modules on which it depends.
-f, —force
Try to strip any versioning information from the module which might otherwise stop
it from loading: this is the same as using both —force-vermagic and —force-
modversion. Naturally, these checks are there for your protection, so using this
option is dangerous unless you know what you are doing.
This applies to any modules inserted: both the module (or alias) on the command line
and any modules it on which it depends.
-i, —ignore-install, —ignore-remove
This option causes modprobe to ignore install and remove commands in the
configuration file (if any) for the module specified on the command line (any
dependent modules are still subject to commands set for them in the configuration
file). Both install and remove commands will currently be ignored when this option
is used regardless of whether the request was more specifically made with only one
or other (and not both) of —ignore-install or —ignore-remove. See modprobe.d(5).
-n, —dry-run, —show
This option does everything but actually insert or delete the modules (or run the
install or remove commands). Combined with -v, it is useful for debugging problems.
For historical reasons both —dry-run and —show actually mean the same thing and
are interchangeable.
-q, —quiet
With this flag, modprobe won’t print an error message if you try to remove or insert
a module it can’t find (and isn’t an alias or install/remove command). However, it
will still return with a non-zero exit status. The kernel uses this to opportunistically
probe for modules which might exist using request_module.
-R, —resolve-alias
Print all module names matching an alias. This can be useful for debugging module
alias problems.
-r, —remove
This option causes modprobe to remove rather than insert a module. If the modules
it depends on are also unused, modprobe will try to remove them too. Unlike
insertion, more than one module can be specified on the command line (it does not
make sense to specify module parameters when removing modules).
There is usually no reason to remove modules, but some buggy modules require it.
Your distribution kernel may not have been built to support removal of modules at
all.
-S, —set-version
Set the kernel version, rather than using uname(2) to decide on the kernel version
(which dictates where to find the modules).
—show-depends
List the dependencies of a module (or alias), including the module itself. This
produces a (possibly empty) set of module filenames, one per line, each starting with
“insmod” and is typically used by distributions to determine which modules to
include when generating initrd/initramfs images. Install commands which apply are
shown prefixed by “install”. It does not run any of the install commands. Note that
modinfo(8) can be used to extract dependencies of a module from the module itself,
but knows nothing of aliases or install commands.
-s, —syslog
This option causes any error messages to go through the syslog mechanism (as
LOG_DAEMON with level LOG_NOTICE) rather than to standard error. This is
also automatically enabled when stderr is unavailable.
This option is passed through install or remove commands to other modprobe
commands in the MODPROBE_OPTIONS environment variable.
-V, —version
Show version of program and exit.
-v, —verbose
Print messages about what the program is doing. Usually modprobe only prints
messages if something goes wrong.
This option is passed through install or remove commands to other modprobe
commands in the MODPROBE_OPTIONS environment variable.
› ENVIRONMENT
The MODPROBE_OPTIONS environment variable can also be used to pass
arguments to modprobe.
› COPYRIGHT
This manual page originally Copyright 2002, Rusty Russell, IBM Corporation.
Maintained by Jon Masters and others.
› SEE ALSO
modprobe.d(5), insmod(8), rmmod(8), lsmod(8), modinfo(8)
› AUTHORS
Jon Masters <[email protected]>
Developer
password=arg
specifies the CIFS password. If this option is not given then the environment variable
PASSWD is used. If the password is not specified directly or indirectly via an
argument to mount, mount.cifs will prompt for a password, unless the guest option is
specified.
Note that a password which contains the delimiter character (i.e. a comma ‘,’) will
fail to be parsed correctly on the command line. However, the same password defined
in the PASSWD environment variable or via a credentials file (see below) or entered
at the password prompt will be read correctly.
credentials=filename
specifies a file that contains a username and/or password and optionally the name of
the workgroup. The format of the file is:
username=value password=value domain=value
This is preferred over having passwords in plaintext in a shared file, such as /etc/fstab. Be
sure to protect any credentials file properly.
uid=arg
sets the uid that will own all files or directories on the mounted filesystem when the
server does not provide ownership information. It may be specified as either a
username or a numeric uid. When not specified, the default is uid 0. The mount.cifs
helper must be at version 1.10 or higher to support specifying the uid in non-numeric
form. See the section on FILE AND DIRECTORY OWNERSHIP AND
PERMISSIONS below for more information.
forceuid
instructs the client to ignore any uid provided by the server for files and directories
and to always assign the owner to be the value of the uid= option. See the section on
FILE AND DIRECTORY OWNERSHIP AND PERMISSIONS below for more
information.
cruid=arg
sets the uid of the owner of the credentials cache. This is primarily useful with
sec=krb5. The default is the real uid of the process performing the mount. Setting this
parameter directs the upcall to look for a credentials cache owned by that user.
gid=arg
sets the gid that will own all files or directories on the mounted filesystem when the
server does not provide ownership information. It may be specified as either a
groupname or a numeric gid. When not specified, the default is gid 0. The mount.cifs
helper must be at version 1.10 or higher to support specifying the gid in non-numeric
form. See the section on FILE AND DIRECTORY OWNERSHIP AND
PERMISSIONS below for more information.
forcegid
instructs the client to ignore any gid provided by the server for files and directories
and to always assign the owner to be the value of the gid= option. See the section on
FILE AND DIRECTORY OWNERSHIP AND PERMISSIONS below for more
information.
port=arg
sets the port number on which the client will attempt to contact the CIFS server. If
this value is specified, look for an existing connection with this port, and use that if
one exists. If one doesn’t exist, try to create a new connection on that port. If that
connection fails, return an error. If this value isn’t specified, look for an existing
connection on port 445 or 139. If no such connection exists, try to connect on port
445 first and then port 139 if that fails. Return an error if both fail.
servernetbiosname=arg
Specify the server netbios name (RFC1001 name) to use when attempting to setup a
session to the server. Although rarely needed for mounting to newer servers, this
option is needed for mounting to some older servers (such as OS/2 or Windows 98
and Windows ME) since when connecting over port 139 they, unlike most newer
servers, do not support a default server name. A server name can be up to 15
characters long and is usually uppercased.
servern=arg
Synonym for servernetbiosname.
netbiosname=arg
When mounting to servers via port 139, specifies the RFC1001 source name to use to
represent the client netbios machine name when doing the RFC1001 netbios session
initialize.
file_mode=arg
If the server does not support the CIFS Unix extensions this overrides the default file
mode.
dir_mode=arg
If the server does not support the CIFS Unix extensions this overrides the default
mode for directories.
ip=arg
sets the destination IP address. This option is set automatically if the server name
portion of the requested UNC name can be resolved so rarely needs to be specified by
the user.
domain=arg
sets the domain (workgroup) of the user
guest
don’t prompt for a password
iocharset
Charset used to convert local path names to and from Unicode. Unicode is used by
default for network path names if the server supports it. If iocharset is not specified
then the nls_default specified during the local client kernel build will be used. If
server does not support Unicode, this parameter is unused.
ro
mount read-only
rw
mount read-write
setuids
If the CIFS Unix extensions are negotiated with the server the client will attempt to
set the effective uid and gid of the local process on newly created files, directories,
and devices (create, mkdir, mknod). If the CIFS Unix Extensions are not negotiated,
for newly created files and directories instead of using the default uid and gid
specified on the the mount, cache the new file’s uid and gid locally which means that
the uid for the file can change when the inode is reloaded (or the user remounts the
share).
nosetuids
The client will not attempt to set the uid and gid on on newly created files,
directories, and devices (create, mkdir, mknod) which will result in the server setting
the uid and gid to the default (usually the server uid of the user who mounted the
share). Letting the server (rather than the client) set the uid and gid is the default.If
the CIFS Unix Extensions are not negotiated then the uid and gid for new files will
appear to be the uid (gid) of the mounter or the uid (gid) parameter specified on the
mount.
perm
Client does permission checks (vfs_permission check of uid and gid of the file
against the mode and desired operation), Note that this is in addition to the normal
ACL check on the target machine done by the server software. Client permission
checking is enabled by default.
noperm
Client does not do permission checks. This can expose files on this mount to access
by other users on the local client system. It is typically only needed when the server
supports the CIFS Unix Extensions but the UIDs/GIDs on the client and server
system do not match closely enough to allow access by the user doing the mount.
Note that this does not affect the normal ACL check on the target machine done by
the server software (of the server ACL against the user name provided at mount
time).
dynperm
Instructs the server to maintain ownership and permissions in memory that can’t be
stored on the server. This information can disappear at any time (whenever the inode
is flushed from the cache), so while this may help make some applications work, it’s
behavior is somewhat unreliable. See the section below on FILE AND DIRECTORY
OWNERSHIP AND PERMISSIONS for more information.
cache=
Cache mode. See the section below on CACHE COHERENCY for details. Allowed
values are:
none: do not cache file data at all
strict: follow the CIFS/SMB2 protocol strictly
loose: allow loose caching semantics
The default in kernels prior to 3.7 was “loose”. As of kernel 3.7 the default is “strict”.
directio
Do not do inode data caching on files opened on this mount. This precludes mmaping
files on this mount. In some cases with fast networks and little or no caching benefits
on the client (e.g. when the application is doing large sequential reads bigger than
page size without rereading the same data) this can provide better performance than
the default behavior which caches reads (readahead) and writes (writebehind)
through the local Linux client pagecache if oplock (caching token) is granted and
held. Note that direct allows write operations larger than page size to be sent to the
server. On some kernels this requires the cifs.ko module to be built with the
CIFS_EXPERIMENTAL configure option.
This option is will be deprecated in 3.7. Users should use cache=none instead on
more recent kernels.
strictcache
Use for switching on strict cache mode. In this mode the client reads from the cache
all the time it has Oplock Level II, otherwise - read from the server. As for write - the
client stores a data in the cache in Exclusive Oplock case, otherwise - write directly
to the server.
This option is will be deprecated in 3.7. Users should use cache=strict instead on
more recent kernels.
rwpidforward
Forward pid of a process who opened a file to any read or write operation on that file.
This prevent applications like WINE from failing on read and write if we use
mandatory brlock style.
mapchars
Translate six of the seven reserved characters (not backslash, but including the colon,
question mark, pipe, asterik, greater than and less than characters) to the remap range
(above 0xF000), which also allows the CIFS client to recognize files created with
such characters by Windows’s POSIX emulation. This can also be useful when
mounting to most versions of Samba (which also forbids creating and opening files
whose names contain any of these seven characters). This has no effect if the server
does not support Unicode on the wire. Please note that the files created with
mapchars mount option may not be accessible if the share is mounted without that
option.
nomapchars
Do not translate any of these seven characters (default)
intr
currently unimplemented
nointr
(default) currently unimplemented
hard
The program accessing a file on the cifs mounted file system will hang when the
server crashes.
soft
(default) The program accessing a file on the cifs mounted file system will not hang
when the server crashes and will return errors to the user application.
noacl
Do not allow POSIX ACL operations even if server would support them.
The CIFS client can get and set POSIX ACLs (getfacl, setfacl) to Samba servers
version 3.0.10 and later. Setting POSIX ACLs requires enabling both CIFS_XATTR
and then CIFS_POSIX support in the CIFS configuration options when building the
cifs module. POSIX ACL support can be disabled on a per mount basis by specifying
“noacl” on mount.
cifsacl
This option is used to map CIFS/NTFS ACLs to/from Linux permission bits, map
SIDs to/from UIDs and GIDs, and get and set Security Descriptors.
See sections on CIFS/NTFS ACL, SID/UID/GID MAPPING, SECURITY
DESCRIPTORS for more information.
backupuid=arg
Restrict access to files with the backup intent to a user. Either a name or an id must
be provided as an argument, there are no default values.
See section ACCESSING FILES WITH BACKUP INTENT for more details
backupgid=arg
Restrict access to files with the backup intent to a group. Either a name or an id must
be provided as an argument, there are no default values.
See section ACCESSING FILES WITH BACKUP INTENT for more details
nocase
Request case insensitive path name matching (case sensitive is the default if the
server suports it).
ignorecase
Synonym for nocase.
sec=
Security mode. Allowed values are:
none - attempt to connection as a null user (no name)
krb5 - Use Kerberos version 5 authentication
krb5i - Use Kerberos authentication and forcibly enable packet signing
ntlm - Use NTLM password hashing
ntlmi - Use NTLM password hashing and force packet signing
ntlmv2 - Use NTLMv2 password hashing
ntlmv2i - Use NTLMv2 password hashing and force packet signing
ntlmssp - Use NTLMv2 password hashing encapsulated in Raw NTLMSSP message
ntlmsspi - Use NTLMv2 password hashing encapsulated in Raw NTLMSSP
message, and force packet signing
The default in mainline kernel versions prior to v3.8 was sec=ntlm. In v3.8, the default
was changed to sec=ntlmssp.
If the server requires signing during protocol negotiation, then it may be enabled
automatically. Packet signing may also be enabled automatically if it’s enabled in
/proc/fs/cifs/SecurityFlags.
nobrl
Do not send byte range lock requests to the server. This is necessary for certain
applications that break with cifs style mandatory byte range locks (and most cifs
servers do not yet support requesting advisory byte range locks).
sfu
When the CIFS Unix Extensions are not negotiated, attempt to create device files and
fifos in a format compatible with Services for Unix (SFU). In addition retrieve bits
10-12 of the mode via the SETFILEBITS extended attribute (as SFU does). In the
future the bottom 9 bits of the mode mode also will be emulated using queries of the
security descriptor (ACL). [NB: requires version 1.39 or later of the CIFS VFS. To
recognize symlinks and be able to create symlinks in an SFU interoperable form
requires version 1.40 or later of the CIFS VFS kernel module.
serverino
Use inode numbers (unique persistent file identifiers) returned by the server instead
of automatically generating temporary inode numbers on the client. Although server
inode numbers make it easier to spot hardlinked files (as they will have the same
inode numbers) and inode numbers may be persistent (which is userful for some
sofware), the server does not guarantee that the inode numbers are unique if multiple
server side mounts are exported under a single share (since inode numbers on the
servers might not be unique if multiple filesystems are mounted under the same
shared higher level directory). Note that not all servers support returning server inode
numbers, although those that support the CIFS Unix Extensions, and Windows 2000
and later servers typically do support this (although not necessarily on every local
server filesystem). Parameter has no effect if the server lacks support for returning
inode numbers or equivalent. This behavior is enabled by default.
noserverino
Client generates inode numbers itself rather than using the actual ones from the
server.
See section INODE NUMBERS for more information.
nounix
Disable the CIFS Unix Extensions for this mount. This can be useful in order to turn
off multiple settings at once. This includes POSIX acls, POSIX locks, POSIX paths,
symlink support and retrieving uids/gids/mode from the server. This can also be
useful to work around a bug in a server that supports Unix Extensions.
See section INODE NUMBERS for more information.
nouser_xattr
Do not allow getfattr/setfattr to get/set xattrs, even if server would support it
otherwise. The default is for xattr support to be enabled.
rsize=bytes
Maximum amount of data that the kernel will request in a read request in bytes. Prior
to kernel 3.2.0, the default was 16k, and the maximum size was limited by the
CIFSMaxBufSize module parameter. As of kernel 3.2.0, the behavior varies
according to whether POSIX extensions are enabled on the mount and the server
supports large POSIX reads. If they are, then the default is 1M, and the maxmimum
is 16M. If they are not supported by the server, then the default is 60k and the
maximum is around 127k. The reason for the 60k is because it’s the maximum size
read that windows servers can fill. Note that this value is a maximum, and the client
may settle on a smaller size to accomodate what the server supports. In kernels prior
to 3.2.0, no negotiation is performed.
wsize=bytes
Maximum amount of data that the kernel will send in a write request in bytes. Prior to
kernel 3.0.0, the default and maximum was 57344 (14 * 4096 pages). As of 3.0.0, the
default depends on whether the client and server negotiate large writes via POSIX
extensions. If they do, then the default is 1M, and the maximum allowed is 16M. If
they do not, then the default is 65536 and the maximum allowed is 131007.
Note that this value is just a starting point for negotiation in 3.0.0 and up. The client
and server may negotiate this size downward according to the server’s capabilities. In
kernels prior to 3.0.0, no negotiation is performed. It can end up with an existing
superblock if this value isn’t specified or it’s greater or equal than the existing one.
fsc
Enable local disk caching using FS-Cache for CIFS. This option could be useful to
improve performance on a slow link, heavily loaded server and/or network where
reading from the disk is faster than reading from the server (over the network). This
could also impact the scalability positively as the number of calls to the server are
reduced. But, be warned that local caching is not suitable for all workloads, for e.g.,
read-once type workloads. So, you need to consider carefully the situation/workload
before using this option. Currently, local disk caching is enabled for CIFS files
opened as read-only.
NOTE: This feature is available only in the recent kernels that have been built with
the kernel config option CONFIG_CIFS_FSCACHE. You also need to have
cachefilesd daemon installed and running to make the cache operational.
multiuser
Map user accesses to individual credentials when accessing the server. By default,
CIFS mounts only use a single set of user credentials (the mount credentials) when
accessing a share. With this option, the client instead creates a new session with the
server using the user’s credentials whenever a new user accesses the mount. Further
accesses by that user will also use those credentials. Because the kernel cannot
prompt for passwords, multiuser mounts are limited to mounts using sec= options
that don’t require passwords.
With this change, it’s feasible for the server to handle permissions enforcement, so
this option also implies “noperm”. Furthermore, when unix extensions aren’t in use
and the administrator has not overriden ownership using the uid= or gid= options,
ownership of files is presented as the current user accessing the share.
actimeo=arg
The time (in seconds) that the CIFS client caches attributes of a file or directory
before it requests attribute information from a server. During this period the changes
that occur on the server remain undetected until the client checks the server again.
By default, the attribute cache timeout is set to 1 second. This means more frequent
on-the-wire calls to the server to check whether attributes have changed which could
impact performance. With this option users can make a tradeoff between performance
and cache metadata correctness, depending on workload needs. Shorter timeouts
mean better cache coherency, but frequent increased number of calls to the server.
Longer timeouts mean a reduced number of calls to the server but looser cache
coherency. The actimeo value is a positive integer that can hold values between 0 and
a maximum value of 2^30 * HZ (frequency of timer interrupt) setting.
noposixpaths
If unix extensions are enabled on a share, then the client will typically allow
filenames to include any character besides ‘/’ in a pathname component, and will use
forward slashes as a pathname delimiter. This option prevents the client from
attempting to negotiate the use of posix-style pathnames to the server.
posixpaths
Inverse of noposixpaths.
prefixpath=
It’s possible to mount a subdirectory of a share. The preferred way to do this is to
append the path to the UNC when mounting. However, it’s also possible to do the
same by setting this option and providing the path there.
vers=
SMB protocol version. Allowed values are:
1.0 - The classic CIFS/SMBv1 protocol. This is the default.
2.0 - The SMBv2.002 protocol. This was initially introduced in Windows Vista
Service Pack 1, and Windows Server 2008. Note that the initial release version of
Windows Vista spoke a slightly different dialect (2.000) that is not supported.
2.1 - The SMBv2.1 protocol that was introduced in Microsoft Windows 7 and
Windows Server 2008R2.
3.0 - The SMBv3.0 protocol that was introduced in Microsoft Windows 8 and
Windows Server 2012.
Note too that while this option governs the protocol version used, not all features of each
version are available.
—verbose
Print additional debugging information for the mount. Note that this parameter must
be specified before the -o. For example:
mount -t cifs //server/share /mnt —verbose -o user=username
› SERVICE FORMATTING AND DELIMITERS
It’s generally preferred to use forward slashes (/) as a delimiter in service names.
They are considered to be the “universal delimiter” since they are generally not
allowed to be embedded within path components on Windows machines and the
client can convert them to blackslashes (\) unconditionally. Conversely, backslash
characters are allowed by POSIX to be part of a path component, and can’t be
automatically converted in the same way.
mount.cifs will attempt to convert backslashes to forward slashes where it’s able to
do so, but it cannot do so in any path component following the sharename.
› INODE NUMBERS
When Unix Extensions are enabled, we use the actual inode number provided by the
server in response to the POSIX calls as an inode number.
When Unix Extensions are disabled and “serverino” mount option is enabled there is
no way to get the server inode number. The client typically maps the server-assigned
“UniqueID” onto an inode number.
Note that the UniqueID is a different value from the server inode number. The
UniqueID value is unique over the scope of the entire server and is often greater than
2 power 32. This value often makes programs that are not compiled with LFS (Large
File Support), to trigger a glibc EOVERFLOW error as this won’t fit in the target
structure field. It is strongly recommended to compile your programs with LFS
support (i.e. with -D_FILE_OFFSET_BITS=64) to prevent this problem. You can
also use “noserverino” mount option to generate inode numbers smaller than 2 power
32 on the client. But you may not be able to detect hardlinks properly.
› CACHE COHERENCY
With a network filesystem such as CIFS or NFS, the client must contend with the fact
that activity on other clients or the server could change the contents or attributes of a
file without the client being aware of it. One way to deal with such a problem is to
mandate that all file accesses go to the server directly. This is performance
prohibitive however, so most protocols have some mechanism to allow the client to
cache data locally.
The CIFS protocol mandates (in effect) that the client should not cache file data
unless it holds an opportunistic lock (aka oplock) or a lease. Both of these entities
allow the client to guarantee certain types of exclusive access to a file so that it can
access its contents without needing to continually interact with the server. The server
will call back the client when it needs to revoke either of them and allow the client a
certain amount of time to flush any cached data.
The cifs client uses the kernel’s pagecache to cache file data. Any I/O that’s done
through the pagecache is generally page-aligned. This can be problematic when
combined with byte-range locks as Windows’ locking is mandatory and can block
reads and writes from occurring.
cache=none means that the client never utilizes the cache for normal reads and
writes. It always accesses the server directly to satisfy a read or write request.
cache=strict means that the client will attempt to follow the CIFS/SMB2 protocol
strictly. That is, the cache is only trusted when the client holds an oplock. When the
client does not hold an oplock, then the client bypasses the cache and accesses the
server directly to satisfy a read or write request. By doing this, the client avoids
problems with byte range locks. Additionally, byte range locks are cached on the
client when it holds an oplock and are “pushed” to the server when that oplock is
recalled.
cache=loose allows the client to use looser protocol semantics which can sometimes
provide better performance at the expense of cache coherency. File access always
involves the pagecache. When an oplock or lease is not held, then the client will
attempt to flush the cache soon after a write to a file. Note that that flush does not
necessarily occur before a write system call returns.
In the case of a read without holding an oplock, the client will attempt to periodically
check the attributes of the file in order to ascertain whether it has changed and the
cache might no longer be valid. This mechanism is much like the one that NFSv2/3
use for cache coherency, but it particularly problematic with CIFS. Windows is quite
“lazy” with respect to updating the “LastWriteTime” field that the client uses to
verify this. The effect is that cache=loose can cause data corruption when multiple
readers and writers are working on the same files.
Because of this, when multiple clients are accessing the same set of files, then
cache=strict is recommended. That helps eliminate problems with cache coherency
by following the CIFS/SMB2 protocols more strictly.
Note too that no matter what caching model is used, the client will always use the
pagecache to handle mmap’ed files. Writes to mmap’ed files are only guaranteed to
be flushed to the server when msync() is called, or on close().
The default in kernels prior to 3.7 was “loose”. As of 3.7, the default is “strict”.
› CIFS/NTFS ACL, SID/UID/GID MAPPING, SECURITY
DESCRIPTORS
This option is used to work with file objects which posses Security Descriptors and
CIFS/NTFS ACL instead of UID, GID, file permission bits, and POSIX ACL as user
authentication model. This is the most common authentication model for CIFS
servers and is the one used by Windows.
Support for this requires both CIFS_XATTR and CIFS_ACL support in the CIFS
configuration options when building the cifs module.
A CIFS/NTFS ACL is mapped to file permission bits using an algorithm specified in
the following Microsoft TechNet document:
http://technet.microsoft.com/en-us/library/bb463216.aspx
In order to map SIDs to/from UIDs and GIDs, the following is required:
a kernel upcall to the cifs.idmap utility set up via request-key.conf(5)
winbind support configured via nsswitch.conf(5) and smb.conf(5)
Please refer to the respective manpages of cifs.idmap(8) and winbindd(8) for more
information.
Security descriptors for a file object can be retrieved and set directly using extended
attribute named system.cifs_acl. The security descriptors presented via this interface are
“raw” blobs of data and need a userspace utility to either parse and format or to assemble
it such as getcifsacl(1) and setcifsacl(1) respectively.
Some of the things to consider while using this mount option:
There may be an increased latency when handling metadata due to additional
requests to get and set security descriptors.
The mapping between a CIFS/NTFS ACL and POSIX file permission bits is
imperfect and some ACL information may be lost in the translation.
If either upcall to cifs.idmap is not setup correctly or winbind is not configured and
running, ID mapping will fail. In that case uid and gid will default to either to those
values of the share or to the values of uid and/or gid mount options if specified.
› ACCESSING FILES WITH BACKUP INTENT
For an user on the server, desired access to a file is determined by the permissions
and rights associated with that file. This is typically accomplished using owenrship
and ACL. For a user who does not have access rights to a file, it is still possible to
access that file for a specific or a targeted purpose by granting special rights. One of
the specific purposes is to access a file with the intent to either backup or restore i.e.
backup intent. The right to access a file with the backup intent can typically be
granted by making that user a part of the built-in group Backup Operators. Thus,
when this user attempts to open a file with the backup intent, open request is sent by
setting the bit FILE_OPEN_FOR_BACKUP_INTENT as one of the CreateOptions.
As an example, on a Windows server, a user named testuser, cannot open this file
with such a security descriptor.
REVISION:0x1
CONTROL:0x9404
OWNER:Administrator
GROUP:Domain Users
ACL:Administrator:ALLOWED/0x0/FULL
But the user testuser, if it becomes part of the group Backup Operators, can open the
file with the backup intent.
Any user on the client side who can authenticate as such a user on the server, can
access the files with the backup intent. But it is desirable and preferable for security
reasons amongst many, to restrict this special right.
The mount option backupuid is used to restrict this special right to a user which is
specified by either a name or an id. The mount option backupgid is used to restrict
this special right to the users in a group which is specified by either a name or an id.
These two mount options can be used together.
› FILE AND DIRECTORY OWNERSHIP AND PERMISSIONS
The core CIFS protocol does not provide unix ownership information or mode for
files and directories. Because of this, files and directories will generally appear to be
owned by whatever values the uid= or gid= options are set, and will have permissions
set to the default file_mode and dir_mode for the mount. Attempting to change these
values via chmod/chown will return success but have no effect.
When the client and server negotiate unix extensions, files and directories will be
assigned the uid, gid, and mode provided by the server. Because CIFS mounts are
generally single-user, and the same credentials are used no matter what user accesses
the mount, newly created files and directories will generally be given ownership
corresponding to whatever credentials were used to mount the share.
If the uid’s and gid’s being used do not match on the client and server, the forceuid
and forcegid options may be helpful. Note however, that there is no corresponding
option to override the mode. Permissions assigned to a file when forceuid or forcegid
are in effect may not reflect the the real permissions.
When unix extensions are not negotiated, it’s also possible to emulate them locally on
the server using the “dynperm” mount option. When this mount option is in effect,
newly created files and directories will receive what appear to be proper permissions.
These permissions are not stored on the server however and can disappear at any time
in the future (subject to the whims of the kernel flushing out the inode cache). In
general, this mount option is discouraged.
It’s also possible to override permission checking on the client altogether via the
noperm option. Server-side permission checks cannot be overriden. The permission
checks done by the server will always correspond to the credentials used to mount the
share, and not necessarily to the user who is accessing the share.
› ENVIRONMENT VARIABLES
The variable USER may contain the username of the person to be used to authenticate
to the server. The variable can be used to set both username and password by using
the format username%password.
The variable PASSWD may contain the password of the person using the client.
The variable PASSWD_FILE may contain the pathname of a file to read the password
from. A single line of input is read and used as the password.
› NOTES
This command may be used only by root, unless installed setuid, in which case the
noeexec and nosuid mount flags are enabled. When installed as a setuid program, the
program follows the conventions set forth by the mount program for user mounts,
with the added restriction that users must be able to chdir() into the mountpoint prior
to the mount in order to be able to mount onto it.
Some samba client tools like smbclient(8) honour client-side configuration
parameters present in smb.conf. Unlike those client tools, mount.cifs ignores
smb.conf completely.
› CONFIGURATION
The primary mechanism for making configuration changes and for reading debug
information for the cifs vfs is via the Linux /proc filesystem. In the directory
/proc/fs/cifs are various configuration files and pseudo files which can display debug
information. There are additional startup options such as maximum buffer size and
number of buffers which only may be set when the kernel cifs vfs (cifs.ko module) is
loaded. These can be seen by running the modinfo utility against the file cifs.ko
which will list the options that may be passed to cifs during module installation
(device driver load). For more information see the kernel file fs/cifs/README.
› BUGS
Mounting using the CIFS URL specification is currently not supported.
The credentials file does not handle usernames or passwords with leading space.
Note that the typical response to a bug report is a suggestion to try the latest version
first. So please try doing that first, and always include which versions you use of
relevant software when reporting bugs (minimum: mount.cifs (try mount.cifs -V),
kernel (see /proc/version) and server type you are trying to contact.
› VERSION
This man page is correct for version 1.74 of the cifs vfs filesystem (roughly Linux
kernel 3.0).
› SEE ALSO
cifs.upcall(8), getcifsacl(1), setcifsacl(1)
Documentation/filesystems/cifs.txt and fs/cifs/README in the linux kernel source
tree may contain additional options and information.
› AUTHOR
Steve French
The syntax and manpage were loosely based on that of smbmount. It was converted
to Docbook/XML by Jelmer Vernooij.
The maintainer of the Linux cifs vfs and the userspace tool mount.cifs is Steve
French. The Linux CIFS Mailing list is the preferred place to ask questions regarding
these programs.
fuse
› NAME
fuse - format and options for the fuse file systems
› DESCRIPTION
FUSE (Filesystem in Userspace) is a simple interface for userspace programs to
export a virtual filesystem to the Linux kernel. FUSE also aims to provide a secure
method for non privileged users to create and mount their own filesystem
implementations.
› CONFIGURATION
Some options regarding mount policy can be set in the file /etc/fuse.conf. Currently
these options are:
mount_max = NNN
Set the maximum number of FUSE mounts allowed to non-root users. The default is
1000.
user_allow_other
Allow non-root users to specify the allow_other or allow_root mount options (see
below).
› OPTIONS
Most of the generic mount options described in mount are supported (ro, rw, suid,
nosuid, dev, nodev, exec, noexec, atime, noatime, sync, async, dirsync).
Filesystems are mounted with nodev,nosuid by default, which can only be
overridden by a privileged user.
These are FUSE specific mount options that can be specified for all filesystems:
default_permissions
By default FUSE doesn’t check file access permissions, the filesystem is free to
implement it’s access policy or leave it to the underlying file access mechanism (e.g.
in case of network filesystems). This option enables permission checking, restricting
access based on file mode. This is option is usually useful together with the
allow_other mount option.
allow_other
This option overrides the security measure restricting file access to the user mounting
the filesystem. So all users (including root) can access the files. This option is by
default only allowed to root, but this restriction can be removed with a configuration
option described in the previous section.
allow_root
This option is similar to allow_other but file access is limited to the user mounting
the filesystem and root. This option and allow_other are mutually exclusive.
kernel_cache
This option disables flushing the cache of the file contents on every open(2). This
should only be enabled on filesystems, where the file data is never changed
externally (not through the mounted FUSE filesystem). Thus it is not suitable for
network filesystems and other intermediate filesystems.
NOTE: if this option is not specified (and neither direct_io) data is still cached after
the open(2), so a read(2) system call will not always initiate a read operation.
auto_cache
This option enables automatic flushing of the data cache on open(2). The cache will
only be flushed if the modification time or the size of the file has changed.
large_read
Issue large read requests. This can improve performance for some filesystems, but
can also degrade performance. This option is only useful on 2.4.X kernels, as on 2.6
kernels requests size is automatically determined for optimum performance.
direct_io
This option disables the use of page cache (file content cache) in the kernel for this
filesystem. This has several affects:
1.
Each read(2) or write(2) system call will initiate one or more read or write
operations, data will not be cached in the kernel.
2.
The return value of the read() and write() system calls will correspond to the return
values of the read and write operations. This is useful for example if the file size is
not known in advance (before reading it).
max_read=N
With this option the maximum size of read operations can be set. The default is
infinite. Note that the size of read requests is limited anyway to 32 pages (which is
128kbyte on i386).
max_readahead=N
Set the maximum number of bytes to read-ahead. The default is determined by the
kernel. On linux-2.6.22 or earlier it’s 131072 (128kbytes)
max_write=N
Set the maximum number of bytes in a single write operation. The default is
128kbytes. Note, that due to various limitations, the size of write requests can be
much smaller (4kbytes). This limitation will be removed in the future.
async_read
Perform reads asynchronously. This is the default
sync_read
Perform all reads (even read-ahead) synchronously.
hard_remove
The default behavior is that if an open file is deleted, the file is renamed to a hidden
file (.fuse_hiddenXXX), and only removed when the file is finally released. This
relieves the filesystem implementation of having to deal with this problem. This
option disables the hiding behavior, and files are removed immediately in an unlink
operation (or in a rename operation which overwrites an existing file).
It is recommended that you not use the hard_remove option. When hard_remove is
set, the following libc functions fail on unlinked files (returning errno of ENOENT):
read(2), write(2), fsync(2), close(2), f*xattr(2), ftruncate(2), fstat(2), fchmod(2),
fchown(2)
debug
Turns on debug information printing by the library.
fsname=NAME
Sets the filesystem source (first field in /etc/mtab). The default is the mount program
name.
subtype=TYPE
Sets the filesystem type (third field in /etc/mtab). The default is the mount program
name. If the kernel suppports it, /etc/mtab and /proc/mounts will show the filesystem
type as fuse.TYPE
If the kernel doesn’t support subtypes, the source filed will be TYPE#NAME, or if
fsname option is not specified, just TYPE.
use_ino
Honor the st_ino field in kernel functions getattr() and fill_dir(). This value is used
to fill in the st_ino field in the stat(2), lstat(2), fstat(2) functions and the d_ino field
in the readdir(2) function. The filesystem does not have to guarantee uniqueness,
however some applications rely on this value being unique for the whole filesystem.
readdir_ino
If use_ino option is not given, still try to fill in the d_ino field in readdir(2). If the
name was previously looked up, and is still in the cache, the inode number found
there will be used. Otherwise it will be set to -1. If use_ino option is given, this
option is ignored.
nonempty
Allows mounts over a non-empty file or directory. By default these mounts are
rejected to prevent accidental covering up of data, which could for example prevent
automatic backup.
umask=M
Override the permission bits in st_mode set by the filesystem. The resulting
permission bits are the ones missing from the given umask value. The value is given
in octal representation.
uid=N
Override the st_uid field set by the filesystem (N is numeric).
gid=N
Override the st_gid field set by the filesystem (N is numeric).
blkdev
Mount a filesystem backed by a block device. This is a privileged option. The device
must be specified with the fsname=NAME option.
entry_timeout=T
The timeout in seconds for which name lookups will be cached. The default is 1.0
second. For all the timeout options, it is possible to give fractions of a second as well
(e.g. entry_timeout=2.8)
negative_timeout=T
The timeout in seconds for which a negative lookup will be cached. This means, that
if file did not exist (lookup retuned ENOENT), the lookup will only be redone after
the timeout, and the file/directory will be assumed to not exist until then. The default
is 0.0 second, meaning that caching negative lookups are disabled.
attr_timeout=T
The timeout in seconds for which file/directory attributes are cached. The default is
1.0 second.
ac_attr_timeout=T
The timeout in seconds for which file attributes are cached for the purpose of
checking if auto_cache should flush the file data on open. The default is the value of
attr_timeout
intr
Allow requests to be interrupted. Turning on this option may result in unexpected
behavior, if the filesystem does not support request interruption.
intr_signal=NUM
Specify which signal number to send to the filesystem when a request is interrupted.
The default is hardcoded to USR1.
modules=M1[:M2…]
Add modules to the filesystem stack. Modules are pushed in the order they are
specified, with the original filesystem being on the bottom of the stack.
› FUSE MODULES (STACKING)
Modules are filesystem stacking support to high level API. Filesystem modules can
be built into libfuse or loaded from shared object
iconv
subdir
log-file=LOG-FILE
File to use for logging [default:/var/log/glusterfs/glusterfs.log]
log-level=LOG-LEVEL
Logging severity. Valid options are TRACE, DEBUG, WARNING, ERROR,
CRITICAL INFO and NONE [default: INFO]
acl
Mount the filesystem with POSIX ACL support
fopen-keep-cache
Do not purge the cache on file open
selinux
Enable SELinux label (extened attributes) support on inodes
worm
Mount the filesystem in ‘worm’ mode
aux-gfid-mount
Enable access to filesystem through gfid directly
ro
Mount the filesystem read-only
enable-ino32=BOOL
Use 32-bit inodes when mounting to workaround broken applications that don’t
support 64-bit inodes
mem-accounting
Enable internal memory accounting
Advanced options
attribute-timeout=SECONDS
Set attribute timeout to SECONDS for inodes in fuse kernel module [default: 1]
entry-timeout=SECONDS
Set entry timeout to SECONDS in fuse kernel module [default: 1]
background-qlen=N
Set fuse module’s background queue length to N [default: 64]
gid-timeout=SECONDS
Set auxilary group list timeout to SECONDS for fuse translator [default: 0]
negative-timeout=SECONDS
Set negative timeout to SECONDS in fuse kernel module [default: 0]
volume-name=VOLUME-NAME
Volume name to be used for MOUNT-POINT [default: top most volume in
VOLUME-FILE]
direct-io-mode=disable
Disable direct I/O mode in fuse kernel module
congestion-threshold=N
Set fuse module’s congestion threshold to N [default: 48]
backup-volfile-servers=SERVERLIST
Provide list of backup volfile servers in the following format [default: None]
$ mount -t glusterfs -obackup-volfile-servers=<server2>: <server3>:…:
<serverN> <server1>:/<volname> <mount_point>
backupvolfile-server=SERVER
Provide list of backup volfile servers in the following format [default: None]
$ mount -t glusterfs -obackupvolfile-server=<server2> <server1>:/<volname>
<mount_point>
fetch-attempts=N
Deprecated option - placed here for backward compatibility [default: 1]
background-qlen=N
Set fuse module’s background queue length to N [default: 64]
no-root-squash=BOOL
disable root squashing for the trusted client [default: off]
root-squash=BOOL
enable root squashing for the trusted client [default: on]
use-readdirp=BOOL
Use readdirp() mode in fuse kernel module [default: on]
› FILES
/etc/fstab
A typical GlusterFS entry in /etc/fstab looks like below
server1:/mirror /mnt/mirror glusterfs log-file=/var/log/mirror.log,acl,selinux 0 0
/proc/mounts
An example entry of a GlusterFS mountpoint in /proc/mounts looks like below
server1:/mirror /mnt/glusterfs fuse.glusterfs
rw,allow_other,default_permissions,max_read=131072 0 0
› SEE ALSO
glusterfs(8), mount(8), gluster(8)
› COPYRIGHT
Copyright(c) 2006-2013 Red Hat, Inc. <http://www.redhat.com>
MOUNT
› NAME
mount - mount a filesystem
› SYNOPSIS
mount [-lhV]
mount -a [-fFnrsvw] [-t vfstype] [-O optlist]
mount [-fnrsvw] [-o option[,option]…] device|dir
mount [-fnrsvw] [-t vfstype] [-o options] device dir
› DESCRIPTION
All files accessible in a Unix system are arranged in one big tree, the file hierarchy,
rooted at /. These files can be spread out over several devices. The mount command
serves to attach the filesystem found on some device to the big file tree. Conversely,
the umount(8) command will detach it again.
The standard form of the mount command, is
mount -t type device dir
This tells the kernel to attach the filesystem found on device (which is of type type) at the
directory dir. The previous contents (if any) and owner and mode of dir become invisible,
and as long as this filesystem remains mounted, the pathname dir refers to the root of the
filesystem on device.
If only directory or device is given, for example:
mount /dir
then mount looks for a mountpoint and if not found then for a device in the /etc/fstab file.
It’s possible to use —target or —source options to avoid ambivalent interpretation of the
given argument. For example
mount —target /mountpoint
The listing and help.
The listing mode is maintained for backward compatibility only.
For more robust and definable output use findmnt(8), especially in your scripts.
Note that control characters in the mountpoint name are replaced with ‘?’.
mount [-l] [-t type]
lists all mounted filesystems (of type type). The option -l adds the labels in this
listing. See below.
The device indication.
Most devices are indicated by a file name (of a block special device), like /dev/sda1,
but there are other possibilities. For example, in the case of an NFS mount, device
may look like knuth.cwi.nl:/dir. It is possible to indicate a block special device using
its filesystem LABEL or UUID (see the -L and -U options below) and partition
PARTUUID or PARTLABEL (partition identifiers are supported for GUID Partition
Table (GPT) and MAC partition tables only).
The recommended setup is to use tags (e.g. LABEL=<label>) rather than
/dev/disk/by-{label,uuid,partuuid,partlabel} udev symlinks in the /etc/fstab file.
The tags are more readable, robust and portable. The mount(8) command internally
uses udev symlinks, so use the symlinks in /etc/fstab has no advantage over the tags.
For more details see libblkid(3).
Note that mount(8) uses UUIDs as strings. The UUIDs from command line or
fstab(5) are not converted to internal binary representation. The string representation
of the UUID should be based on lower case characters.
The proc filesystem is not associated with a special device, and when mounting it, an
arbitrary keyword, such as proc can be used instead of a device specification. (The
customary choice none is less fortunate: the error message `none busy’ from umount
can be confusing.)
The /etc/fstab, /etc/mtab and /proc/mounts files.
The file /etc/fstab (see fstab(5)), may contain lines describing what devices are
usually mounted where, using which options. The default location of the fstab(5) file
could be overridden by —fstab <path> command line option (see below for more
details).
The command
mount -a [-t type] [-O optlist]
(usually given in a bootscript) causes all filesystems mentioned in fstab (of the proper type
and/or having or not having the proper options) to be mounted as indicated, except for
those whose line contains the noauto keyword. Adding the -F option will make mount
fork, so that the filesystems are mounted simultaneously.
When mounting a filesystem mentioned in fstab or mtab, it suffices to give only the
device, or only the mount point.
The programs mount and umount maintain a list of currently mounted filesystems in the
file /etc/mtab. If no arguments are given to mount, this list is printed.
The mount program does not read the /etc/fstab file if device (or LABEL, UUID,
PARTUUID or PARTLABEL) and dir are specified. For example:
mount /dev/foo /dir
If you want to override mount options from /etc/fstab you have to use:
mount device|dir -o <options>
and then the mount options from command line will be appended to the list of options
from /etc/fstab. The usual behaviour is that the last option wins if there is more duplicated
options.
When the proc filesystem is mounted (say at /proc), the files /etc/mtab and /proc/mounts
have very similar contents. The former has somewhat more information, such as the
mount options used, but is not necessarily up-to-date (cf. the -n option below). It is
possible to replace /etc/mtab by a symbolic link to /proc/mounts, and especially when you
have very large numbers of mounts things will be much faster with that symlink, but some
information is lost that way, and in particular using the “user” option will fail.
The non-superuser mounts.
Normally, only the superuser can mount filesystems. However, when fstab contains
the user option on a line, anybody can mount the corresponding system.
Thus, given a line
/dev/cdrom /cd iso9660 ro,user,noauto,unhide
any user can mount the iso9660 filesystem found on his CDROM using the command
mount /dev/cdrom
or
mount /cd
For more details, see fstab(5). Only the user that mounted a filesystem can unmount it
again. If any user should be able to unmount, then use users instead of user in the fstab
line. The owner option is similar to the user option, with the restriction that the user must
be the owner of the special file. This may be useful e.g. for /dev/fd if a login script makes
the console user owner of this device. The group option is similar, with the restriction that
the user must be member of the group of the special file.
The bind mounts.
Since Linux 2.4.0 it is possible to remount part of the file hierarchy somewhere else.
The call is
mount —bind olddir newdir
or shortoption
mount -B olddir newdir
or fstab entry is:
/olddir /newdir none bind
After this call the same contents is accessible in two places. One can also remount a single
file (on a single file). It’s also possible to use the bind mount to create a mountpoint from a
regular directory, for example:
mount —bind foo foo
The bind mount call attaches only (part of) a single filesystem, not possible submounts.
The entire file hierarchy including submounts is attached a second place using
mount —rbind olddir newdir
or shortoption
mount -R olddir newdir
Note that the filesystem mount options will remain the same as those on the original
mount point, and cannot be changed by passing the -o option along with —bind/—rbind.
The mount options can be changed by a separate remount command, for example:
mount —bind olddir newdir mount -o remount,ro newdir
Note that behavior of the remount operation depends on the /etc/mtab file. The first
command stores the ‘bind’ flag to the /etc/mtab file and the second command reads the
flag from the file. If you have a system without the /etc/mtab file or if you explicitly
define source and target for the remount command (then mount(8) does not read
/etc/mtab), then you have to use bind flag (or option) for the remount command too. For
example:
mount —bind olddir newdir mount -o remount,ro,bind olddir newdir
Note that remount,ro,bind will create a read-only mountpoint (VFS entry), but the original
filesystem suberblock will be still writable, it means that the olddir will be writable, but
the newdir will be read-only.
The move operation.
Since Linux 2.5.1 it is possible to atomically move a mounted tree to another place.
The call is
mount —move olddir newdir
or shortoption
mount -M olddir newdir
This will cause the contents which previously appeared under olddir to be accessed under
newdir. The physical location of the files is not changed. Note that the olddir has to be a
mountpoint.
Note that moving a mount residing under a shared mount is invalid and unsupported. Use
findmnt -o TARGET,PROPAGATION to see the current propagation flags.
The shared subtrees operations.
Since Linux 2.6.15 it is possible to mark a mount and its submounts as shared,
private, slave or unbindable. A shared mount provides ability to create mirrors of that
mount such that mounts and umounts within any of the mirrors propagate to the other
mirror. A slave mount receives propagation from its master, but any not vice-versa. A
private mount carries no propagation abilities. A unbindable mount is a private mount
which cannot be cloned through a bind operation. Detailed semantics is documented
in Documentation/filesystems/sharedsubtree.txt file in the kernel source tree.
Supported operations:
mount --make-shared mountpoint mount --make-slave mountpoint mount --
make-private mountpoint mount --make-unbindable mountpoint
The following commands allows one to recursively change the type of all the mounts
under a given mountpoint.
mount --make-rshared mountpoint mount --make-rslave mountpoint mount --
make-rprivate mountpoint mount --make-runbindable mountpoint
mount(8) does not read fstab(5) when —make-* operation is requested. All necessary
information has to be specified on command line.
Note that Linux kernel does not allow to change more propagation flags by one mount(2)
syscall and the flags cannot be mixed with another mount options.
Since util-linux 2.23 mount command allows to use more propagation flags together and
with another mount operations. This feature is EXPERIMENTAL. The propagation flags
are applied by additional mount(2) syscalls after previous successful mount operation.
Note that this use case is not atomic. The propagation flags is possible to specify in
fstab(5) as mount options (private, slave, shared, unbindable, rprivate, rslave, rshared,
runbindable).
For example
mount --make-private --make-unbindable /dev/sda1 /A
is the same as
mount /dev/sda1 /A mount --make-private /A mount --make-unbindable /A
› COMMAND LINE OPTIONS
The full set of mount options used by an invocation of mount is determined by first
extracting the mount options for the filesystem from the fstab table, then applying
any options specified by the -o argument, and finally applying a -r or -w option,
when present.
Command line options available for the mount command:
-V, —version
Output version.
-h, —help
Print a help message.
-v, —verbose
Verbose mode.
-a, —all
Mount all filesystems (of the given types) mentioned in fstab.
-F, —fork
(Used in conjunction with -a.) Fork off a new incarnation of mount for each device.
This will do the mounts on different devices or different NFS servers in parallel. This
has the advantage that it is faster; also NFS timeouts go in parallel. A disadvantage is
that the mounts are done in undefined order. Thus, you cannot use this option if you
want to mount both /usr and /usr/spool.
-f, —fake
Causes everything to be done except for the actual system call; if it’s not obvious,
this “fakes” mounting the filesystem. This option is useful in conjunction with the -v
flag to determine what the mount command is trying to do. It can also be used to add
entries for devices that were mounted earlier with the -n option. The -f option checks
for existing record in /etc/mtab and fails when the record already exists (with regular
non-fake mount, this check is done by kernel).
-i, —internal-only
Don’t call the /sbin/mount.<filesystem> helper even if it exists.
-l, —show-labels
Add the labels in the mount output. Mount must have permission to read the disk
device (e.g. be suid root) for this to work. One can set such a label for ext2, ext3 or
ext4 using the e2label(8) utility, or for XFS using xfs_admin(8), or for reiserfs using
reiserfstune(8).
-n, —no-mtab
Mount without writing in /etc/mtab. This is necessary for example when /etc is on a
read-only filesystem.
-c, —no-canonicalize
Don’t canonicalize paths. The mount command canonicalizes all paths (from
command line or fstab) and stores canonicalized paths to the /etc/mtab file. This
option can be used together with the -f flag for already canonicalized absolute paths.
-s
Tolerate sloppy mount options rather than failing. This will ignore mount options not
supported by a filesystem type. Not all filesystems support this option. This option
exists for support of the Linux autofs-based automounter.
—source src
If only one argument for the mount command is given then the argument might be
interpreted as target (mountpoint) or source (device). This option allows to explicitly
define that the argument is mount source.
-r, —read-only
Mount the filesystem read-only. A synonym is -o ro.
Note that, depending on the filesystem type, state and kernel behavior, the system
may still write to the device. For example, Ext3 or ext4 will replay its journal if the
filesystem is dirty. To prevent this kind of write access, you may want to mount ext3
or ext4 filesystem with “ro,noload” mount options or set the block device to read-
only mode, see command blockdev(8).
-w, —rw, —read-write
Mount the filesystem read/write. This is the default. A synonym is -o rw.
-L, —label label
Mount the partition that has the specified label.
-U, —uuid uuid
Mount the partition that has the specified uuid. These two options require the file
/proc/partitions (present since Linux 2.1.116) to exist.
-T, —fstab path
Specifies alternative fstab file. If the path is directory then the files in the directory
are sorted by strverscmp(3), files that starts with “.” or without .fstab extension are
ignored. The option can be specified more than once. This option is mostly designed
for initramfs or chroot scripts where additional configuration is specified outside
standard system configuration.
Note that mount(8) does not pass the option —fstab to /sbin/mount.<type> helpers, it
means that the alternative fstab files will be invisible for the helpers. This is no
problem for normal mounts, but user (non-root) mounts always require fstab to verify
user’s rights.
-t, —types vfstype
The argument following the -t is used to indicate the filesystem type. The filesystem
types which are currently supported include: adfs, affs, autofs, cifs, coda, coherent,
cramfs, debugfs, devpts, efs, ext, ext2, ext3, ext4, hfs, hfsplus, hpfs, iso9660, jfs,
minix, msdos, ncpfs, nfs, nfs4, ntfs, proc, qnx4, ramfs, reiserfs, romfs, squashfs, smbfs,
sysv, tmpfs, ubifs, udf, ufs, umsdos, usbfs, vfat, xenix, xfs, xiafs. Note that coherent,
sysv and xenix are equivalent and that xenix and coherent will be removed at some
point in the future – use sysv instead. Since kernel version 2.1.21 the types ext and
xiafs do not exist anymore. Earlier, usbfs was known as usbdevfs. Note, the real list of
all supported filesystems depends on your kernel.
The programs mount and umount support filesystem subtypes. The subtype is
defined by ‘.subtype’ suffix. For example ‘fuse.sshfs’. It’s recommended to use
subtype notation rather than add any prefix to the mount source (for example
‘sshfs#example.com’ is depreacated).
For most types all the mount program has to do is issue a simple mount(2) system
call, and no detailed knowledge of the filesystem type is required. For a few types
however (like nfs, nfs4, cifs, smbfs, ncpfs) ad hoc code is necessary. The nfs, nfs4,
cifs, smbfs, and ncpfs filesystems have a separate mount program. In order to make it
possible to treat all types in a uniform way, mount will execute the program
/sbin/mount.TYPE (if that exists) when called with type TYPE. Since various
versions of the smbmount program have different calling conventions,
/sbin/mount.smbfs may have to be a shell script that sets up the desired call.
If no -t option is given, or if the auto type is specified, mount will try to guess the
desired type. Mount uses the blkid library for guessing the filesystem type; if that
does not turn up anything that looks familiar, mount will try to read the file
/etc/filesystems, or, if that does not exist, /proc/filesystems. All of the filesystem types
listed there will be tried, except for those that are labeled “nodev” (e.g., devpts, proc
and nfs). If /etc/filesystems ends in a line with a single * only, mount will read
/proc/filesystems afterwards. All of the filesystem types will be mounted with mount
option “silent”.
The auto type may be useful for user-mounted floppies. Creating a file
/etc/filesystems can be useful to change the probe order (e.g., to try vfat before msdos
or ext3 before ext2) or if you use a kernel module autoloader.
More than one type may be specified in a comma separated list. The list of filesystem
types can be prefixed with no to specify the filesystem types on which no action
should be taken. (This can be meaningful with the -a option.) For example, the
command:
mount -a -t nomsdos,ext
mounts all filesystems except those of type msdos and ext. —target dir If only one
argument for the mount command is given then the argument might be interpreted as
target (mountpoint) or source (device). This option allows to explicitly define that the
argument is mount target. -O, —test-opts opts Used in conjunction with -a, to limit the set
of filesystems to which the -a is applied. Like -t in this regard except that it is useless
except in the context of -a. For example, the command:
mount -a -O no_netdev
mounts all filesystems except those which have the option _netdev specified in the options
field in the /etc/fstab file.
It is different from -t in that each option is matched exactly; a leading no at the beginning
of one option does not negate the rest.
The -t and -O options are cumulative in effect; that is, the command
mount -a -t ext2 -O _netdev
mounts all ext2 filesystems with the _netdev option, not all filesystems that are either ext2
or have the _netdev option specified. -o, —options opts Options are specified with a -o
flag followed by a comma separated string of options. For example:
mount LABEL=mydisk -o noatime,nouser
For more details, see FILESYSTEM INDEPENDENT MOUNT OPTIONS and
FILESYSTEM SPECIFIC MOUNT OPTIONS sections.
-B, —bind Remount a subtree somewhere else (so that its contents are available in both
places). See above. -R, —rbind Remount a subtree and all possible submounts
somewhere else (so that its contents are available in both places). See above. -M, —move
Move a subtree to some other place. See above.
› FILESYSTEM INDEPENDENT MOUNT OPTIONS
Some of these options are only useful when they appear in the /etc/fstab file.
Some of these options could be enabled or disabled by default in the system kernel.
To check the current setting see the options in /proc/mounts. Note that filesystems
also have per-filesystem specific default mount options (see for example tune2fs -l
output for extN filesystems).
The following options apply to any filesystem that is being mounted (but not every
filesystem actually honors them - e.g., the sync option today has effect only for ext2,
ext3, fat, vfat and ufs):
async
All I/O to the filesystem should be done asynchronously. (See also the sync option.)
atime
Do not use noatime feature, then the inode access time is controlled by kernel
defaults. See also the description for strictatime and relatime mount options.
noatime
Do not update inode access times on this filesystem (e.g., for faster access on the
news spool to speed up news servers).
auto
Can be mounted with the -a option.
noauto
Can only be mounted explicitly (i.e., the -a option will not cause the filesystem to be
mounted).
context=context, fscontext=context, defcontext=context and rootcontext=context
The context= option is useful when mounting filesystems that do not support
extended attributes, such as a floppy or hard disk formatted with VFAT, or systems
that are not normally running under SELinux, such as an ext3 formatted disk from a
non-SELinux workstation. You can also use context= on filesystems you do not trust,
such as a floppy. It also helps in compatibility with xattr-supporting filesystems on
earlier 2.4.<x> kernel versions. Even where xattrs are supported, you can save time
not having to label every file by assigning the entire disk one security context.
A commonly used option for removable media is
context=system_u:object_r:removable_t.
Two other options are fscontext= and defcontext=, both of which are mutually
exclusive of the context option. This means you can use fscontext and defcontext
with each other, but neither can be used with context.
The fscontext= option works for all filesystems, regardless of their xattr support. The
fscontext option sets the overarching filesystem label to a specific security context.
This filesystem label is separate from the individual labels on the files. It represents
the entire filesystem for certain kinds of permission checks, such as during mount or
file creation. Individual file labels are still obtained from the xattrs on the files
themselves. The context option actually sets the aggregate context that fscontext
provides, in addition to supplying the same label for individual files.
You can set the default security context for unlabeled files using defcontext= option.
This overrides the value set for unlabeled files in the policy and requires a filesystem
that supports xattr labeling.
The rootcontext= option allows you to explicitly label the root inode of a FS being
mounted before that FS or inode becomes visible to userspace. This was found to be
useful for things like stateless linux.
Note that the kernel rejects any remount request that includes the context option,
even when unchanged from the current context.
Warning: the context value might contain commas, in which case the value has to be
properly quoted, otherwise mount(8) will interpret the comma as a separator between
mount options. Don’t forget that the shell strips off quotes and thus double quoting
is required. For example:
mount -t tmpfs none /mnt -o
‘context=system_u:object_r:tmp_t:s0:c127,c456,noexec’
The default is set from `dmask’ option. (If the directory is writable, utime(2) is also
allowed. I.e. ~dmask & 022)
Normally utime(2) checks current process is owner of the file, or it has CAP_FOWNER
capability. But FAT filesystem doesn’t have uid/gid on disk, so normal check is too
inflexible. With this option you can relax it.
check=value Three different levels of pickyness can be chosen:
r[elaxed]
Upper and lower case are accepted and equivalent, long name parts are truncated
(e.g. verylongname.foobar becomes verylong.foo), leading and embedded spaces are
accepted in each name part (name and extension).
n[ormal]
Like “relaxed”, but many special characters (*, ?, <, spaces, etc.) are rejected. This is
the default.
s[trict]
Like “normal”, but names may not contain long parts and special characters that are
sometimes used on Linux, but are not accepted by MS-DOS are rejected. (+, =,
spaces, etc.)
codepage=value Sets the codepage for converting to shortname characters on FAT and
VFAT filesystems. By default, codepage 437 is used. conv={b[inary]|t[ext]|a[uto]} The fat
filesystem can perform CRLF<—>NL (MS-DOS text format to UNIX text format)
conversion in the kernel. The following conversion modes are available:
binary
no translation is performed. This is the default.
text
CRLF<—>NL translation is performed on all files.
auto
CRLF<—>NL translation is performed on all files that don’t have a “well-known
binary” extension. The list of known extensions can be found at the beginning of
fs/fat/misc.c (as of 2.0, the list is: exe, com, bin, app, sys, drv, ovl, ovr, obj, lib, dll,
pif, arc, zip, lha, lzh, zoo, tar, z, arj, tz, taz, tzp, tpz, gz, tgz, deb, gif, bmp, tif, gl, jpg,
pcx, tfm, vf, gf, pk, pxl, dvi).
Programs that do computed lseeks won’t like in-kernel text conversion. Several people
have had their data ruined by this translation. Beware!
For filesystems mounted in binary mode, a conversion tool (fromdos/todos) is available.
This option is obsolete.
cvf_format=module Forces the driver to use the CVF (Compressed Volume File) module
cvf_module instead of auto-detection. If the kernel supports kmod, the cvf_format=xxx
option also controls on-demand CVF module loading. This option is obsolete.
cvf_option=option Option passed to the CVF module. This option is obsolete. debug
Turn on the debug flag. A version string and a list of filesystem parameters will be printed
(these data are also printed if the parameters appear to be inconsistent). discard If set,
causes discard/TRIM commands to be issued to the block device when blocks are freed.
This is useful for SSD devices and sparse/thinly-provisioned LUNs. fat={12|16|32}
Specify a 12, 16 or 32 bit fat. This overrides the automatic FAT type detection routine. Use
with caution! iocharset=value Character set to use for converting between 8 bit characters
and 16 bit Unicode characters. The default is iso8859-1. Long filenames are stored on disk
in Unicode format. nfs If set, enables in-memory indexing of directory inodes to reduce
the frequency of ESTALE errors in NFS client operations. Useful only when the
filesystem is exported via NFS. tz=UTC This option disables the conversion of
timestamps between local time (as used by Windows on FAT) and UTC (which Linux uses
internally). This is particularly useful when mounting devices (like digital cameras) that
are set to UTC in order to avoid the pitfalls of local time. quiet Turn on the quiet flag.
Attempts to chown or chmod files do not return errors, although they fail. Use with
caution! showexec If set, the execute permission bits of the file will be allowed only if the
extension part of the name is .EXE, .COM, or .BAT. Not set by default. sys_immutable If
set, ATTR_SYS attribute on FAT is handled as IMMUTABLE flag on Linux. Not set by
default. flush If set, the filesystem will try to flush to disk more early than normal. Not set
by default. usefree Use the “free clusters” value stored on FSINFO. It’ll be used to
determine number of free clusters without scanning disk. But it’s not used by default,
because recent Windows don’t update it correctly in some case. If you are sure the “free
clusters” on FSINFO is correct, by this option you can avoid scanning disk. dots, nodots,
dotsOK=[yes|no] Various misguided attempts to force Unix or DOS conventions onto a
FAT filesystem.
› MOUNT OPTIONS FOR HFS
creator=cccc, type=cccc
Set the creator/type values as shown by the MacOS finder used for creating new files.
Default values: ‘????’.
uid=n, gid=n
Set the owner and group of all files. (Default: the uid and gid of the current process.)
dir_umask=n, file_umask=n, umask=n
Set the umask used for all directories, all regular files, or all files and directories.
Defaults to the umask of the current process.
session=n
Select the CDROM session to mount. Defaults to leaving that decision to the
CDROM driver. This option will fail with anything but a CDROM as underlying
device.
part=n
Select partition number n from the device. Only makes sense for CDROMs. Defaults
to not parsing the partition table at all.
quiet
Don’t complain about invalid mount options.
› MOUNT OPTIONS FOR HPFS
uid=value and gid=value
Set the owner and group of all files. (Default: the uid and gid of the current process.)
umask=value
Set the umask (the bitmask of the permissions that are not present). The default is the
umask of the current process. The value is given in octal.
case={lower|asis}
Convert all files names to lower case, or leave them. (Default: case=lower.)
conv={binary|text|auto}
For conv=text, delete some random CRs (in particular, all followed by NL) when
reading a file. For conv=auto, choose more or less at random between conv=binary
and conv=text. For conv=binary, just read what is in the file. This is the default.
nocheck
Do not abort mounting when certain consistency checks fail.
› MOUNT OPTIONS FOR ISO9660
ISO 9660 is a standard describing a filesystem structure to be used on CD-ROMs.
(This filesystem type is also seen on some DVDs. See also the udf filesystem.)
Normal iso9660 filenames appear in a 8.3 format (i.e., DOS-like restrictions on
filename length), and in addition all characters are in upper case. Also there is no
field for file ownership, protection, number of links, provision for block/character
devices, etc.
Rock Ridge is an extension to iso9660 that provides all of these UNIX-like features.
Basically there are extensions to each directory record that supply all of the
additional information, and when Rock Ridge is in use, the filesystem is
indistinguishable from a normal UNIX filesystem (except that it is read-only, of
course).
norock
Disable the use of Rock Ridge extensions, even if available. Cf. map.
nojoliet
Disable the use of Microsoft Joliet extensions, even if available. Cf. map.
check={r[elaxed]|s[trict]}
With check=relaxed, a filename is first converted to lower case before doing the
lookup. This is probably only meaningful together with norock and map=normal.
(Default: check=strict.)
uid=value and gid=value
Give all files in the filesystem the indicated user or group id, possibly overriding the
information found in the Rock Ridge extensions. (Default: uid=0,gid=0.)
map={n[ormal]|o[ff]|a[corn]}
For non-Rock Ridge volumes, normal name translation maps upper to lower case
ASCII, drops a trailing `;1’, and converts `;’ to `.’. With map=off no name translation
is done. See norock. (Default: map=normal.) map=acorn is like map=normal but
also apply Acorn extensions if present.
mode=value
For non-Rock Ridge volumes, give all files the indicated mode. (Default: read
permission for everybody.) Since Linux 2.1.37 one no longer needs to specify the
mode in decimal. (Octal is indicated by a leading 0.)
unhide
Also show hidden and associated files. (If the ordinary files and the associated or
hidden files have the same filenames, this may make the ordinary files inaccessible.)
block={512|1024|2048}
Set the block size to the indicated value. (Default: block=1024.)
conv={a[uto]|b[inary]|m[text]|t[ext]}
(Default: conv=binary.) Since Linux 1.3.54 this option has no effect anymore. (And
non-binary settings used to be very dangerous, possibly leading to silent data
corruption.)
cruft
If the high byte of the file length contains other garbage, set this mount option to
ignore the high order bits of the file length. This implies that a file cannot be larger
than 16MB.
session=x
Select number of session on multisession CD. (Since 2.3.4.)
sbsector=xxx
Session begins from sector xxx. (Since 2.3.4.)
The following options are the same as for vfat and specifying them only makes sense
when using discs encoded using Microsoft’s Joliet extensions.
iocharset=value
Character set to use for converting 16 bit Unicode characters on CD to 8 bit
characters. The default is iso8859-1.
utf8
Convert 16 bit Unicode characters on CD to UTF-8.
› MOUNT OPTIONS FOR JFS
iocharset=name
Character set to use for converting from Unicode to ASCII. The default is to do no
conversion. Use iocharset=utf8 for UTF8 translations. This requires
CONFIG_NLS_UTF8 to be set in the kernel .config file.
resize=value
Resize the volume to value blocks. JFS only supports growing a volume, not
shrinking it. This option is only valid during a remount, when the volume is mounted
read-write. The resize keyword with no value will grow the volume to the full size of
the partition.
nointegrity
Do not write to the journal. The primary use of this option is to allow for higher
performance when restoring a volume from backup media. The integrity of the
volume is not guaranteed if the system abnormally ends.
integrity
Default. Commit metadata changes to the journal. Use this option to remount a
volume where the nointegrity option was previously specified in order to restore
normal behavior.
errors={continue|remount-ro|panic}
Define the behaviour when an error is encountered. (Either ignore errors and just
mark the filesystem erroneous and continue, or remount the filesystem read-only, or
panic and halt the system.)
noquota|quota|usrquota|grpquota
These options are accepted but ignored.
› MOUNT OPTIONS FOR MINIX
None.
› MOUNT OPTIONS FOR MSDOS
See mount options for fat. If the msdos filesystem detects an inconsistency, it reports
an error and sets the file system read-only. The filesystem can be made writable again
by remounting it.
› MOUNT OPTIONS FOR NCPFS
Just like nfs, the ncpfs implementation expects a binary argument (a struct
ncp_mount_data) to the mount system call. This argument is constructed by
ncpmount(8) and the current version of mount (2.12) does not know anything about
ncpfs.
› MOUNT OPTIONS FOR NFS AND NFS4
See the options section of the nfs(5) man page (nfs-utils package must be installed).
The nfs and nfs4 implementation expects a binary argument (a struct
nfs_mount_data) to the mount system call. This argument is constructed by
mount.nfs(8) and the current version of mount (2.13) does not know anything about
nfs and nfs4.
› MOUNT OPTIONS FOR NTFS
iocharset=name
Character set to use when returning file names. Unlike VFAT, NTFS suppresses
names that contain nonconvertible characters. Deprecated.
nls=name
New name for the option earlier called iocharset.
utf8
Use UTF-8 for converting file names.
uni_xlate={0|1|2}
For 0 (or `no’ or `false’), do not use escape sequences for unknown Unicode
characters. For 1 (or `yes’ or `true’) or 2, use vfat-style 4-byte escape sequences
starting with “:”. Here 2 give a little-endian encoding and 1 a byteswapped bigendian
encoding.
posix=[0|1]
If enabled (posix=1), the filesystem distinguishes between upper and lower case. The
8.3 alias names are presented as hard links instead of being suppressed. This option is
obsolete.
uid=value, gid=value and umask=value
Set the file permission on the filesystem. The umask value is given in octal. By
default, the files are owned by root and not readable by somebody else.
› MOUNT OPTIONS FOR PROC
uid=value and gid=value
These options are recognized, but have no effect as far as I can see.
› MOUNT OPTIONS FOR RAMFS
Ramfs is a memory based filesystem. Mount it and you have it. Unmount it and it is
gone. Present since Linux 2.3.99pre4. There are no mount options.
› MOUNT OPTIONS FOR REISERFS
Reiserfs is a journaling filesystem.
conv
Instructs version 3.6 reiserfs software to mount a version 3.5 filesystem, using the 3.6
format for newly created objects. This filesystem will no longer be compatible with
reiserfs 3.5 tools.
hash={rupasov|tea|r5|detect}
Choose which hash function reiserfs will use to find files within directories.
rupasov
A hash invented by Yury Yu. Rupasov. It is fast and preserves locality, mapping
lexicographically close file names to close hash values. This option should not be
used, as it causes a high probability of hash collisions.
tea
A Davis-Meyer function implemented by Jeremy Fitzhardinge. It uses hash
permuting bits in the name. It gets high randomness and, therefore, low probability of
hash collisions at some CPU cost. This may be used if EHASHCOLLISION errors
are experienced with the r5 hash.
r5
A modified version of the rupasov hash. It is used by default and is the best choice
unless the filesystem has huge directories and unusual file-name patterns.
detect
Instructs mount to detect which hash function is in use by examining the filesystem
being mounted, and to write this information into the reiserfs superblock. This is only
useful on the first mount of an old format filesystem.
The tmpfs mount options for sizing ( size, nr_blocks, and nr_inodes) accept a suffix k, m
or g for Ki, Mi, Gi (binary kilo, mega and giga) and can be changed on remount.
mode=
Set initial permissions of the root directory.
uid=
The user id.
gid=
The group id.
mpol=[default|prefer:Node|bind:NodeList|interleave|interleave:NodeList]
Set the NUMA memory allocation policy for all files in that instance (if the kernel
CONFIG_NUMA is enabled) - which can be adjusted on the fly via ‘mount -o
remount …’
default
prefers to allocate memory from the local node
prefer:Node
prefers to allocate memory from the given Node
bind:NodeList
allocates memory only from nodes in NodeList
interleave
prefers to allocate from each node in turn
interleave:NodeList
allocates from each node of NodeList in turn.
The NodeList format is a comma-separated list of decimal numbers and ranges, a range
being two hyphen-separated decimal numbers, the smallest and largest node numbers in
the range. For example, mpol=bind:0-3,5,7,9-15
Note that trying to mount a tmpfs with an mpol option will fail if the running kernel does
not support NUMA; and will fail if its nodelist specifies a node which is not online. If
your system relies on that tmpfs being mounted, but from time to time runs a kernel built
without NUMA capability (perhaps a safe recovery kernel), or with fewer nodes online,
then it is advisable to omit the mpol option from automatic mount options. It can be added
later, when the tmpfs is already mounted on MountPoint, by ‘mount -o
remount,mpol=Policy:NodeList MountPoint’.
› MOUNT OPTIONS FOR UBIFS
UBIFS is a flash file system which works on top of UBI volumes. Note that atime is
not supported and is always turned off.
The device name may be specified as
ubiX_Y UBI device number X, volume number Y
ubiY
UBI device number 0, volume number Y
ubiX:NAME
UBI device number X, volume with name NAME
ubi:NAME
UBI device number 0, volume with name NAME
Alternative ! separator may be used instead of :. The following mount options are
available: bulk_read Enable bulk-read. VFS read-ahead is disabled because it slows down
the file system. Bulk-Read is an internal optimization. Some flashes may read faster if the
data are read at one go, rather than at several read requests. For example, OneNAND can
do “read-while-load” if it reads more than one NAND page. no_bulk_read Do not bulk-
read. This is the default. chk_data_crc Check data CRC-32 checksums. This is the
default. no_chk_data_crc. Do not check data CRC-32 checksums. With this option, the
filesystem does not check CRC-32 checksum for data, but it does check it for the internal
indexing information. This option only affects reading, not writing. CRC-32 is always
calculated when writing the data. compr={none|lzo|zlib} Select the default compressor
which is used when new files are written. It is still possible to read compressed files if
mounted with the none option.
› MOUNT OPTIONS FOR UDF
udf is the “Universal Disk Format” filesystem defined by the Optical Storage
Technology Association, and is often used for DVD-ROM. See also iso9660.
gid=
Set the default group.
umask=
Set the default umask. The value is given in octal.
uid=
Set the default user.
unhide
Show otherwise hidden files.
undelete
Show deleted files in lists.
nostrict
Unset strict conformance.
iocharset
Set the NLS character set.
bs=
Set the block size. (May not work unless 2048.)
novrs
Skip volume sequence recognition.
session=
Set the CDROM session counting from 0. Default: last session.
anchor=
Override standard anchor location. Default: 256.
volume=
Override the VolumeDesc location. (unused)
partition=
Override the PartitionDesc location. (unused)
lastblock=
Set the last block of the filesystem.
fileset=
Override the fileset block location. (unused)
rootdir=
Override the root directory location. (unused)
› MOUNT OPTIONS FOR UFS
ufstype=value
UFS is a filesystem widely used in different operating systems. The problem are
differences among implementations. Features of some implementations are
undocumented, so its hard to recognize the type of ufs automatically. That’s why the
user must specify the type of ufs by mount option. Possible values are:
old
Old format of ufs, this is the default, read only. (Don’t forget to give the -r option.)
44bsd
For filesystems created by a BSD-like system (NetBSD,FreeBSD,OpenBSD).
ufs2
Used in FreeBSD 5.x supported as read-write.
5xbsd
Synonym for ufs2.
sun
For filesystems created by SunOS or Solaris on Sparc.
sunx86
For filesystems created by Solaris on x86.
hp
For filesystems created by HP-UX, read-only.
nextstep
For filesystems created by NeXTStep (on NeXT station) (currently read only).
nextstep-cd
For NextStep CDROMs (block_size == 2048), read-only.
openstep
For filesystems created by OpenStep (currently read only). The same filesystem type
is also used by Mac OS X.
will set up the loop device /dev/loop3 to correspond to the file /tmp/disk.img, and then
mount this device on /mnt.
If no explicit loop device is mentioned (but just an option `-o loop‘ is given), then mount
will try to find some unused loop device and use that, for example
mount /tmp/disk.img /mnt -o loop
The mount command automatically creates a loop device from a regular file if a
filesystem type is not specified or the filesystem is known for libblkid, for example:
mount /tmp/disk.img /mnt
mount -t ext3 /tmp/disk.img /mnt
This type of mount knows about four options, namely loop, offset and sizelimit , that are
really options to losetup(8). (These options can be used in addition to those specific to the
filesystem type.)
Since Linux 2.6.25 is supported auto-destruction of loop devices and then any loop device
allocated by mount will be freed by umount independently on /etc/mtab.
You can also free a loop device by hand, using `losetup -d’ or `umount -d`.
› RETURN CODES
mount has the following return codes (the bits can be ORed):
0
success
1
incorrect invocation or permissions
2
system error (out of memory, cannot fork, no more loop devices)
4
internal mount bug
8
user interrupt
16
problems writing or locking /etc/mtab
32
mount failure
64
some mount succeeded
The command mount -a returns 0 (all success), 32 (all failed) or 64 (some failed,
some success).
› NOTES
The syntax of external mount helpers is:
/sbin/mount.<suffix> spec dir [-sfnv] [-o options] [-t type.subtype]
where the <type> is filesystem type and -sfnvo options have same meaning like standard
mount options. The -t option is used for filesystems with subtypes support (for example
/sbin/mount.fuse -t fuse.sshfs).
› FILES
/etc/fstab
filesystem table
/etc/mtab
table of mounted filesystems
/etc/mtab~
lock file
/etc/mtab.tmp
temporary file
/etc/filesystems
a list of filesystem types to try
› ENVIRONMENT
LIBMOUNT_FSTAB=<path>
overrides the default location of the fstab file
LIBMOUNT_MTAB=<path>
overrides the default location of the mtab file
LIBMOUNT_DEBUG=0xffff
enables debug output
› SEE ALSO
mount(2), umount(2), fstab(5), umount(8), swapon(8), findmnt(8), nfs(5), xfs(5),
e2label(8), xfs_admin(8), mountd(8), nfsd(8), mke2fs(8), tune2fs(8), losetup(8)
› BUGS
It is possible for a corrupted filesystem to cause a crash.
Some Linux filesystems don’t support -o sync and -o dirsync (the ext2, ext3, fat and
vfat filesystems do support synchronous updates (a la BSD) when mounted with the
sync option).
The -o remount may not be able to change mount parameters (all ext2fs-specific
parameters, except sb, are changeable with a remount, for example, but you can’t
change gid or umask for the fatfs).
It is possible that files /etc/mtab and /proc/mounts don’t match. The first file is based
only on the mount command options, but the content of the second file also depends
on the kernel and others settings (e.g. remote NFS server. In particular case the mount
command may reports unreliable information about a NFS mount point and the
/proc/mounts file usually contains more reliable information.)
Checking files on NFS filesystem referenced by file descriptors (i.e. the fcntl and
ioctl families of functions) may lead to inconsistent result due to the lack of
consistency check in kernel even if noac is used.
The loop option with the offset or sizelimit options used may fail when using older
kernels if the mount command can’t confirm that the size of the block device has
been configured as requested. This situation can be worked around by using the
losetup command manually before calling mount with the configured loop device.
› HISTORY
A mount command existed in Version 5 AT&T UNIX.
› AUTHORS
Karel Zak <[email protected]>
› AVAILABILITY
The mount command is part of the util-linux package and is available from
ftp://ftp.kernel.org/pub/linux/utils/util-linux/.
MOUNT.NFS
› NAME
mount.nfs, mount.nfs4 - mount a Network File System
› SYNOPSIS
mount.nfs remotetarget dir [-rvVwfnsh ] [-o options]
› DESCRIPTION
mount.nfs is a part of nfs(5) utilities package, which provides NFS client
functionality.
mount.nfs is meant to be used by the mount(8) command for mounting NFS shares.
This subcommand, however, can also be used as a standalone command with limited
functionality.
mount.nfs4 is used for mounting NFSv4 file system, while mount.nfs is used to
mount NFS file systems versions 3 or 2. remotetarget is a server share usually in the
form of servername:/path/to/share. dir is the directory on which the file system is
to be mounted.
› OPTIONS
-r
Mount file system readonly.
-v
Be verbose.
-V
Print version.
-w
Mount file system read-write.
-f
Fake mount. Don’t actually call the mount system call.
-n
Do not update /etc/mtab. By default, an entry is created in /etc/mtab for every
mounted file system. Use this option to skip making an entry.
-s
Tolerate sloppy mount options rather than fail.
-h
Print help message.
nfsoptions
Refer to nfs(5) or mount(8) manual pages.
› NOTE
For further information please refer nfs(5) and mount(8) manual pages.
› FILES
/etc/fstab
file system table
/etc/mtab
table of mounted file systems
/etc/nfsmount.conf
Configuration file for NFS mounts
› SEE ALSO
nfs(5), nfsmount.conf(5), mount(8),
› AUTHOR
Amit Gud <[email protected]>
rpc.mountd
› NAME
rpc.mountd - NFS mount daemon
› SYNOPSIS
/usr/sbin/rpc.mountd [options]
› DESCRIPTION
The rpc.mountd daemon implements the server side of the NFS MOUNT protocol,
an NFS side protocol used by NFS version 2 [RFC1094] and NFS version 3
[RFC1813].
An NFS server maintains a table of local physical file systems that are accessible to
NFS clients. Each file system in this table is referred to as an exported file system, or
export, for short.
Each file system in the export table has an access control list. rpc.mountd uses these
access control lists to determine whether an NFS client is permitted to access a given
file system. For details on how to manage your NFS server’s export table, see the
exports(5) and exportfs(8) man pages.
The NFS MOUNT protocol has several procedures. The most important of these are
MNT (mount an export) and UMNT (unmount an export).
A MNT request has two arguments: an explicit argument that contains the pathname
of the root directory of the export to be mounted, and an implicit argument that is the
sender’s IP address.
When receiving a MNT request from an NFS client, rpc.mountd checks both the
pathname and the sender’s IP address against its export table. If the sender is
permitted to access the requested export, rpc.mountd returns an NFS file handle for
the export’s root directory to the client. The client can then use the root file handle
and NFS LOOKUP requests to navigate the directory structure of the export.
The rpc.mountd daemon registers every successful MNT request by adding an entry
to the /var/lib/nfs/rmtab file. When receivng a UMNT request from an NFS client,
rpc.mountd simply removes the matching entry from /var/lib/nfs/rmtab, as long as
the access control list for that export allows that sender to access the export.
Clients can discover the list of file systems an NFS server is currently exporting, or
the list of other clients that have mounted its exports, by using the showmount(8)
command. showmount(8) uses other procedures in the NFS MOUNT protocol to
report information about the server’s exported file systems.
Note, however, that there is little to guarantee that the contents of /var/lib/nfs/rmtab
are accurate. A client may continue accessing an export even after invoking UMNT.
If the client reboots without sending a UMNT request, stale entries remain for that
client in /var/lib/nfs/rmtab.
› OPTIONS
-d kind or —debug kind
Turn on debugging. Valid kinds are: all, auth, call, general and parse.
-F or —foreground
Run in foreground (do not daemonize)
-f export-file or —exports-file export-file
This option specifies the exports file, listing the clients that this server is prepared to
serve and parameters to apply to each such mount (see exports(5)). By default,
export information is read from /etc/exports.
-h or —help
Display usage message.
-o num or —descriptors num
Set the limit of the number of open file descriptors to num. The default is to leave the
limit unchanged.
-N mountd-version or —no-nfs-version mountd-version
This option can be used to request that rpc.mountd do not offer certain versions of
NFS. The current version of rpc.mountd can support both NFS version 2, 3 and 4. If
the either one of these version should not be offered, rpc.mountd must be invoked
with the option —no-nfs-version <vers> .
-n or —no-tcp
Don’t advertise TCP for mount.
-P
Ignored (compatibility with unfsd??).
-p num or —port num
Specifies the port number used for RPC listener sockets. If this option is not
specified, rpc.mountd will try to consult /etc/services, if gets port succeed, set the
same port for all listener socket, otherwise chooses a random ephemeral port for each
listener socket.
This option can be used to fix the port value of rpc.mountd‘s listeners when NFS
MOUNT requests must traverse a firewall between clients and servers.
-H prog or —ha-callout prog
Specify a high availability callout program. This program receives callouts for all
MOUNT and UNMOUNT requests. This allows rpc.mountd to be used in a High
Availability NFS (HA-NFS) environment.
The callout program is run with 4 arguments. The first is mount or unmount
depending on the reason for the callout. The second will be the name of the client
performing the mount. The third will be the path that the client is mounting. The last
is the number of concurrent mounts that we believe the client has of that path.
This callout is not needed with 2.6 and later kernels. Instead, mount the nfsd
filesystem on /proc/fs/nfsd.
-s, —state-directory-path directory
Specify a directory in which to place statd state information. If this option is not
specified the default of /var/lib/nfs is used.
-r, —reverse-lookup
rpc.mountd tracks IP addresses in the rmtab file. When a DUMP request is made (by
someone running showmount -a, for instance), it returns IP addresses instead of
hostnames by default. This option causes rpc.mountd to perform a reverse lookup on
each IP address and return that hostname instead. Enabling this can have a substantial
negative effect on performance in some situations.
-t N or —num-threads=N or —num-threads N
This option specifies the number of worker threads that rpc.mountd spawns. The
default is 1 thread, which is probably enough. More threads are usually only needed
for NFS servers which need to handle mount storms of hundreds of NFS mounts in a
few seconds, or when your DNS server is slow or unreliable.
-u or —no-udp
Don’t advertise UDP for mounting
-V version or —nfs-version version
This option can be used to request that rpc.mountd offer certain versions of NFS.
The current version of rpc.mountd can support both NFS version 2 and the newer
version 3.
-v or —version
Print the version of rpc.mountd and exit.
-g or —manage-gids
Accept requests from the kernel to map user id numbers into lists of group id
numbers for use in access control. An NFS request will normally (except when using
Kerberos or other cryptographic authentication) contains a user-id and a list of group-
ids. Due to a limitation in the NFS protocol, at most 16 groups ids can be listed. If
you use the -g flag, then the list of group ids received from the client will be replaced
by a list of group ids determined by an appropriate lookup on the server. Note that the
‘primary’ group id is not affected so a newgroup command on the client will still be
effective. This function requires a Linux Kernel with version at least 2.6.21.
› TCP_WRAPPERS SUPPORT
You can protect your rpc.mountd listeners using the tcp_wrapper library or
iptables(8).
Note that the tcp_wrapper library supports only IPv4 networking.
Add the hostnames of NFS peers that are allowed to access rpc.mountd to
/etc/hosts.allow. Use the daemon name mountd even if the rpc.mountd binary has a
different name.
Hostnames used in either access file will be ignored when they can not be resolved
into IP addresses. For further information see the tcpd(8) and hosts_access(5) man
pages.
TI-RPC is a pre-requisite for supporting NFS on IPv6. If TI-RPC support is built into
rpc.mountd, it attempts to start listeners on network transports marked ‘visible’ in
/etc/netconfig. As long as at least one network transport listener starts successfully,
rpc.mountd will operate.
› FILES
/etc/exports
input file for exportfs, listing exports, export options, and access control lists
/var/lib/nfs/rmtab
table of clients accessing server’s exports
› SEE ALSO
exportfs(8), exports(5), showmount(8), rpc.nfsd(8), rpc.rquotad(8), nfs(5),
tcpd(8), hosts_access(5), iptables(8), netconfig(5)
RFC 1094 - “NFS: Network File System Protocol Specification” RFC 1813 - “NFS
Version 3 Protocol Specification”
› AUTHOR
Olaf Kirch, H. J. Lu, G. Allan Morris III, and a host of others.
mountstats
› NAME
mountstats - Displays various NFS client per-mount statistics
› SYNOPSIS
mountstats [-h|—help] [-v|—version] [-f|—file infile] [-S|—since sincefile] [-n|—
nfs] [-r|—rpc] [-R|—raw] [mountpoint]…
mountstats iostat [-h|—help] [-v|—version] [-f|—file infile] [-S|—since sincefile]
[interval] [count] [mountpoint]…
mounstats nfsstat [-h|—help] [-v|—version] [-f|—file infile] [-S|—since sincefile]
[-3] [-4] [mountpoint]…
› DESCRIPTION
The mountstats command displays various NFS client statisitics for each given
mountpoint.
If no mountpoint is given, statistics will be displayed for all NFS mountpoints on the
client.
Sub-commands
-h, —help
show the help message and exit
-v, —version
show program’s version number and exit
-f infile, —file infile
Read stats from infile instead of /proc/self/mountstats. infile must be in the same
format as /proc/self/mountstats. This may be used with the -S|—since options to
display the delta between two different points in time. This may not be used with the
interval or count options of the iostat sub-command.
-S sincefile, —since sincefile
Show difference between current stats and those in sincefile. sincefile must be in the
same format as /proc/self/mountstats. This may be used with the -f|—file options to
display the delta between two different points in time. This may not be used with the
interval or count options of the iostat sub-command.
interval
Specifies the amount of time in seconds between each report. The first report
contains statistics for the time since each file system was mounted. Each subsequent
report contains statistics collected during the interval since the previous report. This
may not be used with the -f|—file or -s|—since options.
count
Determines the number of reports generated at interval seconds apart. If the interval
parameter is specified without the count parameter, the command generates reports
continuously. This may not be used with the -f|—file or -S|—since options.
-3
Show only NFS version 3 statistics. The default is to show both version 3 and version
4 statistics.
-4
Show only NFS version 4 statistics. The default is to show both version 3 and version
4 statistics.
› FILES
/proc/self/mountstats
› SEE ALSO
iostat(8), nfsiostat(8), nfsstat(8)
› AUTHOR
Chuck Lever <[email protected]>
MPATHCONF
› NAME
mpathconf - A tool for configuring device-mapper-multipath
› SYNOPSIS
mpathconf [commands] [options]
› DESCRIPTION
mpathconf is a utility that creates or modifies /etc/multipath.conf. It can enable or
disable multipathing and configure some common options. mpathconf can also load
the dm_multipath module, start and stop the multipathd daemon, and configure the
multipathd service to start automatically or not. If mpathconf is called with no
commands, it will display the current configuration.
The default options for mpathconf are —with_module The —with_multipathd
option is not set by default. Enabling multipathing will load the dm_multipath
module but it will not immediately start it. This is so that users can manually edit
their config file if necessary, before starting multipathd.
If /etc/multipath.conf already exists, mpathconf will edit it. If it does not exist,
mpathconf will use /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf
as the starting file. This file has user_friendly_names set. If this file does not exist,
mpathconf will create /etc/multipath.conf from scratch. For most users, this means
that user_friendly_names will be set by default, unless they use the —
user_friendly_names n command.
› COMMANDS
—enable
Removes any line that blacklists all device nodes from the /etc/multipath.conf
blacklist section.
—disable
Adds a line that blacklists all device nodes to the /etc/multipath.conf blacklist
section. If no blacklist section exists, it will create one.
—user_friendly_name { y | n }
If set to y, this adds the line user_friendly_names yes to the /etc/multipath.conf
defaults section. If set to n, this removes the line, if present. This command can be
used along with any other command.
—find_multipaths { y | n }
If set to y, this adds the line find_multipaths yes to the /etc/multipath.conf defaults
section. If set to n, this removes the line, if present. This command can be used
aldong with any other command.
› OPTIONS
—with_module { y | n }
If set to y, this runs modprobe dm_multipath to install the multipath modules. This
option only works with the —enable command. This option is set to y by default.
—with_multipathd { y | n }
If set to y, this runs service multipathd start to start the multipathd daemon on —
enable, service multipathd stop to stop the multipathd daemon on —disable, and
service multipathd reload to reconfigure multipathd on —user_frindly_names and
—find_multipaths. This option is set to n by default.
› FILES
/etc/multipath.conf
› SEE ALSO
multipath.conf(5), modprobe(8), multipath(8), multipathd(8), service(8),
› AUTHOR
Benjamin Marzinski <[email protected]>
MPATHPERSIST
› NAME
mpathpersist
› SYNOPSIS
mpathpersist [OPTIONS] [DEVICE]
› DESCRIPTION
Options:
—verbose|-v level
verbosity level
0
Critical and error messages
1
Warning messages
2
Informational messages
3
Informational messages with trace enabled
—clear|-C
PR Out: Clear
—device=DEVICE|-d DEVICE
query or change DEVICE
—help|-h
output this usage message
—hex|-H
output response in hex
—in|-i
request PR In command
—out|-o
request PR Out command
—param-aptpl|-Z
PR Out parameter ‘APTPL’
—read-keys|-k
PR In: Read Keys
—param-sark=SARK|-S SARK
PR Out parameter service action reservation key (SARK is in hex)
—preempt|-P
PR Out: Preempt
—preempt-abort|-A
PR Out: Preempt and Abort
—prout-type=TYPE|-T TYPE
PR Out command type
—read-status|-s
PR In: Read Full Status
—read-keys|-k
PR In: Read Keys
—read-reservation|-r
PR In: Read Reservation
—register|-G
PR Out: Register
—register-ignore|-I
PR Out: Register and Ignore
—release|-L
PR Out: Release
—report-capabilities|-c
PR In: Report Capabilities
—reserve|-R
PR Out: Reserve
—transport-id=TIDS|-X TIDS
TransportIDs can be mentioned in several forms
Examples:
mpathpersist —out —register —param-sark=123abc —prout-type=5
/dev/mapper/mpath9 mpathpersist -i -k /dev/mapper/mpath9
MTR
› NAME
mtr - a network diagnostic tool
› SYNOPSIS
mtr [-BfhvrctglxspQemniuTP46] [—help] [—version] [—report] [—report-
wide] [—report-cycles COUNT] [—curses] [—split] [—raw] [ —xml] [—mpls]
[—no-dns] [—show-ips] [—gtk] [—address IP.ADD.RE.SS] [—
interval SECONDS] [—max-ttl NUM] [—first-ttl NUM] [—bitpattern NUM]
[—tos NUM] [—psize BYTES | -s BYTES] [—tcp] [—udp] [—port PORT] [—
timeout SECONDS] HOSTNAME [PACKETSIZE]
› DESCRIPTION
mtr combines the functionality of the traceroute and ping programs in a single
network diagnostic tool.
As mtr starts, it investigates the network connection between the host mtr runs on
and HOSTNAME. by sending packets with purposely low TTLs. It continues to
send packets with low TTL, noting the response time of the intervening routers. This
allows mtr to print the response percentage and response times of the internet route
to HOSTNAME. A sudden increase in packet loss or response time is often an
indication of a bad (or simply overloaded) link.
The results are usually reported as round-trip-response times in miliseconds and the
percentage of packetloss.
› OPTIONS
-h
—help
Print the summary of command line argument options.
-v
—version
Print the installed version of mtr.
-r
—report
This option puts mtr into report mode. When in this mode, mtr will run for the
number of cycles specified by the -c option, and then print statistics and exit.
This mode is useful for generating statistics about network quality. Note that each
running instance of mtr generates a significant amount of network traffic. Using mtr
to measure the quality of your network may result in decreased network performance.
-w
—report-wide
This option puts mtr into wide report mode. When in this mode, mtr will not cut
hostnames in the report.
-c COUNT
—report-cycles COUNT
Use this option to set the number of pings sent to determine both the machines on the
network and the reliability of those machines. Each cycle lasts one second.
-s BYTES
—psize BYTES
PACKETSIZE
These options or a trailing PACKETSIZE on the command line sets the packet size
used for probing. It is in bytes inclusive IP and ICMP headers
If set to a negative number, every iteration will use a different, random packet size
upto that number.
-t
—curses
Use this option to force mtr to use the curses based terminal interface (if available).
-e
—mpls
Use this option to tell mtr to display information from ICMP extensions for MPLS
(RFC 4950) that are encoded in the response packets.
-n
—no-dns
Use this option to force mtr to display numeric IP numbers and not try to resolve the
host names.
-b
—show-ips
Use this option to tell mtr to display both the host names and numeric IP numbers. In
split mode this adds an extra field to the output. In report mode, there is usually too
little space to add the IPs, and they will be truncated. Use the wide report (-w) mode
to see the IPs in report mode.
-o fields order
—order fields order
Use this option to specify the fields and their order when loading mtr. Available
fields:
L Loss ratio
D Dropped packets
R Received packets
S Sent Packets
N Newest RTT(ms)
B Min/Best RTT(ms)
A Average RTT(ms)
W Max/Worst RTT(ms)
V Standard Deviation
G Geometric Mean
J Current Jitter
M Jitter Mean/Avg.
X Worst Jitter
I Interarrival Jitter
Example: -o “LSD NBAW”
-g
—gtk
Use this option to force mtr to use the GTK+ based X11 window interface (if
available). GTK+ must have been available on the system when mtr was built for
this to work. See the GTK+ web page at http://www.gtk.org/ for more information
about GTK+.
-p
—split
Use this option to set mtr to spit out a format that is suitable for a split-user interface.
-l
—raw
Use this option to tell mtr to use the raw output format. This format is better suited
for archival of the measurement results. It could be parsed to be presented into any of
the other display methods.
-x
—xml
Use this option to tell mtr to use the xml output format. This format is better suited
for automated processing of the measurement results.
-a IP.ADD.RE.SS
—address IP.ADD.RE.SS
Use this option to bind outgoing packets’ socket to specific interface, so that any
packet will be sent through this interface. NOTE that this option doesn’t apply to
DNS requests (which could be and could not be what you want).
-i SECONDS
—interval SECONDS
Use this option to specify the positive number of seconds between ICMP ECHO
requests. The default value for this parameter is one second.
-m NUM
—max-ttl NUM
Specifies the maximum number of hops (max time-to-live value) traceroute will
probe. Default is 30.
-f NUM
—first-ttl NUM
Specifies with what TTL to start. Defaults to 1.
-B NUM
—bitpattern NUM
Specifies bit pattern to use in payload. Should be within range 0 - 255.
-Q NUM
—tos NUM
Specifies value for type of service field in IP header. Should be within range 0 - 255.
-u
—udp
Use UDP datagrams instead of ICMP ECHO.
-T
—tcp
Use TCP SYN packets instead of ICMP ECHO. PACKETSIZE is ignored, since SYN
packets can not contain data.
-P PORT
—port PORT
The target port number for TCP traces.
—timeout SECONDS
The number of seconds to keep the TCP socket open before giving up on the
connection. This will only affect the final hop. Using large values for this, especially
combined with a short interval, will use up a lot of file descriptors.
-4
Use IPv4 only.
-6
Use IPv6 only.
› BUGS
Some modern routers give a lower priority to ICMP ECHO packets than to other
network traffic. Consequently, the reliability of these routers reported by mtr will be
significantly lower than the actual reliability of these routers.
› CONTACT INFORMATION
For the latest version, see the mtr web page at http://www.bitwizard.nl/mtr/.
The mtr mailinglist was little used and is no longer active.
Bug reports and feature requests should be submitted to the launchpad mtr
bugtracker.
› SEE ALSO
traceroute(8), ping(8) TCP/IP Illustrated (Stevens, ISBN 0201633469).
MULTIPATH
› NAME
multipath - Device mapper target autoconfig
› SYNOPSIS
multipath [-v verbosity] [-b bindings_file] [-d] [-h|-l|-ll|-f|-t|-F|-B|-c|-q|-r|-r|-a|-A|-
w|] [-p failover|multibus|group_by_serial|group_by_prio|group_by_node_name]
[device]
› DESCRIPTION
multipath is used to detect multiple paths to devices for fail-over or performance
reasons and coalesces them
› OPTIONS
-v level
verbosity, print all paths and multipaths
0
no output
1
print the created or updated multipath names only, for use to feed other tools like
kpartx
2 +
print all info : detected paths, coalesced paths (ie multipaths) and device maps
-h print usage text -d dry run, do not create or update devmaps -l show the current
multipath topology from information fetched in sysfs and the device mapper -ll show the
current multipath topology from all available information (sysfs, the device mapper, path
checkers …) -f flush a multipath device map specified as parameter, if unused -F flush all
unused multipath device maps -t print internal hardware table to stdout -r force devmap
reload -i ignore wwids file when processing devices -B treat the bindings file as read only
-b bindings_file set user_friendly_names bindings file location. The default is
/etc/multipath/bindings -c check if a block device should be a path in a multipath device -q
allow device tables with queue_if_no_path when multipathd is not running -a add the
wwid for the specified device to the wwids file -A add wwids from any kernel command
line mpath.wwid parameters to the wwids file -w remove the wwid for the specified
device from the wwids file -W reset the wwids file to only include the current multipath
devices -p policy force new maps to use the specified policy:
failover
1 path per priority group
multibus
all paths in 1 priority group
group_by_serial
1 priority group per serial
group_by_prio
1 priority group per priority value. Priorities are determined by callout programs
specified as a global, per-controller or per-multipath option in the configuration file
group_by_node_name
1 priority group per target node name. Target node names are fetched in
/sys/class/fc_transport/target*/node_name.
Existing maps are not modified. device update only the devmap the path pointed by device
is in. device is in the /dev/sdb (as shown by udev in the $DEVNAME variable) or
major:minor format. device may alternatively be a multipath mapname
› SEE ALSO
multipathd(8), multipath.conf(5), kpartx(8), udev(8), dmsetup(8) hotplug(8)
› AUTHORS
multipath was developed by Christophe Varoqui,
<[email protected]> and others.
MULTIPATHD
› NAME
multipathd - multipath daemon
› SYNOPSIS
multipathd [options]
› DESCRIPTION
The multipathd daemon is in charge of checking for failed paths. When this
happens, it will reconfigure the multipath map the path belongs to, so that this map
regains its maximum performance and redundancy.
This daemon executes the external multipath config tool when events occur. In turn,
the multipath tool signals the multipathd daemon when it is done with devmap
reconfiguration, so that it can refresh its failed path list.
› OPTIONS
-d
Forground Mode. Don’t daemonize, and print all messages to stdout and stderr.
-v level
Verbosity level. Print additional information while running multipathd. A level of 0
means only print errors. A level of 3 or greater prints debugging information as well.
-k
multipathd will enter interactive mode. From this mode, the available commands can
be viewed by entering “help”. When you are finished entering commands, press
CTRL-D to quit.
› COMMANDS
The following commands can be used in interactive mode:
list|show paths
Show the paths that multipathd is monitoring, and their state.
list|show paths format $format
Show the paths that multipathd is monitoring, using a format string with path format
wildcards.
list|show maps|multipaths
Show the multipath devices that the multipathd is monitoring.
list|show daemon
Show the current state of the multipathd daemon
list|show maps|multipaths format $format
Show the status of all multipath devices that the multipathd is monitoring, using a
format string with multipath format wildcards.
list|show maps|multipaths status
Show the status of all multipath devices that the multipathd is monitoring.
list|show maps|multipaths stats
Show some statistics of all multipath devices that the multipathd is monitoring.
list|show maps|multipaths topology
Show the current multipath topology. Same as “multipath -ll”.
list|show topology
Show the current multipath topology. Same as “multipath -ll”.
list|show map|multipath $map topology
Show topology of a single multipath device specified by $map, e.g.
36005076303ffc56200000000000010aa. This map could be obtained from “list
maps”.
list|show wildcards
Show the format wildcards used in interactive commands taking $format
list|show config
Show the currently used configuration, derived from default values and values
specified within the configuration file /etc/multipath.conf.
list|show blacklist
Show the currently used blacklist rules, derived from default values and values
specified within the configuration file /etc/multipath.conf.
list|show devices
Show all available block devices by name including the information if they are
blacklisted or not.
list|show status
Show the number of path checkers in each possible state, the number of monitored
paths, and whether multipathd is currently handling a uevent.
add path $path
Add a path to the list of monitored paths. $path is as listed in /sys/block (e.g. sda).
remove|del path $path
Stop monitoring a path. $path is as listed in /sys/block (e.g. sda).
add map|multipath $map
Add a multipath device to the list of monitored devices. $map can either be a device-
mapper device as listed in /sys/block (e.g. dm-0) or it can be the alias for the
multipath device (e.g. mpath1) or the uid of the multipath device (e.g.
36005076303ffc56200000000000010aa).
remove|del map|multipath $map
Stop monitoring a multipath device.
resize map|multipath $map
Resizes map $map to the given size
switch|switchgroup map|multipath $map group $group
Force a multipath device to switch to a specific path group. $group is the path group
index, starting with 1.
reconfigure
Reconfigures the multipaths. This should be triggered automatically after any hotplug
event.
suspend map|multipath $map
Sets map $map into suspend state.
resume map|multipath $map
Resumes map $map from suspend state.
reset map|multipath $map
Reassign existing device-mapper table(s) use use the multipath device, instead of its
path devices.
reload map|multipath $map
Reload a multipath device.
fail path $path
Sets path $path into failed state.
reinstate path $path
Resumes path $path from failed state.
disablequeueing maps|multipaths
Disable queueing on all multipath devices.
restorequeueing maps|multipaths
Restore queueing on all multipath devices.
disablequeueing map|multipath $map
Disable queuing on multipathed map $map
restorequeueing map|multipath $map
Restore queuing on multipathed map $map
forcequeueing daemon
Forces multipathd into queue_without_daemon mode, so that no_path_retry queueing
will not be disabled when the daemon stops
restorequeueing daemon
Restores configured queue_without_daemon mode
map|multipath $map setprstatus
Enable persistent reservation management on $map
map|multipath $map unsetprstatus
Disable persistent reservation management on $map
map|multipath $map getprstatus
Get the current persistent reservation management status of $map
quit|exit
End interactive session.
shutdown
Stop multipathd.
› SEE ALSO
multipath(8) kpartx(8) hotplug(8)
› AUTHORS
multipathd was developed by Christophe Varoqui,
<[email protected]> and others.
NAMEIF
› NAME
nameif - name network interfaces based on MAC addresses
› SYNOPSIS
nameif [-c configfile] [-s] nameif [-c configfile] [-s] {interface macaddress}
› NOTE
This program is obsolete. For replacement check ip link. This functionality is also
much better provided by udev methods.
› DESCRIPTION
nameif renames network interfaces based on mac addresses. When no arguments are
given /etc/mactab is read. Each line of it contains an interface name and a Ethernet
MAC address. Comments are allowed starting with #. Otherwise the interfaces
specified on the command line are processed. nameif looks for the interface with the
given MAC address and renames it to the name given.
When the -s argument is given all error messages go to the syslog.
When the -c argument is given with a file name that file is read instead of
/etc/mactab.
› NOTES
nameif should be run before the interface is up, otherwise it’ll fail.
› FILES
/etc/mactab
› SEE ALSO
ip(8), udev(7)
› BUGS
Only works for Ethernet currently.
ndptool
› NAME
ndptool – Neighbor Discovery Protocol tool
› SYNOPSIS
ndptool -h
ndptool [OPTIONS] COMMAND
› DESCRIPTION
ndptool is a tool which provides wrapper over Neighbor Discovery Protocol
messages.
› OPTIONS
-h, —help
Print help text to console and exit.
-v, —verbose
Increase output verbosity.
-t type, —msg-type type
Specified message type. Following are supported:
rs - Router Solicitation.
ra - Router Advertisement.
ns - Neighbor Solicitation.
na - Neighbor Advertisement.
-i ifname, —ifname ifname
Specified interface name.
› COMMAND
monitor
Monitor incoming NDP messages and print them out.
send
Send NDP message of specified type.
› AUTHOR
Jiri Pirko is the original author and current maintainer of libndp.
NET
› NAME
net - Tool for administration of Samba and remote CIFS servers.
› SYNOPSIS
net {<ads|rap|rpc>} [-h|—help] [-w|—workgroup workgroup] [-W|—
myworkgroup myworkgroup] [-U|—user user] [-I|—ipaddress ip-address] [-p|—
port port] [-n myname] [-s conffile] [-S|—server server] [-l|—long] [-v|—verbose] [-
f|—force] [-P|—machine-pass] [-d debuglevel] [-V] [—request-timeout seconds] [-t|
—timeout seconds] [-i|—stdin] [—tallocreport]
› DESCRIPTION
This tool is part of the samba(7) suite.
The Samba net utility is meant to work just like the net utility available for windows
and DOS. The first argument should be used to specify the protocol to use when
executing a certain command. ADS is used for ActiveDirectory, RAP is using for old
(Win9x/NT3) clients and RPC can be used for NT4 and Windows 2000. If this
argument is omitted, net will try to determine it automatically. Not all commands are
available on all protocols.
› OPTIONS
-?|—help
Print a summary of command line options.
-k|—kerberos
Try to authenticate with kerberos. Only useful in an Active Directory environment.
-w|—workgroup target-workgroup
Sets target workgroup or domain. You have to specify either this option or the IP
address or the name of a server.
-W|—myworkgroup workgroup
Sets client workgroup or domain
-U|—user user
User name to use
-I|—ipaddress ip-address
IP address of target server to use. You have to specify either this option or a target
workgroup or a target server.
-p|—port port
Port on the target server to connect to (usually 139 or 445). Defaults to trying 445
first, then 139.
-n|—netbiosname <primary NetBIOS name>
This option allows you to override the NetBIOS name that Samba uses for itself. This
is identical to setting the netbios namem[] parameter in the smb.conf file. However,
a command line setting will take precedence over settings in smb.conf.
-s|—configfile=<configuration file>
The file specified contains the configuration details required by the server. The
information in this file includes server-specific information such as what printcap file
to use, as well as descriptions of all the services that the server is to provide. See
smb.conf for more information. The default configuration file name is determined at
compile time.
-S|—server server
Name of target server. You should specify either this option or a target workgroup or
a target IP address.
-l|—long
When listing data, give more information on each item.
-v|—verbose
When listing data, give more verbose information on each item.
-f|—force
Enforcing a net command.
-P|—machine-pass
Make queries to the external server using the machine account of the local server.
—request-timeout 30
Let client requests timeout after 30 seconds the default is 10 seconds.
-t|—timeout 30
Set timeout for client operations to 30 seconds.
—use-ccache
Try to use the credentials cached by winbind.
-i|—stdin
Take input for net commands from standard input.
—tallocreport
Generate a talloc report while processing a net command.
-T|—test
Only test command sequence, dry-run.
-F|—flags FLAGS
Pass down integer flags to a net subcommand.
-C|—comment COMMENT
Pass down a comment string to a net subcommand.
-n|—myname MYNAME
Use MYNAME as a requester name for a net subcommand.
-c|—container CONTAINER
Use a specific AD container for net ads operations.
-M|—maxusers MAXUSERS
Fill in the maxusers field in net rpc share operations.
-r|—reboot
Reboot a remote machine after a command has been successfully executed (e.g. in
remote join operations).
—force-full-repl
When calling “net rpc vampire keytab” this option enforces a full re-creation of the
generated keytab file.
—single-obj-repl
When calling “net rpc vampire keytab” this option allows to replicate just a single
object to the generated keytab file.
—clean-old-entries
When calling “net rpc vampire keytab” this option allows to cleanup old entries from
the generated keytab file.
—db
Define dbfile for “net idmap” commands.
—lock
Activates locking of the dbfile for “net idmap check” command.
-a|—auto
Activates noninteractive mode in “net idmap check”.
—repair
Activates repair mode in “net idmap check”.
—acls
Includes ACLs to be copied in “net rpc share migrate”.
—attrs
Includes file attributes to be copied in “net rpc share migrate”.
—timestamps
Includes timestamps to be copied in “net rpc share migrate”.
-X|—exclude DIRECTORY
Allows to exclude directories when copying with “net rpc share migrate”.
—destination SERVERNAME
Defines the target servername of migration process (defaults to localhost).
-L|—local
Sets the type of group mapping to local (used in “net groupmap set”).
-D|—domain
Sets the type of group mapping to domain (used in “net groupmap set”).
-N|—ntname NTNAME
Sets the ntname of a group mapping (used in “net groupmap set”).
-R|—rid RID
Sets the rid of a group mapping (used in “net groupmap set”).
—reg-version REG_VERSION
Assume database version {n|1,2,3} (used in “net registry check”).
-o|—output FILENAME
Output database file (used in “net registry check”).
—wipe
Createa a new database from scratch (used in “net registry check”).
—precheck PRECHECK_DB_FILENAME
Defines filename for database prechecking (used in “net registry import”).
-e|—encrypt
This command line parameter requires the remote server support the UNIX
extensions or that the SMB3 protocol has been selected. Requests that the connection
be encrypted. Negotiates SMB encryption using either SMB3 or POSIX extensions
via GSSAPI. Uses the given credentials for the encryption negotiation (either
kerberos or NTLMv1/v2 if given domain/username/password triple. Fails the
connection if encryption cannot be negotiated.
-d|—debuglevel=level
level is an integer from 0 to 10. The default value if this parameter is not specified is
1.
The higher this value, the more detail will be logged to the log files about the
activities of the server. At level 0, only critical errors and serious warnings will be
logged. Level 1 is a reasonable level for day-to-day running - it generates a small
amount of information about operations carried out.
Levels above 1 will generate considerable amounts of log data, and should only be
used when investigating a problem. Levels above 3 are designed for use only by
developers and generate HUGE amounts of log data, most of which is extremely
cryptic.
Note that specifying this parameter here will override the log levelm[] parameter in
the smb.conf file.
-V|—version
Prints the program version number.
-s|—configfile=<configuration file>
The file specified contains the configuration details required by the server. The
information in this file includes server-specific information such as what printcap file
to use, as well as descriptions of all the services that the server is to provide. See
smb.conf for more information. The default configuration file name is determined at
compile time.
-l|—log-basename=logdirectory
Base directory name for log/debug files. The extension .progname will be appended
(e.g. log.smbclient, log.smbd, etc…). The log file is never removed by the client.
—option=<name>=<value>
Set the smb.conf(5) option “<name>” to value “<value>” from the command line.
This overrides compiled-in defaults and options read from the configuration file.
› COMMANDS
CHANGESECRETPW
This command allows the Samba machine account password to be set from an
external application to a machine account password that has already been stored in
Active Directory. DO NOT USE this command unless you know exactly what you
are doing. The use of this command requires that the force flag (-f) be used also.
There will be NO command prompt. Whatever information is piped into stdin, either
by typing at the command line or otherwise, will be stored as the literal machine
password. Do NOT use this without care and attention as it will overwrite a
legitimate machine password without warning. YOU HAVE BEEN WARNED.
TIME
The NET TIME command allows you to view the time on a remote server or
synchronise the time on the local server with the time on the remote server.
TIME
Without any options, the NET TIME command displays the time on the remote
server. The remote server must be specified with the -S option.
TIME SYSTEM
Displays the time on the remote server in a format ready for /bin/date. The remote
server must be specified with the -S option.
TIME SET
Tries to set the date and time of the local server to that on the remote server using
/bin/date. The remote server must be specified with the -S option.
TIME ZONE
Displays the timezone in hours from GMT on the remote server. The remote server
must be specified with the -S option.
Join a domain. If the account already exists on the server, and [TYPE] is MEMBER,
the machine will attempt to join automatically. (Assuming that the machine has been
created in server manager) Otherwise, a password will be prompted for, and a new
account may be created.
[TYPE] may be PDC, BDC or MEMBER to specify the type of server joining the
domain.
[UPN] (ADS only) set the principalname attribute during the join. The default format
is host/netbiosname@REALM.
[OU] (ADS only) Precreate the computer account in a specific OU. The OU string
reads from top to bottom without RDNs, and is delimited by a ‘/’. Please note that ‘'
is used for escape by both the shell and ldap, so it may need to be doubled or
quadrupled to pass through, and it is not used as a delimiter.
[PASS] (ADS only) Set a specific password on the computer account being created
by the join.
[osName=string osVer=String] (ADS only) Set the operatingSystem and
operatingSystemVersion attribute during the join. Both parameters must be specified
for either to take effect.
Join a domain. Use the OLDJOIN option to join the domain using the old style of
domain joining - you need to create a trust account in server manager first.
[RPC|ADS] USER
[RPC|ADS] USER
[RPC|ADS] USER ADD name [password] [-F user flags] [-C comment]
[RPC|ADS] GROUP
[RAP|RPC] SHARE
Adds a share from a server (makes the export active). Maxusers specifies the number
of users that can be connected to the share simultaneously.
[RPC|RAP] FILE
[RPC|RAP] FILE
Print information on specified fileid. Currently listed are: file-id, username, locks,
path, permissions.
List files opened by specified user. Please note that net rap file user does not work
against Samba servers.
SESSION
RAP SESSION
Without any other options, SESSION enumerates all active SMB/CIFS sessions on
the target server.
RAP DOMAIN
RAP PRINTQ
Lists the specified print queue and print jobs on the server. If the QUEUE_NAME is
omitted, all queues are listed.
Validate whether the specified user can log in to the remote server. If the password is
not specified on the commandline, it will be prompted.
Note
Currently NOT implemented.
RAP GROUPMEMBER
Execute the specified command on the remote server. Only works with OS/2 servers.
Note
Currently NOT implemented.
RAP SERVICE
Start the specified service on the remote server. Not implemented yet.
Note
Currently NOT implemented.
Note
Currently NOT implemented.
LOOKUP
Lookup the IP address of the given host with the specified type (netbios suffix). The type
defaults to 0x20 (workstation).
Give IP address of KDC for the specified REALM. Defaults to local realm.
LOOKUP DC [DOMAIN]
Give IP’s of Domain Controllers for specified DOMAIN. Defaults to local domain.
Give IP of master browser for specified DOMAIN or workgroup. Defaults to local domain.
CACHE
Samba uses a general caching interface called ‘gencache’. It can be controlled using ‘NET
CACHE’.
All the timeout parameters support the suffixes:
s - Seconds
m - Minutes
h - Hours
d - Days
w - Weeks
CACHE LIST
CACHE FLUSH
GETLOCALSID [DOMAIN]
Prints the SID of the specified domain, or if the parameter is omitted, the SID of the local
server.
SETLOCALSID S-1-5-21-x-y-z
GETDOMAINSID
Prints the local machine SID and the SID of the current domain.
SETDOMAINSID
GROUPMAP
Manage the mappings between Windows group SIDs and UNIX groups. Common options
include:
unixgroup - Name of the UNIX group
ntgroup - Name of the Windows NT group (must be resolvable to a SID
rid - Unsigned 32-bit integer
sid - Full SID in the form of “S-1-…”
type - Type of the group; either ‘domain’, ‘local’, or ‘builtin’
comment - Freeform text description of the group
GROUPMAP ADD
GROUPMAP DELETE
Delete a group mapping entry. If more than one group name matches, the first entry found
is deleted.
net groupmap delete {ntgroup=string|sid=SID}
GROUPMAP MODIFY
GROUPMAP LIST
List existing group mapping entries.
net groupmap list [verbose] [ntgroup=string] [sid=SID]
MAXRID
Prints out the highest RID currently in use on the local server (by the active ‘passdb
backend’).
RPC INFO
Print information about the domain of the remote server, such as domain name, domain sid
and number of users and groups.
[RPC|ADS] TESTJOIN
[RPC|ADS] CHANGETRUSTPW
RPC TRUSTDOM
Add a interdomain trust account for DOMAIN. This is in fact a Samba account named
DOMAIN$ with the account flag I (interdomain trust account). This is required for
incoming trusts to work. It makes Samba be a trusted domain of the foreign (trusting)
domain. Users of the Samba domain will be made available in the foreign domain. If the
command is used against localhost it has the same effect as smbpasswd -a -i DOMAIN.
Please note that both commands expect a appropriate UNIX account.
Remove interdomain trust account for DOMAIN. If it is used against localhost it has the
same effect as smbpasswd -x DOMAIN$.
RPC TRUST
RPC RIGHTS
This subcommand is used to view and manage Samba’s rights assignments (also referred
to as privileges). There are three options currently available: list, grant, and revoke. More
details on Samba’s privilege model and its use can be found in the Samba-HOWTO-
Collection.
RPC ABORTSHUTDOWN
Print out sam database of remote server. You need to run this against the PDC, from a
Samba machine joined as a BDC.
RPC VAMPIRE
Export users, aliases and groups from remote server to local server. You need to run this
against the PDC, from a Samba machine joined as a BDC. This vampire command cannot
be used against an Active Directory, only against an NT4 Domain Controller.
RPC GETSID
Fetch domain SID and store it in the local secrets.tdb (or secrets.ntdb).
ADS LEAVE
ADS STATUS
Print out status of machine account of the local machine in ADS. Prints out quite some
debug info. Aimed at developers, regular users should use NET ADS TESTJOIN.
ADS PRINTER
Lookup info for PRINTER on SERVER. The printer name defaults to “*”, the server name
defaults to the local host.
ADS DN DN (attributes)
Perform a raw LDAP search on a ADS server and dump the results. The DN standard
LDAP DN, and the attributes are a list of LDAP fields to show in the result.
Example: net ads dn ‘CN=administrator,CN=Users,DC=my,DC=domain’
SAMAccountName
ADS WORKGROUP
ADS ENCTYPES
(Re)Create a BUILTIN group. Only a wellknown set of BUILTIN groups can be created
with this command. This is the list of currently recognized group names: Administrators,
Users, Guests, Power Users, Account Operators, Server Operators, Print Operators,
Backup Operators, Replicator, RAS Servers, Pre-Windows 2000 compatible Access. This
command requires a running Winbindd with idmap allocation properly configured. The
group gid will be allocated out of the winbindd range.
Create a LOCAL group (also known as Alias). This command requires a running
Winbindd with idmap allocation properly configured. The group gid will be allocated out
of the winbindd range.
Map an existing Unix group and make it a Domain Group, the domain group will have the
same name.
Add a member to a Local group. The group can be specified only by name, the member
can be specified by name or SID.
Remove a member from a Local group. The group and the member must be specified by
name.
Show the full DOMAIN\NAME the SID and the type for the corresponding account.
Set or unset the “password must change” flag for a user account.
Set a value for the account policy. Valid values can be: “forever”, “never”, “off”, or a
number.
SAM PROVISION
Only available if ldapsam:editposix is set and winbindd is running. Properly populates the
ldap tree with the basic accounts (Administrator) and groups (Domain Users, Domain
Admins, Domain Guests) on the ldap tree.
Dumps the mappings contained in the local tdb file specified. This command is useful to
dump only the mappings produced by the idmap_tdb backend.
Store a domain-range mapping for a given domain (and index) in autorid database.
Get the range for a given domain and index from autorid database.
Get ranges for all domains or for one identified by given SID.
Delete a mapping sid <-> gid or sid <-> uid from the IDMAP database. The mapping is
given by <ID> which may either be a sid: S-x-…, a gid: “GID number” or a uid: “UID
number”. Use -f to delete an invalid partial mapping <ID> -> xx
Use “smbcontrol all idmap …” to notify running smbd instances. See the smbcontrol(1)
manpage for details.
Delete a domain range mapping identified by ‘RANGE’ or “domain SID and INDEX”
from autorid database. Use -f to delete invalid mappings.
Delete all domain range mappings for a domain identified by SID. Use -f to delete invalid
mappings.
Check and repair the IDMAP database. If no option is given a read only check of the
database is done. Among others an interactive or automatic repair mode may be chosen
with one of the following options:
-r|—repair
Interactive repair mode, ask a lot of questions.
-a|—auto
Noninteractive repair mode, use default answers.
-v|—verbose
Produce more output.
-f|—force
Try to apply changes, even if they do not apply cleanly.
-T|—test
Dry run, show what changes would be made but don’t touch anything.
-l|—lock
Lock the database while doing the check.
—db <DB>
Check the specified database.
It reports about the finding of the following errors:
Missing reverse mapping:
A record with mapping A->B where there is no B->A. Default action in repair mode
is to “fix” this by adding the reverse mapping.
Invalid mapping:
A record with mapping A->B where B->C. Default action is to “delete” this record.
Missing or invalid HWM:
A high water mark is not at least equal to the largest ID in the database. Default
action is to “fix” this by setting it to the largest ID found +1.
Invalid record:
Something we failed to parse. Default action is to “edit” it in interactive and “delete”
it in automatic mode.
USERSHARE
Starting with version 3.0.23, a Samba server now supports the ability for non-root users to
add user defined shares to be exported using the “net usershare” commands.
To set this up, first set up your smb.conf by adding to the [global] section: usershare path
= /usr/local/samba/lib/usershares Next create the directory /usr/local/samba/lib/usershares,
change the owner to root and set the group owner to the UNIX group who should have the
ability to create usershares, for example a group called “serverops”. Set the permissions on
/usr/local/samba/lib/usershares to 01770. (Owner and group all access, no access for
others, plus the sticky bit, which means that a file in that directory can be renamed or
deleted only by the owner of the file). Finally, tell smbd how many usershares you will
allow by adding to the [global] section of smb.conf a line such as : usershare max shares =
100. To allow 100 usershare definitions. Now, members of the UNIX group “serverops”
can create user defined shares on demand using the commands below.
The usershare commands are:
net usershare add sharename path [comment [acl] [guest_ok=[y|n]]] - to add or
change a user defined share.
net usershare delete sharename - to delete a user defined share.
net usershare info [-l|—long] [wildcard sharename] - to print info about a user
defined share.
net usershare list [-l|—long] [wildcard sharename] - to list user defined shares.
Deletes the user defined share by name. The Samba smbd daemon immediately notices
this change, although it will not disconnect any users currently connected to the deleted
share.
Get info on user defined shares owned by the current user matching the given pattern, or
all users.
net usershare info on its own dumps out info on the user defined shares that were created
by the current user, or restricts them to share names that match the given wildcard pattern
(‘*’ matches one or more characters, ‘?’ matches only one character). If the ‘-l’ or ‘—long’
option is also given, it prints out info on user defined shares created by other users.
The information given about a share looks like: [foobar] path=/home/jeremy
comment=testme usershare_acl=Everyone:F guest_ok=n And is a list of the current
settings of the user defined share that can be modified by the “net usershare add”
command.
List all the user defined shares owned by the current user matching the given pattern, or all
users.
net usershare list on its own list out the names of the user defined shares that were created
by the current user, or restricts the list to share names that match the given wildcard
pattern (‘*’ matches one or more characters, ‘?’ matches only one character). If the ‘-l’ or
‘—long’ option is also given, it includes the names of user defined shares created by other
users.
[RPC] CONF
Starting with version 3.2.0, a Samba server can be configured by data stored in registry.
This configuration data can be edited with the new “net conf” commands. There is also the
possibility to configure a remote Samba server by enabling the RPC conf mode and
specifying the address of the remote server.
The deployment of this configuration data can be activated in two levels from the
smb.conf file: Share definitions from registry are activated by setting registry shares to
“yes” in the [global] section and global configuration options are activated by setting
include = registrym[] in the [global] section for a mixed configuration or by setting
config backend = registrym[] in the [global] section for a registry-only configuration.
See the smb.conf(5) manpage for details.
The conf commands are:
net [rpc] conf list - Dump the complete configuration in smb.conf like format.
net [rpc] conf import - Import configuration from file in smb.conf format.
net [rpc] conf listshares - List the registry shares.
net [rpc] conf drop - Delete the complete configuration from registry.
net [rpc] conf showshare - Show the definition of a registry share.
net [rpc] conf addshare - Create a new registry share.
net [rpc] conf delshare - Delete a registry share.
net [rpc] conf setparm - Store a parameter.
net [rpc] conf getparm - Retrieve the value of a parameter.
net [rpc] conf delparm - Delete a parameter.
net [rpc] conf getincludes - Show the includes of a share definition.
net [rpc] conf setincludes - Set includes for a share.
net [rpc] conf delincludes - Delete includes from a share definition.
Print the configuration data stored in the registry in a smb.conf-like format to standard
output.
Show the definition of the share or section specified. It is valid to specify “global” as
sharename to retrieve the global configuration options from registry.
Create a new share definition in registry. The sharename and path have to be given. The
share name may not be “global”. Optionally, values for the very common options
“writeable”, “guest ok” and a “comment” may be specified. The same result may be
obtained by a sequence of “net conf setparm” commands.
Store a parameter in registry. The section may be global or a sharename. The section is
created if it does not exist yet.
Get the list of includes for the provided section (global or share).
Note that due to the nature of the registry database and the nature of include directives, the
includes need special treatment: Parameters are stored in registry by the parameter name
as valuename, so there is only ever one instance of a parameter per share. Also, a specific
order like in a text file is not guaranteed. For all real parameters, this is perfectly ok, but
the include directive is rather a meta parameter, for which, in the smb.conf text file, the
place where it is specified between the other parameters is very important. This can not be
achieved by the simple registry smbconf data model, so there is one ordered list of
includes per share, and this list is evaluated after all the parameters of the share.
Further note that currently, only files can be included from registry configuration. In the
future, there will be the ability to include configuration data from other registry keys.
Set the list of includes for the provided section (global or share) to the given list of one or
more filenames. The filenames may contain the usual smb.conf macros like %I.
Delete the list of includes from the provided section (global or share).
REGISTRY
Manipulate Samba’s registry.
The registry commands are:
net registry enumerate - Enumerate registry keys and values.
net registry enumerate_recursive - Enumerate registry key and its subkeys.
net registry createkey - Create a new registry key.
net registry deletekey - Delete a registry key.
net registry deletekey_recursive - Delete a registry key with subkeys.
net registry getvalue - Print a registry value.
net registry getvalueraw - Print a registry value (raw format).
net registry setvalue - Set a new registry value.
net registry increment - Increment a DWORD registry value under a lock.
net registry deletevalue - Delete a registry value.
net registry getsd - Get security descriptor.
net registry getsd_sdd1 - Get security descriptor in sddl format.
net registry setsd_sdd1 - Set security descriptor from sddl format string.
net registry import - Import a registration entries (.reg) file.
net registry export - Export a registration entries (.reg) file.
net registry convert - Convert a registration entries (.reg) file.
net registry check - Check and repair a registry database.
Delete the given key and its values from the registry, if it has no subkeys.
Delete the given key and all of its subkeys and values from the registry.
Output type and actual value of the value name of the given key.
Set the value name of an existing key. type may be one of sz, multi_sz or dword. In case of
multi_szvalue may be given multiple times.
Increment the DWORD value name of key by inc while holding a g_lock. inc defaults to
1.
Get the security descriptor of the given key as a Security Descriptor Definition Language
(SDDL) string.
Set the security descriptor of the given key from a Security Descriptor Definition
Language (SDDL) string sd.
Check and repair the registry database. If no option is given a read only check of the
database is done. Among others an interactive or automatic repair mode may be chosen
with one of the following options
-r|—repair
Interactive repair mode, ask a lot of questions.
-a|—auto
Noninteractive repair mode, use default answers.
-v|—verbose
Produce more output.
-T|—test
Dry run, show what changes would be made but don’t touch anything.
-l|—lock
Lock the database while doing the check.
—reg-version={1,2,3}
Specify the format of the registry database. If not given it defaults to the value of the
binary or, if an registry.tdb is explizitly stated at the commandline, to the value found
in the INFO/version record.
[—db] <DB>
Check the specified database.
-o|—output <ODB>
Create a new registry database <ODB> instead of modifying the input. If <ODB> is
already existing —wipe may be used to overwrite it.
—wipe
Replace the registry database instead of modifying the input or overwrite an existing
output database.
EVENTLOG
Starting with version 3.4.0 net can read, dump, import and export native win32 eventlog
files (usually *.evt). evt files are used by the native Windows eventviewer tools.
The import and export of evt files can only succeed when eventlog list is used in smb.conf
file. See the smb.conf(5) manpage for details.
The eventlog commands are:
net eventlog dump - Dump a eventlog *.evt file on the screen.
net eventlog import - Import a eventlog *.evt into the samba internal tdb based
representation of eventlogs.
net eventlog export - Export the samba internal tdb based representation of eventlogs
into an eventlog *.evt file.
Imports a eventlog *.evt file defined by filename into the samba internal tdb representation
of eventlog defined by eventlog. eventlog needs to part of the eventlog list defined in
smb.conf. See the smb.conf(5) manpage for details.
DOM
Starting with version 3.2.0 Samba has support for remote join and unjoin APIs, both client
and server-side. Windows supports remote join capabilities since Windows 2000.
In order for Samba to be joined or unjoined remotely an account must be used that is
either member of the Domain Admins group, a member of the local Administrators group
or a user that is granted the SeMachineAccountPrivilege privilege.
The client side support for remote join is implemented in the net dom commands which
are:
net dom join - Join a remote computer into a domain.
net dom unjoin - Unjoin a remote computer from a domain.
net dom renamecomputer - Renames a remote computer joined to a domain.
Unjoins a computer from a domain. This command supports the following additional
parameters:
ACCOUNT defines a domain account that will be used to unjoin the machine from
the domain. This domain account needs to have sufficient privileges to unjoin
machines.
PASSWORD defines the password for the domain account defined with ACCOUNT.
REBOOT is an optional parameter that can be set to reboot the remote machine after
successful unjoin from the domain.
Note that you also need to use standard net parameters to connect and authenticate to the
remote machine that you want to unjoin. These additional parameters include: -S computer
and -U user.
Example: net dom unjoin -S xp -U XP\administrator%secret
account=MYDOM\administrator password=topsecret reboot.
This example would connect to a computer named XP as the local administrator using
password secret, and unjoin the computer from the domain using the MYDOM domain
administrator account and password topsecret. After successful unjoin, the computer
would reboot.
Renames a computer that is joined to a domain. This command supports the following
additional parameters:
NEWNAME defines the new name of the machine in the domain.
ACCOUNT defines a domain account that will be used to rename the machine in the
domain. This domain account needs to have sufficient privileges to rename machines.
PASSWORD defines the password for the domain account defined with ACCOUNT.
REBOOT is an optional parameter that can be set to reboot the remote machine after
successful rename in the domain.
Note that you also need to use standard net parameters to connect and authenticate to the
remote machine that you want to rename in the domain. These additional parameters
include: -S computer and -U user.
Example: net dom renamecomputer -S xp -U XP\administrator%secret
newname=XPNEW account=MYDOM\administrator password=topsecret reboot.
This example would connect to a computer named XP as the local administrator using
password secret, and rename the joined computer to XPNEW using the MYDOM domain
administrator account and password topsecret. After successful rename, the computer
would reboot.
G_LOCK
Execute a shell command under a global lock. This might be useful to define the order in
which several shell commands will be executed. The locking information is stored in a file
called g_lock.tdb. In setups with CTDB running, the locking information will be available
on all cluster nodes.
LOCKNAME defines the name of the global lock.
TIMEOUT defines the timeout.
COMMAND defines the shell command to execute.
G_LOCK LOCKS
HELP [COMMAND]