今天在看一个shell脚本的时候碰到的dd命令,在那个脚本中,该作者使用dd命令复制/取文件的前1024KB的内容(该脚本的功能是删除指定路径下的重复文件),目的在于避免文件过大时使用cp命令而造成的效率降低。最开始接触Linux系统的时候就见到过dd命令,但基本没有用过,这次想着要不了解一下?然后就有了下文:
搜索关键字:
http://search.aol.com/aol/search?q=linux+time+dd+command+i%2Fo+speed
==
功能说明:读取,转换并输出数据。
语 法:dd [bs=<字节数>][cbs=<字节数>][conv=<关键字>][count=<区块数>][ibs=<字节数>][if=<文件>][obs=<字节数>][of=<文件>][seek=<区块数>][skip=<区块数>][–help][–version]
补充说明:dd可从标准输入或文件读取数据,依指定的格式来转换数据,再输出到文件,设备或标准输出。
参 数:
bs=<字节数> 将ibs( 输入)与obs(输出)设成指定的字节数。
cbs=<字节数> 转换时,每次只转换指定的字节数。
conv=<关键字> 指定文件转换的方式。
count=<区块数> 仅读取指定的区块数。
ibs=<字节数> 每次读取的字节数。
if=<文件> 从文件读取。
obs=<字节数> 每次输出的字节数。
of=<文件> 输出到文件。
seek=<区块数> 一开始输出时,跳过指定的区块数。
skip=<区块数> 一开始读取时,跳过指定的区块数。
示例:
把第一个硬盘的前 512 个字节存为一个文件:
$ dd if=/dev/hda of=disk.mbr bs=512 count=1
=
修复硬盘:
$ dd if=/dev/sda of=/dev/sda
=
确定硬盘的最佳块大小:
$ dd if=/dev/zero bs=1024 count=1000000 of=/root/1Gb.file $ dd if=/dev/zero bs=2048 count=500000 of=/root/1Gb.file $ dd if=/dev/zero bs=4096 count=250000 of=/root/1Gb.file $ dd if=/dev/zero bs=8192 count=125000 of=/root/1Gb.file
通过比较以上命令输出中所显示的命令执行时间,即可确定系统最佳的块大小。
=
销毁磁盘数据:
$ sudo dd if=/dev/urandom of=/dev/hda1
注意:利用随机数据填充硬盘,在某些必要的场合可以用来销毁数据。
=
$ time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"; rm ddfile
参考链接:
- http://linux.die.net/man/1/dd
- http://www.cnblogs.com/sopost/archive/2010/08/13/2190102.html
- http://jingyan.baidu.com/article/d45ad148e203f969552b800a.html
- http://www.cyberciti.biz/faq/linux-unix-dd-command-show-progress-while-coping/
- http://ganquan.info/linux/command/dd
- https://www.thomas-krenn.com/en/wiki/Linux_I/O_Performance_Tests_using_dd
- http://unix.stackexchange.com/questions/108838/how-can-i-benchmark-my-hdd
Linux下获取某文件的指定位置的内容
搜索关键字:
http://search.aol.com/aol/search?q=linux+get+first+1000bytes+of+a+file
参考答案:
$ head -c 6000k /var/dump.log $ dd if=/var/dump.log of=/root/1024bytes.txt bs=1024 count=1 $ dd if=infile of=outfile bs=10 skip=1
参考链接:
- http://stackoverflow.com/questions/4411014/how-to-get-only-the-first-ten-bytes-of-a-binary-file
- http://stackoverflow.com/questions/218912/linux-command-like-cat-to-read-a-specified-quantity-of-characters
- http://www.unix.com/shell-programming-and-scripting/223589-get-files-first-x-bytes.html
dd命令和cp/cat命令的区别
在了解了上面的内容之后,我更好奇了,为什么dd命令看上去这么好,现在人们在Linux系统上进行文件复制时用的都是cp命令呢?难道仅仅是因为cp命令用起来简单么?
搜索关键字:
http://search.aol.com/aol/search?q=Linux+dd+vs+cp+efficiency
参考结论:
dd works on the file you specify, making it able to copy data between devices, or from a device to a file. This is commonly used for moving data if devices specifically are involved (create an iso image from a cd-rom disc for example: dd if=/dev/cdrom of=mycdrom.iso), or backup raw devices (sometimes used in RAC databases: dd if=/dev/raw/raw1 of=device_raw1)
cp is used for duplicating file content to a new file or to a new location. things you specifically want there are preservation of ownership, timestamp and mode (rights), and being able to recurse the operation(=being able to copy directories).
cp命令是用来复制文件内容的(同时保留所有者、时间戳和权限等信息),并且允许递归操作(方便进行文件夹级别的复制操作);
dd命令一般是用来在不同的设备之间复制数据或备份原始数据(早期用得较多)。
参考链接:
- http://superuser.com/questions/609211/why-do-we-use-cp-to-copy-files-and-not-dd-in-unix-derivatives
- http://stackoverflow.com/questions/150697/is-dd-better-than-cat
- http://unix.stackexchange.com/questions/30295/cp-vs-cat-to-copy-a-file
- http://unix.stackexchange.com/questions/9432/is-there-a-way-to-determine-the-optimal-value-for-the-bs-parameter-to-dd
- http://serverfault.com/questions/43014/copying-a-large-directory-tree-locally-cp-or-rsync
- http://serverfault.com/questions/214917/best-way-to-copy-large-amount-of-data-between-partitions
- http://serverfault.com/questions/208300/quickest-way-to-transfer-55gb-of-images-to-new-server
在不同的Linux服务器之间传递数据的最佳实践
搜索关键字:
http://search.aol.com/aol/search?q=Linux+best+way+to+transfer+files+between+servers
参考结论:
安全起见,走ssh协议,一般就是scp或者rsync,但要记得在传输之前先打个包压缩一下(可以利用管道避免掉中间的临时文件)。
There are two ways I usually do this, both use ssh:
$ scp -r sourcedir/ [email protected]:/dest/dir/
or, the more robust and faster (in terms of transfer speed) method:
$ rsync -auv -e ssh --progress sourcedir/ [email protected]:/dest/dir/
Read the man pages for each command if you want more details about how they work.
==
Instead of using tar to write to your local disk, you can write directly to the remote server over the network using ssh.
server1$ tar -zc ./path | ssh server2 "cat > ~/file.tar.gz"
Any string that follows your “ssh” command will be run on the remote server instead of the interactive logon. You can pipe input/output to and from those remote commands through SSH as if they were local. Putting the command in quotes avoids any confusion, especially when using redirection.
Or, you can extract the tar file on the other server directly:
server1$ tar -zc ./path | ssh server2 "tar -zx -C /destination"
Note the seldom-used -C option. It means “change to this directory first before doing anything.”
Or, perhaps you want to “pull” from the destination server:
server2$ tar -zx -C /destination < <(ssh server2 "tar -zc -C /srcdir ./path")
Note that the <(cmd) construct is new to bash and doesn’t work on older systems. It runs a program and sends the output to a pipe, and substitutes that pipe into the command as if it was a file.
I could just have easily have written the above as follows:
server2$ tar -zx -C /destination -f <(ssh server2 "tar -zc -C /srcdir ./path")
Or as follows:
server2$ ssh server2 "tar -zc -C /srcdir ./path" | tar -zx -C /destination
Or, you can save yourself some grief and just use rsync:
server1$ rsync -az ./path server2:/destination/
Finally, remember that compressing the data before transfer will reduce your bandwidth, but on a very fast connection, it may actually make the operation take more time. This is because your computer may not be able to compress fast enough to keep up: if compressing 100MB takes longer than it would take to send 100MB, then it’s faster to send it uncompressed.
Alternately, you may want to consider piping to gzip yourself (rather than using the -z option) so that you can specify a compression level. It’s been my experience that on fast network connections with compressible data, using gzip at level 2 or 3 (the default is 6) gives the best overall throughput in most cases. Like so:
server1$ tar -c ./path | gzip -2 | ssh server2 "cat > ~/file.tar.gz"