本文共 9524 字,大约阅读时间需要 31 分钟。
今天是又一年度的2月14日西方情人节,先说点与情人节有关的,但这绝不是狗粮,可以放心阅读。
讲真,如果你是单身狗,没事时还是要静下心学习以提升自己;如果你不是单身狗,没事时除了上一条还要多陪陪媳妇和家人。因为没有什么是比亲人和提升自己更重要的事了!无论是提升自己还是陪伴家人,不要浮于表面,就像今天过情人节一样,向对方表达爱并不是只有这一天和那几天,而是男女双方长久的坚持和包容。
用以前有人用过的句子说:
当你的才华撑不起你的野心,那你就应该静下心来学习;当你的金钱赶不上你的梦想,那你就应该坐下来好好工作;当你的能力还驾驭不了你的目标,那就应该沉下心来历练!
正文开始:
本文为使用Python脚本检验文件系统数据完整性和防止数据篡改提供一种简单且容易实现的思路(其实很简单,只需要了解Python基础+hashlib+文件操作等)。
虽然校验数据完整性这个话题已经由很多更加完美的解决办法,但依然可以作为Python新手练手内容之一,培养一下逻辑思维,防止“老年痴呆”。
目前已经在Windows 10以及Ubuntu(Python 2.7)下测试通过,其他的平台应该也可以,欢迎帮忙测试。
编写的思路和执行过程简要如下:
1.输入要检查数据完整性的目录的路径(也支持单个文件)和要保存文件hash值的校验文件的路径,如果路径不存在,则抛出异常或者创建,取决于具体情况;
参数传入(最新版本将参数传入通过命令行的方式传入了,下面图片中是老版本中的参数传入):
在刚更新的版本中,参数传入和命令帮助通过docopt模块实现,方便使用。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | Python script to check data integrity on UNIX / Linux or Windows accept options using 'docopt' module, using 'docopt' to accept parameters and command switch Usage: checkDataIntegrity.py [ - g FILE HASH_FILE] checkDataIntegrity.py [ - c FILE HASH_FILE] checkDataIntegrity.py [ - r HASH_FILE] checkDataIntegrity.py generate FILE HASH_FILE checkDataIntegrity.py validate FILE HASH_FILE checkDataIntegrity.py reset HASH_FILE checkDataIntegrity.py ( - - version | - v) checkDataIntegrity.py - - help | - h | - ? Arguments: FILE the path to single file or directory to data protect HASH_FILE the path to hash data saved Options: - ? - h - - help show this help message and exit - v - - version show version and exit Example, try : checkDataIntegrity.py generate / tmp / tmp / data.json checkDataIntegrity.py validate / tmp / tmp / data.json checkDataIntegrity.py reset / tmp / data.json checkDataIntegrity.py - g / tmp / tmp / data.json checkDataIntegrity.py - c / tmp / tmp / data.json checkDataIntegrity.py - r / tmp / data.json checkDataIntegrity.py - - help |
合法的参数和路径:
路径不存在时抛出异常:
其他异常处理可以通过脚本内容看到。
2.首次执行保存需要校验hash值的校验文件的内容,再次执行读取原先的文件与现在的待校验的目录中的文件的hash值做比对,如果hash值不一样,则显示出该文件路径,如果全部一致,则输出提示信息
首次执行:
再次执行(检验通过):
校验不通过:
3.当文件发生变更并且想更新校验文件数据时,可以使用remakeDataIntegrity()函数将已保存的校验文件删除
Linux上的测试:
最新的代码可以从GitHub获得,链接:。
代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 | #!/usr/bin/python # encoding: utf-8 # -*- coding: utf8 -*- """ Created by PyCharm. File: LinuxBashShellScriptForOps:checkDataIntegrity.py User: Guodong Create Date: 2017/2/14 Create Time: 14:45 Python script to check data integrity on UNIX/Linux or Windows accept options using 'docopt' module, using 'docopt' to accept parameters and command switch Usage: checkDataIntegrity.py [-g FILE HASH_FILE] checkDataIntegrity.py [-c FILE HASH_FILE] checkDataIntegrity.py [-r HASH_FILE] checkDataIntegrity.py generate FILE HASH_FILE checkDataIntegrity.py validate FILE HASH_FILE checkDataIntegrity.py reset HASH_FILE checkDataIntegrity.py (--version | -v) checkDataIntegrity.py --help | -h | -? Arguments: FILE the path to single file or directory to data protect HASH_FILE the path to hash data saved Options: -? -h --help show this help message and exit -v --version show version and exit Example, try: checkDataIntegrity.py generate /tmp /tmp/data.json checkDataIntegrity.py validate /tmp /tmp/data.json checkDataIntegrity.py reset /tmp/data.json checkDataIntegrity.py -g /tmp /tmp/data.json checkDataIntegrity.py -c /tmp /tmp/data.json checkDataIntegrity.py -r /tmp/data.json checkDataIntegrity.py --help """ from docopt import docopt import os import sys import hashlib def get_hash_sum(filename, method = "sha256" , block_size = 65536 ): if not os.path.exists(filename): raise RuntimeError( "cannot open '%s' (No such file or directory)" % filename) if not os.path.isfile(filename): raise RuntimeError( "'%s' :not a regular file" % filename) if "md5" in method: checksum = hashlib.md5() elif "sha1" in method: checksum = hashlib.sha1() elif "sha256" in method: checksum = hashlib.sha256() elif "sha384" in method: checksum = hashlib.sha384() elif "sha512" in method: checksum = hashlib.sha512() else : raise RuntimeError( "unsupported method %s" % method) with open (filename, 'rb' ) as f: buf = f.read(block_size) while len (buf) > 0 : checksum.update(buf) buf = f.read(block_size) if checksum is not None : return checksum.hexdigest() else : return checksum def makeDataIntegrity(path): path = unicode (path, 'utf8' ) # For Chinese Non-ASCII character if not os.path.exists(path): raise RuntimeError( "Error: cannot access %s: No such file or directory" % path) elif os.path.isfile(path): dict_all = dict () dict_all[os.path.abspath(path)] = get_hash_sum(path) return dict_all elif os.path.isdir(path): dict_nondirs = dict () dict_dirs = dict () for top, dirs, nondirs in os.walk(path, followlinks = True ): for item in nondirs: # Do NOT use os.path.abspath(item) here, else it will make a serious bug because of # os.path.abspath(item) return "os.getcwd()" + "filename" in some case. dict_nondirs[os.path.join(top, item)] = get_hash_sum(os.path.join(top, item)) for item in dirs: dict_dirs[os.path.join(top, item)] = r"" dict_all = dict (dict_dirs, * * dict_nondirs) return dict_all def saveDataIntegrity(data, filename): import json data_to_save = json.dumps(data, encoding = 'utf-8' ) if not os.path.exists(os.path.dirname(filename)): os.makedirs(os.path.dirname(filename)) with open (filename, 'wb' ) as f: f.write(data_to_save) def readDataIntegrity(filename): import json if not os.path.exists(filename): raise RuntimeError( "cannot open '%s' (No such file or directory)" % filename) with open (filename, 'rb' ) as f: data = json.loads(f.read()) if data: return data def remakeDataIntegrity(filename): def confirm(question, default = True ): """ Ask user a yes/no question and return their response as True or False. :parameter question: ``question`` should be a simple, grammatically complete question such as "Do you wish to continue?", and will have a string similar to " [Y/n] " appended automatically. This function will *not* append a question mark for you. The prompt string, if given,is printed without a trailing newline before reading. :parameter default: By default, when the user presses Enter without typing anything, "yes" is assumed. This can be changed by specifying ``default=False``. :return True or False """ # Set up suffix if default: # suffix = "Y/n, default=True" suffix = "Y/n" else : # suffix = "y/N, default=False" suffix = "y/N" # Loop till we get something we like while True : response = raw_input ( "%s [%s] " % (question, suffix)).lower() # Default if not response: return default # Yes if response in [ 'y' , 'yes' ]: return True # No if response in [ 'n' , 'no' ]: return False # Didn't get empty, yes or no, so complain and loop print ( "I didn't understand you. Please specify '(y)es' or '(n)o'." ) if os.path.exists(filename): if confirm( "[warning] remake data integrity file \'%s\'?" % filename): os.remove(filename) print "[successful] data integrity file \'%s\' has been remade." % filename sys.exit( 0 ) else : print "[warning] data integrity file \'%s\'is not remade." % filename sys.exit( 0 ) else : print >> sys.stderr, "[error] data integrity file \'%s\'is not exist." % filename def checkDataIntegrity(path_to_check, file_to_save): from time import sleep if not os.path.exists(file_to_save): print "[info] data integrity file \'%s\' is not exist." % file_to_save print "[info] make a data integrity file to \'%s\'" % file_to_save data = makeDataIntegrity(path_to_check) saveDataIntegrity(data, file_to_save) print "[successful] make a data integrity file to \'%s\', finished!" % file_to_save, print "Now you can use this script later to check data integrity." else : old_data = readDataIntegrity(file_to_save) new_data = makeDataIntegrity(path_to_check) error_flag = True for item in old_data.keys(): try : if not old_data[item] = = new_data[item]: print >> sys.stderr, new_data[item], item sleep( 0.01 ) print "\told hash data is %s" % old_data[item], item error_flag = False except KeyError as e: print >> sys.stderr, "[error]" , e.message, "Not Exist!" error_flag = False if error_flag: print "[ successful ] passed, All files integrity is ok!" if __name__ = = '__main__' : arguments = docopt(__doc__, version = '1.0.0rc2' ) if arguments[ '-r' ] or arguments[ 'reset' ]: if arguments[ 'HASH_FILE' ]: remakeDataIntegrity(arguments[ 'HASH_FILE' ]) elif arguments[ '-g' ] or arguments[ 'generate' ]: if arguments[ 'FILE' ] and arguments[ 'HASH_FILE' ]: checkDataIntegrity(arguments[ 'FILE' ], arguments[ 'HASH_FILE' ]) elif arguments[ '-c' ] or arguments[ 'validate' ]: if arguments[ 'FILE' ] and arguments[ 'HASH_FILE' ]: checkDataIntegrity(arguments[ 'FILE' ], arguments[ 'HASH_FILE' ]) else : print >> sys.stderr, "bad parameters" sys.stderr.flush() print docopt(__doc__, argv = "--help" ) |
tag:Python校验文件完整性,文件完整性,哈希校验
这个世界属于有天赋的人, 也属于认真的人, 更属于那些 在有天赋的领域认真钻研的人。
加油,together!
--end--
本文转自 urey_pp 51CTO博客,原文链接:http://blog.51cto.com/dgd2010/1897799,如需转载请自行联系原作者