本帖最後由 SuperElephant 於 2023-11-13 19:46 編輯
回覆 5# ericauky
大型檔案比對需時, 以下答案會對比每個檔案的全部內容(hash), 而非檔案名稱
開cmd打"python --version"然後enter, 如無顯示version, 請安裝Python
新增檔案C:\dedup.py將以下代碼copy+paste落dedup.py然後save
再開cmd, 行: python "C:\dedup.py" "D:\target_directory"
"D:\target_directory" 改成你需要搵重複的路徑
代碼:- import os
- import hashlib
- import argparse
- def calculate_sha256(file_path):
- with open(file_path, 'rb') as file:
- bytes = file.read()
- readable_hash = hashlib.sha256(bytes).hexdigest()
- return readable_hash
- def find_duplicates(target_directory):
- file_hashes = {}
- for dirpath, dirnames, filenames in os.walk(target_directory):
- for filename in filenames:
- file_path = os.path.join(dirpath, filename)
- file_hash = calculate_sha256(file_path)
- if file_hash in file_hashes:
- file_hashes[file_hash].append(file_path)
- else:
- file_hashes[file_hash] = [file_path]
- duplicates = {k: v for k, v in file_hashes.items() if len(v) > 1}
- return duplicates
- # Parse command-line arguments
- parser = argparse.ArgumentParser(description='Find duplicate files in a directory.')
- parser.add_argument('directory', type=str, help='The target directory.')
- args = parser.parse_args()
- target_directory = args.directory
- duplicates = find_duplicates(target_directory)
- for hash, file_paths in duplicates.items():
- print(f"Duplicate files for hash {hash}:")
- for file_path in file_paths:
- print(f"\t{file_path}")
- print(f"Scan completed. Found {len(duplicates)} duplicate hashes.")
複製代碼 |