xbonez
New Member
- Joined
- Nov 29, 2010
- Messages
- 1,182 (0.24/day)
- Location
- Philly, PA (US)
System Name | Winter |
---|---|
Processor | AMD Phenom II x4 965 BE @ 4.0Ghz |
Motherboard | MSI 790FX-GD70 |
Cooling | Corsair H50 Liquid Cooling |
Memory | 2 x 2Gb Gskill Ripjaws 1600Mhz (7-7-7-24@1.6V) |
Video Card(s) | Asus GTX 470 @ Stock (Zalman VF3000 cooler) |
Storage | 2 x Samsung Spinpoint F3 500GB (RAID 0) |
Display(s) | Hanns G 28" @ 1920x1200 |
Case | Antec 1200 |
Audio Device(s) | Onboard -- TosLink --> Z5500 |
Power Supply | Corsair 850TX 850W PSU |
Software | Win 7 64-bit Ultimate |
I needed to sort through a directory which contained about 500 or so files (many of which were identical in content), and pick out only the unique files from the directory. I ended up writing a short program in C# to do the task for me. Posting the executable and code below, should anyone ever require it.
Summary
The program iterates over all the files in the input directory and checks for uniqueness by calculating the MD5 hash of the file contents. All unique files are copied over to the output directory.
Executable: http://dl.dropbox.com/u/1276196/DND/IdenticalFileFinder.exe
Usage
The program expects two command line arguments.
The first argument is the input directory.
The second argument is the output directory where the unique files will get copied.
A trailing slash is optional.
Source
Summary
The program iterates over all the files in the input directory and checks for uniqueness by calculating the MD5 hash of the file contents. All unique files are copied over to the output directory.
Executable: http://dl.dropbox.com/u/1276196/DND/IdenticalFileFinder.exe
Usage
The program expects two command line arguments.
The first argument is the input directory.
The second argument is the output directory where the unique files will get copied.
A trailing slash is optional.
Source
Code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Security.Cryptography;
using System.Text;
namespace IdenticalFileFinder
{
class Program
{
static string baseDir = String.Empty;
static string outputDir = String.Empty;
static void Main(string[] args)
{
getArgs(args);
normalizeDirectories();
List<string> hashes = new List<string>();
string[] files = Directory.GetFiles(baseDir);
if (!Directory.Exists(outputDir))
{
try
{
Directory.CreateDirectory(outputDir);
}
catch(Exception e)
{
Console.WriteLine(e.ToString());
Environment.Exit((int)ExitScenarios.READWRITEERROR);
}
}
foreach(string fileName in files)
{
string hash = GetMD5HashFromFile(fileName);
if (!hashes.Contains(hash))
{
string dest = outputDir + Path.GetFileName(fileName);
hashes.Add(hash);
try
{
File.Copy(fileName, dest);
}
catch (Exception e)
{
Console.WriteLine(e.ToString());
Environment.Exit((int)ExitScenarios.READWRITEERROR);
}
}
}
Console.WriteLine(String.Format("{0} unique files found in {1} total files.",hashes.Count,files.Length));
Environment.Exit((int)ExitScenarios.SUCCESS);
}
public static void normalizeDirectories()
{
if (baseDir[baseDir.Length - 1] != '\\')
baseDir += "\\";
if (outputDir[outputDir.Length - 1] != '\\')
outputDir += "\\";
}
public static void getArgs(string[] args)
{
if (args.Length == 2)
{
baseDir = args[0];
outputDir = args[1];
}
else
{
Console.WriteLine("Incorrect arguments passed in. First argument should be input directory and second should be output directory.");
Environment.Exit((int)ExitScenarios.ARGFAIL);
}
if (!Directory.Exists(baseDir))
{
Console.WriteLine("Input directory not found.");
Environment.Exit((int)ExitScenarios.ARGFAIL);
}
}
static public string GetMD5HashFromFile(string fileName)
{
FileStream file = new FileStream(fileName, FileMode.Open);
MD5 md5 = new MD5CryptoServiceProvider();
byte[] retVal = md5.ComputeHash(file);
file.Close();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < retVal.Length; i++)
{
sb.Append(retVal[i].ToString("x2"));
}
return sb.ToString();
}
}
public enum ExitScenarios
{
SUCCESS,
ARGFAIL,
READWRITEERROR
}
}
Last edited: