思路
刚开始我以为这题的考点是如何快速读取文件(因为这是公司多线程学习分享后布置的作业),我就用多线程来解题。后来出题人跟我说:200m测试数据时我的程序OOM了,我才醒悟这题的考点不是快速读取文件,而是大文件排序。
这题挺有意思的,解题运用了多路归并,有个巧妙的地方估计只有实操才知道——复用流。
题目
查找第K小/大数据
每个法官都有不同的办案能力,假设每个案子的难易程度都一样。现在到年底了,各个领导关注的点可能不太一样。
- 领导A:想要知道每家院办案能力最强的和办案能力最弱的法官。
- 领导B:想要知道每家院办案能力第二强的和办案能力第二弱的法官。
- 领导K:想要知道每家院办案能力第K强的和办案能力第K弱的法官。
由于目前管理比较混乱,法官的办案情况存在若干个文件中,每个文件中有若干条记录,文件中每行存储一个法官的办案数,其中包括:法院ID、用户ID(32位UUID)和办案数,每个数据中间用一个空格隔开,例如:
- 2400 UUID 2
其中法官的工作量可能是相同的。
现在需要设计一个程序从这些文件中找到每家法院第K小/大工作量的法官,输出结果具体要求如下:
输出结果按照法院ID进行排序,从小到大
工作量相同的按照用户ID进行排序,从小到大
K的取值范围
- 1 <= K
- 针对每家法院,K = min(K,n) ,其中n为该法院的用户数。
- 举个栗子:如果最后法院2400的用户数为100,传入K的值为101,则当前法院的需要查找的K = 100 = min(101,100)
实现指定方法findKthNumbers
/**
* description: 查找第K小/大数据 <br/>
* @date 20-6-18
*/
public class KthNumbers implements IKthNumbers {
/**
* 查找第K小/大数据
*
* @param dir 数据文件存放路径<br/>
* 文件个数不定,文件内容格式如下:<br/>
* 2400 69890e27cd5a47b38093b520f0eb454b 26 <br/>
* 2401 a6fd9d728a814cbcb240b520f0eb454b 32 <br/>
* 2400 751628a92f264995a268b520f0eb454b 33 <br/>
* 1908 48e102e47a704c87aee3b520f0eb454b 13 <br/>
* ......
* @param k 第K小/大
* @return 查找的结果集<br/>
* 输出结果按照法院ID进行排序,从小到大<br/>
* 工作量相同的按照用户ID进行排序,从小到大<br/>
* 输出格式如下:<br/>
* 2400 <br/>
* 69890e27cd5a47b38093b520f0eb454b 26 <br/>
* 751628a92f264995a268b520f0eb454b 33
*/
@Override
public String findKthNumbers(String dir, int k) {
return null;
}
}
输出格式,第一行法院ID,第二行工作量第K小的用户ID和工作量(以空格分割),第三行工作连第K大的用户ID和工作量(以空格分割),如下:
代码语言:javascript复制2400
69890e27cd5a47b38093b520f0eb454b 26
751628a92f264995a268b520f0eb454b 33
......
按公司要求的代码模板,代码质量优,逻辑清晰,注释中编程思路详细
环境要求
- 不能使用第三方库
- 不能使用数据库
- jdk1.8
评定标准
- 程序的正确性,能输出正确答案,性能良好 ,70%
- 程序设计合理,结构清晰,代码质量优 30%
奖励
- 神秘大奖一份
代码
代码语言:javascript复制@Data
public class FileAndReader {
private File file;
private BufferedReader reader;
private String currentRow;
public FileAndReader(File file) {
this.file = file;
try {
reader = new BufferedReader(new FileReader(file));
currentRow = reader.readLine();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
public String readNextLine() {
try {
currentRow=reader.readLine();
if (currentRow == null) {
reader.close();
}
} catch (IOException e) {
e.printStackTrace();
}
return currentRow;
}
}
@Data
public class FileAndWriter {
private File file;
private BufferedWriter writer;
private Long rows;
public FileAndWriter(File file) {
this.file = file;
try {
writer = new BufferedWriter(new FileWriter(file));
} catch (IOException e) {
e.printStackTrace();
}
rows=0L;
}
public void write(String str){
rows ;
try {
writer.write(str);
} catch (IOException e) {
e.printStackTrace();
}
}
}
public class KthNumbers implements IKthNumbers {
/** 换行符 */
public static final String WRAP = "rn";
/** 拆分的小文件最大行数 */
public static final int MAX_ROW = 10_000;
public static String PATH;
private Map<String, FileAndWriter> corpFiles = new HashMap<>();
public String findKthNumbers(String dir, int k) {
long allTimeStart = System.currentTimeMillis();
PATH = dir;
final File slipFile = new File(PATH "/slip");
final File cropFile = new File(PATH "/crop");
final File sortedFile = new File(PATH "/sorted");
deleteFile(cropFile);
deleteFile(slipFile);
deleteFile(sortedFile);
cropFile.mkdirs();
slipFile.mkdirs();
sortedFile.mkdir();
final File file = new File(dir);
// 根据法院拆分文件
if (file.isDirectory()) {
final long start = System.currentTimeMillis();
for (File singleFile : file.listFiles()) {
if (singleFile.isFile()) {
slipWithCorp(singleFile);
}
}
corpFiles.forEach((k2, v) -> {
try {
v.getWriter().close();
} catch (IOException e) {
e.printStackTrace();
}
});
final long endTime = System.currentTimeMillis();
System.out.printf("根据法院拆分文件耗时:【%s】毫秒%n", endTime - start);
}
// 拆分为小文件
final long startTimeOfSlipFile = System.currentTimeMillis();
corpFiles.forEach((k1, v) -> slipWithMaxRow(v.getFile()));
final long endTimeOfSlipFile = System.currentTimeMillis();
System.out.printf("拆分单个法院文件为小文件耗时:【%s】毫秒%n", endTimeOfSlipFile - startTimeOfSlipFile);
// 排序所有小文件
long startTime = System.currentTimeMillis();
Arrays.stream(slipFile.listFiles()).forEach(e -> {
Arrays.stream(e.listFiles()).forEach(this::sortFile);
});
long endTime = System.currentTimeMillis();
System.out.printf("排序所有小文件耗时:【%s】毫秒%n", endTime - startTime);
// 合并所有小文件
final long startTimeOfMearge = System.currentTimeMillis();
Arrays.stream(slipFile.listFiles()).forEach(e -> {
meargeSortFile(e, new File(PATH "/sorted/" e.getName() ".txt"));
});
final long endTimeOfMearge = System.currentTimeMillis();
System.out.printf("合并所有小文件耗时:【%s】毫秒%n", endTimeOfMearge - startTimeOfMearge);
// 计算目标行
final Map<String, Pair<Long, Long>> targetLine = new HashMap<>();
corpFiles.forEach((k4, v) -> {
Long first;
Long last;
if (v.getRows() < k) {
first = 1L;
last = v.getRows();
} else {
first = k - 1L;
last = v.getRows() - k;
}
targetLine.put(k4, new Pair<>(first, last));
});
// 获取目标字符串
final StringBuilder builder = new StringBuilder();
final long getResultStart = System.currentTimeMillis();
Arrays.stream(sortedFile.listFiles()).forEach(e -> {
try (final BufferedReader reader = new BufferedReader(new FileReader(e))) {
String temp;
Long count = 1L;
final String corp = e.getName().split("\.")[0];
final Pair<Long, Long> pair = targetLine.get(corp);
String first = "";
String last = "";
while ((temp = reader.readLine()) != null) {
if (count.equals(pair.getKey())) {
first = temp;
} else if (count.equals(pair.getValue())) {
last = temp;
}
count ;
}
builder.append(corp).append(WRAP)
.append(first.substring(2)).append(" ")
.append(first.substring(0, 2).startsWith("0") ? first.substring(1, 2) : first.substring(0, 2))
.append(WRAP)
.append(last.substring(2)).append(" ")
.append(last.substring(0, 2).startsWith("0") ? last.substring(1, 2) : last.substring(0, 2))
.append(WRAP);
} catch (FileNotFoundException fileNotFoundException) {
fileNotFoundException.printStackTrace();
} catch (IOException ioException) {
ioException.printStackTrace();
}
});
final long getResultEnd = System.currentTimeMillis();
System.out.printf("读取文件获取结果耗时:【%s】毫秒%n", getResultEnd - getResultStart);
long allTimeEnd = System.currentTimeMillis();
System.out.printf("总耗时:【%s】毫秒%n", allTimeEnd - allTimeStart);
return builder.toString();
}
/**
* description: 递归删除文件
* version: 1.0
* date: 2021/3/17 11:47
* author: 余游洋
*
* @param file
* @return void
*/
private void deleteFile(File file) {
if (file.exists()) {
if (file.isDirectory()) {
File[] files = file.listFiles();
for (int i = 0; i < files.length; i ) {
deleteFile(files[i]);
}
} else {
file.delete();
}
}
}
/**
* description: 根据法院编号拆分文件
* version: 1.0
* date: 2021/3/17 11:48
* author: 余游洋
*
* @param file
* @return void
*/
public void slipWithCorp(File file) {
try (final BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(new FileInputStream(file)))) {
String temp;
while ((temp = bufferedReader.readLine()) != null) {
final String[] s = temp.split(" ");
final FileAndWriter corpFile = corpFiles.computeIfAbsent(s[0], e -> {
File newCorpFile = new File(PATH "/crop/" e ".txt");
try {
newCorpFile.createNewFile();
} catch (IOException ioException) {
ioException.printStackTrace();
}
return new FileAndWriter(newCorpFile);
});
corpFile.write((s[2].length() == 2 ? s[2] : "0" s[2]) s[1] WRAP);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* description: 拆分为指定行数的小文件
* version: 1.0
* date: 2021/3/17 11:48
* author: 余游洋
*
* @param file
* @return void
*/
public void slipWithMaxRow(File file) {
final String slipDir = PATH "/slip/" file.getName().split("\.")[0];
new File(slipDir).mkdirs();
int fileCount = 0;
File slipFile = getSlipFile(slipDir, fileCount);
BufferedWriter writer = null;
try (
BufferedReader reader = new BufferedReader(new FileReader(file));
) {
writer = new BufferedWriter(new FileWriter(slipFile));
String temp;
int count = 0;
while ((temp = reader.readLine()) != null) {
writer.append(temp).append(WRAP);
if ( count == MAX_ROW) {
count = 0;
writer.close();
slipFile = getSlipFile(slipDir, fileCount);
writer = new BufferedWriter(new FileWriter(slipFile));
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
writer.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
/**
* description: 将文件加载到内存排序
* version: 1.0
* date: 2021/3/17 11:49
* author: 余游洋
*
* @param file
* @return void
*/
public void sortFile(File file) {
final ArrayList<String> strings = new ArrayList<>();
try (final BufferedReader reader = new BufferedReader(new FileReader(file))) {
String temp;
while ((temp = reader.readLine()) != null) {
strings.add(temp);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
Collections.sort(strings);
try (final BufferedWriter writer = new BufferedWriter(new FileWriter(file))) {
strings.forEach(e -> {
try {
writer.append(e).append(WRAP);
} catch (IOException ioException) {
ioException.printStackTrace();
}
});
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* description: 利用多路归并,合并小文件
* version: 1.0
* date: 2021/3/17 12:40
* author: 余游洋
*
* @param fromDir
* @param target
* @return void
*/
public void meargeSortFile(File fromDir, File target) {
final TreeSet<FileAndReader> readerTreeSet = new TreeSet<>(Comparator.comparing(FileAndReader::getCurrentRow));
for (File file : fromDir.listFiles()) {
FileAndReader fileAndReader = new FileAndReader(file);
if (fileAndReader.getCurrentRow() != null) {
readerTreeSet.add(fileAndReader);
}
}
try (final BufferedWriter writer = new BufferedWriter(new FileWriter(target))) {
while (!readerTreeSet.isEmpty()) {
final FileAndReader first = readerTreeSet.pollFirst();
writer.append(first.getCurrentRow()).append(WRAP);
final String nextLine = first.readNextLine();
if (null != nextLine) {
readerTreeSet.add(first);
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* description: 创建拆分的小文件
* version: 1.0
* date: 2021/3/17 12:42
* author: 余游洋
*
* @param slipDir
* @param fileCount
* @return java.io.File
*/
public File getSlipFile(String slipDir, int fileCount) {
File slipFile = new File(slipDir "/slip" ( fileCount) ".txt");
try {
slipFile.createNewFile();
} catch (IOException e) {
e.printStackTrace();
}
return slipFile;
}
}