UQ Students should read the Disclaimer & Warning
Note: This page dates from 2005, and is kept for historical purposes.
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>COMP3300 - Assignment 1 - Threaded Merge Sort</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<style type="text/css">
<!--
.wrong {
background: #FF9999;
}
body {
background: url(_img/DSC04989.jpg) fixed center;
font-family: "Arial Unicode MS", Arial, Helvetica, sans-serif;
}
th, td, textarea {
border: 1px solid #000000;
padding: 0 1ex;
background: transparent;
overflow: hidden;
}
table {
border: none;
}
-->
</style>
</head>
<body>
<h1>COMP3300 – Assignment One – Threaded Merge Sort </h1>
<table border="1">
<tr>
<td>My Result </td>
<td>70 out of 100 </td>
</tr>
<tr>
<td>Comments</td>
<td><ul>
<li> bonus early submission +5%</li>
<li> does not maximise parallelism -15%</li>
<li> cascading merge implementation (cap 80%) </li>
</ul></td>
</tr>
<tr>
<td>Class Average</td>
<td>62</td>
</tr>
<tr>
<td>Class Median</td>
<td>70</td>
</tr>
<tr>
<td>Class Standard Deviation </td>
<td>28.7</td>
</tr>
</table>
<p><img src="_img/COMP3300-ass1-marks.png" alt="Marks Distribution" width="436" height="286" /></p>
<p><em>Time constraints prevented me from implementing a parallel merge solution,
as I was away for a large part of this assignment at a wedding. Then, due to
an inadvertent design error, the loop that launches read and sort threads will
wait for each read/sort thread pair to terminate before launching the next
pair, thus entirely defeating the parallelism aspect of this assignment. </em></p>
<p>The purpose of this assignment is to show your understanding of threads and
of C programming. </p>
<p> You are required to use existing code as a base for developing a multithreaded
multifile sort. You have the option of developing a simplified version which
sorts exactly 4 files, if you find the general case of an arbitrary number of
files too hard. </p>
<p> <em>Multifile </em> here means that you should sort multiple files into <em>one </em> overall
sorted result, and you should use the given printBuffer function to produce
the output. </p>
<p> The notion is that because file reading is so slow, it you are sorting multiple
files, it would be nice to start sorting as soon as the smallest file has been
read, and to overlap as much work as possible with waiting for file reads.</p>
<p> The provided code shows how to read 1 file, sort it and print it, using threads.
You need to work out how to merge multiple sorted files (in memory-based data
structures), using threads.</p>
<h2>Detailed Requirements </h2>
<ol>
<li>Your program should use the same files as are supplied, and you should only
modify the files you are told to modify. </li>
<li> A command line is supplied to compile the program. Whatever development
environment you use, your program must compile with the given command line
on the student UNIX machines (Solaris). The provided code has been tested on
Linux and Solaris, but your solution must run on Solaris. </li>
<li> Your program should run with the following command: <br />
./launch_parallel_sort data <br />
where the file name data can be replaced by an arbitrary number of files. </li>
<li> The input data to your program should be any number of text files named
on the command line. </li>
<li> The output of your program should only consist of the output produced by
calling printBuffer once as in the supplied code; the output should be the
contents of all the files named on the command line, individually sorted then
merged (i.e., the original data sorted into one data structure). </li>
<li> Your program should be written as follows (alternative organizations will
be accepted if they are justified as simpler or more efficient, but the basic
principle of doing work as soon as the data is available by using multiple
threads must apply):
<ol>
<li>it should have one thread for each file read (using the supplied code
for reading a file) </li>
<li> it should have one thread for each sort (using the supplied sorting code) </li>
<li>it should have one or more threads for merging sorted results: each of
these threads should merge two sorted results into one; and merges may be
cascaded to handle arbitrary numbers of files </li>
</ol>
</li>
<li>You need to write code to do merging and code to combine reading, sorting
and merging, using the given thread primitives. </li>
<li>You should only use the following threading primitives, and not other parts
of Pthreads or other libraries or APIs:
<ol>
<li>init_threads </li>
<li>launch_thread </li>
<li>join_thread </li>
</ol>
</li>
<li>You are supplied with 4 data files for testing, but your final code will
be tested with other files. </li>
<li>You should only modify the following files; all other files will be used
unaltered when we test your code: <br />
launch_parallel_sort.c <br />
parallel-sort.c <br />
if you modify any other files or submit files with different names to
these your assignment will be graded as “doesn't compile”. </li>
<li>Your program should conform to the following formatting requirements:
<ol>
<li>All code at the same level should be left-aligned; an increase of level
by introducing “ { ” should result in an indentation of 2 - 4 spaces; opening
a new “ { ” block should usually be done at the end of the previous
line and the closing “ } ” should be on a line of its own unless readability
dictates otherwise (for example, it is permissible to write “ } else { ”);
if in doubt ask. </li>
<li>Type names should start with an initial capital; all other names with
an initial lowercase letter. For readability you may use underscores or capitals
within a name. (Violations of these guidelines in the original code should
not be altered.) </li>
<li>Any types or variables specific to one compilable (“ .c ”) file should
appear at the top of the file after the “ #includes ”. </li>
</ol>
If in doubt about any formatting requirements, please ask: these guidelines
can be extended.</li>
<li>There will be a bonus for early completion: any assignment handed in 24
hours or more before the due date, which achieves at least 50% of available
marks, will receive a 5% bonus. The maximum available for the assignment therefore
is 105% of available marks. Recall that if you fail to achieve 50% overall
for assignments, your exam alone will be used to determine your final grade,
but with a maximum grade of 5. </li>
</ol>
<p><a href="#c1">launch_parallel_sort.c</a><br />
<a href="#c2">parallel-sort.c</a></p>
<p id="c1">launch_parallel_sort.c<br />
<textarea name="c" cols="82" rows="179" readonly="readonly" title="C Code - Copyright 2004 Ned Martin">/*
* Author: Ned Martin #40529927
*
* Creation Date: 31-MAR-2004
*
* Copyright: Ned Martin, Student s40529927, for COMP3300,
* University of Queensland
*
* Description: Assignment 1 for COMP3300
* Uses Pthreads to implement a parallel sort using the
* standard qsort function on individual files, and merging
* the results.
*
* Two threads are created for each input file, one to read
* the file and another which waits for that read thread
* and then sorts it. Once all input has been read and
* sorted, it is merged and output.
*
* Version: 31-MAR-2004-FINAL
*
* Tab Stops: Eight spaces
*/
#include <stdio.h>
#include "sorttypes.h"
#include "threadLaunch.h"
#include "parallel-sort.h"
/**
* Function: sort
*
* Purpose: Joins a read thread and sorts its output.
*
* Method: Waits for the thread with an ID directly below its own and joins
* that thread which will be a read thread, and sorts its output.
*
* Returns: Void pointer which is a pointer o a CharBuf containing sorted
* data.
*/
void * sort
(
void *arg
)
{
// cast arg as ThreadInfo so we can access the thread ID
ThreadInfo * info = (ThreadInfo*) arg;
// wait for the read thread for this file
void * filedata = join_thread (info->which_thread - 1);
// return sorted file
return sortBuffer (filedata);
} // end sort
/*
* Structure containing a pointer to a CharBuf and an integer length
* indicating the number of CharBufs stored in this structure as an array.
*/
typedef struct {
CharBuf * chars; // Pointer to a CharBuf
int length; // number of CharBuf pointers
} SortedBuf;
/*
* Define these names globally so the data continues
* to exist after launch_threads exits; use static
* so they aren't visible in other separately compiled
* files.
*/
static ThreadInfo sortsetup = {0, 0}; // Stores thread info for sort threads
static FileInfo filesetup = {"", "r"}; // Stores thread info for read threads
static FileInfo* fileInfos; // Array of FileInfo
static ThreadInfo* threadInfos; // Array of ThreadInfo
static SortedBuf* sortedChars; // Array of SortedBuf to store pointers
// to sorted output
/**
* Function: launch_threads
*
* Purpose: Launches threads to read files and sort files.
*
* Method: Accepts an integer number of filenames and an array containing
* the filenames. Launches two threads for each file, one which
* reads the file into an array, and another which then sorts the
* array. Read threads are numbered 0, 2, 4, ... and sort threads
* are numbered 1, 3, 5, ... This is because a sort thread will
* join the thread with an ID directly below it.
*
* Returns: None
*/
void launch_threads
(
int N, // number of filenames
char *fileNames [] // array of filenames
)
{
int ii; // counter for loop
// initialize two threads for each file.
// One will read file, one will sort file.
init_threads (N * 2);
// allocate arrays to store thread information
fileInfos = (FileInfo*) malloc (sizeof (FileInfo) * N);
threadInfos = (ThreadInfo*) malloc (sizeof (ThreadInfo) * N);
sortedChars = (SortedBuf*) malloc (sizeof (SortedBuf) * N);
for( ii = 0; ii < N; ii++ ) // number of input files
{
// fileNames comes from the command line: ignore first one:
// that's the name of the program
// Add filenames and read method for threads 1, 2, 3, ...
fileInfos[ii].filename = fileNames[ii + 1];
fileInfos[ii].filemode = "r";
// Launch read threads. These will return file as an array of
// strings in CharBuf struct.
// Thread id's are 0, 2, 4, ...
launch_thread (readFile, (void*)&fileInfos[ii], 2 * ii);
// Add thread info for sort threads 1, 3, 5, ...
threadInfos[ii].which_thread = 2 * ii + 1;
threadInfos[ii].N = N;
// Launch sorting threads 1, 3, 5, ...
launch_thread (sort, (void*)&threadInfos[ii], 2 * ii + 1);
// Store pointers to sorted output by joining sort threads
// Join threads 1, 3, 5, ...
sortedChars[ii].chars = (CharBuf*) join_thread(2 * ii + 1);
sortedChars[ii].length = N;
}
} // end launch_threads
/**
* Function: main
*
* Purpose: Main function. Calls functions to read, sort and merge input
* files and then prints the output.
*
* Method: Accepts commandline arguments which are filenames, calls a
* function to launch threads to read and sort these files, and
* then merges the resulting sorted files prints output.
*
* Returns: Integer
* 0 when successfull
* nonzero otherwise
*/
int main
(
int argc, // number of commandline input
char *argv[] // array of commandline input
)
{
int N = argc - 1; // number of input filenames
CharBuf * result; // character buffer to store output
if (N > 0) // there is some input
{
// launch threads to read and sort input
launch_threads (N, argv);
// merge data for output
result = (CharBuf*) merge(sortedChars);
// print output
printBuffer ("data", result);
}
else // error
{
fprintf (stderr, "usage %s filename [filename ...]\n",
argv [0]);
}
} // end main
</textarea>
</p>
<p id="c2">parallel-sort.c<br />
<textarea name="k" cols="82" rows="144" readonly="readonly" title="Logic behind the key creation">/*
* Author: Ned Martin #40529927
*
* Creation Date: 31-MAR-2004
*
* Copyright: Ned Martin, Student s40529927, for COMP3300,
* University of Queensland
*
* Description: Assignment 1 for COMP3300
* Uses Pthreads to implement a parallel sort using the
* standard qsort function on individual files, and merging
* the results.
*
* This file contains definitions for merge routines that
* merge sorted data into a single output, called from
* launch_parallel_sort.c
*
* Version: 31-MAR-2004-FINAL
*
* Tab Stops: Eight spaces
*/
#include "parallel-sort.h"
#include "sorttypes.h"
#include "threadLaunch.h"
// not needed in other files, hence static
static CharBuf * merge_sorted_lines(CharBuf* first, CharBuf* second);
// This would normally go in a header, but for this assignment we are not
// allowed to modify headers, so this is put here instead.
// See launch_parallel_sort.c for description.
typedef struct {
CharBuf * chars;
int length;
} SortedBuf;
/**
* Function: merge
*
* Purpose: Merge sorted files into a single sorted file.
*
* Method: Accepts an array of CharBufs, and iterates through this array,
* sending each pair of CharBufs to a merging function and merging
* the output with the previous result, returning this merged
* CharBuf.
*
* Returns: CharBuf containing all the input merged in a sorted manner.
*/
void * merge
(
void *arg // array of CharBufs
)
{
CharBuf * return_result; // CharBuf to store merged results
// cast input void pointer to SortedBuf
SortedBuf* sortedBuf = (SortedBuf*) arg;
int ii, // loop counter
length = sortedBuf[0].length; // set length to number of files
// first file doesn't need to be merged
return_result = sortedBuf[0].chars;
// for each subsequent file, merge with previously merged files
for( ii = 1; ii < length; ii++ )
{
// merge previous with current and store in previous
return_result = merge_sorted_lines(sortedBuf[ii].chars,
return_result);
}
// return the merged results as a void pointer
return (void*) return_result;
} // end merge
/**
* Function: merge_sorted_lines
*
* Purpose: Merge sorted lines from two arrays and return the result.
*
* Method: Accepts pointers to two CharBufs and merges them together.
* Creates and allocates space for an output merged CharBuf, then
* loops through arrays, checking if strings are less than each
* other, and outputting those that are less, incrementing a
* counter for each array. When one array is exhausted, the
* remainder of the remaining array is added to the end of the
* merged array, and a pointer to this merged array is returned.
*
* Returns: Pointer to CharBuf containing merged and sorted array of
* strings.
*/
CharBuf * merge_sorted_lines
(
CharBuf* first, // first CharBuf to merge
CharBuf* second // second CharBuf to merge
)
{
CharBuf * merged; // used to store the merged lines
int firstCounter = 0, // first array counter
secondCounter = 0, // second array counter
outputCounter = 0, // counter for output array
firstLength = first->length, // length of first array
secondLength = second->length; // length of second array
// allocate space for merged lines
merged = (CharBuf*) malloc(sizeof(CharBuf));
merged->lines = (char**) malloc(sizeof(char*) * (firstLength +
secondLength));
merged->length = firstLength + secondLength;
// while inside the array bounds
while( firstCounter < firstLength && secondCounter < secondLength )
{
// if first string is less than second string
if( strcmp(first->lines[firstCounter],
second->lines[secondCounter]) < 0 )
{
merged->lines[outputCounter++] =
first->lines[firstCounter++];
}
else
{
merged->lines[outputCounter++] =
second->lines[secondCounter++];
}
}
// after one array is finished, merge remainder of other array
while( firstCounter < firstLength ) // first array has not finished
{
merged->lines[outputCounter++] = first->lines[firstCounter++];
}
while( secondCounter < secondLength ) // second array has not finished
{
merged->lines[outputCounter++] = second->lines[secondCounter++];
}
// return the resulting merged array
return merged;
} // end merge_sorted_lines
</textarea>
<br />
Code © Copyright 2004 Ned Martin</p>
<p>23-Apr-2004</p>
</body>
</html>