Parallel Computing

Παρασκευή 16 Ιανουαρίου 2009

Matlab application

The main goal of my project is to use Matlab's parallel toolbox
to parallelize an other Matlab toolbox, IRtoolbox ( an extention of pdetoolbox).

The reason of this parallelization is the nature of the problems that solved by IRtoolbox which uses a domain decomposition method, interface relaxation method

domain decomposition

The multi-domain/multi-physics problems consist of many different
componentsthat have distinct natures, shapes and capabilities.
In order to model these systems, we must use parallel computing
strategies that treat the systems’ components as independent
components, because the existing simulation software
is only used for simple geometrical shapes.
The computation of these systems is achieved by using several
methods that have been proposed. The first and the most common
computational approach is domain decomposition. Domain
decomposition refers to geometry discretization by using
grids and meshes. The decomposition creates small discrete
and inter-connected problems. Each discrete domain has its
own equation, but the basic idea is that neighboring components
communicate and exchange details and useful information.

interface relaxation method

The domain is decomposed into sub-domains. The sub-domains
have different mathematical models and as for the interface
between two neighboring components we can use interface
conditions that are derived from the physical phenomena
(e.g. continuity of mass temperature, conservation of momentum).
The models on each sub-domain are solved in the loop of the interface
relaxation iteration method to compute the global solution.

Σάββατο 13 Δεκεμβρίου 2008

Parallel matrix-vector and matrix-matrix multiplication

Problem:
Implement parallel multiplication of matrix with vector and matrix with matrix
using pthread, Openmp and MPI and compare the execution time between them and the serial program.
I will try to run them using a dual core machine, PS3 and the cluster.

To simplify the description below let's confider:
Matrix X Vector = A x V
Matrix X Matrix = A X B

How to make it Parallel:
With pthreads :
The main process is responsible to split the data, create thread and collect the results.
Every thread will multiply a row of A with V or a column of B and return the result to the main process.

OpenMP :
With OpenMP is easy to write the sequential algorithm and leave OpenMP to make the for statements parallel.

MPI :
With MPI is similar to the pthreads. The processor with id = 0 shares data to the other processors and collects the results.
For A x V a processor receives a row of A and V and multiplies them.
For A X B a processor receives (from processor 0) the number of times it will have to run.
After that in a for statement it receives a row of A and a column of B, make multiplication and sends the result back to processor 0.
Parallelization with MPI main problem:
How to split the data and synchronize processors to avoid useless data transfer.

Furthermore I may implement the A X B multiplication with the Strassen algorithm (probably with the three ways above) and also compare the execution time.

Παρασκευή 5 Δεκεμβρίου 2008

Distributed brute force Password cracker

Distributed brute force Password cracker

General problem description – skip to “Problem” if you are bored

Many applications (such as forums, portals etc) as an extra security measure don’t store the passwords as plain text in the database, but they instead they encrypt it first with some hashing algorithm and store the hash into the database (more).

So when someone has access to the database won’t see the actual passwords but the hashes
The most common hashing algorithm used for password hashing is md5. (the method deployed in the project can use any known hashing algorithm)
So what is stored in the database usually is something like this:

Md5(“password”) = “5F4DCC3B5AA765D61D8327DEB882CF99” (more on md5).

A malicious user that somehow has access to the database, will only see the hashes, but not the actual passwords.

That is good for the application because md5 is not reversible.

So the only way to find out the password that created the hash, is to generate md5 for as many passwords as you can and compare the hash with the one from the database.

If the password is some word out from a dictionary that is very easy to do (very few passwords to try usually the password is cracked in some seconds). But when the password is random things are more complicated (more) especially when it has many characters and the charset is big.

So in many times the only way to find the password is to try brute force. (more) . There are some time-memory tradeoff techniques such as Rainbow Tables. Time-memory space tradeoff basically means that you pre-calculate hashes for many passwords, store them in huge files and at the cracking(recovery) process you just compare the hash with the stored one. That saves much time (some days -> some minutes).

One drawback is that you need much GBs of storage for the files (which that is not a problem today as GB are getting cheaper every day)

The other drawback is that there are many ways to defeat time-memory tradeoff techniques.

- Use md5(“sometext”+password) md5(md5(password)) or md5(md5(md5(“sometext”+password))) and in every application sometext or the exact function is different

- Salt

What is clear is that you can’t precalculate something that can be used for all applications, and if you do, the application developer might just change the function and all the precalculations is useless.

So brute force is the only way to go in many cases.

But on the other hand brute force is very time consuming, and some passwords are nearly impossible to crack

… using just one pc.

There are many md5 password cracking tools written with highly optimized assembly code to crack faster, but this is just not enough.

Another way is to use modern Graphics Card GPU which is much faster that general-purpose CPU for integer processing, and that can give you much processing power (~x25 over normal CPU (more))

But even that isn’t enough in many cases. There is a need for a mechanism to utilize every machine we have available.

OR

Distributed brute force Password cracker

There are many free tools for md5 password cracking BUT none of them supports distributed cracking, and none of them is cross platform.

There are SOME to-expensive tools available (the cheapest I could find was ~$200)

So:

Problem: brute force md5 (or other) password cracking over network(internet)

We can use TCP/IP to communicate over the network, and openmp to parallelize to every single machine.
We have client/server architecture.
The server takes the following parameters:

· Alphabet (charset, lowercase, lowercase+Uppercase, custom, etc)

· The hash

· Optional length

· Some prefix (if password is generated with something like md5(“somestring”+password)

· Hash rules (md5(md5(hash)) or md5(md5(“somestring”+hash))) or whatever

Then the server must split the task and create a pool of smaller tasks and distribute them to the clients.

Each client in each task takes the charset, some prefix, the hash, and the length of passwords to generate.

Each client must be implemented with openmp or some other method so that it will run parallelized to each machine.

A client version can be implemented for every machine available (Multicore Pcs(windows linux), PS3, the cluster)

Some difficult parts of the problem:

· Splitting the job

· The jobs must be the right size: To small and we waste time for I/O over network and too big we risk jobs repeated because of malfunctions or errors.

· Cluster implementation

· Implementation of a pause/resume (or error recovery or save state) mechanism

· I have to code everything myself, because god is busy helping George

Τρίτη 18 Νοεμβρίου 2008

Some random walk model

Problem:
Random walks are initiated on an NxN grid. A walk may leave a trail. Different walks interact with each other on encounter.

Parameters (some of them, there -too- many options):
-Interaction takes place when two walks encounter each other, or even when a walk encounters the trail of another walk.
-The grid may have boundaries or be torus-like.
-The possible types of interaction between random walks are many (e.g. on encounter, alter -temporarily- the probabilities of a walk to move north, west, south or east).

Parallelization:
For n processors, the grid is partitioned in n areas. Every processor is assigned an area and is responsible for all the random walks which take place in it.

Parallelization's main problem:
The main problem with parallelization, is that 'cause of the different load between processors -at a given period in time-, a processor (say A) might get ahead in time, in comparison to a neighbor (say B) of it. Therefore if a random walk (say wB) goes from the area of B to the one of A, we have to remember previous states of the area of A in order to handle the interactions -in the past, as far as it concerns A- of wB with the random walks taking place in A.

------------------------------------------------------------------

The way to go (or, some way to go):
-Math: If it doesn't end up seeming too difficult, I may try to find somewhere, or somehow write down a ~probabilistic analysis of the model (with simple parameters), in order to check the requisites for a speedup -when going from a serial to a parallel implementation-.
Two books I'm ~checking out are these:
Reversible Markov Chains and Random Walks on Graphs (http://www.stat.berkeley.edu/~aldous/RWG/book.html)
Reversibility and Stochastic Networks (http://www.statslab.cam.ac.uk/~frank/rsn.html)
-Programming: an implementation (to be run on a cluster) using MPI, OpenMP or pthreads (or something else, God only knows).

Πέμπτη 6 Νοεμβρίου 2008

New cluster instalation

We will start the installation (unpacking and putting the components on the rack) today at 2 in our building Iasonos 10. You are welcome to participate.

PDE based compression on the CELL PE

Here is an interesting article. It is easy to read too.

Παρασκευή 31 Οκτωβρίου 2008

Proposed tasks

Please find below a list of suggested tasks. Please select yours, propose your own or just comment or even reject.

A - OPENMP

Complete a brief review concerning available openMP's. Categorizations and comparisons at any level and respect are highly desirable.
Install openMP at (a) your own machine (b) PS3 (c) existing cluster (d) new cluster. Preliminary (both light and heavy) experiments should be performed.
Provide introductory material for openMP that is, to your opinion, superior to the one available through our portal.

B - Clusters

Participate to the instalation of the new cluster.
Participate to the transfer of the old cluster to its new location (on same rack with the new cluster)

C - Numerical Linear Algebra: Dense matrices in the beginning, sparse ones next. PCs, clusters and/or the CELL PE will be used.

Advanced Matrix-Vector multiplication on.
Gauss Elimination with pivoting on parallel machines.