The main objective of this book is to find the possibility of implementing automatic term conflation method for Agew language. There is a need for people all over the World to be able to use their own language when using computers or accessing information on the Internet. This requires the existence of Stemmer .After analyzing different approaches to stemming algorithm,longest match approach for stemming algorithm was used and has shown very good performance.
This book deals with the design and building a stemming algorithm for the Albanian language and than using it to classify a corpus of documents. The work is based on research on stemming algorithms of other languages and the morphology of Albanian. Text Mining is a knowledge-intensive technique that is used to interact with a collection of documents by employing a set of analysis tools. Data/Text Mining (data can be text) is becoming a very useful process today for gathering information based on stored data. The most useful fields where data mining helps most are medicine, banking, finance, marketing, spam filtering etc. A stemming algorithm is a procedure that removes the suffixes from the words providing the root (stem) of the words. Stemming is needed in search engines to reduce the number of words with the same stem giving a reduced number of indexes. This book represents a first set of rules for Albanian that will be used in a stemming algorithm and for the first time, a list of stopwords of Albanian will be represented.
This study describes the design of a stemming algorithm for Wolaytta language. To give a solid background for the thesis, literatures on conflation in general and stemming algorithms in particular were reviewed. The result of the study is a prototype context sensitive iterative stemmer for Wolaytta language. Error counting technique was employed to evaluate the performance of this stemmer. The stemmer was trained on 3537 words (80% of the sample text) and the improved version reveals an accuracy of 90.6% on the training set. The number of over stemmed and understemmed words on the training set were 8.6% (304 words) and 0.8% (28 words) respectively. When the stemmer runs on the unseen sample of 884 words (20% of the sample text), it performed with an accuracy of 86.9%. The percentage of errors recorded as understemmed and overstemmed on this unseen (test set) were 9% and 4.1%, respectively. Moreover, a dictionary reduction of 38.92% was attained on the test set. The major sources of errors are also reported with possible recommendations to further improve the performance of the stemmer and also for further research.
In this dissertation, we describe the Word Stemming/Hashing Algorithm as an approach to help regain a spam-free inbox. Since it is impossible to foresee future spam creation techniques, it is important to react quickly to their development. Word Stemming/Hashing Algorithm has been design to detect the modified suspicious terms of the harmful messages as the content based spam filter technique works well only if the suspicious terms are lexically correct due to the spammers rearrangement of the Suspicious terms to foil the filter. That means the terms must be valid terms with correct spelling. Otherwise most content based spam filters will be unable to detect the suspicious terms. In this dissertation, we shown that if we use some sort of word stemming or word hashing technique that can extract the base or stem of a misspelled or modified term, the efficiency of any content based spam filter can be significantly improved in mail classification. Finally, we present a simple rule based Word Stemming Algorithm that can fish out and handle the modified suspicious terms for effective mail classification.
Steganography is an ancient art. It is used for security in open systems. It focuses on hiding secret messages inside a cover medium. The most important property of a cover medium is the amount of data that can be stored inside it without changing its noticeable properties. There are many sophisticated techniques with which to hide, analyze, and recover that hidden information. This research work discusses an exploration in the use of Genetic Algorithm operators on the cover medium. We worked with text as the cover medium with the aim of increasing robustness and capacity of hidden data. Elitism is used for the fitness function. The model presented here is applied on text files, though the idea can also be used on other file types. Our results show this approach satisfied both security and hiding capacity requirements. Furthermore, we found that an increase in the size of the secret message resulted in an exponential increase in the size of the generated cover text. We also found a close relationship between the size of the chromosome used and the population size.
Scheduling is the central concept used in operating system. It helps in choosing the processes for execution. There are many scheduling algorithms available for a multi-programmed operating system like FCFS, SJF, Priority, Round Robin etc. We mainly focused on Round Robin scheduling algorithm. We proposed a new algorithm as titled “A NOVEL APPROACH FOR SCHEDULING”. It is the combination of Round Robin scheduling algorithm and Dynamic Time Quantum. We get better result in terms of Waiting Time, Turnaround Time and number of Context Switch than the Round Robin scheduling algorithm using static time quantum, Average Mid Max Round Robin scheduling algorithm and Min-Max Round Robin scheduling algorithm.
Automatic Algorithm Recognition and Replacement – A New Approach to Program Optimization
In this book a Fuzzy-based Algorithm for the prediction of next CPU burst has been proposed. This algorithm uses the intelligent fuzzy system to estimate the execution time of a process based on its past behavior. The heart of this system is a database which contains if-then rules. Further, the comparative analysis of the Exponential Average Algorithm and the Fuzzy -based algorithm reflect that the Fuzzy-based approach is more optimal, thus it predicts more closer values to the real CPU-burst than the Exponential Average Algorithm.
We present various parallel and bit-parallel text algorithms. A parallel solution of the arithmetic coding compression algorithm, the computation of the border array, and a new approach to pattern matching problems. This approach uses non-deterministic finite automata for pattern matching and their bit-parallel simulation. We also present a new solution to weighted degenerated pattern matching problem, which sets new conditions on the searched pattern, and a pattern matching which matches any subpattern of a length specified. This algorithm uses the bit-vector extension as well. Moreover, we present a new bit-parallel simulation of the determinisation of pattern matching automata and suffix automata. Our determinisation provides an increase in speed in comparison to a standard subset construction determinisation algorithm.
Wormhole routing is a popular switching technique being used in current generation parallel systems. Wormhole Routing is successful to deadlock due to its pipeline operation in the networks. In this book two deadlock free Wormhole Routing algorithms are given for Torus networks. First a simple and efficient minimal adaptive wormhole routing algorithm is presented that recovers from potential deadlocks. This algorithm removes those packets from the network that have stalled for more than a given timeout in a router. Second a fault-tolerant and non-adaptive routing algorithm is proposed for wormhole routed torus networks. This can tolerate any number of non-overlapping rectangular faulty blocks with simple logic and requires only three virtual channels. The algorithm is proved to provide deadlock-livelock free routing with non-overlapping f-rings. The use of such a limited number of virtual channels will significantly reduce the implementation cost of the algorithm in router hardware.
In every language, there are words that can have different meanings in different context. Word Sense Disambiguation is the process of automatically determining the accurate meaning of such a word when used in a sentence. One of the many approaches proposed in the past is Michael Lesk's 1986 algorithm. This algorithm is based on two assumptions. First, words in a given section of text will tend to share a common topic and second, if one sense from each of the two words can be found sharing the same topic, then their dictionary definitions must use some common words. Thus it is possible to disambiguate neighbouring words in a sentence by comparing their definitions and picking those senses whose definitions have the most number of common words. The biggest drawback of this algorithm is that dictionary definitions are often very short and do not have enough words for this algorithm to work well. This book illustrates an approach of dealing with this problem by modifying the algorithm. An attempt towards the exploration of the vast field of Word Sense Disambiguation has also been made.
Revision with unchanged content. As urban areas continue to grow in population and as congestion continues to worsen, future demands for transportation can only be met by an efficiently managed mass transportation. Minimizing transfer time through schedule coordination is one means to reduce the total travel time for those passengers who need to transfer from one route to another to complete their trips. However, providing coordinated transit services for the entire transit system, in which transfer times are minimized, is not easily achieved. This book presents a system-wide approach, based on genetic algorithms, for the optimization of transfer times for an entire bus transit system. Optimization of transfer times in a transit system is a complex problem because of the large set of binary and discrete values involved. The combinatorial nature of the problem imposes a computational burden and makes it difficult to solve by classical mathematical programming methods. As this book provides good coverage of the issues involved with transit transfers and transfer optimization, investigators conducting research in public transportation may find this book useful. In addition, software developers can benefit from the code provided in the appendixes. Master’s and Ph.D. students may find this book resourceful for a master’s thesis or dissertation.
Graph theory is a well explored, but still expanding area of mathematics and computer science. Efficient algorithms for graph theoretic problems are of immense practical importance. There are numerous problems in graph theory that are NP-complete, i.e., no efficient algorithms can solve them in polynomial times. Genetic algorithms (GAs) are heuristic search and optimization technique where the search methods mimic some natural phenomena: genetic inheritance and survival of the fittest. In this work, GAs are applied successfully on some well-known NP-complete graph theoretic problems. Some other problems in fuzzy environments are also considered here.
Tordes: A symmetric key algorithm - It is a block cipher algorithm and unique independent approach which uses several computational steps along with string of operators and randomized delimiter selections by using some suitable mathematical logic. It is specially designed to produce different cipher texts by applying same key on same plain text. It is one of the best performing partial symmetric key algorithms particularly for the text message with limited size. It also protects the cipher text from attacks like Brute-force like attack because it is fully dependent on the key and code cannot be deciphered by applying all possible combinations of keys. The following information invariably used in TORDES for encryption Techniques. Key Values Code sequence string generated from a particular process. Transformation of string. Mirror Image the string ADVANTAGES OF TORDES The algorithm is very simple in nature. There are more operations present in this algorithm which would make it more secured. For a small amount of data this algorithm will work very smoothly It can be implemented on any language i.e. C, C++, java, or any plate form like dot net or PHP.
The book is mainly dedicated to the students, researchers or developers working or having their interests in the field of Image Processing with specialization in handwritten text recognization using JAVA IDE. Handwritten text recognization is vast and great scope field for researchers and the book include chapters showing various approaches towards the same. A new approach is designed for handwritten text recognization with better approach and better results. Book include various key features.