Simple version of TimSort sorting algorithm-java类库-php.cn

Home

类库下载

java类库

Simple version of TimSort sorting algorithm

高洛峰

Oct 31, 2016 am 10:53 AM

java

1. Principle and implementation of the simple version of TimSort sorting algorithm

TimSort sorting algorithm is the default sorting algorithm for object arrays in Python and Java. The essence of the TimSort sorting algorithm is a merge sort algorithm, but a lot of optimizations have been made on the merge sort algorithm. The data we need to sort in daily life is usually not completely random, but partially ordered or partially reversed, so TimSort makes full use of the ordered parts for merge sorting. Now we provide a simple version of the TimSort sorting algorithm, which mainly makes the following optimizations:

1.1 Utilize the originally ordered fragments

First define a minimum merge length. Check the originally ordered fragments in the array. If the ordered length is less than the specified minimum merge length, then expand the ordered fragments through insertion sort (the reason for this is to avoid merging fragments with smaller lengths, because this The efficiency is relatively low). Push the starting index position and ordered length of the ordered fragment onto the stack.

1.2 Avoid merging a longer ordered fragment with a smaller ordered fragment, because this is less efficient:

(1) If there are at least three ordered sequences in the stack, we use X , Y, Z respectively represent the three existing sequence fragments from the top of the stack downward, and are merged when the lengths of the three satisfy X+Y>=Z.

(1.1) If 1.2) Otherwise, pop X and Y off the stack, and push the merged result onto the stack. Note that in fact we will not actually pop the stack. There are some techniques in writing code that can achieve the same effect and be more efficient.

(2) If the condition of X+Y>=Z is not met or there are only two sequences in the stack, we use X and Y to represent the lengths of the two existing sequences from the top of the stack downwards. If Y performs the merge, and then pushes the merged ordered fragment results onto the stack.

1.3 When merging two ordered fragments, the so-called gallop mode is used, which can reduce the length of data involved in the merge

Assume that the two ordered fragments that need to be merged are X and Y respectively. , if the first m elements of the X segment are smaller than the first element of the Y segment, then these m elements do not actually need to participate in the merge, because the m elements are still at their original positions after the merge. In the same way, if the last n elements of the Y fragment are larger than the last element of X, then the last n elements of Y do not need to participate in the merge. This reduces the length of the merged array (the simple version does not do this), and also reduces the length of data copied back and forth between the array to be sorted and the auxiliary array, thereby improving the efficiency of the merge.

2. Java source code

package datastruct;
 
import java.lang.reflect.Array;
import java.util.Arrays;
import java.util.Random;
import java.util.Scanner;
 
public class SimpleTimSort<T extends Comparable<? super T>>{
    //最小归并长度
    private static final int MIN_MERGE = 16;
    //待排序数组
    private final T[] a;
    //辅助数组
    private T[] aux;
    //用两个数组表示栈
    private int[] runsBase = new int[40];
    private int[] runsLen = new int[40];
    //表示栈顶指针
    private int stackTop = 0;
     
    @SuppressWarnings("unchecked")
    public SimpleTimSort(T[] a){
        this.a = a;
        aux = (T[]) Array.newInstance(a[0].getClass(), a.length);
    }
     
    //T[from, to]已有序，T[to]以后的n元素插入到有序的序列中
    private void insertSort(T[] a, int from, int to, int n){
        int i = to + 1;
        while(n > 0){
            T tmp = a[i];
            int j;
            for(j = i-1; j >= from && tmp.compareTo(a[j]) < 0; j--){
                a[j+1] = a[j];
            }
            a[++j] = tmp;
            i++;
            n--;
        }
    }
     
    //返回从a[from]开始，的最长有序片段的个数
    private int maxAscendingLen(T[] a, int from){
        int n = 1;
        int i = from;
         
        if(i >= a.length){//超出范围
            return 0;
        }
         
        if(i == a.length-1){//只有一个元素
            return 1;
        }
         
        //至少两个元素
        if(a[i].compareTo(a[i+1]) < 0){//升序片段
            while(i+1 <= a.length-1 && a[i].compareTo(a[i+1]) <= 0){
                i++;
                n++;
            }
            return n;
        }else{//降序片段，这里是严格的降序，不能有>=的情况，否则不能保证稳定性
            while(i+1 <= a.length-1 && a[i].compareTo(a[i+1]) > 0){
                i++;
                n++;
            }
            //对降序片段逆序
            int j = from;
            while(j < i){
                T tmp = a[i];
                a[i] = a[j];
                a[j] = tmp;
                j++;
                i--;
            }
            return n;
        }
    }
     
    //对有序片段的起始索引位置和长度入栈
    private void pushRun(int base, int len){
        runsBase[stackTop] = base;
        runsLen[stackTop] = len;
        stackTop++;
    }
     
    //返回-1表示不需要归并栈中的有序片段
    public int needMerge(){
        if(stackTop > 1){//至少两个run序列
            int x = stackTop - 2;
            //x > 0 表示至少三个run序列
            if(x > 0 && runsLen[x-1] <= runsLen[x] + runsLen[x+1]){
                if(runsLen[x-1] < runsLen[x+1]){
                    //说明 runsLen[x+1]是runsLen[x]和runsLen[x-1]中最大的值
                    //应该先合并runsLen[x]和runsLen[x-1]这两段run
                    return --x;
                }else{
                    return x;
                }
            }else
            if(runsLen[x] <= runsLen[x+1]){
                return x;
            }else{
                return -1;
            }
        }
        return -1;
    }
     
    //返回后一个片段的首元素在前一个片段应该位于的位置
    private int gallopLeft(T[] a, int base, int len, T key){
        int i = base;
        while(i <= base + len - 1){
            if(key.compareTo(a[i]) >= 0){
                i++;
            }else{
                break;
            }
        }
        return i;
    }
     
    //返回前一个片段的末元素在后一个片段应该位于的位置
    private int gallopRight(T[] a, int base, int len, T key){
        int i = base + len -1;
        while(i >= base){
            if(key.compareTo(a[i]) <= 0){
                i--;
            }else{
                break;
            }
        }
        return i;
    }
     
    public void mergeAt(int x){
        int base1 = runsBase[x];
        int len1 = runsLen[x];
         
        int base2 = runsBase[x+1];
        int len2 = runsLen[x+1];
         
        //合并run[x]和run[x+1],合并后base不用变,长度需要发生变化
        runsLen[x] = len1 + len2; 
        if(stackTop == x + 3){
            //栈顶元素下移，省去了合并后的先出栈，再入栈
            runsBase[x+1] = runsBase[x+2];
            runsLen[x+1] = runsLen[x+2];
        }
        stackTop--;
         
        //飞奔模式，减小归并的长度
        int from = gallopLeft(a, base1, len1, a[base2]);
        if(from == base1+len1){
            return;
        }
        int to = gallopRight(a, base2, len2, a[base1+len1-1]);
         
        //对两个需要归并的片段长度进行归并
        System.arraycopy(a, from, aux, from, to - from + 1);
        int i = from;
        int iend = base1 + len1 - 1;
         
        int j = base2;
        int jend = to;
         
        int k = from;
        int kend = to;
         
        while(k <= kend){
            if(i > iend){
                a[k] = aux[j++];
            }else
            if(j > jend){
                a[k] = aux[i++];
            }else
            if(aux[i].compareTo(aux[j]) <= 0){//等号保证排序的稳定性
                a[k] = aux[i++];
            }else{
                a[k] = aux[j++];
            }
            k++;
        }
    }
     
    //强制归并已入栈的序列
    private void forceMerge(){
        while(stackTop > 1){
            mergeAt(stackTop-2);
        }
    }
     
    //timSort的主方法
    public void timSort(){
        //n表示剩余长度
        int n = a.length; 
         
        if(n < 2){
            return;
        }
         
        //待排序的长度小于MIN_MERGE,直接采用插入排序完成
        if(n < MIN_MERGE){
            insertSort(a, 0, 0, a.length-1);
            return;
        }
         
        int base = 0;
        while(n > 0){
            int len = maxAscendingLen(a, base);
            if(len < MIN_MERGE){
                int abscent = n > MIN_MERGE ?  MIN_MERGE - len : n - len;
                insertSort(a, base, base + len-1, abscent);
                len = len + abscent;
            }
            pushRun(base, len);
            n = n - len;
            base = base + len;
             
            int x;
            while((x  = needMerge()) >= 0 ){
                mergeAt(x);
            }
        }
        forceMerge();
    }
     
    public static void main(String[] args){
         
        //随机产生测试用例
        Random rnd = new Random(System.currentTimeMillis());
        boolean flag = true;
        while(flag){
             
            //首先产生一个全部有序的数组
            Integer[] arr1 = new Integer[1000];
            for(int i = 0; i < arr1.length; i++){
                arr1[i] = i;
            }
             
            //有序的基础上随机交换一些值
            for(int i = 0; i < (int)(0.1*arr1.length); i++){
                int x,y,tmp;
                x = rnd.nextInt(arr1.length);
                y = rnd.nextInt(arr1.length);
                tmp = arr1[x];
                arr1[x] = arr1[y];
                arr1[y] = tmp;
            }
             
            //逆序部分数据
            for(int i = 0; i <(int)(0.05*arr1.length); i++){
                int x = rnd.nextInt(arr1.length);
                int y = rnd.nextInt((int)(arr1.length*0.01)+x);
                if(y >= arr1.length){
                    continue;
                }
                while(x < y){
                    int tmp;
                    tmp = arr1[x];
                    arr1[x] = arr1[y];
                    arr1[y] = tmp;
                    x++;
                    y--;
                }
            }
             
            Integer[] arr2 = arr1.clone();
            Integer[] arr3 = arr1.clone();
            Arrays.sort(arr2);
             
            SimpleTimSort<Integer> sts = new SimpleTimSort<Integer>(arr1);
            sts.timSort();
             
            //比较SimpleTimSort排序和库函数提供的排序结果比较是否一致
            //如果没有打印任何结果，说明排序结果正确
            if(!Arrays.deepEquals(arr1, arr2)){
                for(int i = 0; i < arr1.length; i++){
                    if(!arr1[i].equals(arr2[i])){
                        System.out.printf("%d: arr1 %d  arr2 %d\n",i,arr1[i],arr2[i]);
                    }
                }
                System.out.println(Arrays.deepToString(arr3));
                flag = false;
            }
        }
    }
}

Copy after login

3. Issues that should be paid attention to with the TimSort algorithm

The TimSort algorithm will only merge two consecutive fragments, so as to ensure the stability of the algorithm.

There is a certain relationship between the minimum merge length and the length of the stack. If the minimum merge length is increased, the length of the stack should also be increased, otherwise it may cause the risk of the stack going out of bounds (the stack in the code is implemented by an array of length 40 of).

4. The full version of the TimSort algorithm

In fact, the full version of the TimSort algorithm will have a lot of optimizations on the above-mentioned simple TimSort algorithm. For example, when the ordered sequence is smaller than the minimum merge length, we can use a method similar to binary search to find the position where it should be inserted to extend the length of the array. Another example is that in the galloping mode, binary search is used to find the position of the first element of the second sequence in the first sequence. At the same time, a smaller auxiliary space can be used to complete the merge. Interested students can view the source code in Java Come and learn.

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks ago By DDD

Nordhold: Fusion System, Explained

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Clair Obscur: Expedition 33 - How To Get Perfect Chroma Catalysts

2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1676

CakePHP Tutorial

1429

Laravel Tutorial

1333

PHP Tutorial

1278

C# Tutorial

1257

Related knowledge

PHP vs. Python: Understanding the Differences Apr 11, 2025 am 12:15 AM

PHP and Python each have their own advantages, and the choice should be based on project requirements. 1.PHP is suitable for web development, with simple syntax and high execution efficiency. 2. Python is suitable for data science and machine learning, with concise syntax and rich libraries.

PHP: A Key Language for Web Development Apr 13, 2025 am 12:08 AM

PHP is a scripting language widely used on the server side, especially suitable for web development. 1.PHP can embed HTML, process HTTP requests and responses, and supports a variety of databases. 2.PHP is used to generate dynamic web content, process form data, access databases, etc., with strong community support and open source resources. 3. PHP is an interpreted language, and the execution process includes lexical analysis, grammatical analysis, compilation and execution. 4.PHP can be combined with MySQL for advanced applications such as user registration systems. 5. When debugging PHP, you can use functions such as error_reporting() and var_dump(). 6. Optimize PHP code to use caching mechanisms, optimize database queries and use built-in functions. 7

Break or return from Java 8 stream forEach? Feb 07, 2025 pm 12:09 PM

Java 8 introduces the Stream API, providing a powerful and expressive way to process data collections. However, a common question when using Stream is: How to break or return from a forEach operation? Traditional loops allow for early interruption or return, but Stream's forEach method does not directly support this method. This article will explain the reasons and explore alternative methods for implementing premature termination in Stream processing systems. Further reading: Java Stream API improvements Understand Stream forEach The forEach method is a terminal operation that performs one operation on each element in the Stream. Its design intention is

PHP vs. Other Languages: A Comparison Apr 13, 2025 am 12:19 AM

PHP is suitable for web development, especially in rapid development and processing dynamic content, but is not good at data science and enterprise-level applications. Compared with Python, PHP has more advantages in web development, but is not as good as Python in the field of data science; compared with Java, PHP performs worse in enterprise-level applications, but is more flexible in web development; compared with JavaScript, PHP is more concise in back-end development, but is not as good as JavaScript in front-end development.

PHP vs. Python: Core Features and Functionality Apr 13, 2025 am 12:16 AM

PHP and Python each have their own advantages and are suitable for different scenarios. 1.PHP is suitable for web development and provides built-in web servers and rich function libraries. 2. Python is suitable for data science and machine learning, with concise syntax and a powerful standard library. When choosing, it should be decided based on project requirements.

PHP's Impact: Web Development and Beyond Apr 18, 2025 am 12:10 AM

PHPhassignificantlyimpactedwebdevelopmentandextendsbeyondit.1)ItpowersmajorplatformslikeWordPressandexcelsindatabaseinteractions.2)PHP'sadaptabilityallowsittoscaleforlargeapplicationsusingframeworkslikeLaravel.3)Beyondweb,PHPisusedincommand-linescrip

PHP: The Foundation of Many Websites Apr 13, 2025 am 12:07 AM

The reasons why PHP is the preferred technology stack for many websites include its ease of use, strong community support, and widespread use. 1) Easy to learn and use, suitable for beginners. 2) Have a huge developer community and rich resources. 3) Widely used in WordPress, Drupal and other platforms. 4) Integrate tightly with web servers to simplify development deployment.

PHP vs. Python: Use Cases and Applications Apr 17, 2025 am 12:23 AM

PHP is suitable for web development and content management systems, and Python is suitable for data science, machine learning and automation scripts. 1.PHP performs well in building fast and scalable websites and applications and is commonly used in CMS such as WordPress. 2. Python has performed outstandingly in the fields of data science and machine learning, with rich libraries such as NumPy and TensorFlow.

See all articles