Jump to content

Talk:Red–black tree

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Possible insert case 5 error

[edit]

Insertion, case 5 says:

"In this case, a right rotation on the parent P is performed; the result is a tree where the former parent P is now the parent of both the new node N and the former grandparent G."

It should say

"...a right rotation on the GRANDparent G is performed;..."

Most implementations define the rotation functions RotateRight(Node* n) or RotateLeft(Node* n) such that n is the root of the subtree being rotated, rather than one of the children. The code to implement the line in question would then be:

RotateRight(g);

rather than

RotateRight(p);

This could be just a terminology issue, but it will confuse people when they read source code and see irreconcilable differences with what is listed here.

Insertion/Deletion average complexity

[edit]

The run time complexity of insertion and deletion from any binary search tree will be O(log n) because you must first either find where the element should be inserted or find the element to delete. The mention of an insertion after a supposed "NEXT" operation does not represent the average case of the insert or delete operations. Furthermore, there is no standard NEXT operation for binary search trees. If you have a reference that suggests a NEXT operation is indeed part of the standard operations supported by general binary search trees, please add it. Jmonty42 (talk) 21:17, 29 September 2015 (UTC)[reply]

@Jmonty42 Maybe INSERT or DELETE without SEARCH is not your "average case". Nevertheless it has an average complexity excluding SEARCH, and this
  1. is of interest for some applications (whether the RBtree is a search tree or not).
  2. If SEARCH would be always included there would be no need mentioning because then always logarithmic.
  3. Some authors strongly emphasise that their amortized complexity is constant. Then there has to be some means different from SEARCH for accessing a node.
--Nomen4Omen (talk) 21:35, 29 September 2015 (UTC)[reply]
I fixed the article to report O(log n) costs for all amortized operations.. 24.206.111.9 (talk) 14:18, 29 May 2024 (UTC)[reply]

@Nomen4Omen, please cite your references for authors claiming that the amortized complexity of insert into a binary search tree is constant. A red black tree by definition is a binary search tree. The rules for coloring nodes red/black and for inserting, deleting, and balancing afterwards would be completely meaningless if each node did not conform to the rules of a node in a binary search tree (greater than the children to its left, less than the children to its right).

There are data structures that have constant insert and delete operations, such as a hash map. That is an entirely different class of data structure intended for a completely different purpose. Every single binary search tree will have average complexities of O(log n) for insert and delete operations. Jmonty42 (talk) 21:43, 29 September 2015 (UTC)[reply]

Each of these references lists the run-time complexity of insert and delete as O(log n): http://pages.cs.wisc.edu/~paton/readings/Red-Black-Trees/ https://www.topcoder.com/community/data-science/data-science-tutorials/an-introduction-to-binary-search-and-red-black-trees/ https://www.cs.princeton.edu/~rs/talks/LLRB/RedBlack.pdf

Please stop spreading misinformation. Jmonty42 (talk) 21:51, 29 September 2015 (UTC)[reply]

I'm not spreading misinformation. The other sources implicitly include SEARCH, and I explicitly exclude it by use of a footnote.
  1. Amortized modification costs are proved in Kurt Mehlhorn, Peter Sanders: Algorithms and Data Structures. The Basic Toolbox. Springer, Berlin/Heidelberg 2008, ISBN 978-3-540-77977-3, doi:10.1007/978-3-540-77978-0 p. 165.
  2. Although an RBtree is called a binary search tree, it also is a binary tree and has all these properties in addition to a SEARCH operation.
--Nomen4Omen (talk) 21:58, 29 September 2015 (UTC)[reply]

@Nomen4Omen, I looked up your reference in Mehlhorn and Sanders and you are misunderstanding the text. From the text on page 165 from section 7.7 "Historical Notes and Further Findings": "On the other hand, AVL trees do not have constant amortized update costs. ... In comparison [to AVL trees], red-black trees have slightly higher costs for locate, but they have faster updates ..." It makes no claim that an insert operation has constant amortized cost. You may be confused from the section 7.4 titled "Amortized Analysis of Update Operations". The update operations refer to the balancing and updating of binary search trees after the initial insert or delete is done. This is only part of the operation that we are discussing when we are referring to the insert operation of red-black trees. The insert operation (including searching for the place to insert, actually inserting the new node as a leaf in the correct spot, and then balancing the tree - or updating to use the terms from your reference) is O(log n) when we compare to a similar operation on say, a linked list (where the insert operation is O(n)) or a hash map (where the insert operation - which includes a hashing function) is O(1).

Again, to quote from your reference material, please turn to page 151. This section is talking about (a,b) trees and red-black trees. The paragraph directly under Exercise 7.5 clearly states "To insert an element e, we first descend the tree recursively to find ...". When discussing the complexity of the insert operation on data structures, you must include the entire operation. On page 149, in the introduction of binary search trees it also states "Thus any insertion requires O(log n) comparisons on average." Again, that will be true for any binary search tree, of which red-black trees are a type of. Jmonty42 (talk) 22:50, 29 September 2015 (UTC)[reply]

This link might help to clear things up when discussing the complexity of these common operations: http://bigocheatsheet.com/. Jmonty42 (talk) 23:22, 29 September 2015 (UTC)[reply]

@Jmonty42 The latter source is not consistent in not including search for stacks and lists or in always INcluding it for the other data structures. Anyway, it is more information for the reader to give time complexity EXcluding search (and telling him about that), because INcluding logarithmic to constant is an easy task. (Why should emphasizing amortized constant update costs be so important for Mehlhorn/Sanders when UPDATE does not exist without SEARCH?) --Nomen4Omen (talk) 02:37, 30 September 2015 (UTC)[reply]

@Nomen4Omen You're missing the fundamental concept here. Inserting into a red black tree depends on search. You cannot insert an element into a red-black tree without searching for where it goes first, it is meaningless to try to analyze the insert operation without that necessary step. You are correct that the linked resource does not include searching as part of insertion for other data structures, because other data structures don't require searching for insertion. Take a stack for example. Placing an element onto a stack is always an O(1) operation because it just goes onto the top of the stack (think of stacking plates). Stacks do not insert elements in sorted order, it's a last-in, first-out (LIFO) data structure by definition. A red-black tree inserts elements in sorted order by definition, you cannot insert an element into a red-black tree without log n comparisons on average. An insert operation on an array doesn't require searching, first, either (assuming you're inserting by index). But inserting into an array requires all of the elements after the insertion point to be moved down, resulting in an average O(n) complexity operation, which is the same as linearly searching in an array O(n). The two operations having the same complexity does not mean that one depends on the other.

Your changes only affected the summary section of the article, which is supposed to give a way to quickly compare this particular data structure to other data structures with similar summary sections. If you wish to provide more information about the amortized update operation from Melhorn/Sanders, include it in the relevant section in the article. Jmonty42 (talk) 05:06, 30 September 2015 (UTC)[reply]

@Jmonty42 My very last attempt!
  1. The main (possibly only) difference is that SEARCH is absolutely mandatory for you − and not for me. I say, you cannot know (and should not preclude) how the keys in the binary search tree are organized and how the user accesses a specific element. An example from real life: An application (e.g. in the context of the Windows registry) may have keys which have a major part (e.g. the registry "key") and a subordinate part (e.g. the registry "value"). The comparison subroutine of the binary search tree is capable to compare every leading subset (substring starting at offset 0) of the hierarchical key:=("key","value"). Parts of the application access the data by SEARCHing for ("key","value"). But there may also be other parts of the application which SEARCH only for the registry "key" and access the subordinate registry "value"s and associated "data" by means of (inOrder) sequential NEXT operations (while using only the doubly linked property of the binary tree), thereby possibly INSERTing and/or DELETEing. My complexity assessment takes care of this latter kind of access, too. Because a reader can be assumed to be able to add: constant (ave INSERT without SEARCH) + logarithmic (ave SEARCH) = logarithmic (ave INSERT with SEARCH), my complexity assessment contains both cases, but yours (in my opinion unnecessarily) withholds some information to the reader.
  2. IMHO my changes do not at all affect the summary section of the article, because there only the worstcase ("guarantee"d) complexities are given − and no average complexities. (This is quite common for summary sections.)
--Nomen4Omen (talk) 10:32, 30 September 2015 (UTC)[reply]

@Nomen4Omen, no, search is not only mandatory for me, search is mandatory for all inserts into any binary search tree. Every single reference I have cited, including yours, explicitly states as much. In your example of inserting while traversing the tree in order, you've increased the complexity of the insert operation to O(n). Insert cannot be considered independent for binary search trees. Even if you disregard how the operation got to the point of where the element is to be inserted, in the case of a red-black or AVL tree you'll still need to rebalance the tree, which on average is O(log n).

Also, regarding your second point, the only change of yours I affected on this page was the change to the info box (which I mistakenly called the summary). You had incorrectly (according to every reference we have discussed here) edited the average complexity for insert and delete operations to O(1). That is the change I am referring to. It is factual and supported by references that these operations have a complexity of O(log n).

Your example of the Windows registry containing key value pairs has no relevance to the discussion of the complexity of the insert operation. The elements in a red-black tree can be any homogenous data type so long as that data type supports comparison. Jmonty42 (talk) 16:22, 30 September 2015 (UTC)[reply]

To offer a third perspective: I agree with Jmonty42 that listing the amortized complexity of Insert and Delete operations as O(1) is potentially confusing and unintuitive (at least, it confused me, even even with the footnote). I would second Jmonty42's suggestion to remove these from the table, leaving only the O(log n) costs (as treated in standard textbooks like "Introduction to Algorithms, 4th Edition" by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein). Einabla (talk) 16:42, 8 November 2022 (UTC)[reply]

Why is Sedgewick not listed as the inventor?

[edit]

It looks like the infobox lists the inventor of the antecedent to red-black trees rather than the inventor of red-black trees. — Preceding unsigned comment added by 45.49.18.32 (talkcontribs) 09:47, 23 January 2016

Changes as of 2 Mar 2021

[edit]

Mainly the sections #Terminology, #Properties, #Operations, #Proof of bounds have been revised.

  • section #Properties
    • without rule 3: "root is black" (because this disturbs recursions)
    • then rules number 4 and 5 become 3 and 4
    • NIL leaves are never individuals, never "null leaves as actual node objects".
  • section #Operations
  • Both, Insertion and Deletion, now programmed iteratively. Advantages:
    • more precise visibility of the logic, the conditions of the cases
    • no need to observe tail-recursivity
    • rotations with root involvement easier
    • if-s with goto separate the iteration from solution which improves structure
    • (performance)
    • Both, Insertion and Deletion, cases slightly renumbered. Case 1 = iteration, Case 2 loop break, Insert Case 3 simple exit from loop.

Rotations are commutative with color changes, but consistently placed behind.

  • diagrams:
    • 2 to 4 phases; from top to bottom
    • 2 or 3 if complete
    • 3rd or 4th if new assignment of current node for further processing
  • section #Insertion:
    • new simple insert case 6
  • section #Removal turned into 2 sections
    • simple cases (never loop)
    • case black leaf without child is more complex and possibly loops
  • Finally, section #Proof of bounds
    • The section "Proof of asymptotic bounds" has been rewritten, mainly because the old version was introducing slightly deviating definitions of height and black height.
      The new version uses the definitions of #Properties.
    • It additionally gives an exact formula of the number of nodes of minimal RB trees.
  • stylistic: fewer "Note that ..."
    • fewer "we"
    • fewer "in this case"
    • no "this test is trivial due to ..."

i.e. versus i. .e.

[edit]

This page uses i. .e., i .e. and e. g.. For the first two of i. .e. and i. e. this seems to be inconsistent. Checking the style guides acronym section, I was able to find that the specific first two examples should be ie. and the last should be e.g..

Since these currently seem to be inconsistent, I'm going to make a preliminary edit to change them to what is shown in the manual of style's table. (This post is mostly to say my reasoning in case I'm wrong) KindlingFlame (talk) 06:51, 6 March 2024 (UTC)[reply]

History section confusion

[edit]

"Sedgewick originally allowed nodes whose two children are red, making his trees more like 2–3–4 trees, but later this restriction was added"

What does "this restriction" refer to? Gwideman (talk) 11:44, 30 March 2024 (UTC)[reply]

The 'restriction' added later is that 'a black node may not have two red children. This means that only 2-nodes and 3-nodes are allowed, making them isomorphic to 2-3 trees instead. Left-leaning RB-trees have a 2-3 version and a 2-3-4 version. Ordinary RB-trees also have a 2-3 version and a 2-3-4 version. IntGrah (talk) 19:48, 26 April 2024 (UTC)[reply]

"B-trees of order 4" versus "2–3–4 tree"

[edit]

There is a section titled "B-trees of order 4". Would it not be clearer to just say "2–3–4 tree" instead, given that most textbooks introduce 2–3–4 trees first? I.e.

Red–black trees are similar in structure to 2–3–4 trees (B-trees of order 4), ...

and proceed in the rest of the section to refer to them as 2–3–4 trees IntGrah (talk) 19:53, 26 April 2024 (UTC)[reply]

C not C++

[edit]

The "Operations" section says that the examples are C++.

Are they. Looks like like plain C to me. Didn't want to change it, in case there is some C++ usage in there which I didn't spot.

It's certainly 90%+ C and a proper C++ implementation would look very different.

Good to change to "C" Oschonrock (talk) 16:33, 17 July 2024 (UTC)[reply]

:struct RBnode {
:  RBnode* parent;        // C would require 'struct RBnode* parent',
:  RBnode* child[2];      // but C++ allows this kind of declaration
:  enum color_t color;
:  int key;
:};
:
Technically no, but it could be changed easily. The style isn't consistent in the article. I suggest changing it all to the C++ style, or removing it entirely. (edited 10:40, 21 July 2024 (UTC)) IntGrah (talk) 17:07, 17 July 2024 (UTC)[reply]

Re-writing the code

[edit]

The code in this article is not very well-written. In my opinion it should be pseudocode, since that's what most other CS articles seem to do. If not it should still be cleaned up a bit. Some glaring issues:

  • Although the code is technically C++, it doesn’t really use any C++ features, the style is much more similar to C
  • RBNode’s children being an array of length 2 with a bunch of associated macros is needlessly confusing. The children should be `left` and `right`. I understand the purpose of the array is to combine the left/right rotation functions into a single function, but why not just have two separate functions if it makes things clearer?
  • The type `byte` is used arbitrarily in spots that don't make sense
  • `The `goto` statements are confusing and it would be easier to just bunch all of the code together, and then have the sections afterwards explain each part. Or at least something that doesn't use `goto`.

The pseudocode in the Introduction To Algorithms textbook (linked below, start at page 331) is very concise but I'm unsure of how to carry it over without it being considered plagiarism.


[1]https://dl.ebooksworld.ir/books/Introduction.to.Algorithms.4th.Leiserson.Stein.Rivest.Cormen.MIT.Press.9780262046305.EBooksWorld.ir.pdf#page=353 Begilbert238 (talk) 23:56, 20 July 2024 (UTC)[reply]

I agree that the code is quite repugnant. GOTO considered harmful, inconsiderate use of macros, even the fact it is written in C
My proposal is to just straight up remove it. In my opinion, the pseudocode is useless in the current state of the article; most red–black tree implementations take several hundreds of lines of code, and spreading this across the article makes the operations seem a lot more complicated than they really are. The cryptic tables used in the explanation also have to go (The section Notes to the sample code and diagrams of insertion and removal > Notes to the insert diagrams).
IntGrah (talk) 10:36, 21 July 2024 (UTC)[reply]