67 84 78 - e-Rho: Electronic Resources Hub...
Transcript of 67 84 78 - e-Rho: Electronic Resources Hub...
1
1 INTRODUCTION
Algebra is mathematical shorthand for language, and matrices are shorthand for algebra. A special value of matrices is that they enable many mathematical operations, especially those arising in statistics and the quantitative sciences, to be expressed concisely and with clarity.
Scientists are being confronted more and more with large amounts of numerical data. But the mere collecting and recording of data achieves nothing; having been collected, data must be analyzed and interpreted. One of the most useful branches of mathematics for the description of such analysis and interpretation is Matrix Algebra. It is useful not only in both simplifying description and promoting development of many analysis methods but also in organizing computer techniques to execute those methods and to present the results.
In this chapter we define the matrix and some basic concepts related to it. In Section 1.2, vectors and scalars are introduced. A review of summation and dot notation follows in Section 1.3. An example on the useful applications of matrices in the field of Statistics, particularly in regression analysis, is illustrated on the last section of the chapter.
1.1 DEFINITION OF TERMS
Suppose there are 4 students with the following exam scores for the 3 exams in Stat 135:
Exam 1 Exam 2 Exam 3 Student 1 67 84 78 Student 2 98 76 81 Student 3 48 59 60 Student 4 77 82 53
Consider the array of numbers in the table extracted and written simply as
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
538277605948817698788467
Page 1 Statistics 135: Matrix Theory for Statistics
2
where each value has a particular meaning; for example, the entry in the 4th row and 2nd column, 82, represents the score of student 4 in the second exam. A row represents a particular student and a column represents a particular exam that the student had taken. Such an array of numbers is called a matrix.
EXERCISES:
1. Is
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
536081788259768477489867
a matrix ?
2. Suppose you obtain data by measuring heights (in inches) and weights (in pounds) of 19 school children:
Student Height Weight Sarah 69.0 112.5 Mark 56.5 84.0
Fantasia 65.3 98.0 Diana 62.8 102.5 Jasmine 63.5 102.5 Latoya 57.3 83.0 Eric 59.8 84.5
Sheryn 62.5 112.5 Christian 62.5 84.0 Carrie 59.0 99.5 Bo 51.3 50.5
Vonzell 64.3 90.0 Anthony 56.3 77.0 Scott 66.5 112.0
Rachel Anne 72.0 150.0 Frenchie 64.8 128.0 Taylor 67.0 133.0
Katharine 57.5 85.0 Elliot 66.5 112.0
Can you represent the data set in matrix form? If yes, how?
Definition 1.1.1 A matrix is a rectangular or square array of numbers arranged in
rows and columns.
Page 2 Statistics 135: Matrix Theory for Statistics
3
Definition 1.1.2 The individual entries in the array are called the elements or terms of the matrix which can be numbers of any sort (real or complex, rational or irrational). For our purposes we will study elements which are real numbers, positive, negative or zero.
REMARKS:
1. The rows of a matrix are of equal length, as are the columns. 2. The notation for a matrix is an underlined capital letter. A matrix A is denoted by
A =
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
aaaa
aaaa
aaaaaaaa
rcrjr2r1
iciji2i1
2c2j2221
1c1j1211
......
..........
..........
..........
......
..........
..........
..........
......
......
or A = { }aij , i = 1, 2, … , r and j = 1, 2, … , c
where aij – the element in the ith row and jth column
r – no. of rows c – no. of columns Definition 1.1.3 The size of the matrix, i.e., the number of rows and columns, is
referred to as its order, or as its dimension. Thus, the matrix A with r rows and c columns has order r x c. Hence, we say that A is an r x c matrix, written as A
rxc.
Definition 1.1.4 a11 is called the leading element of the matrix A.
Page 3 Statistics 135: Matrix Theory for Statistics
4
EXERCISES:
1. Write down the matrix A = { }aij where aij = i + j , i = 1, 2, 3
j = 1, 2, 3, 4
2. Write down the matrix B= {bhk} where bhk= 2h‐k , h = 1, 2, 3, 4 k = 1, 2
Definition 1.1.5 A is a square matrix if r=c, i.e., the number of rows equals the
number of columns. Thus, we write Ar. Definition 1.1.5.1 The elements a11 , a22 , … , arr of a square matrix are referred to
as the diagonal elements or diagonal of the matrix. Definition 1.1.5.2 The elements of a square matrix that lie in a line parallel to and just
below the diagonal are sometimes referred to as the subdiagonal elements.
Definition 1.1.5.3 Elements of a square matrix other than the diagonal elements are called off‐diagonal or nondiagonal elements.
EXERCISES:
Ax44 =
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
9806204935127134
Given the above matrix, write down the 1. diagonal elements 2. subdiagonal elements
Definition 1.1.6 A triangular matrix is a square matrix with all elements above (or
below) the diagonal being zero. Definition 1.1.6.1 An upper triangular matrix is a triangular matrix whose elements
below the diagonal are all zero. Definition 1.1.6.2 A lower triangular matrix is a triangular matrix whose elements
above the diagonal are all zero.
Page 4 Statistics 135: Matrix Theory for Statistics
5
Example 1.1.1 upper triangular matrix lower triangular matrix
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−
−
300820519
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−
−
534067002
Definition 1.1.7 A diagonal matrix is a square matrix having zero for all its
nondiagonal elements, i.e., a matrix D = [dij]rxr is a diagonal matrix if dij = 0 ∀ i ≠ j, i = 1, 2, … , r. It is denoted by D{ a11, a22 , … , arr } = diag {a11, a22 , … , arr } where aii are the diagonal elements,
i = 1, 2, … , r. Example 1.1.2
D =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
800020005 = D{5, 2, 8} = diag{5, 2, 8}
Definition 1.1.8 A scalar matrix is a diagonal matrix with all diagonal elements
equal. Example 1.1.3
S =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
500050005
Definition 1.1.9 An identity matrix of order r, denoted by Ir, is a diagonal matrix
having all diagonal elements equal to one. Example 1.1.4
I2 = ⎥⎦
⎤⎢⎣
⎡1001 I3 =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
100010001
Page 5 Statistics 135: Matrix Theory for Statistics
6
Definition 1.1.10 A null matrix, denoted by O, is a matrix of zeros, i.e., every element
is zero. It is also referred to as a zero matrix. Example 1.1.5
O = ⎥⎦
⎤⎢⎣
⎡000000 O2 = ⎥
⎦
⎤⎢⎣
⎡0000
1.2 VECTORS AND SCALARS Definition 1.2.1 A matrix consisting of only a single column is called a column
vector. Definition 1.2.2 A matrix consisting of only a single row is a row vector. REMARKS:
1. A column vector is denoted by an underlined small letter. 2. A row vector is denoted by an underlined small letter with a prime.
Example 1.2.1
x =
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
5490
y’ = [ ]708
Definition 1.2.3 A single number is called a scalar. REMARK: Sometimes, it is convenient to think of a scalar a matrix of order 1x1. 1.3 REVIEW OF SUMMATION AND DOT NOTATION 1.3.1 SUMMATION NOTATION
1. ∑=
n
iiX
1=
Page 6 Statistics 135: Matrix Theory for Statistics
7
2. ∑=
n
iiX
1
2=
3. ∑=
n
ic
1=
4. ∑=
n
iii YX
1=
5. ⎟⎟⎠
⎞⎜⎜⎝
⎛∑=
n
iiX
1⎟⎟⎠
⎞⎜⎜⎝
⎛∑=
n
iiY
1 =
6. ⎟⎟⎠
⎞⎜⎜⎝
⎛∑=
n
iiX
1
2 =
7. ∑=
n
jjj ba
111 =
8. ∑=
n
iikX
1=
9. 1, ,
1
Xn
iji i j
j n= ≠≤ ≤
∑ ∑ =
REMARK:
Note that ∑=
m
i 1∑=
n
jija
1 = ∑
=
n
j 1∑=
m
iija
1.
In terms of a matrix of m rows and n columns the left‐hand side is the sum
of all the row totals and the right‐hand side is the sum of all the column totals, both sums equaling the total of all elements. 1.3.2 DOT NOTATION
a j. = ∑=
r
iija
1
ai. = ∑=
c
jija
1
Page 7 Statistics 135: Matrix Theory for Statistics
8
a.. = ∑=
r
iia
1. = ∑
=
c
jja
1. = ∑
=
r
i 1∑=
c
jija
1
EXERCISES:
Given a matrix A =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
301113202101 = { }aij , i = 1, 2, 3 , j = 1, 2, 3, 4
Find the values of the following:
1. a .1 , a .2 , a .3
2. a 1. , a 2. , a 3. , a 4.
3. a..
4. ∑=
3
1iiia
5. ∑≠=
4
2,12
jjja
6. ∑≠=
3
2,1 ii∑
≠=
4
3,1 jjija
1.3.3 PRODUCT NOTATION
∏=
n
iib
1 = b1 b2 b3…bn
1.4 APPLICATIONS OF MATRIX ALGEBRA
Some applications of matrix algebra are: (1) in regression analysis for calculating the estimates for β ; (2) in population dynamics where in studying a biological population, we want to investigate the distribution of individuals according to their age; (3) in performing multivariate analysis on data; and (4) in estimating the parameters for an experimental designs study. Refer to Exercise 2 in section 1.1. One can investigate the relationship between weights and heights of school children. A mathematical equation that
Page 8 Statistics 135: Matrix Theory for Statistics
9
allow us to predict values of the variable WEIGHT (dependent variable) from known values of the variable HEIGHT (independent variable) is called a regression equation. Refer to Figure 1 on the next page. From an inspection of this scatter diagram, it is seen that the points follow closely a straight line, indicating that the two variables are to some extent linearly related.
Figure 1 Once a reasonable linear relationship has been ascertained, we usually try to express this mathematically by a straight‐line equation called the linear regression line. Denoting the variables WEIGHT by Y and HEIGHT by X, we know that the slope‐intercept form of a straight line can be written in the form 10 ˆˆˆ ββ +=Y X
where the constants 0β and 1β represent the y intercept and slope, respectively. the symbol Y is used to distinguish between the predicted value given by the regression line and an actual observed value y for some value of x. Note that 0β and 1β of the best fitting line can be computed using matrix algebra.
Another application of matrix algebra in statistics is in the field of Multivariate Analysis. Here, rather than analyzing one variable’s statistics, we analyze multiple variables simultaneously without necessarily using regression. For example, in our Height‐Weight data, we can get what we call a sample mean vector which is simply a vector with elements being the means of the individual variables. In our example, it is:
Scatter Plot
0
50
100
150
200
0 20 40 60 80
Height
Wei
ght
Page 9 Statistics 135: Matrix Theory for Statistics
10
62.33684100.0263⎡ ⎤
= ⎢ ⎥⎣ ⎦
x
Where the first element corresponds to height and the second element corresponds to weight. We also have what we call a sample variance‐covariance matrix, where it is a symmetric matrix with the diagonal elements as variances and the non‐diagonal elements are covariances. An example is from the height‐weight data:
26.2869 97.0990397.09903 518.652⎡ ⎤
= ⎢ ⎥⎣ ⎦
S
We also have what we call a sample correlation matrix where the elements are bivariate correlations of the data. It is a symmetric matrix with diagonal elements of 1 and correlation coefficients of two variables as non‐diagonal elements. Example for the height‐weight data:
1 0.8777850.877785 1⎡ ⎤
= ⎢ ⎥⎣ ⎦
R
We are not limited only to these methods, since we can also make inferences from these statistics (confidence intervals, hypothesis testing, and other methodologies beyond regression).
For the Examination data, you can get:
72.5 433.6667 99.875 101.25 1 0.5632 0.475334
75.25 ; 99.875 128.9167 31.5 ; 0.5632 1 0.2712368 101.25 31.5 186 0.475334 0.27123 1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥= = =⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦
x S R
Page 10 Statistics 135: Matrix Theory for Statistics
11
ASSIGNMENT 1! Submit next meeting day, before class hours on stapled whole yellow pad papers, with names
on the upper left side of the papers and answers written clearly with pertinent solutions and reasons for every line of proof
Searle, pages 17‐21: Numbers 3 (2 pts each matrix, 3 matrices), 4(f) (5 points), 4(g) (5 points), 4(h) (3 points), 4(i) (3 points), 9 (4 points), 10 (4 points) Total of 30 points
Page 11 Statistics 135: Matrix Theory for Statistics
10
�� ����������������
�
����������� � �� ��� ��������
����� �������� �������� ��� ������ �������� ��� !��" � ��#� ��$%&�� '(�� ���)�� �(�� ���" ����
����(���" ������������(*�����+,��������+������*��!��- ���#���������������.�������������������#�
�
�
��������)��(���������������(����!�" �������������" �� ���!�" �� � *!(������������ ����� �
- ��� � ����(���#� �- �)���� ���� ���������� � �!� " �������� ��� ��� ��" ����� ��� ����� �!� �� )� �� ��
������ #�/ �- ���*���!�����'������������� �����!��" ���� �" ��������� ���!������(�������+��'�+�
- ����!����������������(���� ����)� *��� �����+���" ��� ����������" �(����� ������'��" ��� ���!����� ��
�!!���� ��" � ��#�
�
� ������ �� �#0� ��� �#1�� ���� '����� �������� �� ���!��" ��� � � " �������� ���� ����(����#� ��
�����(*�� ( ������ �� *� �!� �(��� �������� �� ��� ��2(����� � � *����� *� ���� ����� �� � � ����
�(������ *� ��������� �!� ����� ��(���#� � ������ � �#3�� ���� ��- �� �!� " ����.� ��*�'��� ���� �(��� ��#���
�(" " ��+��!������� ��������!�" ����.���*�'���- �������������*�'���!����- ��� ������� ��#04#�
�
��� � ��������� ��
�
!���5�{ }aij �� ����5�{ }bij ������'������.���" ������������� ��������(" ����6������� ���.���" ����.�
��5�{ }eij ���!� ���'+�eij �5�aij 6�bij ����5�0�����7������ ��8�5�0�����7����#�
�
��������������
9�����5� ��
���
�
4231
�� ����5� ��
���
�
8675
#���� ���5���6������
�
���� ��
�- ��" ���������� �'��������� �+��!����+���)��������" �� (" '����!� ��- ��� ��������" ��
(" '����!����(" �:��#�#�����+���)��������" �������#���(���- ����+���������+���������������������
��������#�
�
���������
�
0# �� " � (!���(���� �!� �� ������ � ����(��� " �;��� ������ " ������� ��� ��� � �� �#� �����
" ����� ��� ��������+� " ���� � � �� !�����+� � � ���� �������� ��� � �� ��� � !� ������ � � ��
!�����+�� �����< �����������#�� �)� �� �������'����'���- ���������" � (!���(�� *�� ��
������ *� ������ =� � �������>� !��� ����� ����(��� � � ���� !����� � �� ���� �� !���������
��������)��+#��
�
Page 12 Statistics 135: Matrix Theory for Statistics
11
�������������� � ?� (!���(�� *������ ������ *������
?������� &4� @0�
?������� %�� A%�
?������� B1� 0A�
� � �
���������!"� � ?� (!���(�� *������ ������ *������
?������� &@� %1�
?������� %4� @B�
?������� 0&&� 03�
�
����������������!����������(����� �������!�����" � (!���(�� *������� ������
������ *�����#�
�# ������� ����������������� �" ����.�!��" #�
'# � �)������������" � (!���(�� *�� �������� *�������!������������(��#�
�
�# �� ���6���!��5 ��
���
�
−−140321�� ���5 �
�
���
�
− 2153
#�
�
��� "#�$���� !$��$#��� �
�
!���5�{ }aij ����� ���.���" ����.�� ��;����������� (" '������� ������������" (��������!���'+�;��
�� �����'+�;������������.���" ����.���5�{ }eij ��- �����eij �5�;aij ����5�0�����7������ ��8�5�0�����7����#�
�
��������������
9�����5� ��
���
�
4321
����� �&��5��
�
��%�&����&�������������������������!�� ��.��" ����.��5C��8D���������.��" ����.�E�5=$0>�5�C$��8D#��
�
��������������
9�����5�
���
�
�
���
�
�
512031
���� �$��5�
�
��' � ����"!(���#�� �
�
If A and B are r x c matrices, then the difference between A and B, denoted by A – B, is defined as A – B = A + (-1) B.�
Page 13 Statistics 135: Matrix Theory for Statistics
12
Example 2.3.1�
9�����5�
���
�
�
���
�
�
064912
�� ����5�
���
�
�
���
�
�
922347
#���� ���E���5��
�
���� ���
0# ���- ����������� ��� �+�" ������������������!�������" ���������� �'���('��������!��" �
� �� � �����#� � ��(�� " �������� ����� ���� �� !��" �'��� !��� ������� � ���� �����
�� !��" �'���!����('������� ��� ��)����)����#�
�#� ���� �*���)���!���*�)� �" ����.����� �'����!� ���������� =( �2(�>�" ����.�- �����
�(" �- ������������� (���" ����.������������6=$�>�5��$�5�#�
�
������
���- �'���- ���������*������ ��" ��� ���.�� �����!�� �� ��� ���� ������������� �
!���0310�� ��031��� �!�(����( ���������������)��+#�
� �
)�����&�������&�*�+,,,+,,,�-&��� �
F���� < ������������ �� ���� �(�������� < �����/ � *��" �
0310� �A� 0%� 01� �0�
031�� &�� 0@� �0� &4�
� �
����&������&�*�+,,,+,,,�-&��� �
F���� < ������������ �� ���� �(�������� < �����/ � *��" �
0310� 03� 3� 00� 0A�
031�� ��� 04� 0&� �@�
� ��������!��������" �(�������������!!��� ����!��.�� ����!��" �����*������ ��" �#�
�# ������� ����������������� �" ����.�!��" #�
'# ��������+�����0310�� ��031���*�)������� ��� ���� ������������� ,��*��������!���� �
�������( ��+#�
�
� �!��"��#�$��������%��- ����.����" �����������5�{ }aij ��� �����5�{ }bij ������2(����!�aij �5�
bij �!�����5�0�����7������ ��8�5�0�����7����#��2(����+��!��- ��" ������������ ��" �� � *�( �����
���+������!�������" �������#�
�
9�����5� ��
���
�
3851
�� ����5� ��
���
�
y
x
85
��!���- ����)��(����!�.�� ��+�- ������5���G�
�
�
�
Page 14 Statistics 135: Matrix Theory for Statistics
13
��. � ����� !$��$#��� �
�
!���5�{ }aik ����� ���.���" ����.�� ����5�{ }bkj ��������.���" ����.����� ����������(����!���� ��
�������5���5�{ }eij ����� ���.���" ����.���!� ���'+�
eij 5��=
c
kkjik ba
1�5� ba ji 11 �6�� ba ji 22 �6�7�6� ba njin ���
�
��5�0���� ��7�������8�5�0�����7������ ��;�5�0�����7����#�
�
� ����" �������+�������������� ��� �'��������� �������!����- ���
�
[ ] → row ith�.��[ ]column jth↓ �.���5� { }[ ]elementj)th (i, �.������
!�����5�0�����7������ ��8�5�0�����7����#�
�
�
����������.���
9��� Ax22
�5� ��
���
�
4021
�� �� Bx325� �
�
���
�
−−
206121
#���� ����5�
�
$%&%'� �������( ����#�$�)������
!� a ,�5� [ ]aaaa n...321 �� �� x �5�
���������
�
�
���������
�
�
x
xxx
n
.
.
.3
2
1
����� ������ �������(����!���
� ��.������!� ������
a , x �5�a1 x1�6�a2 x2 �6�a3 x3 �6�7�6�an xn �5��=
n
iii xa
1�
Page 15 Statistics 135: Matrix Theory for Statistics
14
�
����������.���
9��� ax31,�5�[ ]523 �� �� x
x135�
���
�
�
���
�
�
604020
#���� �a , x �5 �
��������(-��&/�$�0���������&����� �
�
�� ������ *� '(+� *� �(������� �!� �.����" � ���� ������ " ���� � �� ��''���� !��� ��'������+�
��(������ ��������" ����+��'�����" ����+�� (������ �� ����+�����*+�������" � ����!���( �)�����+#�
�(��������������������� �" ����!�������" ����� ����''����� �������" ���- ����H&��H0�� ��H04��� ��
������������" ����+�������" � �� �����%4��044�� ��&4�� �" �������������)��+#�
�# I ����� ���� ������� ��� �� ��- � )������ �,� � �� ���� (" '���� �!� � �" ���� ������ ��� ��
���(" �)������ #�
'# I ���� ��� ���� ������ ����� �!� ���� ��2(����� � �" ���� '�(*��� � � ���� ��" ���- � ��� ����
���" ����+�������" � �G�
�
���� ��
����� �������(����,.��!�����)���������� ��.��.������ �+�- �� ���� ��.���)��������" ���������
���������- �� ����� (" '����!����(" ��� �a ,�����2(���������� (" '����!���- ��� � x #�
�
$%&%$ � �����( ����#��* ��)������
!�a ,�5�1 2
. . .na a a� �� � �� �� x �5�
1
2
.
.
.
n
xx
x
� �� �� �� �� �� �� �� �� �� �� �
����� ������(��������(����.�,������!� ������
x a ,5
��������������
�
�
��������������
�
�
nnjn2n1n
niji2i1i
n2j22212
n1j12111
axaxaxax
axaxaxax
axaxaxaxaxaxaxax
....................................
....................................
......
......
��I ������������(��������(�����.,G�
Page 16 Statistics 135: Matrix Theory for Statistics
15
�
�����������%������(-��&/�$�0���������&��������������� �
�
0# � ���� ,#�I ��������������" � ����!������" ����.�������� �G�
�# � ��� �,#�I ��������������" � ����!������" ����.�������� �G�
�
$%&%+� ������,)�������( ���
�
!���5�{ }aij �� �� x 5�{ }x j �!�����5�0�����7������ ��8�5�0�����7��������� ��
�.�5�
�
�� �=
c
jjij xa
1�!�����5�0�����7����#�
��������(-��&/�$�0���������&������1�#�&��&-�2 �
�
�(����������� �" ����������� ��� ��*�'��� *���- �- ����H���H���� ��H1����������)��+#��
0# ����� ��������������!������ �" ����� �������" ���- �� �� ��*�'��� *���- �� �
" ����.�!��" #�� � �������'+��#�
�# �- � " (��� - �(��� ��� ����� ���� ���" ����+� ������" � �� ��� �(������� ���� ��2(�����
� �" ���� � � ���� ��" ���- G� � � ���� ��*�'��� *� ��- G� ����� �� ���� ������ ������
��" (��� ��(��+#�
�
$%&%&�� ��( ����#��* ����������
� �
� ?(�����+� *� �- �� " �������� �� � '�� �.���� ��� ��� �� ��" ���� ��������)�� �.�� ��� � �!�
" (�����+� *���" ����.�'+���)�����#�
�
�����������%������(-��&/�$�0���������&��������������� �
�
�(������ ���� '�����" ����+� ������" � �� ������ B4� ������ 14� " ����� � �� @4� ��''���#�
���� (������ � ������" � �� ������ 34�� &4� � �� �4� � �" ���#� ���� ��+�����*+� ������" � ��
������&4���4�� ��04�� �" ���#�
0# �- �" (���- �(���������������'�����" ����+�������" � ������(�������������2(�����
� �" ���� � � ���� ��" ���- G� � � ���� ��*�'��� *� ��- G� ����� �� ���� ������ ������
��" (��� ��(��+#�
�# �- � " (��� - �(��� ��� ����� ���� (������ � ������" � �� ��� �(������� ���� ��2(�����
� �" ���� � � ���� ��" ���- G� � � ���� ��*�'��� *� ��- G� ����� �� ���� ������ ������
��" (��� ��(��+#�
&#� �- � " (��� - �(��� ��� ����� ���� ��+�����*+� ������" � �� ��� �(������� ���� ��2(�����
� �" ���� � � ���� ��" ���- G� � � ���� ��*�'��� *� ��- G� ����� �� ���� ������ ������
��" (��� ��(��+#�
&# ����� ������� �" �����2(���" � ����!�����!�(��������" � ���� �" ����.�!��" #�� � ����
���'+��#�
Page 17 Statistics 135: Matrix Theory for Statistics
16
@# ����� �� ���� ������ ������ �!� '(+� *� ���� � �" ���� � � ���� ��" ���- � � �� � � ����
��*�'��� *���- ��������!�(��������" � �����" (��� ��(��+#��
�
���� ����
�
0#�� ��������(�������!��- ��" ����������� ���������!� ���� �������!�����.������ �+��!�
���� (" '����!� ���(" ��� ����2(�������� (" '����!� ��- ��� ��:�����" ������������
��� ���������'�������������������-���.��������������/��.���-����0#�
�#�� ���" �������!� ���� ��!�" ����.�" (����������� ����������(����������� ��� ���������+�
�.������)� ��!��������#�
&#�� !�������!���������.����
�#� �- �" � +���- ��" (�������)��!����������.���G�
'#� �- �" � +����(" ��" (�������)��!����������.���G�
�#����� �����'�����.����� �+��!�������!�- ���������G�
�#�I ������������� ����� �!������5�������.���G�
@#�� ��������� �������- �+���.����� �������!�������" ��������- �� ���� ���������2(����
� ���!�������" �������#������ ��������� ��� ���������+��2(��#�
�
���(" � *����������" ����.�����(����( ������ ��������� ��.�����
�
%#�� � *���� *� ���� ����(��� ���� - ��.���-���.�1� �� ��� 0� ���.���-���.�1� 0� �1� �#� �
*���� *���������(�������- ��.�2��-���.�1������0����.�2��-���.�1�0��1��#�
B#�� ����- �)����������" (���������'+������(" �)�����������������#�
A#�� �����(" �)����������" (���������'+�����- �)�����������" ����.#�
1#�� ��" ����.�����" (���������'+������(" �)��������������(" �)�����#�
3# ����- �)����������" (���������'+���" ����.�)�������������- �)�����#�
�
����������.�'�
9�����5� ��
���
�
4321
�� ����5� ��
���
�
−−
1110
#���� ����5��������������������������� �����5�
�
����������%������(-��&/�$�0���������&��������������� �
� �
� ����" (�����+�������#��
�
��3 ��� "��"�������� ����
�
!���5�{ }aij ����� ���.���" ����.����� ���������2.�2���!������ �����'+��,�5C�aij ,�D���������
.���" ����.���!� ���'+�aij ,�5�a ji #����(����,��������" ����.�- ��������(" ������������- ���!���
- ��������������� ����!��" �!������������#�
�
Page 18 Statistics 135: Matrix Theory for Statistics
17
����������3���
A3x4�5�
����
�
�
����
�
�
128411731062951
������� �� A4x3,�5�
���
�
�
���
�
�
121110987654321
�
�
���� ���
0# ������- ���!��,�����������" ������������(" ���!��#�
�# !������� ���.���" ����.����� ��,��������.���" ����.#�
&# !���8�����������" �� �����������- �� ��8������(" ��!����������������������" �� �����8���
��- �� ���������(" ��!��,#�
@# ���� ������ �!���������� �������!��������,������#�����������+��� ���!������2(�)��� ��
!��" �� =���
� >,� ��� =���
�� >� " (��� '�� (���� - �� �)��� ��� ��� �������+� ��� ��)�� �('�������
������ �!��������������!�������� �������" ����.#�
��
������������#��3����������#�����������
�
0#� ��%����4����������5������ +���.���" ����.����=��,�>,�5��#��
����!��
�# ����������� +���.���" ����������� �����=���6����>,�5��,�6���,�
����!���
&# ������9�����'��� ���.���" ����.�� ��;�������� (" '��#���� ��=�;��>,�5�;�,#�
����!��
@#� 9������ ���.���" ����.�� ��������.���" ����.#���� ��=����>,�5��,�,#�
����!��
�
���� ���
�
0#� ����!�(�����������+��� �'���.�� �����������+����������� �������!���������(����!�
" ������� ���" �������#����(���!�����!� ���� (" '����!�" ���������
� � =��0���&�7��;$0�;�>,�5��;,��;$0,�7���,�0,��
� �#� ������� �������!������(" �)�������������- �)������� ��)���$)����#�
�Example 2.5.2�
� � x1x35�
���
�
�
���
�
�
321
��� ��� x3x1,�5� [ ]321 �
�
�
�
�
Page 19 Statistics 135: Matrix Theory for Statistics
18
��6 ���#�������� ����
�
!���5� { }aij ����� ���.���" ����.����� ������������!������=�>�������!� �����������(" ��!�
�������*� ������" � ����!�����#�#��
( ) � +++===
r
1irr2211ii a...aaaAtr �
�
���� ��I �� ������ ����2(������������������ �����!� �����#�#���������� ����.���#�
Example 2.6.1�
0# !�
���
�
�
���
�
�
=1085165757
A ����� ���=�>�5��
�
�# !�
����
�
�
����
�
�
−
−
=
4109439632412013112
B ����� ���=�>5�
�
������������#��3�������#�����������
�
0# 9�����'��� ���.���" ����.#���� ����=�>5��=�,>#�
����!��
����������6���
!�
���
�
�
���
�
�
=1383125749
A ����� ���=�>�5��
�
� ��
���
�
�
���
�
�
=′1317824359
A ����� ���=�,>�5��
�
�# ��=������>�5����������#�#������������������������0.0�" ����.�
�
����������6�'�
�
� ��=0@>�5�� �� � ��=1>�5�
�
&#� 9������ ����'����.���" �������#���� ����=���6���>�5���=�>�6���=�>�
� � ����!#�
Page 20 Statistics 135: Matrix Theory for Statistics
19
����������6�.�
!�
���
�
�
���
�
�
−−=
321871342
A �� ��
���
�
�
���
�
�
=1253642623
B ����� ���=�>�5����������=�>�5� �����
�
���
�
�
���
�
�
=+157214111965
BA �� ������=��6��>�5��������
�
@#� 9�����'��� ���.���" ����.�� ��;���������#���� �����=;�>�5�;���=�>#�
� � ����!��
����������6�3�
!��
���
�
�
���
�
�
=149
11678105
A ����� ���=�>�5������������
���
�
�
���
�
�
=31227
331821243015
A3 ��� �����=&�>5��������#�
�
%#�� 9�����'��� ���.���" ����.�� ��������.���" ����.#���� ����=��>�5���=��>#�
� � ����!��
����������6�6�
!�� ��
���
�=
65
73
42
A �� ��
���
�
�
���
�
�
=324
123
B ���� �AB 5 ��
���
�
48322917
��� ����=AB >�5��
� � ��
���
�
�
���
�
�
=232414222012393722
BA �� ����=BA >5�
�
��?��/ ���
�
0# ���� �������+� �'�)�� �� � '�� �.�� ���� ��� ����(���� �!� " ���� ��� � �- �� " �������#�
��(�����=ABC >�5���=CAB >5���=BCA >�����)��������������" ��������������� ��������
�� !��" �'���!���" (����������� �!�����������(�������������� �����#�
�# ��= 'AA >�5����= A'A >�5�� �= =
r
1i
c
1j
2ija ≥4#�
�
�
Page 21 Statistics 135: Matrix Theory for Statistics
20
��7�� ��8 ��"������� ����
�
9�����'�����2(����" ����.#���� �kA �.�����!�������������)��� ��*����;#�
A...AAA k = ��;���" ���
�REMARK: In keeping with scalar arithmetic where x0 = 1, we take A0 = I for A square.��
��?�����< 9����!����������0������4����������2�����/��2���������%�
�
0# ���� � �*���)��� ��*������� ��2��qpqp AAA += �� ��
pqqp A)A( = #�
����!��
�# !����5������ ��������� � �*���)��� ��*������� �=AB >��5�ppBA #�
����!��
�
����������7���
�
�9������5���#�=��>&�5��������5��������5��������5��&�&�
�
&# !�������� � �*���)��� ��*���� �������������������� �=��>��5�pp Ac �
����!��
�
��9 ������ ���� ���#�"�
�
�� " ����.� �� � '�� �������� ��� � ��� �('" �������� '+� ���- � *� ����J� ���� �� ���
'��- �� � ��- �� � �� ���(" �#� �!� ��(����� ���� �������� � *� �� � '�� �������� �(�� � � " � +�
��!!��� ��- �+�#�
�
����� ��� ������ ����������" ����.�
�
������
�
�
������
�
�
=
565554535251
464544434241
363534333231
262524232221
161514131211
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
A���
�
�
� ���������������!���������+���!� (" '����� �����@������� ���!���� *� ������'+�����
��������� ��������" ����.��
�
Page 22 Statistics 135: Matrix Theory for Statistics
21
1143
Ax
5�
���
�
�
���
�
�
aaaaaaaaaaaa
34333231
24232221
14131211
����������������� 1223
Ax
5�
���
�
�
���
�
�
aaaaaa
3635
2625
1615
�
�
�
2142
Ax
5� ��
���
�
aaaaaaaa
54535251
44434241��� ����� 22
22A
x5� �
�
���
�
aaaa
5655
4645�
�
��� ������" ����.����� � �- �'��- ����� ������" ����.��!�" ��������
�
� � �
����
�
�
����
�
�
=2221
1211
2242
2343
65 AA
AAA
xx
xx
x���
�
- �����*�)�����.����������������#��00���0�����0��� ������������������'��2-��������2��!�����
�������� ���" ����.��#�
�
I ����(��������- �����
�
����
�
�
����
�
�
=
������
�
�
������
�
�
=
2221
1211
ˆˆ
ˆˆ
3333
3232
565554535251
464544434241
363534333231
262524232221
161514131211
65 AA
AAA
xx
xx
x
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
K�
�
- �����*�)���� �������������� � *��!��#��
�
� ���������� �����!������������� � *���!�����
� � �00�� ����0���)��������" �� (" '����!����(" �#�
� �0��� �������������)��������" �� (" '����!����(" �#�
� �00�� ���0����)��������" �� (" '����!���- �#�
� ��0�� �������������)��������" �� (" '����!���- �#�
�
����������" " � ��� �������" � ��� ���!������('" ��������� ��������� ���������� � *��!��#�
�
�*� ������������������ � *��!�� ���.���" ����.���� ���!�(���('" ���������� �'���� ��'+�
Page 23 Statistics 135: Matrix Theory for Statistics
22
� �
Brxc
�5�
���
�
�
���
�
�
−−−
−
NM
LK
qcxprxqpr
qcpxpxq
)()()(
)(
����� ���
�
0# �������� � *���� �������������������)��� *���" ����.�� ���8(���!�(���('" �������#����� �'��
��)����� � ��� (" ���(�� ��- �� � �� ���(" �� �!� " �������#� � �(�� �������� � *� � � � +�
���**�����" � ����(������
�
� � � � �
� � � � �
� � � � �
� � � � �
�
��� �������- ��#�
�
�# �*� ��������" ����.����!���������.�2��� �'���������� ���� �������- ��� �������(" ���!�
�('" �����������
�
� � � ��5�
�������
�
�
�������
�
�
AAA
AAAAAA
rcrr
c
c
.....................
...
...
21
22221
11211
�
� �
� - ����� Aij ����������� pi �.�q j �� ���=
r
iip
1��5� p � ���
=
c
jjq
1�5�q �
� � ���� ���������������� ���" ����.��������
�
��
��
= ijAAji xqppxq��!�����5�0�7���� ��8�5�0�7���
&# �������� ���" ����������������������������5��������2%�
�
�
�
�
Page 24 Statistics 135: Matrix Theory for Statistics
23
$%6%' �((�������#���������(���������
�(������" ����������� ���������������� ����(����������
�
��
��
= ijAAji xqppxq���
��
�
��
��
= ijBBji xqppxq��!�����5�0�7���
� ��8�5�0�7���� ���=
r
iip
1��5� p � ���
=
c
jjq
1�5�q
#�
��� ���6������'��� �����" ��+�'+����� *������������� �� *��('" ���������!���� ���#�
�
�������
9�����
���
�=
����
�
�
����
�
�
=EDC
A�
83625471
� ����
���
�=
����
�
�
����
�
�
=ZYXW
V
44597283
#�� ��������(" ��!������������� ���
" �������#�
�
�
�
$%6%$ ���!���� !��!��������
�(������;����������� (" '���� ����" ����.�������������� ����(����������
�
��
��
= ijAAji xqppxq���!�����5�
0�7���� ��8�5�0�7���� ���=
r
iip
1��5� p � ���
=
c
jjq
1�5�q
�
��� �;������'��� ���'+�!��" � *������������" (��������!�������('" ����.#�
�
����������%�������������4��-����������� �
� �
� �'��� �%�#�
�
� $%6%+����������#�����������(��������
�(��������" ����.�������������� ����(��������
��
�
��
��
= ijAAji xqppxq
���!�����5�0�7���� ��8�5�0�7���� ��
�=
r
iip
1��5� p � ���
=
c
jjq
1�5�q �
Page 25 Statistics 135: Matrix Theory for Statistics
24
������� �������!����������� ���" ����.����������� �������" ����.��!���� ��������('$
" �������#�
�
����������9��5�
[ ]YX ,�5� ��
���
�
YX
'
'
�� ��
T
FEDCBA��
���
����5�
���
�
�
���
�
�
FCEBDA
''
''
''
�
�������
� 9��� ��
���
�=
����
�
�
����
�
�
=DCBA
X
121110987654321
#�� ���������� �������!������������� ���" ����.#��
�
$%6%&� � !��!���������#���������(���������
�(��������" ����.�������������� ����(����������
�
��
��
= ijAAji xqppxq���!�����5�0�7���� ��8�5�0�7���
� �� ppr
ii =�
=1� �� qq
c
jj =�
=1�� �����" ����.�������������� ����(��������
��
�
��
��
= jkBBkj xsqqxs
��� !��� 850�7��� � �� ;50�7��� � �� qqc
jj =�
=1� � �� ss
d
kk =�
=1#� ��� ��
��
�
��
��
= �=
c
j xsppxsjkij BAAB
ki1
#�
�
��������
�� ������ ��
���
�=
���
�
�
���
�
�
=2221
1211
987654
321
AAAA
A � �� ��
���
�=
���
�
�
���
�
�
=21
11
1111
11
BB
B #� � ��� ���� ����(��� �!� ����
�������� ���" �������#�
�
���� ��
������������ � *��!������ *��������(" ��" (���'��������" �����������!������ *�������- �#������
� ��� �����00�����������" �� (" '����!����(" ������00�������- �#�
� � �
Page 26 Statistics 135: Matrix Theory for Statistics
25
��: $�;���%�� ��������/�0����&2�"��������/�0���
�
0> ��" " (����)��9�- ��
�
�> ��!���� ����������.���" ������������ ���6���5���6��#��
����!��
�
'> ?(����������� ��!�" ����������� ���� �*� �������" " (����)����#�#����≠ ���#�
�
�> �����5������ �'���� �������������'���� �+�- �� ���� ��������'�����2(����� ����)��������" �������#�
��> ��!���������.���� ���������.������������.���'(�������� �����!� ��#����> �������.���� ���������.������������.���� ����������.��#���')��(��+�����≠ ����'���(���
���+���)����!!��� ��������#�
�
� �- �����������������!�?����.�?(����������� �'�� *���" " (����)���
�
�> ��5���5������!������2(����
�������!���������.��������5����5����
�
'> ���5����5������!������2(����
�������!���������.����� Opxr
Arxc
5� Opxc
�� �� Arxc
Ocxs
�5�Orxs
�
�
�> ���������)��9�- ��
�
�> !������� ����������.���" ������������ ���6�=���6���>�5�=���6���>�6���
����!��
'> !������� ���.���" ����.�����������.���" ����.��� �������� ���.���" ����.����� ��=��>�5�=��>�#�
����!��
�
&> � �����'(��)��9�- �
�
�> !������� ��������������.����" ���������� ������������.���" ����.����� �=���6����>���5����6���#��
'> !���������� ����.����" ����.��� ����� ����������.���" ������������ ���=���6���>�5����6���#�
����!���.�������
�
@> !�.�� ��+���������� ��#�������� ���.���" ����.��� �����������.���" ����.����� ��
�> =�.�6�+�>���5�.��6�+�#�
'> .�=�+��>�5�=�.+�>���5�+�=�.��>#��> ��=�.��>�5�.�=����>#�
�> .�=���6���>�5�.��6�.�����������.��#�����!���.�������
Page 27 Statistics 135: Matrix Theory for Statistics
26
%> ������� ������)� *�������" ���������
=���6���>�=���6���>,�5�=���6���>�=��,�6��,�>5���,�6���,�6���,�6���,#�
����!��
B> ������� �����2(�����=���6���>��5����6����6����6���#�
����!��
A> � �)� ��� +���.���" ����.�����������.������ ���.���" ����.����(����������6���5��#�
����!���.�������
����������( �2(��� ����5�$�#�
� �
���,�#�&�������;����"��������/�0���
�
� ���� !����- � *� ���� �.�" ����� �!� ���(���� � � " ����.� ��*�'��� ����� �� �������� ������ �������
� ���*(��#�
�
0> �L�6��L�5�=���6���>�L�� ��L��6�L��5�L�=���6���>��L�=���6���>���� ��� ���������+��2(������=���6�
��>�L:�- �������� ����������*�'�����.�6�'.�5�=���6�'�>�.�5�.�=���6�'>#�
�����L��6�M L�*� �����+����� ,����)��L������!�����#�
�
�> LF�E�L�5�L�=�F�E��>��- ������F�����2(���:��- ��������� ����������*�'����.+�E�.�5�.=+�E�0>#�
��������- �+���� �������� !��" �'����+#�
�
&> �)� �- �� ����� �����'�����.����� �������!�������" ������������+����� ���� �*� ������2(��:�
- �������� ����������*�'�����'�5�'�#�
�
@> �����2(���� ����5�������� ����" ��+�����������������#�
�
�����������,���
9�����5� ��
���
�
1111
����5� ��
���
�
−− 2222
���� ����5��
�
%> �����2(���� ����5�������� ���" �� ��������5��#�
�
�����������,���
9�����5�
���
�
�
���
�
�
−−− 5211042521
����� ����5��
�
�
�
�
Page 28 Statistics 135: Matrix Theory for Statistics
27
�
�
B> F��5���" ������ �������F�5�� ���F�5�$#�
�
�����������,�'�
F�5� ��
���
�
−1401
�≠ ����'(���F��5���5� ��
���
�
1001
�
�
A> I ���� ���)��?��5�?�- ����'����?�≠ ��� ��?�≠ ��#�
�
�����������,�.�
9���?�5� ��
���
�
−−
2323
����� �?��5��
�
������(���" ����.�?�- ����?�5�?�������������'������.�����#�
�
1> �L�5��L������ ����" ��+���5���
�
�����������,�3�
9�����5�
���
�
�
���
�
�
202110201
������5�
���
�
�
���
�
�
−032140
031����L�5�
���
�
�
���
�
�
633422756
�
�
��� ��L��5�
���
�
�
���
�
�
2616181055191112
���L��5�
���
�
�
���
�
�
2616181055191112
�
�
��(������L�5��L�'(����≠ ��#�
�
�
���� � ����� ?� ��������������������&����#�B4$A1#����������- ���������.�������#�
�
�
�
�
�
�
Page 29 Statistics 135: Matrix Theory for Statistics
28
#<������������ ���5�� ����������� "�!" )��#�$�
�
������� �!�� )�� ���#�"�
�
� �?���.�����������)��+����+����" �;����" ����.��!� (" '�����8(�����" ��+�� �(������" ����.�� �
�����������������- ��������������!�����������" � ���!�����" ����.��� ������� �(���!����" � ���������
���������(���'����" ���������������* " � ���!��������" � ����!�����" ����.#�
�
�.�" �����#�#0#0#�=!��" ��.��#0#0>�
�
9����5 ��
���
�
4231
��� ���5 ��
���
�
8675
#��
��� � �(�� ���� " ��������� - �� 8(��� ������ � � �(��
����������������� ��)��(������" � ����!�����" ����.���
�
�
�������� ��������� ������#�$�
�
� � �)� �������- ��" �������������� !��" �'������������� ��������" ����.�������� �� ��.�������
8(�������� ��)��(���������� ��!����" � ����!�������������������� �� *����" � ���� ��#��
�
�.�" �����#�#�#0#�=!��" ��.#��#0#0#>�
���- ��; �- ����6���5� ��
���
�
4231
�6� ��
���
�
8675
�5�
6 108 12� �� �� �
�
�
�
����'��"#�$���� !$��$#��� �
�
� �������" (����������� �� ��.�������8(�����" ����������" ����" (����������� ��'(��- ��" (�����+�����
���" � ����!�����" ����.�- ������!�.��$��!��� ������ ��� �� (" '��#�
�
�.�" �����#�#�#�=!��" ��.��#�#0>�
�
9����51 32 4� �� �� �
��� ��- ��- � ��������)��!���&�#�
���" �;����!��� �������������� ��� �����!�.����
- �������������������* �=H>�'�!���������������- �
��!��� ���� �����(" ���!��� ����!�����������!�
Page 30 Statistics 135: Matrix Theory for Statistics
29
����������#����" ��.�����&��5�3 96 12� �� �� �
#�
�
�.�" �����#�#&#�#�=!��" ��.��#�#�#>�
�
9��� �� 5�
���
�
�
���
�
�
512031
#� ��� ���)�� !��� E��� ����
�*���)�� " ����.� �!� ��� ��� � ��� ��� 8(���
��" ��+������ *������������)��(��'���2(���
���$0#��
�
����.��� ����"!(���#�� �
�
�.�" �����#�#@#0�=!��" ��.��#�>�
9�����5�
���
�
�
���
�
�
064912
�� ����5�
���
�
�
���
�
�
922347
#��
��� ���E���5�
5 36 24 9
− −� �� �� �� �−� �
�
�
�
�
����3��� ����� !$��$#��� �
�
�)� � � ��.����� - �� �� ����!��" � " ����.� " (����������� �� '(�� ��� ��� �� ������� '��� ��" ��������#� ��� *���
��2(�� �����- ��- �������- �����������(���������'+�����#�
�
������������3���=!��" ��.#��#@#0>�
9��� Ax22
�5� ��
���
�
4021
�� �� Bx325� �
�
���
�
−−
206121
#�I ��- �������)��!�������" ����.���#�
�
Page 31 Statistics 135: Matrix Theory for Statistics
30
0#��!����� �(��� *�����" ����������� �����- ��
(��������.��������+�!( ���� �
5??< 9�=����+0�����+�>��
�
�- ���!����� �(��� *�����!��" (����+�(�*�����
�� *��� (" '��� � �- ���� - ����� � � �(�� �����
- ���� '�� 0&#� �- �� ����� ��� 8(��� ���� ����� *�
���" � �� �!� ���� ����(��� " ����.�� ��� ���� !(���
� �- ��� !��� ���� " (����������� �� �� ��� - ��
�.�������" ����.��!�������=�.&>#�
�
�
�
�
�#� ��� *��� ���� !� ��� ������������ � �- ���� - �� !�����
��*���*��� �(�� �.������� " ����.� ��J�� � ��� ����
�����������#��
�
�
�
�
�
�
�
�
�
&#��!������*���*��� *��- ������������!( ���� �;�+�����
� ��- ��- ����*�������!����- � *��(���" ���
�
�
�
�
�
�
�
�
@#� �!���� ������ *� ���� - �� !����- � '+� ������ *� � ��
����� *�����6���!�6� ����� ��- ��- ����*���������������
����(���" ����.���
�
���5�13 2 524 0 8
−� �� �−� �
�
Page 32 Statistics 135: Matrix Theory for Statistics
31
NNN�I �� � *���.����- ����" ��!( ���� ��!�+�(�" ������- �� *��������!���������(���� ��- �� �+�(�
- � ���������������#���+����+�(����!#�NNN�
�
����3���� �������!#������8 ��=�#���"�
�
���*�������� �������(����!��- ��)��������- ���� ,����)������������.����;�+�#��������(������������+�
������ *��� (" '�������5??< 9�=����+0�����+�>��(��(��#�
�
������������3������=!��" ��.#��#@#�>�
9��� ax31,�5�[ ]523 �� �� x
x135�
���
�
�
���
�
�
604020
#��
��� �a , x �5�@@4��
�
�
����3�����!��������!#������8 ��=�#���"�
�
���*��������(��������(����!��- ��)��������+�(���)��
�����*���*��������.���������J���!���������(���" ����.�
�
������������3������
9��� ax31,�5�[ ]523 �� �� x
x135�
2040� �� �� �
#��
��� � x a ,�5�
60 40 100120 80 200� �� �� �
�
�
����6����� "��" )�� ���#�"�
�
�����������- ��- �+�������� ��������" ����.�� ��.������
� 0#�< �� *�����5��� �����=����+0>�!( ���� �� �������� *�������� �����6���!�6� ����
� �#��������+$��������������?������
�
����6�������� "��"�������� ��! #�� �
�
���(�������5��� �����=����+0>�!( ���� ����������" ������������5??< 9��!( ���� ��'(��+�(�8(�����)��
���� �(��� ������+�� ��������� ������#�
Page 33 Statistics 135: Matrix Theory for Statistics
32
�
0#�9��,����+�- ����)��" ����.����������- �� �������*��#�
������ ����������" ����.�- ���+�������!��" (���!( ���� �
������- #�
�
�
�
�
�
�#�I �� �+�(�!� ����� �(��� *�����!��" (���'+������� *�
� ���� ��� ������� *� � +� ������ �.���� - ���� !����� ����
OP�9< ���������- ���������.��������� �������" ����.����
'����� ��������� ���!��" �� ������#�
�
�
�
�
�
&#� �*���*��� ���� �.������� ������ �!� ���� ��� �������
" ����.�� ��������������� �����6���!�6� �������*�������
����������� �������" ����.#��
�
�
�
�
�
�
����6�����#��>1��"���"��#�$�� ��<���
�
������������������������*�'�.�����������������;�'�.�
!���Q��� �����#R���" ��+�(����������������+���� ������
��" ����.#��
�
�
�
�
�
�
�
�
#<������ �� ���� �� (5� � ����
������� "�!" )�"�"�����#�� $ �
�
Page 34 Statistics 135: Matrix Theory for Statistics
33
��� (��� ���� !��� " ����.� ��*�'��� ��" �(����� ��� - �� (��� ���� �������)��?����.� 9� *(�*�� =?9>�
������(���� - ����� ��� �� ���;�*�� ����'��� �!� " ���� �!� �(�� " ����.� �������� �� � �� ������ " ����.�
� ��+�������� �2(��#��������- ��������- ������" ��'������������� ������(������ ����������#�
�
��(������"#�( )�� ���#�"�
�
���������'�����- �������" ����.�� ����S?9��- ���+�������!����- � *����*��" �!����.�" �����'���- ���
�
�.�" �����#�#0#0#�=!��" ��.��#0#0>�
�
9����5 ��
���
�
4231
��� ���5 ��
���
�
8675
#����� �(������" ����������
T� proc iml; /*initializes the IML package */ A={1 3, /*to declare one row of elements, there should*/ 2 4}; /*be a space between elements and a row ends */ B={5 7, /*with a comma. Always place a matrix name e.g., A, B */ 6 8}; /*to identify different matrices. */ print A, B; /*prints the matrices to the output window. */ run;
�
�����(��(��������������������������������������������������������������������� �����������������������������
�
����������������������������������������������������
�
�������������������������������������������������������������
�������������������������������������������������������������
����������������������������������������������������
�
�������������������������������������������������������������
�������������������������������������������������������������
�
�
��(����� ��������� �� ��"!(���#�� �
� proc iml; A={1 3, 2 4}; B={5 7, 6 8}; C=A+B; /*C to be the sum of the two matrices */ D=A-B; /*D to be the difference of the two matrices */ print A B C D; /*prints the matrices to the output window. */ run;
�
�
Page 35 Statistics 135: Matrix Theory for Statistics
34
�����(��(�������������������������������������������������������
�
���������������������������������������������������������������������������������
�
�������������������������������������������������������������������������������������������
�������������������������������������������������������������������������������������������
�
��(����"#�$���� ��� ����� !$��$#��� �
� proc iml; A={1 3, 2 4}; B={5 7, 6 8}; E=2*A; /*E to be the product of a scalar and a matrix */ F=A*B; /*F to be the product of the two matrices */ print A B E F; /*prints the matrices to the output window. */ run;
�
�
�����(��(�������������������������������������������������������������������������
�
������������������������������������������������������������������������������� �
�
�������������������������������������������������������������������������������������������
�������������������������������������������������������������������������������������������
�
��(�'����� "��"��� ����8 ��"������� ����
� proc iml; A={1 3, 2 4}; B={5 7, 6 8}; E=A`; /*E to be the transpose of A */ F=A**2; /*F to be the A matrix multiplied to itself */ G=A**3; /*G to be the A matrix powered by 3 */ print A B E F G; /*prints the matrices to the output window. */ run;
�
�
�����(��(�������������������������������������������������������������������������
�
����������������������������������������������������������������� �������������������!�
�
�������������������������������������������������������������������������������������������������
�������������������������������������������������������������������������������������������������
Page 36 Statistics 135: Matrix Theory for Statistics
35
��(�.�����#�������� ����
� proc iml; A={1 3, 2 4}; B={5 7, 6 8}; F=trace(A); /*F to be the trace of A */ G=trace(B); /*G to be the A matrix powered by 3 */ print A B F G; /*prints the matrices to the output window. */ run;
�
�
�����(��(�������������������������������������������������������������
�
��������������������������������������������������������������������� ���������!�
�
���������������������������������������������������������������������������������
�������������������������������������������������������������
�
Page 37 Statistics 135: Matrix Theory for Statistics
1
3 SPECIAL MATRICES 3.1 SYMMETRIC MATRICES Defn: A matrix A is symmetric if and only if A’ = A, i.e., aij =a ji ∀ i, j.
Example 3.1.1
A = ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
360641012 is symmetric
Defn: A matrix A is skew‐symmetric if and only if A’ = ‐A, i.e., aij = a ji− ∀ i, j.
Example 3.1.2
A =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−−043402
320 is skew‐symmetric
Remarks: 1. If A is symmetric or skew‐symmetric, then A is a square matrix. 2. If A is skew‐symmetric, then the elements on the diagonal of A are all zero and
each off‐diagonal element is minus its symmetric partner. Products of Symmetric Matrices Products of symmetric matrices are not generally symmetric. If A and B are symmetric matrices of the same order then the transpose of the product AB is ( AB )’ = B’A’ = BA. Since BA is generally not the same as AB, this means AB is generally not symmetric. Example 3.1.3
Let A = ⎥⎦
⎤⎢⎣
⎡3221 and B = ⎥
⎦
⎤⎢⎣
⎡6773 . Then AB =
And ( AB )’ = Remarks: 1. If A is symmetric, then A’ is symmetric. 2. If A is an r x r matrix, then A = S + K (decomposition is unique) where S is
symmetric and K is skew‐symmetric. Also, a. A + A’ is symmetric. b. A – A’ is skew‐symmetric.
Page 38 Statistics 135: Matrix Theory for Statistics
2
3. Let A and B be symmetric matrices a. A + B is symmetric. b. AB is symmetric if and only if AB = BA.
Properties of AA’ and A’A 1) Products of a matrix and its transpose always exist and are symmetric.
If A is an r x c matrix, then i) AA’ is symmetric, since ( AA’ )’ = ( A’ )’ A’ = AA’. ii) A’A is symmetric, since ( A’A )’ = ( A’ ) ( A’ )’ = A’A.
Note: AA’ and A’A are not necessarily equal. Remark: Matrix multiplication ensures that elements of AA’ are inner products of rows
of A with themselves and with each other:
Suppose Arxc =
⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
aaa
aaaaaa
rcrr
c
c
.....................
...
...
21
22221
11211
, Acxr
’ =
⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
aaa
aaaaaa
rccc
r
r
.....................
...
...
21
22212
12111
then
AArxr
’ =
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
∑
∑∑
∑∑∑
=
==
===
c
jrj
c
jrjj
c
jj
c
jrjj
c
jjj
c
jj
a
aaa
aaaaa
1
2
12
1
22
11
121
1
21
.
..
..
..
...
...
Q: How about A’A ? Remark: AA’ and A’A have diagonal elements that are nonnegative since the sum of
squares is always nonnegative. 2) A’A = O implies A = O. ( AA’ = O implies A = O. )
Proof: 3) tr( A’A ) = O implies A = O. ( tr( AA’ ) = O implies A = O. )
Proof:
Page 39 Statistics 135: Matrix Theory for Statistics
3
Remark: Results 2) and 3) are seldom useful for the sake of some particular matrix A, but they are often helpful in developing other results in matrix algebra when A is a function of other matrices. Example 3.1.4
For matrices Prxc
, Qrxc , X
cxs , P X X ’ = Q X X ’ implies P X = Q X .
Proof: Products of Vectors 1) The inner product of 2 vectors x and y is a scalar, thus it is always symmetric.
x’y = y’x = k, k a scalar 2) The outer product of 2 vectors x and y is not necessarily symmetric xy’ is not
generally equal to yx’. Example 3.1.5
Let x =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
201 and y =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
642
. Then the inner product of x and y is
and the outer product is Sums of Outer Products Consider A
rxc = [ ]aaa c...21 where a j
has an r x 1 dimension, j = 1,
2, … , c and Bcxs =
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
b
bb
c'
'2
'1
.
.
. where b j ’ has a 1 x s dimension.
Page 40 Statistics 135: Matrix Theory for Statistics
4
Then ABrxs
= [ ]aaa c...21
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
b
bb
c'
'2
'1
.
.
. = ba j
c
jj∑
=1’
= sum of outer products of columns of A with corresponding rows in B
Example 3.1.6
Let A =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
635241 and B =
⎥⎦
⎤⎢⎣
⎡10987
bb
'2
'1 .
a1 a2
Then AB = Special case: B = A’
AA’ = aa j
c
jj∑
=1’
Elementary Vectors Defn: A vector with unity for its ith element and zeros elsewhere is called an
elementary vector.
Notation: e ni
)( where i indicates the position of 1 and n denotes its order
Example 3.1.7
e )3(1 =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
001
, e )4(3 =
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
0100
, e )6(5 =
⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
010000
, e )2(2 = ⎥
⎦
⎤⎢⎣
⎡10
Page 41 Statistics 135: Matrix Theory for Statistics
5
Remarks:
1) Eij = en
i)( e n
j)( ’ = null matrix except for the (i,j)th element being unity
Example 3.1.8
Let e )3(1 =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
001 and e )3(
2 =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
010
. Then E12 = e)3(
1 e )3(2 ’ =
2) I n = ee ni
n
i
ni
)(
1
)(∑=
’ = ∑=
n
iiiE
1
Example 3.1.9
I 4 =
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
1000010000100001
= e )4(1 e )4(
1 ’ + e )4(2 e )4(
2 ’ + e )4(3 e )4(
3 ’ + e )4(4 e )4(
4 ’
= E11 + E22 + E33 + E44
3) Let A be an r x c matrix, then
a. e ri
)(’ A = ith row of A
b. A e cj
)( = jth column of A
Example 3.1.10
Let A =
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
9563710124231
, e )4(3 =
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
0100
, and e )3(2 =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
010
Then e )4(3 ’ A = and A e )3(
2 =
Page 42 Statistics 135: Matrix Theory for Statistics
6
3.2 MATRICES WITH EQUAL ELEMENTS Defn: Vectors whose every element is 1 are called summing vectors. They can be used
to express a sum of numbers in matrix notation as an inner product.
Notation: 1n ’ = ⎥⎥⎦
⎤
⎢⎢⎣
⎡
n
1...111
Example 3.2.1 1) 14 ’ = [ ]1111 , x ’ = [ ]xxxx 4321
14 ’ x =
2) A = ⎥⎦
⎤⎢⎣
⎡642531
12 ’ A = A 13 =
Remark: The inner product of a summing vector with itself is a scalar, the vectors’
order, i.e. , 1n ’ 1n = n.
Defn: Let J denote the outer product of 2 summing vectors, 1r and 1s ’. Then, J is a
matrix with all elements equal to one. Notation: J
rxs = 1r 1s ’ , J n = 1n 1n ’
Example 3.2.2
1312 ’ =
14 14 ’ =
Remarks: 1) λ J
rxs = matrix with all elements equal to λ
2) Jrxs
Jsxt = s J
rxt
Page 43 Statistics 135: Matrix Theory for Statistics
7
3) 1r ’ Jrxs = r 1s ’
4) Jrxs
1s = s 1r
5) J n = 1n 1n ’ and J n2 = n J n
6) J n = n1 J n and J n
2 = J n
Defn: Let Cn = I n ‐ J n = I n ‐ n1 J n . Then Cn is called the centering matrix.
Example 3.2.3
:3=n C3 =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
100010001 ‐
31
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
111111111
Note: 1) C ’ = C and C2 = C
2) C 1 = O
3) C J = J C = O Remark: The mean and sum of squares about the mean for the data x1, x2 , … ,
xn can be expressed in terms of 1 ‐ vectors and J ‐ matrices.
Let x ’ = [ ]xxx n...21 . Then
1) Sample mean in matrix form
x = n1 ∑=
n
iix
1 =
n1 x ’ 1 =
n1 1’ x
2) Each observation as a deviation from x in matrix form
[ ]xxx xxx n −−− ...21
= x ’ ‐ x 1’ = Exercise:
Show that 1
'
−nCxx
is the sample variance.
Page 44 Statistics 135: Matrix Theory for Statistics
8
3.3 IDEMPOTENT MATRICES Defn: A matrix A is idempotent if and only if A2 = A. Example 3.3.1
Identity matrices, square null matrices, J n
Defn: A matrix A satisfying A2 = O is called nilpotent, and that for which A2 = I
could be called unipotent. Example 3.3.2
1) A =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−− 5211042521
is nilpotent
2) B =
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−
1000010053104201
is unipotent
Remarks: 1. Idempotent matrices are necessarily square, o.w., A2 does not exist.
2. When A is idempotent, Ar = A, r = 1, 2, … Theorem: Let A and B be idempotent matrices, then
a) A + B is idempotent if AB = BA = O. b) AB is idempotent if AB = BA. c) ( I – A ) is idempotent, but not ( A – I ).
3.4 ORTHOGONAL MATRICES
Defn: The norm of a real vector x ’ = [ ]xxx n...21 is defined as x =
xx'+ = ⎟⎠⎞
⎜⎝⎛∑=
n
iix
1
22
1
Example 3.4.1 Let x ’ = [ ]4221
Page 45 Statistics 135: Matrix Theory for Statistics
9
Defn: A vector x is said to be a unit vector (normal vector) when its norm is unity, i.e.,
xx' = 1. Example 3.4.2 Let x ’ = [ ]8.04.04.02.0
Note: Given a non‐null vector x , let u = xx
x' then u is the normalized form of x
since uu' = ⎟⎟⎠
⎞⎜⎜⎝
⎛
xxx'
⎟⎟⎠
⎞⎜⎜⎝
⎛
xxx'
= xxxx
'' = 1.
Example 3.4.3
Let x ’ = [ ]4221 x = 5 u ’ = [ ]54
52
52
51
Defn: The non‐null vectors x and y are said to be orthogonal when yx' = 0 (or xy'
= 0 ). Example 3.4.4 1) Let x ’ = [ ]432 and y ’ = [ ]221 −− . Then yx' =
2) Let x ’ = [ ]4221 and y ’ = [ ]2236 −− . Then yx' =
Defn: The vectors x and y are defined as orthonormal when they are normal and
orthogonal , i.e., xx' = yy' = 1 and yx' = xy' = 0.
Example 3.4.5 Let x ’ = [ ]6
46
36
36
16
1 and y ’ = [ ]104
101
101
109
101 −−−−
Defn: A group, or collection, of vectors all of the same order is called a set of vectors. Defn: A set of vectors xi for i = 1, 2, … , n is said to be an orthonormal set of vectors
when every vector in the set is normal, xi ’ xi = 1 for all i, and when every pair
of different vectors in the set is orthogonal, xi ’ x j = 0 for i ≠ j = 1, 2, … , n.
Page 46 Statistics 135: Matrix Theory for Statistics
10
Remarks: 1. The vectors of an orthonormal set are all normal, and pairwise orthogonal. 2. A matrix A
rxc whose rows constitute an orthonormal set of vectors is said to have
orthonormal rows, whereupon AA ’ = I r . But then AA' is not necessarily an
identity matrix I c . Conversely, when Arxc has orthonormal columns AA' = I c
but AA ’ may not be an identity matrix.
Example 3.4.6
Let A = ⎥⎦
⎤⎢⎣
⎡010001 . Then AA ’ =
and AA' =
Defn: Let A be a square matrix, then A is said to be an orthogonal matrix if AA' =
AA ’ = I . Remark: A orthogonal ⇒ A has orthonormal rows and orthonormal columns since square matrices with orthonormal rows have orthonormal columns. Example 3.4.7
1) Show that A = 6
1
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−
211033222 is an orthogonal matrix.
2) I n is an orthogonal matrix.
Remark: Let An and Bn be orthogonal matrices. Then AB is orthogonal.
Proof: 3.5 QUADRATIC FORMS Defn: Let x
nx1 be a vector and An a square matrix, then the product Axx' is called a
quadratic form.
Page 47 Statistics 135: Matrix Theory for Statistics
11
Example 3.5.1
1) Let x ’ = [ ]xxx 321 and A = ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡ −
336844521
.
Then Axx' =
2) Let x ’ = [ ]xx 21 and A = ⎥⎦
⎤⎢⎣
⎡
aaaa
2221
1211 .
Then Axx' =
The results are quadratic functions of the x’s; hence the name quadratic form.
Note: If x ’ = [ ]xxx n...21 and A = { }aij , i , j = 1, 2, …, n; then Axx'
= ∑=
n
i 1xxa ji
n
jij∑
=1
= xa in
iii
2
1∑=
+ ∑=
n
j 1xxa ji
n
jiij∑
≠
= xa in
iii
2
1∑=
+ ∑=
n
j 1xxa ji
n
jiij∑
< + ∑
=
n
j 1xxa ji
n
jiij∑
>
= xa in
iii
2
1∑=
+ ∑=
n
j 1xxaa jiji
n
jiij )( +∑
<
Thus, there is no unique matrix A for which any particular quadratic form can be expressed as Axx' .
Example 3.5.2
Axx' = x ’
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡ −
336844521
x is the same as Bxx' = x ’
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
351645011 x
The quadratic form is the same even though the associated matrix A in the first product is different from matrix B of the second product. A and B have the same diagonal elements, and in each of them the sum of each pair of symmetrically placed off‐diagonal elements aij and a ji are the same.
Page 48 Statistics 135: Matrix Theory for Statistics
12
Remark: For any particular quadratic form, there is a unique symmetric matrix A for which the quadratic form can be expressed as Axx' . It can be found in any particular case by rewriting the quadratic Axx' where A is not symmetric as
x ’[ ])(21 'AA+ x , because 2
1 ( )AA '+ is symmetric.
Example 3.5.3
Axx' = x ’
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
32152
12
15432
131 x
Hence, if A is symmetric, i.e., aij = a ji , we can express Axx' as
Axx' = xa in
iii
2
1∑=
+ 2∑=
n
j 1xxa ji
n
jiij∑
<
Therefore, when dealing with quadratic forms, we can always take A as symmetric. This will be convenient not only because the symmetric A is unique for any particular quadratic form, but also because symmetric matrices have many properties that are useful in studying quadratic forms, particularly those associated with analysis of variance.
Hereafter, whenever we deal with a quadratic form Axx' , we assume A = A' .
( or if not, we express A in terms of its symmetric counterpart ) 3.6 NON‐NEGATIVE DEFINITE MATRICES
All quadratic forms Axx' are zero for x = o . For some matrices A the corresponding quadratic form is zero only for x = o . EXAMPLE 3.6.1
Let A = ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
620231011
. Then Axx' =
Page 49 Statistics 135: Matrix Theory for Statistics
13
Defn: When Axx' > 0 for all x other than x = o then Axx' is a positive definite
quadratic form, and A = A' is correspondingly a positive definite (p.d.) matrix.
There are also symmetric matrices A for which Axx' is zero for some non‐null x as well as for x = o . EXAMPLE 3.6.2
Let A = ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−−−−−
173243132
24237. Then Axx' =
Defn: When Axx' ≥ 0 for all x and Axx' = 0 for some x ≠ o then Axx' is a
positive semidefinite quadratic form and hence A = A' is a positive
semidefinite (p.s.d.) matrix. Notations: A > 0 ⇒ A is p.d. ; A ≥ 0 ⇒ A is p.s.d. Remarks: 1. Positive definite and positive semidefinite matrices are called non‐negative
definite (n.n.d.) matrices. 2. All symmetric idempotent matrices are p.s.d. (except I , which is the only p.d.
idempotent matrix) Verify:
1. ( I ‐ J ) is idempotent, hence p.s.d.
2. )(1
xn
iix −∑
=
2 = Cxx' is p.s.d. because it is positive except for being zero when all
the xi ’s are equal. Reading Assignment: Searle, Chapter 4: Determinants, pp. 84‐118. Do the exercises for practice.
Page 50 Statistics 135: Matrix Theory for Statistics
47
4 DETERMINANTS 4.1 DEFINITIONS Defn: Let S = { 1, 2, … , n } be the set of integers from 1 to n, arranged in ascending order. A
rearrangement j1j2j3…jn of the elements of S is called a permutation of S. The total number of permutations of S is n!. We denote the set of all permutations of S by Sn. Example 4.1
S = { 1, 2, 3 } S3 = { 123, 132, 213, 231, 312, 321 } → 3! = 6 permutations Defn: A permutation j1j2j3…jn of S is said to have an inversion if a larger integer, say jq, precedes
a smaller one, say, jr. A permutation is called an even permutation if the total number of inversions in it is even, or odd if the total number of inversions in it is odd.
If n ≥ 2, there are 2
!n even and 2!n odd permutations in Sn.
Example 4.2 1) S1 has 1 permutation: 1, which is even ( no inversion ) 2) S2 has 2 permutations: 12, which is even (no inversion) and 21, which is odd ( 1 inversion ) 3) In the permutation 4312 in S4, the total number of inversions is 5. Thus, 4312 is an odd
permutation. 4) In the permutation 15342 in S5, the total number of inversions is 5, and 51342 has 6
inversions. Defn 1: Let A = { }aij be an n x n matrix. The determinant of A, denoted by det( )A or A ,
is defined by A = ∑ ±S n
)( aaa jjj 321 321 …anjn where the summation is over all
permutations j1j2j3…jn of the set S = { 1, 2, … , n }. The sign is taken as ( + ) when the permutation is even and ( ‐ ) when it is odd.
Illustration:
1) A = ⎥⎦
⎤⎢⎣
⎡aa
aa
2221
1211; S = { 1, 2 } : thus A =
Page 51 Statistics 135: Matrix Theory for Statistics
48
2) B =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
bbb
bbb
bbb
333231
232221
131211
; S = { 1, 2, 3 }: thus B =
Example 4.3
Let A =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
163194121
. Then A =
• Basket Rule for 2 x 2 and 3 x 3 matrices:
A = ⎥⎦
⎤⎢⎣
⎡aa
aa
2221
1211 =
B =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
bbb
bbb
bbb
333231
232221
131211
=
Remarks: 1. Determinants are defined only for square matrices. The determinant of a non square
matrix is undefined and therefore does not exist. 2. det(scalar) = scalar.
Defn: Let A = { }aij be an n x n matrix. Let M ij be the (n‐1) x (n‐1) submatrix of A
obtained by deleting the ith row and jth column of A . The determinant M ij is called
the minor of aij .
Defn: Let A = { }aij be an n x n matrix. The cofactor, Aij , of aij is defined as Aij = (‐1)i+j
M ij . (no. of cofactors = n2)
Page 52 Statistics 135: Matrix Theory for Statistics
49
Determinant, Def’n. 2: (Using cofactor expansion)
Let A = { }aij be an n x n matrix. Then A = Aa ijn
jij∑
=1 , for any i
A = ∑=
n
jija
1(‐1)i+j M ij (expansion of A about the ith row)
A = Aa ijn
iij∑
=1 , for any j or A = ∑
=
n
iija
1(‐1)i+j M ij
(expansion of A about the jth column)
Note:
1) The cofactor expansion is used recurrently when n is large, i.e., each M ij is expanded by
the same procedure. 2) Expansion about any row will produce a determinant which is the same as when expansion
is done about any column. 3) Expansion should be done about the row/column which has the largest no. of zeros.
Computations: A. First‐Order Determinant
a = a , a ℜ∈
Example 4.4
7 = , 18 = , 5− =
B. Second‐Order and Third‐Order Determinants
‐ use Basket Rule Example 4.5
1) A = ⎥⎦
⎤⎢⎣
⎡−− 4356
, then A =
2) B = ⎥⎦
⎤⎢⎣
⎡3109
, then B =
Page 53 Statistics 135: Matrix Theory for Statistics
50
3) D =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−
−
963852741
, then D =
4) F =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−−
040530261
, then F =
C. Higher‐Order Determinants
- use cofactor expansion Example 4.6
1) A =
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−
−−
32023003
31244321
, A =
2) B =
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−
1462132010011131
, B =
4.2 Properties of Determinants
The following results are theorems (for proofs, see Searle). Let A = { }aij be an n x n matrix
1) A' = A .
→ since expansion about the row is equivalent to expansion about the column
Page 54 Statistics 135: Matrix Theory for Statistics
51
2) If 2 rows (cols) of A are the same, then A = 0.
→ since if A has 2 rows which are the same, we can expand A by minors so that 2 x 2
minors in the last step of the expansion are from 2 equal rows. Then ⎥⎦
⎤⎢⎣
⎡baba
=
abab − = 0 for all minors ⇒ A = 0
3) If one row (col) of a matrix is a multiple of another row (col), the determinant is 0.
→ factor out the constant (multiplier) to produce a determinant with 2 rows (cols) the same
4) If A has a zero col (row), then A = 0.
→ expand about that col (row)
5) If A is a triangular matrix, then A = ∏=
n
iiia
1.
→ use cofactor expansion recurrently along the row/col with the most 0’s
6) If A is a diagonal matrix, then A = ∏=
n
iiia
1.
7) When a nonzero scalar λ is a factor of a row (col) of A , then it is also a factor of A , i.e.,
A = λ (col) row a ofout factored with λA
Example 4.7
A = ⎥⎦
⎤⎢⎣
⎡7164
⇒ A =
8) If λ is a scalar, Aλ = λ n A .
9) If A is skew‐symmetric, and n is odd, then A = 0.
→ A = A' = A− = (‐1)n A , n is odd ⇒ aa - = iff 0=a ∴ A = 0.
10) If A and B are square matrices and are of the same order, then AB = A B .
Page 55 Statistics 135: Matrix Theory for Statistics
52
11) For A and B square matrices of the same order, AB = BA .
→ since A B = B A .
12) Ak = A k , where k is a positive integer.
13) If A is orthogonal, then A = ± 1.
→ since AA’ = I and I = A A' = A 2 = 1 ⇒ A = 1±
14) If A is idempotent, then A = 1,0 .
→ since A2 = A ⇒ A2 = A 2 = A ⇒ A = 1,0
15) For A and B square matrices of the same order, if AB = I , then A ≠ 0 and B
≠ 0 .
→ since AB = A B = I = 1 ⇒ A and B ≠ 0
16) For A and B square matrices of the same order, BIAO
− = A .
17) If A and B are square matrices, not necessarily of the same order, then BOOA =
A B .
18) If A , B and C are matrices of order n x n, then BCOA
= A B .
4.3 Elementary Row Operations Defn: An elementary row (col) operation on a matrix A
mxn is any one of the following
operations: a) Type I operation: interchange row (col) i and row (col) h. b) Type II operation: multiply row (col) i by c ≠ 0. c) Type III operation: add a multiple of row (col) i to row (col) h, i ≠ h.
Page 56 Statistics 135: Matrix Theory for Statistics
53
4.3. A. How Type I elementary row or column operation can be done using matrix operations:
Let A=4 3 23 6 12 7 1
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
. What we will do is interchange the the 2nd and 3rd row of the matrix. To do
that, we will post‐multiply A to the matrix E∗ =
1 0 00 0 10 1 0
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
. If we do the multiplication, we will
get B =E∗ A=
4 3 22 7 13 6 1
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
.
Let us switch the 2nd column and the 3rd column of A . To do this, we will pre‐multiply E∗.
What we will get is B = A E∗=
4 2 33 1 62 1 7
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
.
Exercise (Assignment):
1. Using the matrix A above, switch the 1st row with the 2nd row, by defining a new matrix to
pre‐multiply to A .
2. Using the matrix A above, switch the 1st column with the 3rd column, by defining a new
matrix to post‐multiply to A . 4.3. B. How Type II elementary row and column operations are done:
Pre‐multiply the matrix G =0 0
0 00 0
ab
c
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
to A . Post‐multiply the matrix G to A . What are
the results? What have you noticed? 4.3. C. How Type III elementary row and column operations are done: Examples:
1. Adding the first column of A to its third column: 4 3 2 1 0 13 6 1 0 1 02 7 1 0 0 1
⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦
=
Page 57 Statistics 135: Matrix Theory for Statistics
54
2. Adding the first row to the third row: 1 0 1 4 3 20 1 0 3 6 10 0 1 2 7 1
⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦
=
Exercise (Assignment):
1. Add the second row of A to its third row and show the matrix multiplication that does this.
2. Add the second column of A to its first column and show the matrix multiplication that does this.
Additional Results on Determinants: Let A be an n x n matrix
1) If matrix B is obtained from A by interchanging 2 rows (cols) of A , then B = ‐ A .
Example: Solve for E∗, A and E A∗ . Show that E
∗ A = E A∗ . What is the for of
E∗?
2) If B is obtained from A by multiplying a row (col) of A by a real no. k, then B =
k A .
Example: Solve for G , AG and GA . Show that G A = GA and
A G = AG . What is the form of G ?
3) If B is obtained from A by adding a multiple of row (col) i to row (col) h, i ≠ h, then B
= A .
Example 4.8
1) Let A = ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
314011312
and B =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
413110213
. Then A =
and B =
Page 58 Statistics 135: Matrix Theory for Statistics
55
2) If A = 3212 = 6 – 2 =4, then 2 A =
but A2 =
3) Let A = ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
− 122640101
, then A =
4.4 Diagonal Expansion Defn: Deleting any r rows and r cols from a square matrix of order n leaves a submatrix of
order (n – r). The determinant of this submatrix is a minor of order (n – r), or an (n – r) ‐ order minor.
Defn: A principal minor is a minor whose diagonal elements are coincident with the diagonal
elements of the original matrix.
A matrix, say X , can always be expressed as the sum of two matrices, one of which is a diagonal matrix, i.e.,
X = DA + where A = { }aij for i , j = 1, 2, …, n and
D is a diagonal matrix of order n.
The determinant of X can then be obtained as a polynomial of the elements of D .
Consider the matrices A = { }aij , i, j = 1, 2 and D = diag{ }dd 21, . Then DA + =
In similar fashion, it can be shown that
daaa
adaa
aada
3333231
2322221
1312111
++
+=
Considered as a polynomial in the d ’s, we can see that i) 1 is the coefficient of the product of all the d ’s.
ii) diagonal elements of A are the coefficients of the 2nd‐degree terms in the d ’s.
iii) 2nd order principal minors of A are coefficients of the 1st –degree terms in the d ’s.
iv) A is the term independent of d ’s.
Page 59 Statistics 135: Matrix Theory for Statistics
56
This method of expansion is known as expansion by diagonal elements or simply
diagonal expansion. This method of expansion is useful on many occasions because the
determinantal form DA + occurs quite often, and when A is such that many of its
principal minors are zeros, the expression DA + by this method is greatly simplified.
Example 4.9
1) Let X =
922282227 then we have X = DA + =
2) X =
642421644 =
Remark: If D is a scalar matrix, i.e., the d i ’s are equal, then
DA + =
The general diagonal expansion of a determinant of order n, DA + consists of the
sum of all possible products of the d i ’s taken r at a time for r = n, n‐1, …, 2, 1, 0, each product
being multiplied by its complementary principal minor of order (n‐r) in A . By
complementary principal minor in A is meant the principal minor having diagonal elements
other than those associated in DA + with the d ’s of the particular product concerned.
When all the d ’s are equal, the expression becomes DA + = )(0
Atrd in
i
in∑=
−
where )(Atri is the sum of the principal minors of order i of A . By definition, )(Atro =
1 and )(Atrn = A .
Page 60 Statistics 135: Matrix Theory for Statistics
57
4.5 Sums and Differences of Determinants
1) In general, BA + ≠ A B+
Example 4.10 A B+ =
2) In general, BA − ≠ A B−
Note: If A is an n x n matrix and B is an m x m matrix, A ± B is defined but
BA ± is not.
3) If A = { }aij and B = { }bij are n x n matrices that are identical for all elements except
for corresponding elements in the kth row, and if C = { }cij is an n x n matrix, then
A B+ = C , where cij = aij , except in the kth row, in which ckj =
ba kjkj + , j = 1, 2, …, n.
Proof: Exercise
READING ASSIGNMENT: Read Chapter 5: Inverse Matrices, pp. 119‐139. Do the Exercises on pp 148‐154 (for inverse matrices) and pp. 112 to 118 (for determinants)
Page 61 Statistics 135: Matrix Theory for Statistics
58
CHAPTER 4 APPENDIX A: EVALUATING DETERMINANTS USING EXCEL AND SAS We can also use Excel and SAS to simply get the determinants, without going to the trouble of using the basket rule, the cofactor expansion methods, or diagonalization. In Excel, we simply use the =MDETERM(array) function.
Example, we have A =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
163194121
.
To solve for A , we type the matrix and use the
function on another cell. By the output, A = ‐2.
In SAS, this is simply the DET() function: SAS Code: proc iml; A= {1 2 1, 4 9 1, 3 6 1}; B= Det(A); print A B; run; SAS Output: The SAS System A B 1 2 1 -2 4 9 1 3 6 1
Page 62 Statistics 135: Matrix Theory for Statistics
59
CHAPTER 4 APPENDIX B: HELPFUL MATRIX FUNCTIONS IN SAS Recently going through the IML Language Reference in SAS, I’d like to give you some of the functions that may help you in using SAS for matrix algebra. Generally, these functions will show you how to make the special matrices that we use in class, such as the identity matrix, matrix of ones, and diagonal matrices. I’ll thrown in more functions in other appendices in later chapters. 1. ABS(matrix) = it gives the absolute values of the elements of the original matrix. Example: proc iml; A={1 2 -3 4, -4 2 1 3, 3 0 0 -3, 2 0 -2 3}; Abs_A = abs(A); print A Abs_A; run; A ABS_A 1 2 -3 4 1 2 3 4 -4 2 1 3 4 2 1 3 3 0 0 -3 3 0 0 3 2 0 -2 3 2 0 2 3 2. BLOCK(matrix1 <,matrix2,…,matrix15>) = it create a matrix with submatrices arranged diagonally. Example: proc iml; a={2 2, 4 4} ; b={6 6, 8 8} ; c=block(a,b); print c; run; C 2 2 0 0 4 4 0 0 0 0 6 6 0 0 8 8
3. DIAG(argument) = if the argument is a matrix, it returns with the diagonal elements. If the argument is a vector, then it gives a diagonal matrix with the elements on the argument. Example: proc iml; a={4 3,2 1}; c=diag(a); b={1 2 3}; d=diag(b); print c d; run;
Page 63 Statistics 135: Matrix Theory for Statistics
60
C D 4 0 1 0 0 0 1 0 2 0 0 0 3
4. EXP(matrix) = calculates the exponential at each element of the matrix Example: proc iml; b={2 3 4}; a=exp(b); print a; run; A 7.3890561 20.085537 54.59815
5. I(dimension) = it gives the identity matrix of the given dimension Example: proc iml; a=I(3); print a; run; A 1 0 0 0 1 0 0 0 1
6. J(nrow <, ncol <, value > > ) = it gives a matrix with a common value Example: proc iml; b=j(3); r=j(5,2,'xyz'); k=j(4)*4; print b r k; run; B R K 1 1 1 xyz xyz 4 4 4 4 1 1 1 xyz xyz 4 4 4 4 1 1 1 xyz xyz 4 4 4 4 xyz xyz 4 4 4 4 xyz xyz
Page 64 Statistics 135: Matrix Theory for Statistics
61
7. T(matrix) = this is another function that gives the transpose of the original matrix argument. Example: proc iml; x={1 2, 3 4}; y=t(x); print x y; run; X Y 1 2 1 3 3 4 2 4
8. XMULT(matrix1, matrix2) = also performs matrix multiplication but with greater accuracy. Example: proc iml; x={1 2, 3 4}; y=t(x); z=xmult(x,y); print x y z; run; X Y Z 1 2 1 3 5 11 3 4 2 4 11 25
Page 65 Statistics 135: Matrix Theory for Statistics
58
5. INVERSE MATRICES
Division in the ordinary sense is undefined in matrix algebra. The expression
AB has no meaning when A is a matrix. We deal with inverse matrices and use
them as multipliers, e.g., A 1− B where the product of a matrix and its inverse is an identity matrix.
The concept of a matrix inverse has been established in the context of solving
simultaneous linear equations. Consider the linear system Ax = b that has a unique solution. A solution for the above system may be obtained if there exists a matrix B such that BA = I . Then the solution is x = Bb . We call B the inverse of A.
5.1 Definition of the Inverse
Defn: If for a given n x n matrix A, there is an n x n matrix, denoted by A 1− , such that
A A 1− = A 1− A = I n , then A 1− is an inverse of A with respect to matrix
multiplication. An n x n matrix A is said to be invertible if A 1− exists and
noninvertible if A does not have an inverse. Example 5.1
1) A = ⎥⎦
⎤⎢⎣
⎡4321
, A 1− = ⎥⎦
⎤⎢⎣
⎡−
−
21
23
12
2) B = ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
203120012
, B 1− =
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−
−
−
114
113
116
112
114
113
111
112
114
Page 66 Statistics 135: Matrix Theory for Statistics
59
5.2 Adjoint of a Matrix Defn: Let A
nxn = { }aij . The matrix B = { }Aij
' = { }A ji , where Aij is the cofactor of
aij , is called the adjoint matrix of A, denoted by adj A. How to get the adjoint: 1. Solve for the matrix of cofactors of the elements of A , which is
{ } ( 1)i jijcof A ijMA +
⎧ ⎫⎪ ⎪= = −⎨ ⎬⎪ ⎪⎩ ⎭
for all i and j.
2. the adjoint of A is simply the transpose of the matrix of cofactors, that is:
adj A = 'cof A⎛ ⎞⎜ ⎟⎝ ⎠
Example 5.2
1) A = ⎥⎦
⎤⎢⎣
⎡4321
, adj A =
2) B = ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
203120012
, adjB =
3) F =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
031120421
, adjF =
Page 67 Statistics 135: Matrix Theory for Statistics
60
Note: A = Aa ijn
jij∑
=1 = Aa ij
n
iij∑
=1
Aa kjn
jij∑
=1 = Aa ki 11 + Aa ki 22 + … + Aa knin = 0 ki ≠ and
Aa ikn
iij∑
=1 = Aa kj 11 + Aa kj 22 + … + Aa nknj = 0 kj ≠ .
⇒ this represents expansion about the kth row ( kth col ) replaced by its ith row ( jth
col ) and so is a “determinant of a matrix” having 2 rows (cols) the same ⇒ det is 0.
Defn: An n x n matrix A is invertible if and only if A ≠ 0 . Moreover, if n ≥ 2 and if
A 1− exists, then
A 1− = A1 adj A
Show: AA1 adj A = I n by definition of inverse
Example 5.3
1) A = ⎥⎦
⎤⎢⎣
⎡4321
, A 1− =
2) B = ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
203120012
, B 1− =
3) F =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
031120421
, F 1− =
Page 68 Statistics 135: Matrix Theory for Statistics
61
5.3 Conditions for the Existence of A 1−
i) A 1− can exist only when A is square. ii) A 1− does exist only if A ≠ 0 .
Defn: A square matrix A is said to be singular if A =0 , and nonsingular if A ≠ 0 .
Singularity is a property of square matrices only; and only nonsingular matrices
have inverses. 5.4 Properties of the Inverse
If A is a square, nonsingular matrix its inverse, A 1− , has the following properties:
1) A 1− A = A A 1− = I . Proof:
2) A 1− is unique. Proof:
3) If k is a nonzero scalar, then ( )Ak 1− = k1
A 1− .
Proof:
4) A 1− = A1
.
Proof:
5) A 1− is nonsingular. Proof:
6) ( )A 1 1− − = A.
Proof:
Page 69 Statistics 135: Matrix Theory for Statistics
62
7) ( )A' 1− = ( )A 1 '− .
Proof:
8) If A = A' , then ( )A 1 '− = A 1− . Proof:
9) If A 1− and B 1− exist, then ( )AB 1− = B 1− A 1− . Proof:
In general, if A1 , A2 , …, Ar 1− , Ar are n x n nonsingular matrices, then
( )AAAA rr 121 ... 1−
− = Ar
1− Ar1
1−− … A 1
2− A 1
1− .
10) If A and B are n x n matrices and if OAB = , then OA = or OB = or both
A and B are noninvertible. Proof: 11) If A, B and F are n x n matrices and if A is invertible, then AFAB =
implies that FB = . (cancellation property for matrix multiplication over the set of all invertible n x n matrices)
Proof: 5.5 Some Special Cases 1) Inverse of order 2
Let A = ⎥⎦
⎤⎢⎣
⎡byxa
. Then A 1− = xyab −
1⎥⎦
⎤⎢⎣
⎡−
−ayxb for 0≠− xyab .
→ if determinant of A
x22 is nonzero, interchange the diagonal elements, change the
sign of the off‐diagonal elements, and divide by xyab − . Diagonal Matrices
If D is a diagonal matrix, then { }( )dD ii1−= { }
d iiD 1 , 0≠d ii . i∀
Special case: identity matrices
Page 70 Statistics 135: Matrix Theory for Statistics
63
Example 5.4
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−
800050003 1
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
8100
0510
0031
2) I n and J n matrices
I n1− = I n
J n = 0 , and so have no inverses, but for 0≠a and 0≠+ nba ,
( )JbIa nn+−1 =
a1 ⎟
⎠⎞
⎜⎝⎛
+− JI nn nba
b.
3) Orthogonal Matrices
If P is orthogonal, P 0≠ , i.e., P 1− exists and P has the following
properties: i) P is a square matrix;
ii) P = 1± ;
iii) its rows are orthonormal, i.e., PP ’ = I ;
iv) its columns are orthonormal, i.e., IPP =' . iii) and iv) are necessary for the other 2 to hold. Thus, another definition for an
orthogonal matrix P is ∋ P ’ = P 1− . 4) Idempotent Matrices
The only idempotent matrices that are nonsingular are identity matrices. But idempotent matrices can involve nonsingular matrices.
EXAMPLE 5.5
P ( )QP 1− Q is idempotent when ( )QP 1− exists.
Page 71 Statistics 135: Matrix Theory for Statistics
64
CHAPTER 5 APPENDIX: MATRIX INVERSES USING SAS AND EXCEL We can use SAS and Excel to easily get inverses without using the adjoint formula, which could be very tedious to work on. In Excel, to solve the inverse of a matrix, we use the =MINVERSE(array) function. This is an “array function” in Excel, you may the procedures for using this is similar for the =MMULT(array) function used in the Chapter II Appendix. Example:
Let’s say we have A = ⎥⎦
⎤⎢⎣
⎡4321
. We solve for
A 1− by using the function. After inputting the matrix, we input the function and argument on another cell. Again, we need to highlight the space needed to be occupied by the inverse matrix. Then we press F2, followed by Ctrl+Shift+Enter. And the result will be: In SAS, it is simply by using the INV(matrix) function. See the syntax and output below. SAS Code: Output: proc iml; A={1 2, 3 4}; Inv_A = inv(A); print A Inv_A; run;
A INV_A 1 2 -2 1 3 4 1.5 -0.5
Page 72 Statistics 135: Matrix Theory for Statistics
65
6. LINEAR SYSTEMS 6.1 Definitions Defn: The equation bxaxaxa nn =+++ ...2211 , which expresses b in terms of the
variables xxx n,...,, 21 and the constants aaa n,...,, 21 , is called a linear equation
in xxx n,...,, 21 . A set of one or more linear equations in the same variables is called a linear
system. Example 6.1 bxaxaxa nn 11212111 ... =+++ bxaxaxa nn 22222121 ... =+++
. ( )1 . . bxaxaxa mnmnmm =+++ ...2211 is a system of m linear equations in n unknowns.
In matrix form, ( )1 can be written as Ax = b where
Amxn
=
⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
aaa
aaa
aaa
mnmm
n
n
...............
...
...
21
22221
11211
: coefficient matrix of the system,
Page 73 Statistics 135: Matrix Theory for Statistics
66
xnx1 =
1
2
.
.
.
n
xx
x
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
: vector of unknowns; bmx1
=
⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
b
b
b
m
.
.
.2
1
: vector of constants (known values).
Defn: A set of values for the variables that satisfy each equation in the system defined in ( )1 , is called a solution of that linear system.
Note: To solve a linear system means to find all its solutions. Remarks:
1) If the linear system in ( )1 has no solution, it is said to be inconsistent; otherwise, it is called consistent.
2) If b = o , i.e., 0...21 ==== bbb m , then ( )1 is called a homogeneous system.
3) The solution ox = to a homogeneous system is called a trivial solution. A solution to a homogeneous system in which not all of xxx n,...,, 21 are zero is called a nontrivial solution.
6.2 Gaussian Elimination and Gauss‐Jordan Reduction
Defn: An m x n matrix A is said to be in reduced row (col) echelon form if it satisfies the following properties:
i) All rows (cols) consisting entirely of zeros, if any, are at the bottom (right) of the
matrix. ii) The first nonzero entry in each row (col) that does not consist entirely of zeros is a 1,
called the leading entry of its row (col). iii) If rows i and (i+1) (cols j and (j+1)) are 2 successive rows (cols) that do not consist
entirely of zeros, then the leading entry of row (col) i+1 (j+1) is to the right of (located below) the leading entry of row (col) i.
iv) If a col (row) contains a leading entry of some row (col), then all other entries in that col (row) are zero.
If A satisfies i), ii), and iii), it is said to be in row (col) echelon form.
Page 74 Statistics 135: Matrix Theory for Statistics
67
Example 6.2 Matrices in row echelon form:
A = ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
310001010002211
, B =
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
−
−
000000000000271000
843010422051
Matrices in reduced row echelon form:
A=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
000003210010021
, B =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
000000100000011
, C =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
100010001
Matrices not in reduced row echelon form:
A = ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−
210052204301
, B =
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡−
0000221052104301
Theorem: Every matrix can be put into row (col) echelon form, or into reduced row (col) echelon
form, by means of elementary row (col) operations.
Page 75 Statistics 135: Matrix Theory for Statistics
68
Defn: An m x n matrix A is said to be row (col) equivalent to an m x n matrix B if B can be
obtained by applying a finite sequence of elementary row (col) operations to A .
Notation: A ∼ B Theorem: Row equivalence is an equivalence relation, i.e.,
a) every matrix is row equivalent to itself, i.e., A ∼ A .
b) If A ∼ B , then B ∼ A .
c) If A ∼ B and B ∼ C , then A ∼ C .
Theorem: Every nonzero m x n matrix A is row (col) equivalent to a matrix in [reduced] row (col) echelon form. Example 6.3
Let A =
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
−−−−−−
−−−−−
304212273134201
1453213111
a) Find a matrix B in row echelon form that is row equivalent to A .
b) Find a matrix F in reduced row echelon form that is row equivalent to A . Note: There is only one matrix in reduced row echelon form that is row equivalent to a given matrix. (unique)
Let the augmented matrix [ ]bA represent the linear system Ax b = .
Theorem: Let b =Ax and g =Fx be 2 linear systems, each of m equations in n unknowns.
If the augmented matrices [ ]bA and [ ]gF are row equivalent, then the linear
systems are equivalent, i.e., they have exactly the same solutions. Example 6.4
32423
=−−=+
yxyx and
62434
32
=−
−=+
yx
yx are equivalent systems.
Page 76 Statistics 135: Matrix Theory for Statistics
69
Corollary: If A and B are row equivalent m x n matrices, then the homogeneous systems
o =Ax and o =Bx are equivalent.
Solving the Linear System b =Ax : 1) Using Gaussian Elimination (or Backward Substitution)
Obtain a matrix [ ]g F in row echelon form that is row equivalent to [ ]b A .
([ ]g F represents the linear system g=Fx , and the set of solutions to this system
gives precisely the set of solutions to b =Ax .) Obtain the solution/s using backward substitution.
2) Using Gauss‐Jordan Reduction
Obtain a matrix [ ]g F in reduced row echelon form that is row equivalent to
[ ]b A . The solution will be based on the reduced row echelon form of the augmented matrix. Remarks:
1) If [ ]b A has a zero row, then the system has infinite solutions.
2) If [ ]b A has identical rows, then [ ]b A is row equivalent to [ ]g F with zero row.
3) If [ ]b A has a row which is a linear combination of the other rows, then [ ]b A is row
equivalent to [ ]g F with a zero row.
4) If the system of m equations in n unknowns has the condition m < n, then the system always has an infinite solution.
Example 6.5 1) Solve: x1 + 3x2 – 2x3 = 1 2x1 + 5x2 – 3x3 = 2 ‐3x1 + 2x2 – 4x3 = 3 2) Solve: x1 + x2 – x3 – 3x4 – x5 = 0 2x1 + 3x2 – 5x3 – 4x4 + x5 = 0 x1 + 2x3 – 4x4 – 3x5 = 0 ‐x1 – 3x2 + 7x3 + 2x4 – 2x5 = 0 x1 + 2x2 – 4x3 + 3x5 = 0
Page 77 Statistics 135: Matrix Theory for Statistics
70
3) Solve: x + 2y – z =0 x + 3y + 2z = 0 3x + 8y + 3z = 0 6.3 Homogeneous Systems Defn: A linear system is homogeneous if the constants on the right are all equal to zero, i.e.,
o =Ax is a homogeneous system.
Defn: The solution o =x to a homogeneous system is called a trivial solution. A solution to a homogeneous system in which not all of x1 , x2 , …, xn are zero is called a nontrivial solution.
Defn: The row rank of a matrix is the number of non‐zero rows in any of its equivalent row
echelon forms. (denote it by rk( )A ) Some Properties of the Rank of a Matrix
1) rk( )A is a positive integer, except that rk( )O = 0.
2) rk ⎟⎠⎞
⎜⎝⎛ A
nxp p ≤ and n ≤ ; the rank of a matrix equals or is less than the smaller of its
number of rows or columns.
3) rk ⎟⎠⎞⎜
⎝⎛ A
nxn n ≤ ; a square matrix has rank not exceeding its order.
4) When rk ⎟⎠⎞⎜
⎝⎛ A
nxn n = , then A is nonsingular; i.e., A 1−
exists.
5) When rk ⎟⎠⎞
⎜⎝⎛ A
pxq qp <= , A is said to be of full row rank.
6) When rk ⎟⎠⎞
⎜⎝⎛ A
pxq pq <= , A is said to be of full column rank.
7) When rk ⎟⎠⎞⎜
⎝⎛ A
nxn n = , A is said to be of full rank.
8) rk( )AB ≤ smaller of rk( )A and rk( )B .
9) If M is idempotent of order n , rk( )MI n − = −n rk( )M .
Page 78 Statistics 135: Matrix Theory for Statistics
71
Solution Sets of Homogeneous Linear Systems Let n be the number of variables in a homogeneous system and r be the row rank of the coefficient matrix. 1) If nr = , the system has only the trivial solution. 2) If nr < , we may solve the r nonzero equations in any equivalent row echelon system
for the leading variables in terms of the ( )rn − remaining variables. These ( )rn − variables then become arbitrary constants for the solution set.
Example 6.6 1) Solve: x – 2y + 2z = 0
4x – 7y + 3z = 0 2x – y + 2z = 0
2) Solve: x + 4y – 3z = 0 4x + 16y – 12z = 0
‐3x – 12y + 9z = 0
6.4 General Linear System
Defn: A linear system is said to be consistent if it has at least one solution, otherwise, it is inconsistent.
Remark: Any homogeneous system is consistent.
CONSISTENCY TEST To test an m x n system (m equations, n variables) for consistency, bring the augmented
matrix to row echelon form. If this matrix has a row in which the first n entries equal zero and the (n+1)th entry is nonzero, then the system is inconsistent. If the row echelon matrix has no such row, then the system is consistent.
Corollary: A linear system is consistent if and only if the row rank of the coefficient
matrix equals the row rank of the augmented matrix.
Consider a linear system of 2 equations in the unknowns x1and x2 : cxaxa 12211 =+ l1 → cxbxb 22211 =+ l2 →
Page 79 Statistics 135: Matrix Theory for Statistics
72
Example 6.7 1) Solve: x + 2y = 1 2x – y = 12 3x + 2y = 12 2) Solve: x1 + x2 = 3 x1 + (a2 – 8)x2 = a 3) Solve: x1 + x2 – x3 = 2 x1 + 2x2 + x3 = 3 x1 + x2 + (a2 – 5)x3 = a 6.5 Determinants and Linear Systems
The determinant provides a condition for the existence of a unique solution to a square linear system.
Theorem: A linear system with a square coefficient matrix A , has a unique solution if and only
if 0≠A .
Proof:
Remark: If 0=A , then b=Ax may or may not have a solution. When it does, the
solution is not unique. Corollary: A homogeneous linear system with a square coefficient matrix has a non‐trivial
solution if and only if 0=A .
Note: Let A be n x n. 0=A ⇒ rk( )A n <
A 0 ≠ ⇒ rk( ) nA = ⇒ A has full rank.
Page 80 Statistics 135: Matrix Theory for Statistics
73
CHAPTER 6 APPENDIX A: USING SAS TO ANSWER LINEAR ALGEBRA QUESTIONS Surprisingly, SAS has functions that can answer questions of systems of linear equations with unknowns. We will cite some helpful function in PROC IML. 1. ECHELON(matrix) = this function reduces a matrix to its row‐echelon form. Example: x1 + 3x2 – 2x3 = 1 2x1 + 5x2 – 3x3 = 2 => => ‐3x1 + 2x2 – 4x3 = 3 SAS Code: proc iml; x={1 3 -2 1, 2 5 -3 2, -3 2 -4 3}; z=echelon(x); print x z; run; X Z 1 3 -2 1 1 0 0 -5 2 5 -3 2 0 1 0 6 -3 2 -4 3 0 0 1 6 This tells us that x1=‐5, x2=6, and x3=6. 2. HOMOGEN(matrix) function = gives us only some of the nontrivial solutions for a homogenous linear system of equations with the matrix argument equal to the matrix of coefficient. [Note that the vector of constants in a homogenous system is the null vector.] The number of nontrivial observations it will give is equal to the rank of the matrix minus one (since the other solution is the null vector). Read SAS help further for more details. Example: five unknown values proc iml; a={22 10 2 3 7, 14 7 10 0 8, -1 13 -1 -11 3, -3 -2 13 -2 4, 9 8 1 -2 4, 9 1 -7 5 -1, 2 -6 6 5 1, 4 5 0 -2 2}; x=homogen(a); print x; run;
1
2
3
1 3 2 12 5 3 23 2 4 3
xxx
− ⎡ ⎤⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥− =⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥− −⎣ ⎦ ⎣ ⎦⎣ ⎦
1 3 2 12 5 3 23 2 4 3
⎡ ⎤−⎢ ⎥−⎢ ⎥⎢ ⎥− −⎣ ⎦
Page 81 Statistics 135: Matrix Theory for Statistics
74
X -0.419095 0 0.4405091 0.4185481 -0.052005 0.3487901 0.6760591 0.244153 0.4129773 -0.802217
3. SOLVE(A,B) = solves for the system of linear equations Ax=b, where A is square and nonsingular. The answer that the function will give is equal to x = A‐1b. Example: proc iml; a={1 3 -2, 2 5 -3 , -3 2 -4}; b={1,2,3}; x=solve(a,b); print a b x ; run; A B X 1 3 -2 1 -5 2 5 -3 2 6 -3 2 -4 3 6 The aethereal question that SAS programmers have to address: How to solve for the rank of a matrix? They don’t have a direct function for solving the rank of the matrix. They do have a RANK(matrix) function, but what it does is it solves for the ranking order of the elements of the matrix. What they have is a resolve for solving ranks of matrices, given by the formula below: rank=round(trace(ginv(a)*a)); Where “a” is the matrix argument. For example: proc iml; a={22 10 2 3 7, 14 7 10 0 8, -1 13 -1 -11 3, -3 -2 13 -2 4, 9 8 1 -2 4, 9 1 -7 5 -1, 2 -6 6 5 1, 4 5 0 -2 2}; rank=round(trace(ginv(a)*a)); print rank; run;
Rank was equal to 3.
Page 82 Statistics 135: Matrix Theory for Statistics
74
7. VECTOR SPACES 7.1 Definition of a Vector Space Definition: A (real) vector space is a nonempty set V of elements in which 2 operations ⊕ and Θ are defined with the following properties:
a. If α , V∈β then V∈⊕ βα . 1. ,αββα ⊕=⊕ V∈∀ βα , 2. ( ) ( ) ,γβαγβα ⊕⊕=⊕⊕ V∈∀ γβα ,, 3. There exists a unique element ∋∈Vθ ,ααθθα =⊕=⊕ V∈∀α 4. For each V∈α , there exists a unique ∋∈Vβ
θαββα =⊕=⊕ (we denote β by α− , called the negative of α )
b. If V∈α and c is any real number then Vc ∈αΘ . 5. ( ) ( ) ( )βΘαΘβαΘ ccc ⊕=⊕ ℜ∈∈∀ cV ,,βα 6. ( ) ( ) ( )αΘαΘαΘ dcdc ⊕=⊕ V∈∀α , ℜ∈dc, 7. ( ) ( ) αΘαΘΘ cddc = V∈∀α , ℜ∈dc, 8. ααΘ =1 V∈∀α
The elements of V are called vectors. The operation ⊕ is called vector addition and the operation Θ is called scalar multiplication. The vector θ is called the zero vector. Remark: To verify that a given set V is a vector space, it must satisfy all the properties of the above definition. Check (a) and (b) first, for, if either of these fails, V is not a vector space. Example 7.1.1 Examples of Vector Spaces 1. Let ℜ=V : set of all real numbers ⊕ addition of real numbers; Θ multiplication of real numbers then Θ,,⊕ℜ is a vector space. 2. Let nV ℜ= : set of all n x1 vectors with real components ⊕ vector addition; Θ multiplication of a vector by a real number then Θ,,⊕ℜn is a vector space.
Page 83 Statistics 135: Matrix Theory for Statistics
75
3. Let IV = : set of all integers ⊕ scalar addition; Θ scalar multiplication then Θ,,⊕I is not a vector space. 4. Let MV = : set of all m x n matrices ⊕ matrix addition; Θ multiplication of a matrix by a real number then Θ,,⊕M is a vector space.
5. Let ⎭⎬⎫
⎩⎨⎧
=∋⎥⎦
⎤⎢⎣
⎡== 12
2
1 x21x
xx
xxV |
⊕ vector addition; Θ multiplication of a vector by a constant then Θ,,⊕V is a vector space.
6. Let ⎭⎬⎫
⎩⎨⎧
+=∋⎥⎦
⎤⎢⎣
⎡== 1x
21x
xx
xxV 122
1|
⊕ vector addition; Θ multiplication of a vector by a constant
then Θ,,⊕V is not a vector space since .V00
∉⎥⎦
⎤⎢⎣
⎡
7. Let =V set of all real numbers ⊕ ordinary subtraction, i.e., βαβα −=⊕ ; Θ ordinary multiplication, i.e., ααΘ cc = Is V a vector space? 8. Let nV ℘= : set of all polynomials of degree n≤ , including the zero polynomial.
Recall ( ) →++++= −−
n1n1n
1n
0 atatatatp ... polynomial of degree less than or equal to n →++++ − 0t0t0t0 1nn ... zero polynomial has no degree ⊕ addition of 2 polynomials Θ multiplication by a scalar Thus, Θ,,⊕℘n is a vector space.
9. Let =V set of all ordered triples of real numbers ( )zyx ,, . ⊕ is ( ) ( ) ( )'''''' ,,,,,, zzyyxzyxzyx ++=⊕∋ ; Θ is ( ) ( )czcycxzyxc ,,,, =∋ Θ Verify that properties (1), (3), (4), and (6) of the definition of a vector space fail to
hold. Thus, Θ,,⊕V is not a vector space.
Page 84 Statistics 135: Matrix Theory for Statistics
76
10. Let { }ℜ∈≠+++= dcb0adcxbxaxV 23 ,,,| : set of all polynomials of degree 3 only ⊕ addition of 2 polynomials Θ multiplication by a scalar V1tt2t3 23 ∈+++=α
Consider V6t2t4t3 23 ∈−−+−=β V5tt6 2 ∉−−=⊕ βα Thus, V is not a vector space.
Some Consequences of the Properties or Axioms of Vector Spaces
1. The vector θ (the identity) in V is unique. 2. The inverse of any vector α in V is unique. 3. θαΘ =0 for any V∈α . 4. θαΘ =c for any ℜ∈c . 5. If θαΘ =c , then either 0c = or θα = . 6. ( ) ααΘ −=− 1 for any V∈α .
7.2 Subspaces Definition: Let V be a vector space and VW ⊂ , i.e., W is a non‐empty subset of V . Then W is a subspace of V if and only if W is a vector space with respect to the operations in V . Example 7.2.1
1. 2V ℜ= is a vector space.
( )⎭⎬⎫
⎩⎨⎧ =ℜ∈= 12121 x
21xxxxW ,|,
W is a vector space and VW ⊂ then W is a subspace of V .
2. Let 2V ℘= : set of all polynomials of degree 2≤ , including the zero polynomial. 2℘ is a vector space.
a) Let 1W ℘= : set of all polynomials with degree 1≤ , including the zero polynomial.
1℘ is a vector space and 21 ℘⊂℘ . Thus, 1℘ is a subspace of 2℘ .
Page 85 Statistics 135: Matrix Theory for Statistics
77
b) Let =W : set of all polynomials of degree 2 only.
2W ℘⊂ but W is not a vector space since if 1t3t2 2 ++=α , 2tt2 2 ++−=β W∈α , W∈β but W∉⊕ βα . Thus, W is not a subspace of 2℘ .
Theorem: Let V be a vector space with operations ⊕ and Θ , and let W be a nonempty subset of V . Then W is a subspace of V if and only if W is closed under ⊕ and Θ . Example 7.2.2 1. 2V ℜ= , ( ){ } 2
2121 0bxaxxxW ℜ⊂=+= |, Verify that W is a subspace.
2. Let W : set of all vectors in 3ℜ of the form ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
+ baba
, ℜ∈ba, .
Is W a subspace of 3ℜ ?
Remarks: Every vector space V has 2 trivial subspaces. i. V , the vector space V itself since VV ⊂ . ii. subspace { }θ consisting only of the identity θ .
Theorem: The set of all solutions of a homogeneous linear system with an m x n coefficient matrix A is a subspace of nℜ . This space is called the null space of A . Proof: Example 7.2.3 1. Let W : set containing the solution of the homogeneous system
0yx0z4y5x3
0z2y2x
=−=+−=+−
Then W is a subspace of 3ℜ . Solving this system shows that W consists of all triples of the form ( )zz2z2 ,, .
Page 86 Statistics 135: Matrix Theory for Statistics
78
2. Let W : set containing the solution of the nonhomogeneous system
7yx21y2x
=−=+
W is not a vector space since ⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡00
yx
is not a solution to the system, there exists no
identity element θ . 7.3 Linear Combination of Vectors If V is a vector space, it has infinitely many vectors in it. But there is a finite number of vectors in V that completely describesV . Definition: Let { }n21 vvvS ,...,,= be a set of vectors in a vector spaceV . A vector v in V is
called a linear combination of the vectors in S if ∑=
=+++=n
1iiinn2211 vavavavav ... for at
least one nonzero ℜ∈ia . Example 7.3.1 1. Consider 3ℜ .
Let⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
121
x1 , ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
201
x2 , ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
011
x3 , 3ix ℜ∈ , 321i ,,= .
Define 321 xx2x512
x −+=⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡= .
Thus, x is a linear combination of 1x , 2x and 3x .
2. Let⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
111
x1 , ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
011
x2 , ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
001
x3 . Then 321 xx4x331
0x +−=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−= is a linear
combination of the ix ’s.
3. Express ⎥⎦
⎤⎢⎣
⎡− 37 as a linear combination of ⎥
⎦
⎤⎢⎣
⎡11 and ⎥
⎦
⎤⎢⎣
⎡− 11
.
Page 87 Statistics 135: Matrix Theory for Statistics
79
4. Express ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
111 as a linear combination of
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
− 221
, ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
− 312 and 3
15
2ℜ∈
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−− .
Theorem: If V is a vector space and W is a set of all linear combinations of the vectors 1x , 2x ,…, nx in V , then W is a subspace in V . Definition: Let { }nvvvS ,...,, 21= be a set of vectors in the vector space V . The set S spans V , or V is spanned by S , if every vector in V is a linear combination of the vectors in S . Example 7.3.2
1. Consider the vector space 3ℜ . Let ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
011
201
121
321 xxx ,, . Does { }321 xxx ,, span
3ℜ ?
2. Consider the vector space 3ℜ . Let ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
011
111
21 xx , . Does { }21 xx , span 3ℜ ?
3. Consider the vector space 2℘ : set of polynomials of degree < 2 including the zero polynomial. Let 122
1 ++= ttα and 222 += tα . Does { }21 αα , span 2℘ ?
7.4 LINEAR DEPENDENCE AND INDEPENDENCE Definition: Let { }nvvvS ,...,, 21= be a set of distinct vectors in a vector space V . Then S is said to be linearly dependent if there exists constants naaa ,...,, 21 not all zero such that
.01
=∑=
n
iii va (*)
Otherwise, S is linearly independent. That is, S is linearly independent if (*) holds only when iai ∀= 0 . Example 7.4.1
1. ⎭⎬⎫
⎩⎨⎧
⎥⎦
⎤⎢⎣
⎡−⎥⎦
⎤⎢⎣
⎡− 8
241
, is linearly dependent in 2ℜ since .⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡−+⎥
⎦
⎤⎢⎣
⎡− 0
082
141
2
Page 88 Statistics 135: Matrix Theory for Statistics
80
2. Consider 4ℜ . Let ⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
3111
2110
2101
321 xxx ,, . To find out if { }321 xxx ,, is linearly
dependent, we form 0332211 =++ xaxaxa and solve for 21 aa , and 3a .
3. Let ⎪⎭
⎪⎬
⎫
⎪⎩
⎪⎨
⎧ℜ∈
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=ℜ zyx
zyx
,,:3 and ⎪⎭
⎪⎬
⎫
⎪⎩
⎪⎨
⎧
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−=
120
102
111
,,S . Is S linearly independent?
Remarks: 1. The set { }0=S is linearly dependent. Thus, if S is any set of vectors that contain 0,
then S must be linearly dependent. 2. A set consisting of a single nonzero vector is linearly independent. 3. The set S is linearly dependent if and only if S contains at least one vector that is a
linear combination of all other vectors in S . 4. Let A be an mxn matrix in reduced row echelon form. The nonzero rows of A,
viewed as vectors in nℜ , forms linearly independent set of vectors. 5. In nℜ , any set containing more than n vectors is linearly dependent. Example 7.4.2 The set ( ) ( ) ( )},,,,,{ 132101 −=S is linearly dependent in 2ℜ . In fact the linear system
0203
32
321
=−=++
ccccc
has the nontrivial solution .,, 217 321 −=−== ccc Theorem: Let 1S and 2S be finite subsets of a vector space and let 21 SS ⊂ . Then
a. If 1S is linearly dependent, so is 2S . b. If 2S is linearly independent, so is 1S .
Remarks: 1. Suppose p linearly dependent vectors of order p are used as columns of a matrix,
say A. Then det(A)=0 ⇒ A is singular. 2. Linear dependence Test in nℜ : For the set { }mxxxS ,..., 21= in nℜ , let A be the matrix
whose ith row coincides with ix . If row rank of A is r, then S is linearly independent if and only if r=m.
Page 89 Statistics 135: Matrix Theory for Statistics
81
Example 7.4.3
Determine if the set ⎪⎭
⎪⎬
⎫
⎪⎩
⎪⎨
⎧
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−=
342
101
121
,,S is linearly independent in 3ℜ .
3. The number of linearly independent rows of a matrix is the same as the number of
linearly independent columns. 4. When p vectors of order p are linearly independent, any other vector of order p can
be expressed as a linear combination of these p vectors. Definition: A set of vectors { }nvvvS ,..., 21= in a vector space V is called a basis for V if
a. S spans V , and b. S is linearly independent.
Remarks: 1. A basis is a spanning set with no algebraic redundancies. 2. If S is a basis for V then every vector in S must lie in V since S is contained in
its own span. Example 7.4.4
1. Let 3ℜ=V and ⎪⎭
⎪⎬
⎫
⎪⎩
⎪⎨
⎧
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
100
010
001
,,S . Then S is a basis for 3ℜ , called the natural basis
for 3ℜ . 2. Let 2ℜ=V . The following sets are bases for 2ℜ :
⎭⎬⎫
⎩⎨⎧
⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡
⎭⎬⎫
⎩⎨⎧
⎥⎦
⎤⎢⎣
⎡−⎥⎦
⎤⎢⎣
⎡
⎭⎬⎫
⎩⎨⎧
⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡32
14
11
11
10
11
,,,,,
3. Let W be a subspace of 3ℜ where ⎪⎭
⎪⎬
⎫
⎪⎩
⎪⎨
⎧ℜ∈
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡+= bab
baa
W ,: . Find a basis for W .
4. Find a basis for the solution space V of the homogeneous system
Page 90 Statistics 135: Matrix Theory for Statistics
82
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
00000
2523226053122111303213021
5
4
3
2
1
xxxxx
5. The set ⎭⎬⎫
⎩⎨⎧
⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡11
01
10
,, does not form a basis for the vector space 2ℜ because the set is
not linearly independent, although the set does span the vector space. Results: 1. For a given vector space, the basis is not unique. 2. If { }nvvvS ,..., 21= is a basis for a vector space V , then every vector Vv∈ can be
uniquely expressed as a linear combination of the vectors in S . 3. If { }nvvvS ,..., 21= is a set of nonzero vectors in S which spans V , then S contains a
basis for V . Example 7.4.5
Let 3ℜ=V and { }521 xxxS ,...,,= where ,,,⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
211
110
101
321 xxx ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−
−=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
211
121
54 xx , . S
spans 3ℜ and linearly dependent. To find a subset of S that is a basis for 3ℜ :
4. { }nvvvS ,..., 21= is a basis for a vector space V and { }rwwwT ,...,, 21= is a set of linearly independent vectors in V then r<n. Note: For a set of linearly independent vectors in V to be a basis for V , S must contain the maximum number of linearly independent vectors.
5. { }nvvvS ,..., 21= and { }rwwwT ,...,, 21= are bases for a vector space V , then n=r. Definition: The dimension of a nonzero vector space V , denoted as dim(V ), is the number of nonzero vectors in a basis for V . The dimension of { }0 is zero. Example 7.4.6 1. dim ( )n℘ =n+1 but the vector space ℘ of all polynomials is an infinite‐dimensional
vector space. e.g. The set { }12 ,, ttS = is a basis for 2℘ , so dim ( )2℘ =3.
2. dim ( )nℜ =n.
Page 91 Statistics 135: Matrix Theory for Statistics
83
Results: 1. Suppose V is an n‐dimensional vector space, then (i) any set of n+1 vectors in V is
necessarily linearly dependent, and also (ii) a set if n‐1 vectors cannot span V . 2. Suppose V is an n‐dimensional vector space and let { }mvvvS ,..., 21= be a set of m
vectors in V . i. If m>n, then S must be linearly dependent. ii. If m<n, then S cannot span V .
3. If W is nonzero subspace of a finite‐dimensional vector space V , then dim ( )W <dim ( )V .
4. If W is a subspace of a finite dimensional vector space V and dim ( )W =dim ( )V , then VW = .
5. If S is a linearly independent set of vectors in a finite‐dimensional vector space V , then there is a basis T for V which S .
Example 7.4.7
Find a basis for 3ℜ that contains ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=101
x .
6. Let V be an n‐dimensional vector space and { }nvvvS ,..., 21= be a set of n vectors in V .
i. If S is a linearly independent set of vectors in V , then S is a basis for V . ii. If S spans V , then S is a basis for V .
7. Corollary to 6: Any linearly independent set containing n vectors in nℜ is a basis for nℜ .
8. The rank of a matrix is defined as the largest number of linearly independent rows (cols) of the matrix.
Page 92 Statistics 135: Matrix Theory for Statistics
85
8. EIGENVALUES AND EIGENVECTORS 8.1 Definitions Defn: Let A be an n × n matrix. A scalar λ is an eigenvalue of A if ∃ a nonzero vector x
∈ ℜn ∋ Ax = λx. Any x ≠ 0 satisfying the above equation is called an eigenvector of A corresponding to the eigenvalue λ.
Remarks: 1. The equation Ax = λx holds if and only if (A ‐ λI)x = 0, a homogeneous linear
system, which we will assume to have a nontrivial solution. 2. In addition, the homogeneous system in (1) will have a nontrivial solution if and
only if |A ‐ λI| = 0. 3. The eigenvalues λ1, λ2, …, λn of A are the real roots of the characteristic
polynomial (of degree n) |A ‐ λI| = 0. The roots are sometimes called latent/ proper/ characteristic roots.
4. Associated with each λi is a vector xi ∋ Axi = λixi for i = 1, 2, …, n, and these vectors are called latent/ proper/ characteristic vectors or eigenvectors.
Example 8.1: Determine the eigenvalues of the following matrices
1. A = ⎥⎦
⎤⎢⎣
⎡0223
2. B = ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−− 327112022
3. C = ⎥⎦
⎤⎢⎣
⎡2003
4. D = ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−
223031001
8.2 Properties of Eigenvalues 1. Eigenvalues of Powers of a Matrix
If λ is an eigenvalue of A, then λk is an eigenvalue of Ak, where k is positive if A is singular and k is positive or negative if A is nonsingular.
When A is nonsingular with eigenvalue λ, the inverse A‐1 has 1/λ as an eigenvalue, where λ ≠ 0.
Page 93 Statistics 135: Matrix Theory for Statistics
86
2. Eigenvalues of a Scalar‐by‐Matrix Product If λ is an eigenvalue of A, with associated eigenvector x, then cλ is an eigenvalue of cA, with associated eigenvector cx. When A has an eigenvalue λ, then (A + cI) for a scalar c has an eigenvalue λ + c. In addition, when (A + cI) is invertible, then (A + cI)‐1 will have (λ + c)‐1 as eigenvalue.
3. Eigenvalues of Polynomials When A has an eigenvalue λ, then the polynomial in A, say f (A), has an eigenvalue f (λ). Example 8.2: Derive the eigenvalue and eigenvector of f (A) = A3 + 17A2 + 5A +3I.
4. The Sum and Product of Eigenvalues
If A has eigenvalues λ1, λ2, …, λn, then
tr (A) = ∑=
n
ii
1λ and |A| = ∏
=
n
ii
1
λ .
Results: 1. A is singular if and only if 0 is an eigenvalue of A. 2. The characteristic polynomials of A and A’ are identical, so A and A’ have the
same eigenvalues. However, their eigenvectors are not identical. 8.3 (Steps in) Calculating Eigenvectors 1. Form the characteristic equation |An ‐ λIn| = 0. 2. Find all roots λ1, λ2, …, λn of the characteristic equation. 3. For each λk found in (2), solve the homogeneous linear system (A ‐ λkI)x = 0. This
system has n – rk (A ‐ λkI) linearly independent solutions. A. Simple Roots Whenever λk is a simple root (i.e., it is not a multiple root), rk (A ‐ λkI) = n – 1 and hence, there is only one linearly independent eigenvector associated with λk. Example 8.3: Find the eigenvectors of the matrices in Example 8.1. B. Multiple Roots Whenever λk is a multiple root of a characteristic equation, the number of times it is a solution mk is called its multiplicity. Thus, we formulate An as having distinctly different
eigenvalues λ1, λ2, …, λs with λk having multiplicity mk for k = 1, 2, …,s and .1
nms
kk =∑
=
Page 94 Statistics 135: Matrix Theory for Statistics
87
Example 8.4: Find the characteristic polynomial and the eigenvalues and their associated eigenvectors of the following matrices.
1. A = ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−
−−−
011121221
2. B = ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
540032210
8.4 DIAGONALIZATION Theorem : If x1, x2, …, xk are eigenvectors of a matrix A for distinct eigenvalues λ1, λ2,
…, λk, respectively, then the set { x1, x2, …, xk} is LIN. Pf : Assign. ( Hint: Use Mathematical Induction) Corollary : If an nxn matrix A has n distinct eigenvalues, then the eigenvectors of A form a basis of Rn. Definition : Let A be an nxn matrix having eigenvalues λ1, λ2, …, λn , not necessarily
distinct, and let x1, x2, …, xn be the corresponding eigenvectors. Let P be the matrix such that P = [x1 x2 … xn]. A is said to be diagonable or diagonalizable if P is nonsingular, i.e. P‐1 exists, and P‐1AP = diag { λ1, λ2, …, λn} = D.
Results: 1. A matrix is digonalizable if all the roots of its characteristic polynomial are real
and distinct.
Proof: Let λ1, λ2, …, λn be the roots of (nxn) A. By assumption, the λiʹs are real and distinct eigenvalues →{ x1, x2, …, xn} is LIN (from the first theorem of this handout) where xi is the eigenvector corresponding to λi , i = 1,2, …, n → P = [x1 x2 … xn] has rank n, thus it is nonsingular and hence, invertible. Now, PD = AP ( why?) → P‐1PD = P‐1AP → D = P‐1AP where D = diag { λ1, λ2, …, λn}
→ A is diagonalizable by definition.
Page 95 Statistics 135: Matrix Theory for Statistics
88
2. From the definition, if A is diagonalizable, then A = PDP‐1. Also,
Ak = PDkP‐1 where k is an integer; k can be negative if A is nonsingular. A‐1 = PD‐1P‐1 if A is nonsingular.
3. If all the roots of the characteristic polynomial of A are real and not all distinct,
then A may or may not be diagonalizable. 4. If the roots of the characteristic polynomial of A are real, then A can be
diagonalized if, for each eigenvalue λk of multiplicity mk, we can find mk LIN eigenvectors. The solution space of the system ( A ‐ λkI) x = 0 has dimension mk.
5. If λk is an eigenvalue of A with multiplicity mk, then we can never find more than
mk LIN eigenvectors associated with λk. Remark : A matrix may fail to be diagonalizable because not all the roots of its
characteristic polynomial are real numbers, or because its eigenvectors do not form a basis for Rn.
Definition : Let A and A* be nxn matrices. Then A is similar to A* if A* = P‐1AP for
some invertible matrix P. Theorem : Let A be an nxn matrix. Then A is similar to a diagonal matrix D if and
only if Rn has a basis consisting of eigenvectors of A. Moreover, the elements on the main diagonal of D are the eigenvalues of A.
Proof: Sufficiency: A is similar to D → ∃ an invertible P ∋ P‐1AP =D= diag { d1, d2, …, dn}
Then AP = PD. → [Ax1 Ax2 … Axn] = [d1x1 d2x2 … dnxn] where x1, x2, …, xn are the column vectors of P. → Axi= dixi i = 1,2,…,n Since P is invertible, its columns are nonzero vectors, hence di is an eigenvalue with corresponding eigenvector xi. Also, P is invertible → the eigenvectors xiʹs are LIN. → The eigenvectors of A form a basis for Rn.
Necessity : Suppose λ1, λ2, …, λn are eigenvalues of A ( not necessarily distinct) and the corresponding eigenvectors x1, x2, …, xn form a basis for Rn and thus a LIN set.
Let P = [x1 x2 … xn]. Note that P is invertible.
Page 96 Statistics 135: Matrix Theory for Statistics
89
Now, AP = [Ax1 Ax2… Axn] = [λ1x1 λ2x2, …, λnxn] by defn. of eigenvalues
and eigenvectors Note that P‐1xi = ith column of the identity matrix → P‐1 λi xi = ith column of the identity matrix multiplied by λi. → P‐1AP = diag { λ1, λ2, …, λn} = D. → A is similar to D by definition. Remark: Similar matrices have the same characteristic polynomial, thus they have
the same eigenvalues. Results for Symmetric Matrices : 1. All roots of the characteristic polynomial of a real symmetric matrix are real
numbers. 2. If A is symmetric, then A is diagonalizable. 3. If A is symmetric, then the eigenvectors that belong to distinct eigenvalues of A
are orthogonal.
The mk LIN eigenvectors corresponding to any eigenvalue λk with multiplicity mk are orthogonal not only to each other but also to the mt eigenvectors corresponding to each other eigenvalue λt.
4. The rank of a symmetric matrix equals the number of nonzero eigenvalues. (This is also true for all diagonable matrices.)
5. If A is symmetric, then ∃ an orthogonal matrix P ∋ P‐1AP = PʹAP =D, a diagonal
matrix. The eigenvalues of A lie on the main diagonal of D. 6. SPECTRAL DECOMPOSITION of A:
Let U be an orthogonal matrix (columns are normalized eigenvectors of A)
I = UUʹ ; I = ∑=
n
1iuiuiʹ ; A = ∑
=
n
1iAuiuiʹ
A = ∑=
n
1i λi uiuiʹ is the spectral decomposition of A
Also, Ak = ∑=
n
1i λik uiuiʹ for any positive integer k and negative integer k if A is
nonsingular (and later on, even for decimal or fractional exponents).
Page 97 Statistics 135: Matrix Theory for Statistics
90
6. Non‐negative definite matrices
i) All eigenvalues are real. ii) They are all diagonable. iii) Rank is the number of nonzero eigenvalues.
• The eigenvalues of a symmetric matrix are all non‐negative iff the matrix is
nonnegative definite. • The eigenvalues of a symmetric matrix are all positive if and iff the matrix is
positive definite. Pf : Assignment ( Hint: Start with (A‐ λI)x= 0 and use the quadratic form).
Page 98 Statistics 135: Matrix Theory for Statistics