Socket Programming

24
Socket Programming

description

Socket Programming. Socket Programming. TCP/IP 簡介 TCP/IP 模組架構 Socket 介紹 Socket 分類 Client/Server model ( 主從式架構模型 ) 相關函式 TCP Socket 程式 設計 UDP Socket 程式設計 Socket Read/Write Inside Out Performance Matters: Interrupt and Memory Copy at Socket - PowerPoint PPT Presentation

Transcript of Socket Programming

Socket Programming

Socket Programming1Socket ProgrammingTCP/IP TCP/IPSocketSocketClient/Server model()TCP SocketUDP SocketSocket Read/Write Inside OutPerformance Matters: Interrupt and Memory Copy at SocketOpen Source Implementation: Linux Socket Filter

2TCP/IP TCP/IP Transmission Control Protocol / Internet Protocol (TCP/IP)ARPANET ARPANET

TCP/IPARPANET()TCP/IP,3TCP/IP TCP/IP4

(Packet)IP(packet)[IP](internet protocol)IPTCP/IPTCPUDPTCPTCPUDPTCPFTPHTTPSMTPTELNETDNS

4Socket socket Linux

A socket is an abstraction of the end point of a communication channel. As its namesuggests, the end-to-end protocol layer controls the data communications betweenthe two end points of a channel. The end points are created by networking applicationsusing socket APIs of an appropriate type. Networking applications can thenperform a series of operations on that socket. The operations that can be performedon a socket include control operations (such as associating a port number with thesocket, initiating or accepting a connection on the socket, or releasing the socket),data transfer operations (such as writing data through the socket to some peer application,or reading data from some peer application through the socket), and statusoperations (such as finding the IP address associated with the socket). The completeset of operations that can be performed on a socket constitutes the socket APIs.To open a socket, an application program first calls the socket() function toinitialize an end-to-end channel. The standard socket call, sk=socket(domain,type, protocol) , requires three parameters. The first parameter specifies thedomain or address family. Commonly used families are AF_UNIX for communicationsbounded on the local machine, and AF_INET for communications based onIPv4 protocols. The second parameter specifies the type of socket. Common valuesfor socket type, when dealing with the AF_INET family, include SOCK_STREAM(typically associated with TCP) and SOCK_DGRAM (associated with UDP). Sockettype influences how packets are handled by the kernel before being passed up to theapplication. The last parameter specifies the protocol that handles the packets flowingthrough the socket. The socket function returns a file descriptor through whichoperations on the socket can be applied.The values of the socket parameters depend on what underlying protocols areused. In the next two subsections we investigate three types of socket APIs. They correspondto accessing the transport layer, the IP layer, and the link layer, respectively,as we can see in their open source implementations.5Socket

1Datagram sockets(connectionless) datagram socketsUDPsocket2Stream sockets(connection-oriented) stream socketsTCPsocketTCPTCP

TCP/IP,Socket

6Client/Server model socketsocketsstream(TCP)datagram(UDP)socketsocketTCP/IPsocket nameIPWindows SocketsWindows SocketsIPsocketsocket(association)socketIPIPWindows Sockets

7socket()NAME socket() - create an endpoint for communicationSYNOPSIS #include #include int socket(int domain, int type, int protocol);1. serverclient2. domain: AF_INET3. type: SOCK_STREAMSOCK_DGRAM4. protocol: 0 (socket()type)5. socket descriptor-1 (errnomacro)socket()socket() socket() domain AF_INETIPv4AF_INET6IPv6AF_UNIXtypeSOCK_STREAMSOCK_DGRAMSOCK_SEQPACKETSOCK_RAW()protocol IPPROTO_TCPIPPROTO_SCTPIPPROTO_UDPIPPROTO_DCCP 0domaintype-1

8bind()NAME bind() - bind a name to a socketSYNOPSIS #include #include int bind(int sockfd, struct sockaddr *my_addr, int addrlen);1. server2. sockfdsocket()3. myaddr portipstruct sockaddr_instruct sockaddr4. addrlensizeof(struct sockaddr)5. bindlocalendpoint attachsocketbind()IPbind()bind()socket()bind()bind()sockfd, bindmy_addr, sockaddraddrlen, socklen_tsockaddr-10

9connect()NAME connect() - initiate a connection on a socket.SYNOPSIS#include #include int connect(int sockfd, struct sockaddr *serv_addr, int addrlen);1. connect()client2. sockfdsocket()serv_addrserveripport3. addrlensizeof(struct sockaddr)connect() TCPTCP

connect()UDP send() recv() -10

10listen()NAME listen() - listen for connections on a socketSYNOPSIS #include int listen(int sockfd, int backlog);1. server2. sockfdsocket()backlogqueueconnection (accept())3. -1listen()TCP

socketlisten()(SOCK_STREAM,SOCK_SEQPACKET)listen()sockfd, socket.backlog, 0-1

11accept()NAME accept() - accept a connection on a socketSYNOPSIS #include #include int accept(int sockfd, struct sockaddr *addr, int *addrlen);1. sockfd: listen()sockfd2. addr client3. addrlensizeof(struct sockaddr)4.-15. server listen()accept()accept()file descriptorconnectionI/Oaccpetserverlisten()fdconnectionaccept() TCP

Unix select()accept() Accept() sockfd,cliaddr, sockaddr addrlen,socklen_t -1Datagram accept()

12sendto()int sendto(int sockfd, const void *msg, int len, int flags);1. sockfdlisten()fdaccpetfd2. msgdatalendata length (sizeof(msg))flag03. 4. -1send()recv(),write()read(),recvfrom()sendto(), /

socket#include #include int send(int s, const void *msg, int len, unsigned int flags);ru; y xul4bufsocketlenflags0 MSG_OOB out-of-bandMSG_DONTROUTE routeMSG_DONTWAIT MSG_NOSIGNAL SIGPIPE-1

13recvfrom()The recvfrom() call is similar in many respects:int recvfrom(int sockfd, void *buf, int len, unsigned int flags);1. sockfd: fd2. 0connectionsend()recv(),write()read(),recvfrom()sendto(), /

socket#include #include int recv(int s, void *buf, int len, unsigned int flags);buflenflags0 MSG_OOB out-of-bandMSG_PEEK socketMSG_WAITALL lenMSG_NOSIGNAL SIGPIPE-1

14write()write()NAME write() - write to a file descriptorSYNOPSIS #include ssize_t write(int fd, const void *buf, size_t count);1. filedevicesocketsocket#include int write(int sockfd, char *buf, int len); Sockfd socketsocket IDBuf Len -1

15read()read()NAME read() - read from a file descriptorSYNOPSIS #include ssize_t read(int fd, void *buf, size_t count);1. filedevicesocket2. datareadblock3. countreturnblocksocket#include int read(int sockfd, char *buf, int len); Sockfd socketsocket IDBuf Len -1write()

16close()close()NAME read() - read from a file descriptorSYNOPSIS #include int clode(int sockfd);1.close()0-1

17TCP Socket

Similarly, a socket file descriptor returned from socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) is initialized as a TCP socket, where AF_INET indicatesInternet address family, SOCK_STREAM stands for the reliable byte-stream service,and IPPROTO_TCP means the TCP protocol. The functions to be performed on thedescriptor are depicted in Figure 5.38 . Here by default bind() is not called at the client.The flowchart of the simple TCP client-server programs is a little bit complex dueto the connection-oriented property of TCP. It contains connection establishment, datatransfer, andconnectionterminationstages. Besides bind() , theservercalls listen()to allocate the connection queue to the socket and waits for connection requests fromclients. The listen() system call expresses the willingness of the server to start acceptingincoming connection requests. Each listening socket contains two queues: (1) partially established request queue and (2) fully established request queue. A requestwould first stay in the partially established queue during the three-way handshake.After the connection is established with the three-way handshake finished, the requestwould be moved to the fully established request queue.The partially established request queue in most operating systems has a maximumqueue length, e.g., 5, even if the user specifies a value larger than that. Thus,the partially established request queue could be the target of a denial of service (DoS)attack. If a hacker continuously sends SYN requests without finishing the three-wayhandshake, the request queue will be saturated and cannot accept new connectionrequests from well-behaving clients.The listen() system call is commonly followed by the accept() systemcall, whose job is to de-queue the first request from the fully-established requestqueue, initialize a new socket pair and return the file descriptor of the new socketcreated for the client. That is, the accept() system call provided by the BSDsocket results in the automatic creation of a new socket, largely different from that inthe TLI sockets where an application must explicitly create a new socket for the newconnection. Note that the original listening socket is still listening on the well-knownport for new connection requests. Of course the new socket pair contains the IP18UDP Socket

The services most widely used by networking applications are those providedby transport protocols such as UDP and TCP. A socket file descriptor is returnedfrom the socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP) function and392 Computer Networks: An Open Source Approachinitialized as a UDP socket, where AF_INET indicates Internet address family,SOCK_DGRAM stands for datagram service, and IPPROTO_UDP indicates the UDPprotocol. A series of operations can be performed on the descriptor, such as thosefunctions in Figure 5.37 .In Figure 5.37 , before the connection is established, the UDP server as well asthe client creates a socket and uses the bind() system call to assign an IP addressand a port number to the socket. Note that bind() is optional and is usually notcalled at the client. When bind() is not called, the kernel selects the default IP addressand a port number for the client. Then, after a UDP server binds to a port, it isready to receive requests from the UDP client. The UDP client may loop through thesendto() and recvfrom() functions to do some useful work until it finishesits job. The UDP server continues accepting requests, processing the requests, andfeedbacking the results using sendto() and recvfrom() . Normally, a UDP clientdoes not need to call bind() as it does not need to use well-known ports. Thekernel dynamically assigns an unused port to the client when it calls sendto() .19socketsocketsocketsocketssocket

SocketSocketsocketssocketsocket()socket()Socketsocketsocketsocketsocketsocketsocketbind()IPSocketsocketsocketsockets(association)socket?WinSock APIlisten()socket(accept())connect()socketSocketssockets?socketrecv()send()socketrecvfrom()sendto()SocketsocketTCP socketsocketsocketUDP socketsocketclosesocket()

20Socket Read/Write Inside outUser SpaceServerClientServer socket creationsend dataClient socket creationsend datasocket()bind()listen()write()accept()socket()read()connect()sys_listen

inet_listensys_write

do_sock_write

sock_sendmsg

inet_sendmsg

tcp_sendmsg

tcp_write_xmit

sys_socket

sock_create

inet_createsys_bind

inet_bind

sys_accept

inet_accept

tcp_accept

wait_for_connectionKernel Spacesys_socket

sock_create

inet_createsys_read

do_sock_read

sock_recvmsg

sock_common_recvmsg

tcp_recvmsg

memcpy_toiovecsys_connect

inet_stream_connect

tcp_v4_getport

tcp_v4_connect

inet_wait_connectsys_socketcallInternetsys_socketcallThe internals of the socket APIs used by simple TCP client-server programsin Linux are illustrated in Figure 5.40 . Programming APIs invoked from theuser-space programs are translated into the sys_socketcall() kernelcall and are then dispatched to their corresponding sys_*() calls. Thesys_socket() (in net/socket.c ) calls sock_create() to allocatethe socket and then calls inet_create() to initialize the sock structureaccording to the given parameters. The other sys_*() functions call theircorresponding inet_*() functions because the sock structure is initializedto Internet address family ( AF_INET ). Since read() and write()in Figure 5.40 are not socket-specific APIs but are commonly used by file I/Ooperations, their call flows follow their inode operations in the file system tofind that the given file descriptor is actually related to a sock structure. Subsequentlythey are translated into the corresponding do_sock_read() anddo_sock_write() functions, and so on, which are socket-aware.In most UNIX systems the read()/write() functions are integratedinto the Virtual File System (VFS). VFS is the software layer in the kernel thatprovides the file system interface to user space programs. It also provides anabstraction within the kernel which allows different file system implementationsto coexist.21Socket Read/Write Inside out

As shown in Figure 5.41 , the structure proto in the structure sock provides alist of function pointers that link to the necessary operations of a socket, such asconnect , sendmsg , and recvmsg . By linking different sets of functions tothe list, a socket can send or receive data over different protocols. Find out andread the function sets of other protocols such as UDP.22Performance Matters: Interrupt and Memory Copy at Socket

Latency in transmitting TCP segments in the TCP layerLatency in receiving TCP segments in the TCP layerReceiving segments at a socket actually invokes two processing flows, as shownin the call graph of Figure 5.42 . The first flow starts from the system call,read() , later waits on the tcp_recvmsg() (for the case of TCP), whichneeds to be triggered by sk_data_ready() , and ends at the return to the userspace. Thus, the time spent on this flow presents the user-perceived latency. Thesecond flow starts from tcp_v4_rcv() (for the case of TCP) called by the IPlayer with an incoming packet and ends at calling sk_data_ready() to triggerthe resumption of first flow. Figure 5.42 shows the time spent on receivingTCP segments in the transport layer. tcp_recvmsg() takes the responsibilityto copy data from the kernel structure into the user buffer, and therefore consumesthe most time (2.6 s). The system call, read() , spends time on modeswitching between user and kernel modes. Besides, it also spends time on systemtable lookup. Therefore, read() spends significant time (2.4 s). Finally,in the second flow, time spent on tcp_data_queue() and tcp_v4_rcv()are to queue and validate segments, respectively.

Figure 5.43 shows the time spent in transmitting TCP segments. The toptwo most time-consuming functions are functionally similar to the ones in thereceiving case. They are tcp_sendmsg() , which copies data from the userbuffer to the kernel structure, and the system call write() , switching betweenuser and kernel modes. After examining the time of both TCP segment transmissionand reception, we can conclude that the bottlenecks of the TCP layer occurat two places: memory copy between the user buffer and the kernel structure,and switching between user and kernel modes23Open Source Implementation: Linux Socket FilterLinux Socket Filter (net/core/filter.c)Similar to BPF (Berkley Packet FIilter)network monitornetwork monitorrarpdfilterfilterfilterbufferbufferbufferBPFprotocolstackuserkernellink-leveldriverlink-leveldriverlink-leveldrivernetworkkernelFigure 5.44 presents a layered model for packet capturing and filtering. Theincoming packets are cloned from the normal protocol stack to the BPF, whichthen filters packets within the kernel level according the BPF instructionsinstalled by the corresponding applications. Since only the packets passingthrough BPF will be directed to the user-space programs, the overhead of thedata exchange between user and kernel spaces can be significantly reduced.To employ a Linux socket filter with a socket, the BPF instruction can bepassed to the kernel by using the setsockopt() function implemented insocket.c , and setting the parameter optname to SO_ATTACH_FILTER .The function will assign the BPF instruction to the sock->sk_filterillustrated in Figure 5.41 . The BPF packet-filtering engine was written in aspecific pseudo-machine code language inspired by Steve McCanne and VanJacobson. BPF actually looks like a real assembly language with a couple ofregisters and a few instructions to load and store values and perform arithmeticoperations and conditionally branch.The filter code examines each packet on the attached socket. The result ofthe filter processing is an integer that indicates how many bytes of the packet(if any) the socket should pass to the application level. This contributes toa further advantage that since often for the purpose of packet capturing andfiltering we are interested in just the first few bytes of a packet, we can saveprocessing time by not copying the excess bytes.24socket()connect()write()read()close()socket()bind()listen()accept()read()write()read()close()connection establishment(TCP Three-way handshake)data (request)data (reply)end-of-life notificationprocess requestblocks until connection from client1. switch to passive socket2. create connection queueTCP ServerTCP Clientobtain a descriptorassign IP & port to the socketenter ESTABLISHED stateinitiate 3-way handshakeobtain a descriptorsocket()recvfrom()sendto()recvfrom()close()socket()bind()sendto()data (request)data (reply)process requestblocks until connection from clientobtain a descriptorUDP ServerUDP Clientobtain a descriptorassign IP & port to the socketsocket()recvfrom()sendto()recvfrom()close()socket()bind()sendto()data (request)data (reply)process requestblocks until connection from clientobtain a descriptorUDP ServerUDP Clientobtain a descriptorassign IP & port to the socketsocket()recvfrom()sendto()recvfrom()close()socket()bind()sendto()data (request)data (reply)process requestblocks until connection from clientobtain a descriptorUDP ServerUDP Clientobtain a descriptorassign IP & port to the socketcountfile_lockmax_fdsnext_fdmax_fdsetfd[0]fd[1]fd[255]KKlinux/sched.hstruct files_structKKf_dentryf_listmax_fdsf_opf_vfsmntf_countf_flagsf_modef_posd_flagsKKd_countd_inodeKKd_parentKKskKKKKlinux/fs.hstruct filelinux/dentry.hstruct dentrylinux/fs.hstruct inodeinodefileKKstruct socket

s_addrd_addrdportnet/sock.hstruct sockbound_dev_ifsportKKreceive_queuewrite_queueprotoKKKKstruct tcp_opt

union tp_pinfo

KKsnd_cwndKKKKsk_filterKKsocketKKconnectclosedisconnectioctlacceptinitdestoryshutdownsetsockoptgetsockoptnet/sock.hstruct protosendmsgrecvmsgKKtcp_v4_connecttcp_closetcp_disconnecttcp_ioctltcp_accepttcp_v4_init_socktcp_v4_destory_socktcp_shutdowntcp_setsockopttcp_getsockopttcp_sendmsgtcp_recvmsgKKipv4/tcp_ipv4.cstruct tcp_funcunion u

opened Linux socket