summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2011-05-16ipv4: more compliant RFC 3168 supportEric Dumazet
Commit 6623e3b24a5e (ipv4: IP defragmentation must be ECN aware) was an attempt to not lose "Congestion Experienced" (CE) indications when performing datagram defragmentation. Stefanos Harhalakis raised the point that RFC 3168 requirements were not completely met by this commit. In particular, we MUST detect invalid combinations and eventually drop illegal frames. Reported-by: Stefanos Harhalakis <v13@v13.gr> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16ipv4: Trivial rt->rt_src conversions in net/ipv4/route.cDavid S. Miller
At these points we have a fully filled in value via the IP header the form of ip_hdr(skb)->saddr Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16net: ping: dont call udp_ioctl()Eric Dumazet
udp_ioctl() really handles UDP and UDPLite protocols. 1) It can increment UDP_MIB_INERRORS in case first_packet_length() finds a frame with bad checksum. 2) It has a dependency on sizeof(struct udphdr), not applicable to ICMP/PING If ping sockets need to handle SIOCINQ/SIOCOUTQ ioctl, this should be done differently. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Vasiliy Kulikov <segoon@openwall.com> Acked-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15caif: remove unesesarry exportssjur.brandeland@stericsson.com
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15caif: Bugfix debugfs directory name must be unique.sjur.brandeland@stericsson.com
Race condition caused debugfs_create_dir() to fail due to duplicate name. Use atomic counter to create unique directory name. net_ratelimit() is introduced to limit debug printouts. Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15caif: Handle dev_queue_xmit errors.sjur.brandeland@stericsson.com
Do proper handling of dev_queue_xmit errors in order to avoid double free of skb and leaks in error conditions. In cfctrl pending requests are removed when CAIF Link layer goes down. Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15caif: prepare support for namespacessjur.brandeland@stericsson.com
Use struct net to reference CAIF configuration object instead of static variables. Refactor functions caif_connect_client, caif_disconnect_client and squach files cfcnfg.c and caif_config_utils. Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15caif: Protected in-flight packets using dev or sock refcont.sjur.brandeland@stericsson.com
CAIF Socket Layer and ip-interface registers reference counters in CAIF service layer. The functions sock_hold, sock_put and dev_hold, dev_put are used by CAIF Stack to protect from freeing memory while packets are in-flight. Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15caif: Move refcount from service layer to sock and dev.sjur.brandeland@stericsson.com
Instead of having reference counts in caif service layers, we hook into existing refcount handling in socket layer and netdevice. Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15caif: Add ref-count to framing layersjur.brandeland@stericsson.com
Introduce Per-cpu reference for lower part of CAIF Stack. Before freeing payload is disabled, synchronize_rcu() is called, and then ref-count verified to be zero. Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15caif: Use RCU and lists in cfcnfg.c for managing caif link layerssjur.brandeland@stericsson.com
RCU lists are used for handling the link layers instead of array. When generating CAIF phy-id, ifindex is used as base. Legal range is 1-6. Introduced set_phy_state() for managing CAIF Link layer state. Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15caif: Use RCU instead of spin-lock in caif_dev.csjur.brandeland@stericsson.com
RCU read_lock and refcount is used to protect in-flight packets. Use RCU and counters to manage freeing lower part of the CAIF stack if CAIF-link layer is removed. Old solution based on delaying removal of device is removed. When CAIF link layer goes down the use of CAIF link layer is disabled (by calling caif_set_phy_state()), but removal and freeing of the lower part of the CAIF stack is done when Link layer is unregistered. Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15caif: Use rcu_read_lock in CAIF mux layer.sjur.brandeland@stericsson.com
Replace spin_lock with rcu_read_lock when accessing lists to layers and cache. While packets are in flight rcu_read_lock should not be held, instead ref-counters are used in combination with RCU. Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15net: ping: small changesEric Dumazet
ping_table is not __read_mostly, since it contains one rwlock, and is static to ping.c ping_port_rover & ping_v4_lookup are static Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-14Merge branch 'batman-adv/next' of git://git.open-mesh.org/ecsv/linux-mergeDavid S. Miller
2011-05-15batman-adv: reset broadcast flood protection on errorMarek Lindner
The broadcast flood protection should be reset to its original value if the primary interface could not be retrieved. Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-05-15batman-adv: Add missing hardif_free_ref in forw_packet_freeSven Eckelmann
add_bcast_packet_to_list increases the refcount for if_incoming but the reference count is never decreased. The reference count must be increased for all kinds of forwarded packets which have the primary interface stored and forw_packet_free must decrease them. Also purge_outstanding_packets has to invoke forw_packet_free when a work item was really cancelled. This regression was introduced in 32ae9b221e788413ce68feaae2ca39e406211a0a. Reported-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-05-13ipv4: Remove rt->rt_dst reference from ip_forward_options().David S. Miller
At this point iph->daddr equals what rt->rt_dst would hold. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13ipv4: Remove route key identity dependencies in ip_rt_get_source().David S. Miller
Pass in the sk_buff so that we can fetch the necessary keys from the packet header when working with input routes. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13ipv4: Always call ip_options_build() after rest of IP header is filled in.David S. Miller
This will allow ip_options_build() to reliably look at the values of iph->{daddr,saddr} Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13ipv4: Kill spurious write to iph->daddr in ip_forward_options().David S. Miller
This code block executes when opt->srr_is_hit is set. It will be set only by ip_options_rcv_srr(). ip_options_rcv_srr() walks until it hits a matching nexthop in the SRR option addresses, and when it matches one 1) looks up the route for that nexthop and 2) on route lookup success it writes that nexthop value into iph->daddr. ip_forward_options() runs later, and again walks the SRR option addresses looking for the option matching the destination of the route stored in skb_rtable(). This route will be the same exact one looked up for the nexthop by ip_options_rcv_srr(). Therefore "rt->rt_dst == iph->daddr" must be true. All it really needs to do is record the route's source address in the matching SRR option adddress. It need not write iph->daddr again, since that has already been done by ip_options_rcv_srr() as detailed above. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13net:set valid name before calling ndo_init()Peter Pan(潘卫平)
In commit 1c5cae815d19 (net: call dev_alloc_name from register_netdevice), a bug of bonding was involved, see example 1 and 2. In register_netdevice(), the name of net_device is not valid until dev_get_valid_name() is called. But dev->netdev_ops->ndo_init(that is bond_init) is called before dev_get_valid_name(), and it uses the invalid name of net_device. I think register_netdevice() should make sure that the name of net_device is valid before calling ndo_init(). example 1: modprobe bonding ls /proc/net/bonding/bond%d ps -eLf root 3398 2 3398 0 1 21:34 ? 00:00:00 [bond%d] example 2: modprobe bonding max_bonds=3 [ 170.100292] bonding: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) [ 170.101090] bonding: Warning: either miimon or arp_interval and arp_ip_target module parameters must be specified, otherwise bonding will not detect link failures! see bonding.txt for details. [ 170.102469] ------------[ cut here ]------------ [ 170.103150] WARNING: at /home/pwp/net-next-2.6/fs/proc/generic.c:586 proc_register+0x126/0x157() [ 170.104075] Hardware name: VirtualBox [ 170.105065] proc_dir_entry 'bonding/bond%d' already registered [ 170.105613] Modules linked in: bonding(+) sunrpc ipv6 uinput microcode ppdev parport_pc parport joydev e1000 pcspkr i2c_piix4 i2c_core [last unloaded: bonding] [ 170.108397] Pid: 3457, comm: modprobe Not tainted 2.6.39-rc2+ #14 [ 170.108935] Call Trace: [ 170.109382] [<c0438f3b>] warn_slowpath_common+0x6a/0x7f [ 170.109911] [<c051a42a>] ? proc_register+0x126/0x157 [ 170.110329] [<c0438fc3>] warn_slowpath_fmt+0x2b/0x2f [ 170.110846] [<c051a42a>] proc_register+0x126/0x157 [ 170.111870] [<c051a4dd>] proc_create_data+0x82/0x98 [ 170.112335] [<f94e6af6>] bond_create_proc_entry+0x3f/0x73 [bonding] [ 170.112905] [<f94dd806>] bond_init+0x77/0xa5 [bonding] [ 170.113319] [<c0721ac6>] register_netdevice+0x8c/0x1d3 [ 170.113848] [<f94e0e30>] bond_create+0x6c/0x90 [bonding] [ 170.114322] [<f94f4763>] bonding_init+0x763/0x7b1 [bonding] [ 170.114879] [<c0401240>] do_one_initcall+0x76/0x122 [ 170.115317] [<f94f4000>] ? 0xf94f3fff [ 170.115799] [<c0463f1e>] sys_init_module+0x1286/0x140d [ 170.116879] [<c07c6d9f>] sysenter_do_call+0x12/0x28 [ 170.117404] ---[ end trace 64e4fac3ae5fff1a ]--- [ 170.117924] bond%d: Warning: failed to register to debugfs [ 170.128728] ------------[ cut here ]------------ [ 170.129360] WARNING: at /home/pwp/net-next-2.6/fs/proc/generic.c:586 proc_register+0x126/0x157() [ 170.130323] Hardware name: VirtualBox [ 170.130797] proc_dir_entry 'bonding/bond%d' already registered [ 170.131315] Modules linked in: bonding(+) sunrpc ipv6 uinput microcode ppdev parport_pc parport joydev e1000 pcspkr i2c_piix4 i2c_core [last unloaded: bonding] [ 170.133731] Pid: 3457, comm: modprobe Tainted: G W 2.6.39-rc2+ #14 [ 170.134308] Call Trace: [ 170.134743] [<c0438f3b>] warn_slowpath_common+0x6a/0x7f [ 170.135305] [<c051a42a>] ? proc_register+0x126/0x157 [ 170.135820] [<c0438fc3>] warn_slowpath_fmt+0x2b/0x2f [ 170.137168] [<c051a42a>] proc_register+0x126/0x157 [ 170.137700] [<c051a4dd>] proc_create_data+0x82/0x98 [ 170.138174] [<f94e6af6>] bond_create_proc_entry+0x3f/0x73 [bonding] [ 170.138745] [<f94dd806>] bond_init+0x77/0xa5 [bonding] [ 170.139278] [<c0721ac6>] register_netdevice+0x8c/0x1d3 [ 170.139828] [<f94e0e30>] bond_create+0x6c/0x90 [bonding] [ 170.140361] [<f94f4763>] bonding_init+0x763/0x7b1 [bonding] [ 170.140927] [<c0401240>] do_one_initcall+0x76/0x122 [ 170.141494] [<f94f4000>] ? 0xf94f3fff [ 170.141975] [<c0463f1e>] sys_init_module+0x1286/0x140d [ 170.142463] [<c07c6d9f>] sysenter_do_call+0x12/0x28 [ 170.142974] ---[ end trace 64e4fac3ae5fff1b ]--- [ 170.144949] bond%d: Warning: failed to register to debugfs Signed-off-by: Weiping Pan <panweiping3@gmail.com> Reviewed-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13net: ipv4: add IPPROTO_ICMP socket kindVasiliy Kulikov
This patch adds IPPROTO_ICMP socket kind. It makes it possible to send ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages without any special privileges. In other words, the patch makes it possible to implement setuid-less and CAP_NET_RAW-less /bin/ping. In order not to increase the kernel's attack surface, the new functionality is disabled by default, but is enabled at bootup by supporting Linux distributions, optionally with restriction to a group or a group range (see below). Similar functionality is implemented in Mac OS X: http://www.manpagez.com/man/4/icmp/ A new ping socket is created with socket(PF_INET, SOCK_DGRAM, PROT_ICMP) Message identifiers (octets 4-5 of ICMP header) are interpreted as local ports. Addresses are stored in struct sockaddr_in. No port numbers are reserved for privileged processes, port 0 is reserved for API ("let the kernel pick a free number"). There is no notion of remote ports, remote port numbers provided by the user (e.g. in connect()) are ignored. Data sent and received include ICMP headers. This is deliberate to: 1) Avoid the need to transport headers values like sequence numbers by other means. 2) Make it easier to port existing programs using raw sockets. ICMP headers given to send() are checked and sanitized. The type must be ICMP_ECHO and the code must be zero (future extensions might relax this, see below). The id is set to the number (local port) of the socket, the checksum is always recomputed. ICMP reply packets received from the network are demultiplexed according to their id's, and are returned by recv() without any modifications. IP header information and ICMP errors of those packets may be obtained via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source quenches and redirects are reported as fake errors via the error queue (IP_RECVERR); the next hop address for redirects is saved to ee_info (in network order). socket(2) is restricted to the group range specified in "/proc/sys/net/ipv4/ping_group_range". It is "1 0" by default, meaning that nobody (not even root) may create ping sockets. Setting it to "100 100" would grant permissions to the single group (to either make /sbin/ping g+s and owned by this group or to grant permissions to the "netadmins" group), "0 4294967295" would enable it for the world, "100 4294967295" would enable it for the users, but not daemons. The existing code might be (in the unlikely case anyone needs it) extended rather easily to handle other similar pairs of ICMP messages (Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply etc.). Userspace ping util & patch for it: http://openwall.info/wiki/people/segoon/ping For Openwall GNU/*/Linux it was the last step on the road to the setuid-less distro. A revision of this patch (for RHEL5/OpenVZ kernels) is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs: http://mirrors.kernel.org/openwall/Owl/current/iso/ Initially this functionality was written by Pavel Kankovsky for Linux 2.4.32, but unfortunately it was never made public. All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with the patch. PATCH v3: - switched to flowi4. - minor changes to be consistent with raw sockets code. PATCH v2: - changed ping_debug() to pr_debug(). - removed CONFIG_IP_PING. - removed ping_seq_fops.owner field (unused for procfs). - switched to proc_net_fops_create(). - switched to %pK in seq_printf(). PATCH v1: - fixed checksumming bug. - CAP_NET_RAW may not create icmp sockets anymore. RFC v2: - minor cleanups. - introduced sysctl'able group range to restrict socket(2). Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13convert old cpumask API into new oneKOSAKI Motohiro
Adapt new API. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13af_iucv: get rid of compile warningUrsula Braun
-Wunused-but-set-variable generates compile warnings. The affected variables are removed. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13iucv: get rid of compile warningUrsula Braun
-Wunused-but-set-variable generates a compile warning. The affected variable is removed. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13ethtool: Added support for FW dumpAnirban Chakraborty
Added code to take FW dump via ethtool. Dump level can be controlled via setting the dump flag. A get function is provided to query the current setting of the dump flag. Dump data is obtained from the driver via a separate get function. Changes from v3: Fixed buffer length issue in ethtool_get_dump_data function. Updated kernel doc for ethtool_dump struct and get_dump_flag function. Changes from v2: Provided separate commands for get flag and data. Check for minimum of the two buffer length obtained via ethtool and driver and use that for dump buffer Pass up the driver return error codes up to the caller. Added kernel doc comments. Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12ipv4: Fix 'iph' use before set.David S. Miller
I swear none of my compilers warned about this, yet it is so obvious. > net/ipv4/ip_forward.c: In function 'ip_forward': > net/ipv4/ip_forward.c:87: warning: 'iph' may be used uninitialized in this function Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12ipv4: Elide use of rt->rt_dst in ip_forward()David S. Miller
No matter what kind of header mangling occurs due to IP options processing, rt->rt_dst will always equal iph->daddr in the packet. So we can safely use iph->daddr instead of rt->rt_dst here. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12ipv4: Simplify iph->daddr overwrite in ip_options_rcv_srr().David S. Miller
We already copy the 4-byte nexthop from the options block into local variable "nexthop" for the route lookup. Re-use that variable instead of memcpy()'ing again when assigning to iph->daddr after the route lookup succeeds. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12ipv4: Kill spurious opt->srr check in ip_options_rcv_srr().David S. Miller
All call sites conditionalize the call to ip_options_rcv_srr() with a check of opt->srr, so no need to check it again there. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12net: introduce netdev_change_features()Michał Mirosław
It will be needed by bonding and other drivers changing vlan_features after ndo_init callback. As a bonus, this includes kernel-doc for netdev_update_features(). Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12ipvs: Remove all remaining references to rt->rt_{src,dst}Julian Anastasov
Remove all remaining references to rt->rt_{src,dst} by using dest->dst_saddr to cache saddr (used for TUN mode). For ICMP in FORWARD hook just restrict the rt_mode for NAT to disable LOCALNODE. All other modes do not allow IP_VS_RT_MODE_RDR, so we should be safe with the ICMP forwarding. Using cp->daddr as replacement for rt_dst is safe for all modes except BYPASS, even when cp->dest is NULL because it is cp->daddr that is used to assign cp->dest for sync-ed connections. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12ipvs: Eliminate rt->rt_dst usage in __ip_vs_get_out_rt().David S. Miller
We can simply track what destination address is used based upon which code block is taken at the top of the function. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12ipvs: Use IP_VS_RT_MODE_* instead of magic constants.David S. Miller
[ Add some cases I missed, from Julian Anastasov ] Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12net/irda/ircomm_tty.c: Use flip buffers to deliver dataAmit Virdi
use tty_insert_flip_string and tty_flip_buffer_push to deliver incoming data packets from the IrDA device instead of delivering the packets directly to the line discipline. Following later approach resulted in warning "Sleeping function called from invalid context". Signed-off-by: Amit Virdi <amit.virdi@st.com> Acked-by: Alan Cox <alan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12net: Fix vlan_features propagationMichał Mirosław
Fix VLAN features propagation for devices which change vlan_features. For this to work, driver needs to make sure netdev_features_changed() gets called after the change (it is e.g. after ndo_set_features()). Side effect is that a user might request features that will never be enabled on a VLAN device. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12ethtool: bring back missing comma in netdev features stringsFranco Fichtner
The issue was introduced in commit eed2a12f1ed9aabf. Signed-off-by: Franco Fichtner <franco@lastsummer.de> Acked-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12garp: remove last synchronize_rcu() callEric Dumazet
When removing last vlan from a device, garp_uninit_applicant() calls synchronize_rcu() to make sure no user can still manipulate struct garp_applicant before we free it. Use call_rcu() instead, as a step to further net_device dismantle optimizations. Add the temporary garp_cleanup_module() function to make sure no pending call_rcu() are left at module unload time [ this will be removed when kfree_rcu() is available ] Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12sctp: sctp_sendmsg: Don't test known non-null sinfoJoe Perches
It's already known non-null above. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12sctp: sctp_sendmsg: Don't initialize default_sinfoJoe Perches
This variable only needs initialization when cmsgs.info is NULL. Use memset to ensure padding is also zeroed so kernel doesn't leak any data. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12l2tp: fix potential rcu raceEric Dumazet
While trying to remove useless synchronize_rcu() calls, I found l2tp is indeed incorrectly using two of such calls, but also bumps tunnel refcount after list insertion. tunnel refcount must be incremented before being made publically visible by rcu readers. This fix can be applied to 2.6.35+ and might need a backport for older kernels, since things were shuffled in commit fd558d186df2c (l2tp: Split pppol2tp patch into separate l2tp and ppp parts) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> CC: James Chapman <jchapman@katalix.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-11Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-3.6 Conflicts: drivers/net/benet/be_main.c
2011-05-11Merge branch 'tipc-May10-2011' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/net-next-2.6
2011-05-10Merge branch 'pablo/nf-2.6-updates' of git://1984.lsi.us.es/net-2.6David S. Miller
2011-05-10xfrm: Don't allow esn with disabled anti replay detectionSteffen Klassert
Unlike the standard case, disabled anti replay detection needs some nontrivial extra treatment on ESN. RFC 4303 states: Note: If a receiver chooses to not enable anti-replay for an SA, then the receiver SHOULD NOT negotiate ESN in an SA management protocol. Use of ESN creates a need for the receiver to manage the anti-replay window (in order to determine the correct value for the high-order bits of the ESN, which are employed in the ICV computation), which is generally contrary to the notion of disabling anti-replay for an SA. So return an error if an ESN state with disabled anti replay detection is inserted for now and add the extra treatment later if we need it. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10xfrm: Assign the inner mode output function to the dst entrySteffen Klassert
As it is, we assign the outer modes output function to the dst entry when we create the xfrm bundle. This leads to two problems on interfamily scenarios. We might insert ipv4 packets into ip6_fragment when called from xfrm6_output. The system crashes if we try to fragment an ipv4 packet with ip6_fragment. This issue was introduced with git commit ad0081e4 (ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed). The second issue is, that we might insert ipv4 packets in netfilter6 and vice versa on interfamily scenarios. With this patch we assign the inner mode output function to the dst entry when we create the xfrm bundle. So xfrm4_output/xfrm6_output from the inner mode is used and the right fragmentation and netfilter functions are called. We switch then to outer mode with the output_finish functions. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10net: dev_close() should check IFF_UPEric Dumazet
Commit 443457242beb (factorize sync-rcu call in unregister_netdevice_many) mistakenly removed one test from dev_close() Following actions trigger a BUG : modprobe bonding modprobe dummy ifconfig bond0 up ifenslave bond0 dummy0 rmmod dummy dev_close() must not close a non IFF_UP device. With help from Frank Blaschka and Einar EL Lueck Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Reported-by: Einar EL Lueck <ELELUECK@de.ibm.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10vlan: fix GVRP at dismantle timeEric Dumazet
ip link add link eth2 eth2.103 type vlan id 103 gvrp on loose_binding on ip link set eth2.103 up rmmod tg3 # driver providing eth2 BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa0030c9e>] garp_request_leave+0x3e/0xc0 [garp] PGD 11d251067 PUD 11b9e0067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/virtual/net/eth2.104/ifindex CPU 0 Modules linked in: tg3(-) 8021q garp nfsd lockd auth_rpcgss sunrpc libphy sg [last unloaded: x_tables] Pid: 11494, comm: rmmod Tainted: G W 2.6.39-rc6-00261-gfd71257-dirty #580 HP ProLiant BL460c G6 RIP: 0010:[<ffffffffa0030c9e>] [<ffffffffa0030c9e>] garp_request_leave+0x3e/0xc0 [garp] RSP: 0018:ffff88007a19bae8 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff88011b5e2000 RCX: 0000000000000002 RDX: 0000000000000000 RSI: 0000000000000175 RDI: ffffffffa0030d5b RBP: ffff88007a19bb18 R08: 0000000000000001 R09: ffff88011bd64a00 R10: ffff88011d34ec00 R11: 0000000000000000 R12: 0000000000000002 R13: ffff88007a19bc48 R14: ffff88007a19bb88 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff88011fc00000(0063) knlGS:00000000f77d76c0 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 0000000000000000 CR3: 000000011a675000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rmmod (pid: 11494, threadinfo ffff88007a19a000, task ffff8800798595c0) Stack: ffff88007a19bb36 ffff88011c84b800 ffff88011b5e2000 ffff88007a19bc48 ffff88007a19bb88 0000000000000006 ffff88007a19bb38 ffffffffa003a5f6 ffff88007a19bb38 670088007a19bba8 ffff88007a19bb58 ffffffffa00397e7 Call Trace: [<ffffffffa003a5f6>] vlan_gvrp_request_leave+0x46/0x50 [8021q] [<ffffffffa00397e7>] vlan_dev_stop+0xb7/0xc0 [8021q] [<ffffffff8137e427>] __dev_close_many+0x87/0xe0 [<ffffffff8137e507>] dev_close_many+0x87/0x110 [<ffffffff8137e630>] rollback_registered_many+0xa0/0x240 [<ffffffff8137e7e9>] unregister_netdevice_many+0x19/0x60 [<ffffffffa00389eb>] vlan_device_event+0x53b/0x550 [8021q] [<ffffffff8143f448>] ? ip6mr_device_event+0xa8/0xd0 [<ffffffff81479d03>] notifier_call_chain+0x53/0x80 [<ffffffff81062539>] __raw_notifier_call_chain+0x9/0x10 [<ffffffff81062551>] raw_notifier_call_chain+0x11/0x20 [<ffffffff8137df82>] call_netdevice_notifiers+0x32/0x60 [<ffffffff8137e69f>] rollback_registered_many+0x10f/0x240 [<ffffffff8137e85f>] rollback_registered+0x2f/0x40 [<ffffffff8137e8c8>] unregister_netdevice_queue+0x58/0x90 [<ffffffff8137e9eb>] unregister_netdev+0x1b/0x30 [<ffffffffa005d73f>] tg3_remove_one+0x6f/0x10b [tg3] We should call vlan_gvrp_request_leave() from unregister_vlan_dev(), not from vlan_dev_stop(), because vlan_gvrp_uninit_applicant() is called right after unregister_netdevice_queue(). In batch mode, unregister_netdevice_queue() doesn’t immediately call vlan_dev_stop(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10net: fix two lockdep splatsEric Dumazet
Commit e67f88dd12f6 (net: dont hold rtnl mutex during netlink dump callbacks) switched rtnl protection to RCU, but we forgot to adjust two rcu_dereference() lockdep annotations : inet_get_link_af_size() or inet_fill_link_af() might be called with rcu_read_lock or rtnl held, so use rcu_dereference_rtnl() instead of rtnl_dereference() Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>