Friday 14 March 2014

MacOSX - howto "burn" CD image to USB stick

Insert USB stick and find out mount point:
dhcp54:~ bgerofi$ mount
/dev/disk0s2 on / (hfs, local, journaled)
devfs on /dev (devfs, local, nobrowse)
map -hosts on /net (autofs, nosuid, automounted, nobrowse)
map auto_home on /home (autofs, automounted, nobrowse)
/dev/disk1s2 on /Volumes/Adium 1.5.9 (hfs, local, nodev, nosuid, read-only, noowners, quarantine, mounted by bgerofi)
/dev/disk2s1 on /Volumes/USB20FD (msdos, local, nodev, nosuid, noowners)

Unmount (without eject) parition:
dhcp54:Downloads bgerofi$ sudo diskutil umount /dev/disk2s1
Volume USB20FD on disk2s1 unmounted

Overwrite block device with .iso image:
dhcp54:Downloads bgerofi$ sudo dd bs=2097152 if=~/Downloads/ubuntu-13.10-desktop-amd64.iso of=/dev/disk2
441+1 records in
441+1 records out
925892608 bytes transferred in 360.621256 secs (2567493 bytes/sec)

Wednesday 7 September 2011

regarding TCP checksum offload and linux kernel netfilters

Rewriting certain parts of a TCP packet via netfilter can be tricky if your NIC does TCP checksum offloading. Apparently, th->check needs to hold the checksum of the pseudo header, the hardware will finish up the rest. Besides, IP checksum may be necessary to fill in correctly, why is that not computed in hardware as well (?)...

Tuesday 14 December 2010

qemu-kvm w2k3 virtio disk drivers

1.) Download the disk-virtio ISO and copy the content into the guest's partition.
2.) Start VM and attach a disk:
virsh attach-disk w2k3 /mnt/bgerofi-virtual_machines/winxp/winxp.raw vdb --type disk --type file
3.) Specify the disk-virtio driver for the new SCSI device.
4.) Shutdown, undefine VM, modify xml and redifine VM so that disk is virtio.

Windows Server 2003 over qemu-kvm, "A disk read error occurred."

if you encounter a problem of getting the following error message after installing Windows Server 2003/XP in qemu-kvm, during boot:

"A disk read error occurred."

that is because:
(according to http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579166#17)

A run of testdisk on the raw image file detects a difference between
the disc geometry mentioned in the MBR and in the NTFS partitions
start sector. MBR says that there should be 255 heads and NTFS
partition boot sector assumes 16 heads.
I edidet the NTFS partition bootsector manual, and the problem went
away. When i do the setup with the plain qemu emulator there is also
no problem. For me it seems, that the it must be a problem between
seabios und the kvm. may be the kvm handels the bios information in a
different way?

Asume: A raw complete harddisc image within a bootable NTFS partition with XP or w2k3 on it.
Incident: when using these image with kvm based qemu, the system won't boot anymore

Solution:

1) set up the whole discimage as a loop device
- losetup /dev/loop0 /path/to/my/diskimage.raw
2) let kpartx create drive mappings for all partitions within the loop device
- kpartx -a /dev/loop0
3) you need to know on which partition your NTFS partition resides
- fdisk -l /dev/loop0
4) use the right partition mapping with hex-edit (eg. partition 1)
- hexedit /dev/mapper/loop0p1
5) look on hex position 0x1a, for the count of heads NTFS asumes
- in hexedit type enter and then 1A
6) change the value to 0xFF
- in hexedit type FF
7) save and exit hexedit
- press Ctrl+X to end
8) remove the partition mappings
- kpartx -d /dev/loop0
9) remove loop device
- losetup -d /dev/loop0

Friday 23 April 2010

VM migration call stack (libvirt and qemu-KVM)

libvirt-0.7.7/tools/virsh.c: cmdMigrate() -> virDomainMigrate() -> virDomainMigrateVersion2() -> domain->conn->driver->domainMigratePerform() ->

libvirt-0.7.7/src/qemu/qemu_driver.c: qemudDomainMigratePerform() -> doNativeMigrate() -> qemuMonitorMigrateToHost() -> qemuMonitorTextMigrateToHost() -> qemuMonitorCommandWithHandler:230:
Send command 'migrate -d "tcp:tsurugi8.il.is.s.u-tokyo.ac.jp:49157"'

qemu-kvm-0.12.3/migration.c: do_migrate() -> tcp_start_outgoing_migration() -> migrate_fd_connect()

migrate_fd_connect():

qemu_fopen_ops_buffered(): creates a QEMUFileBuffered object and sets up a timer in qemu_new_timer(), where buffered_rate_tick() is the actual callback function.

buffered_rate_tick(): sets the new timer deadline and calls the QEMUFileBuffered obj's put_buffer() and put_ready() functions that were registered as migrate_fd_put_buffer() and migrate_fd_put_ready() respectively.

migrate_fd_put_ready(): calls qemu_savevm_state_iterate() that iterates the savevm_handlers and calls save_live_state() on each.

two live save handlers are registered: ram_save_live() and block_save_live() for the memory and the disk respectively.

ram_save_live():

Thursday 15 April 2010

KVM bridge network + VM live migration

/etc/network/interfaces on the host machine:
#auto eth0
#iface eth0 inet dhcp

auto br0
iface br0 inet dhcp
bridge_ports eth0
bridge_stp off
bridge_fd 0
bridge_maxwait 0

domain description XML for the guest (interface bridge is the point!):

<domain type='kvm' id='3'>
<name>karmic_qemu</name>
<uuid>11111111-50c2-1590-c9fc-0d3afa1c1b97</uuid>
<memory>524288</memory>
<currentMemory>524288</currentMemory>
<vcpu>1</vcpu>
<os>
<type arch='x86_64' machine='pc-0.11'>hvm</type>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
<pae/>
</features>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<devices>
<emulator>/usr/local/bin/qemu-system-x86_64</emulator>
<disk type='file' device='cdrom'>
<target dev='hdc' bus='ide'/>
<readonly/>
</disk>
<disk type='file' device='disk'>
<source file='/home/bgerofi/qemu-images/karmic.qcow2'/>
<target dev='vda' bus='virtio'/>
</disk>
<interface type='bridge'>
<source bridge='br0'/>
<mac address="00:6E:01:69:3A:11"/>
<model type='virtio'/>
</interface>
<serial type='pty'>
<source path='/dev/pts/1'/>
<target port='0'/>
</serial>
<console type='pty' tty='/dev/pts/1'>
<source path='/dev/pts/1'/>
<target port='0'/>
</console>
<input type='mouse' bus='ps2'/>
<graphics type='vnc' port='5900' autoport='yes' keymap='en-us'/>
<video>
<model type='cirrus' vram='9216' heads='1'/>
</video>
</devices>
</domain>


/etc/network/interfaces on the guest machine (for tsurugi network):

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth1
iface eth1 inet static
address 172.16.16.99
netmask 255.255.0.0
gateway 172.16.0.1

(I have no clue why it doesn't show up as eth0...)

on both machines we need to set up sasl:
bgerofi@tsurugi8:~$ sudo saslpasswd2 -a libvirt bgerofi -f /etc/libvirt/passwd.db
(type pasword)

(Note: it is possible you need to add the user/pass to /etc/sasldb2 as well!)

bgerofi@tsurugi8:~$ sudo sasldblistusers2 -f /etc/libvirt/passwd.db
bgerofi@tsurugi8: userPassword


edit libvirt.conf so that it uses TCP and no TSL:
bgerofi@tsurugi8:~$ sudo vim /usr/local/etc/libvirt/libvirtd.conf

# This is enabled by default, uncomment this to disable it
listen_tls = 0 <<<----- THIS CHANGED

# Listen for unencrypted TCP connections on the public TCP/IP port.
# NB, must pass the --listen flag to the libvirtd process for this to
# have any effect.
#
# Using the TCP socket requires SASL authentication by default. Only
# SASL mechanisms which support data encryption are allowed. This is
# DIGEST_MD5 and GSSAPI (Kerberos5)
#
# This is disabled by default, uncomment this to enable it.
listen_tcp = 1 <<<----- THIS CHANGED


start libvirtd in listening mode:
root@tsurugi8:~# libvirtd --listen


live migration:
bgerofi@tsurugi7:~$ sudo virsh list --all
Id Name State
----------------------------------
1 karmic_qemu running

bgerofi@tsurugi7:~$ sudo virsh migrate --live karmic_qemu qemu+tcp://tsurugi8/system
Please enter your authentication name: bgerofi
Please enter your password:

bgerofi@tsurugi7:~$ sudo virsh list --all
Id Name State
----------------------------------
- karmic_qemu shut off

what we got on tsurugi8:
bgerofi@tsurugi8:~$ sudo virsh list
Id Name State
----------------------------------
1 karmic_qemu running



QEMU log file: /usr/local/var/log/libvirt/qemu/karmic_qemu.log

replication:
sudo virsh replicate karmic_qemu qemu+tcp://tsurugi8/system

Wednesday 15 July 2009

tracking dirty pages without patching the kernel?

The main idea is to replace all writable mappings (both anonymous and vm_file) with a VM_SHARED pseudo file mapping, which will imitate the right behavior of the vm_area.
Each vm_area gets a separate pseudo file/address space which stores the original mapping's properties.

Initially all pages (also the anonymous ones that got COW-ed from a private mapping) are converted to shared mappings of this pseudo file. Reverse mappings and page cache need to be updated consistently (i.e. each page has to be linked to the pseudo address space and the vm_area has to be included into the address space's priority tree.)

Each iteration of the incremental update clears the write bit of all PTEs (belonging to the dirty pages).

The benefit of the pseudo file mapping is the address space callbacks, page_mkwrite() in do_wp_page() and fault() in __do_fault().
We will be always notified when a page gets written to first, while we don't utilize the dirty bit (swapping can still work) and don't miss page writes in case of an mprotect() call after the write (since the write itself had been logged before mprotect()).


Considering the different cases of the original mappings:
(anon vs. file backed / private vs. shared, bold text means the original)

vm_file && VM_SHARED:

init: we need to replace the address space operations, all pages are mapped as shared anyway, no COW necessary.
(the pseudo file's write_page will call the original write_page, ensuring that we actually modify the original file)

fault(): load the page through the original fault().

page_mkwrite(): calls the original and logs the event.

vm_file && !VM_SHARED
:

init: we need to iterate the page table and find the pages present, the ones that are writable are now anonymous, so we have to convert them to shared mappings of the pseudo file mapping, most importantly the reverse mappings have to be taken care. the usage counter of these pages will be 1, so we will never copy them in subsequent writes.

(the pseudo file's write_page will not call the original write_page, ensuring that we only work on our private copy in the memory)

fault(): load the page through the original read

page_mkwrite(): here we have to make a copy of the original page because normally, COW would ensure that we get our own private copy. However instead of mapping it as anonymous page, we will map it as a shared page of the pseudo file so that consequent write faults will not find the page as anonymous and page_mkwrite() will be called again.
(we either make a new copy of all written pages in each iteration or keep track the ones which have been actually copied by us, it is not necessary to copy those again)

!vm_file && VM_SHARED:

init: (this is the case of shared memory) update the vma that it is a file mapping now, and see which other vmas map these pages (update those as well?-> we can find all mappings through the reverse priority tree)

BUT: are we supposed to migrate a process that shares memory with someone else??
(if we ensure that all the processes that share the memory are our targets, it could be done..)

fault(): creates a new mapping and zeros the page

page_mkwrite(): logs the write


!vm_file && !VM_SHARED:

init: our private malloc()ed memory and memory that we inherited from fork() (if no exec() occurred after)
(this is a problem if our process was fork()ed from a big address space and it is actually not using the most of it, but is that really a relevant case?)

1.) update the vma that it is a file mapping now with the pseudo file
2.) iterate PTEs and check the ones which are present:
- if writable, we already COW-ed, map as file shared of the pseudo file (this is the most common case, malloc()ed memory)
- if not yet writable, make a copy (as COW) and map as file shared of the pseudo file, this can be very expensive, if the address space is big... *

(* for avoiding the immediate copy, we could create a fake vma, link the page to it anonymously, store the address that it belonged to that page and mark the real pte as non-present. In this case during fault() we could check this fake vma and see if the given address was already present originally, if so we make a copy and actually map it in.)

fault(): creates a new mapping and zeros the page (or see *)

page_mkwrite(): we have our own copy, just log the write